Retrieval Augmented Generation is one of most essential use cases with Large Language Models.
You can ground your large language model to answer questions based on the contents of your document.
For this tutorial, we are building a complete offline RAG-based LLM app which utilizes Ollama for inference, ChromaDB for vector store, streamlit for UI.
---
🔥 *Resources*
- *Code Example*: https://github.com/yankeexe/llm-rag-with-reranker-demo
- Ollama Download: https://ollama.dev/download
- Ollama Llama3.2:3b: https://ollama.dev/library/llama3.2
- Ollama nomic-embed-text: https://ollama.com/library/nomic-embed-text
- Cross Encoder: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- Streamlit: https://docs.streamlit.io/develop/api-reference
- ChromaDB: https://docs.trychroma.com/guides
- Ollama SDK: https://github.com/ollama/ollama-python
_Example Docs_
- AI Practitioner Doc: https://d1.awsstatic.com/training-and-certification/docs-ai-practitioner/AWS-Certified-AI-Practitioner_Exam-Guide.pdf
- CKS Doc: https://github.com/cncf/curriculum/blob/master/CKS_Curriculum%20v1.31.pdf
---
⚡️ *Follow me*
- Github: https://github.com/yankeexe
- LinkedIn: https://www.linkedin.com/in/yankeemaharjan
- Twitter (X): https://x.com/yankexe
- Website: https://yankee.dev
--
🎞️ Chapters
0:00 Intro
0:16 Application Demo
1:13 Prerequisites
1:50 Code: Env Setup
2:40 Code: App UI
3:40 Code: Splitting Document + Data Structures
8:06 Code: Embedding +Vector Database
15:58 Code: Adding LLM + Grounding
20:30 Code: Re-ranking with Cross-Encoders
24:29 Demo: Multi-document Relevance Scoring
25:47: Outro