Retrieval Augmented Generation (RAG) is the de facto technique for giving LLMs the ability to interact with any document or dataset, regardless of its size. Follow along as I cover how to parse and manipulate documents, explore how embeddings are used to describe abstract concepts, implement a simple yet powerful way to surface the most relevant parts of a document to a given query, and ultimately build a script that you can use to have a locally-hosted LLM engage your own documents.
Check out my other Ollama videos: https://www.youtube.com/playlist?list=PL4041kTesIWby5zznE5UySIsGPrGuEqdB
Links:
Code from video - https://decoder.sh/videos/rag-from-the-ground-up-with-python-and-ollama
Ollama Python library - https://github.com/ollama/ollama-python
Project Gutenberg - https://www.gutenberg.org
Nomic Embedding model (on ollama) - https://ollama.com/library/nomic-embed-text
BGE Embedding model - https://huggingface.co/CompendiumLabs/bge-base-en-v1.5-gguf/blob/main/bge-base-en-v1.5-f16.gguf
How to use a model from HF with Ollama - https://www.youtube.com/watch?v=fnvZJU5Fj3Q
Cosine Similarity - https://blog.gopenai.com/rag-for-everyone-a-beginners-guide-to-embedding-similarity-search-and-vector-db-423946475c90#cdfc
Timestamps:
00:00 - Intro
00:26 - Environment Setup
00:49 - Function review
01:50 - Source Document
02:18 - Starting the project
02:37 - parse_file()
04:35 - Understanding embeddings
05:40 - Implementing embeddings
07:01 - Timing embedding
07:35 - Caching embeddings
10:06 - Prompt embedding
10:19 - Cosine similarity for embedding comparison
12:16 - Brainstorming improvements
13:15 - Giving context to our LLM
14:29 - CLI input
14:49 - Next steps