MENU

Fun & Interesting

RAG from the Ground Up with Python and Ollama

Decoder 44,506 lượt xem 1 year ago
Video Not Working? Fix It Now

Retrieval Augmented Generation (RAG) is the de facto technique for giving LLMs the ability to interact with any document or dataset, regardless of its size. Follow along as I cover how to parse and manipulate documents, explore how embeddings are used to describe abstract concepts, implement a simple yet powerful way to surface the most relevant parts of a document to a given query, and ultimately build a script that you can use to have a locally-hosted LLM engage your own documents.

Check out my other Ollama videos: https://www.youtube.com/playlist?list=PL4041kTesIWby5zznE5UySIsGPrGuEqdB

Links:
Code from video - https://decoder.sh/videos/rag-from-the-ground-up-with-python-and-ollama
Ollama Python library - https://github.com/ollama/ollama-python
Project Gutenberg - https://www.gutenberg.org
Nomic Embedding model (on ollama) - https://ollama.com/library/nomic-embed-text
BGE Embedding model - https://huggingface.co/CompendiumLabs/bge-base-en-v1.5-gguf/blob/main/bge-base-en-v1.5-f16.gguf
How to use a model from HF with Ollama - https://www.youtube.com/watch?v=fnvZJU5Fj3Q
Cosine Similarity - https://blog.gopenai.com/rag-for-everyone-a-beginners-guide-to-embedding-similarity-search-and-vector-db-423946475c90#cdfc

Timestamps:
00:00 - Intro
00:26 - Environment Setup
00:49 - Function review
01:50 - Source Document
02:18 - Starting the project
02:37 - parse_file()
04:35 - Understanding embeddings
05:40 - Implementing embeddings
07:01 - Timing embedding
07:35 - Caching embeddings
10:06 - Prompt embedding
10:19 - Cosine similarity for embedding comparison
12:16 - Brainstorming improvements
13:15 - Giving context to our LLM
14:29 - CLI input
14:49 - Next steps

Comment