LLMs are powerful, but have limitations: their knowledge is fixed in their weights, and their context window is limited. Worse: when they don’t know something, they might just make it up. RAG, for Retrieval Augmented Generation, has emerged as a way to mitigate both of those problems. However, implementing RAG effectively is more complex than it seems. The nitty gritty parts of what makes good retrieval good are rarely talked about: No, cosine similarity is, in fact, not all you need. In this workshop, we explore what helps build a robust RAG pipeline, and how simple insights from retrieval research can greatly improve your RAG efforts. We’ll cover key topics like BM25, re-ranking, indexing, domain specificity, evaluation beyond LGTM@few, and filtering. Be prepared for a whole new crowd of incredibly useful buzzwords to enter your vocabulary.
More resources are available here:
https://parlance-labs.com/education/rag/ben.html
00:00 Introduction
Hamel introduces Ben Clavier, a researcher at Answer.ai with a strong background in information retrieval and the creator of the RAGatouille library.
00:48 Ben's Background
Ben shares his journey into AI and information retrieval, his work at Answer.ai, and the open-source libraries he maintains, including ReRankers.
02:20 Agenda
Ben defines Retrieval-Augmented Generation (RAG), clarifies common misconceptions, and explains that RAG is not a silver bullet or an end-to-end system.
05:01 RAG Basics and Limitations
Ben explains the basic mechanics of RAG, emphasizing that it is simply the process of stitching retrieval and generation together, and discusses common failure points.
06:29 RAG MVP Pipeline
Ben breaks down the simple RAG pipeline, including model loading, data encoding, cosine similarity search, and obtaining relevant documents.
07:54 Vector Databases
Ben explains the role of vector databases in handling large-scale document retrieval efficiently and their place in the RAG pipeline.
08:46 Bi-Encoders
Ben describes bi-encoders, their efficiency in pre-computing document representations, and their role in quick query encoding and retrieval.
11:24 Cross-Encoders and Re-Ranking
Ben introduces cross-encoders, their computational expense, and their ability to provide more accurate relevance scores by encoding query-document pairs together.
14:38 Importance of Keyword Search
Ben highlights the enduring relevance of keyword search methods like BM25 and their role in handling specific terms and acronyms effectively.
15:24 Integration of Full-Text Search
Ben discusses the integration of full-text search (TF-IDF) with vector search to handle detailed and specific queries better, especially in technical domains.
16:34 TF-IDF and BM25
Ben explains TF-IDF, BM25, and their implementation in modern retrieval systems, emphasizing their effectiveness despite being older techniques.
19:33 Combined Retrieval Approach
Ben illustrates a combined retrieval approach using both embeddings and keyword search, recommending a balanced weighting of scores.
19:22 Metadata Filtering
Ben emphasizes the importance of metadata in filtering documents, providing examples and explaining how metadata can significantly improve retrieval relevance.
22:37 Full Pipeline Overview
Ben presents a comprehensive RAG pipeline incorporating bi-encoders, cross-encoders, full-text search, and metadata filtering, showing how to implement these steps in code.
26:05 Q&A Session Introduction
26:14 Fine-Tuning Bi-Encoder and Cross-Encoder Models
Ben discusses the importance of fine-tuning bi-encoder and cross-encoder models for improved retrieval accuracy, emphasizing the need to make the bi-encoder more loose and the cross-encoder more precise.
26:59 Combining Scores from Different Retrieval Methods
A participant asks about combining scores from different retrieval methods. Ben explains the pros and cons of weighted averages versus taking top candidates from multiple rankers, emphasizing the importance of context and data specifics.
29:01 The Importance of RAG as context lengths get longer
Ben reflects on how RAG may evolve or change as context lengths of LLMs get larger, but emphasizing that long context lengths are not a silver bullet.
30:06 Chunking Strategies for Long Documents
Ben discusses effective chunking strategies for long documents, including overlapping chunks and ensuring chunks do not cut off sentences, while considering the importance of latency tolerance in production systems.
30:56 Fine-Tuning Encoders and advanced retrieval with ColBERT
Ben also discusses when to fine-tune your encoders, and explains ColBERT for advanced retrieval.