In this video we learn how to make Retrieval Augmented Generation (RAG) super fast for chatbots, Large Language Models (LLMs), or agents. We focus on how to design RAG / agent-powered conversational agents that use NVIDIA's NeMo Guardrails for decision-making on tool usage.
? Article:
https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
? Code:
https://github.com/pinecone-io/examples/blob/master/learn/generation/chatbots/nemo-guardrails/03-rag-with-actions.ipynb
? Subscribe for Latest Articles and Videos:
https://www.pinecone.io/newsletter-signup/
?? AI Consulting:
https://aurelio.ai
? Discord:
https://discord.gg/c5QtDB9RAP
Twitter: https://twitter.com/jamescalam
LinkedIn: https://www.linkedin.com/in/jamescalam/
00:00 Making RAG Faster
00:20 Different Types of RAG
01:03 Naive Retrieval Augmented Generation
02:22 RAG with Agents
05:06 Making RAG Faster
08:55 Implementing Fast RAG with Guardrails
11:02 Creating Vector Database
12:52 RAG Functions in Guardrails
14:32 Guardrails Colang Config
16:13 Guardrails Register Actions
17:03 Testing RAG with Guardrails
19:42 RAG, Agents, and LLMs