Semantic Caching for LLM models

Houssem Dellai 759 3 months ago

Video Not Working? Fix It Now

This is how to enhance the performance of intelligent applications by implementing cache. For the case of LLMs, it is a bit different as we are dealing with user prompts. The same request could be expressed differently using different words and styles. So, text to text comparison is not enough here. For that semantic caching introduced the idea of comparing the 'intent' instead of text. How this works ? This requires a tool for semantic comparison, typically an embedding model. And also a Vector database to save the prompts and their answers, which could be done using Redis Cache. How to put it all together ? You find that out in the video :) Disclaimer: This video is part of my courses on Udemy: https://www.udemy.com/user/houssem-dellai/ Follow me on Twitter for more content: https://twitter.com/houssemdellai

Comment