CAG: Cache-Augmented Generation
Summary
Retrieval-Augmented Generation (RAG) has become the dominant approach for integrating external knowledge into LLMs, helping models access information beyond their training data. However, RAG comes with limitations, such as retrieval latency, document selection errors, and system complexity. Cache-Augmented Generation (CAG) presents an alternative that improves performance but does not fully address the core challenge of small context windows.
RAG has some drawbacks
- There can be significant retrieval latency as it searches for and organizes the correct data.
- There can be errors in the documents/data it selects as results for a query. For example it may select the wrong document or give priority to the wrong document.
- It may introduce security and data issues 2️⃣.
- It introduces complication
- an external application to manage the data (Vector Database)
- a process to continually update this data when the data goes stale