On this article we are going to discover why 128K tokens (and extra) fashions can’t absolutely exchange utilizing RAG.
We’ll begin with a quick reminder of the issues that may be solved with RAG, earlier than wanting on the enhancements in LLMs and their influence on the want to make use of RAG.
RAG isn’t actually new
The thought of injecting a context to let a language mannequin get entry to up-to-date knowledge is kind of “previous” (on the LLM degree). It was first launched by Fb AI/Meta researcher on this 2020 paper “Retrieval-Augmented Technology for Data-Intensive NLP Duties”. Compared the primary model of ChatGPT was solely launched on November 2022.
On this paper they distinguish two type of reminiscence:
- the parametric one, which is what’s inherent to the LLM, what it realized whereas being fed lot and lot of texts throughout coaching,
- the non-parametric one, which is the reminiscence you may present by feeding a context to the immediate.