Large Language Models (LLMs) have demonstrated extraordinary capabilities, revolutionizing numerous sectors. However, significant limitations persist, only partially addressed by Retrieval-Augmented Generation (RAG) systems. This research proposal aims to overcome one of these crucial limitations: the excessive consumption of computational resources.
The goal is to develop efficient and scalable retrieval-augmented language models (RALMs), addressing three fundamental research questions: (RQ1) Incorporate into the model the ability to “know when it does not know”; (RQ2) Transfer knowledge from a model to an external memory; (RQ3) Apply joint compression techniques of IR and LLMs to reduce model size. By addressing these questions, we will bridge substantial gaps in the existing literature. For instance, we will present one of the first in-depth investigations into compression within RAG models, and pave new avenues for future research, particularly regarding memory transfer. Furthermore, improving the efficiency and scalability of RAG models will have a significant social impact, contributing to: Reduce computational costs, making these technologies more accessible; Mitigate environmental impact, promoting the sustainability of artificial intelligence models; Ensure greater social equity, democratizing access to advanced machine learning tools.
This project is fully feasible, based on a clear methodology and solid theoretical foundations. The collaboration with the Department of Computer Science of the University of Copenhagen (DIKU) will provide the necessary resources for the optimal execution of the project.