DF-RAG: Enhancing RAG for question answering by balancing relevance and diversity of retrieved chunks
A pipeline that dynamically adapts the level of diversity for each query at test time without requiring prior information.
Retrieval-augmented generation (RAG) enables large language models (LLMs) to incorporate external knowledge for knowledge-intensive tasks, producing more factually accurate outputs. RAG’s performance therefore is contingent on the retrieval of the right information. However, most existing RAG methods rely on cosine similarity for retrieval, which often introduces redundancy that limits information recall, thereby reducing overall performance on downstream tasks. In this work, we first show that a diversity-focused retrieval at the dataset level improves RAG performance across multiple challenging long context question-answering including multi- hop benchmarks. We then design an oracle that estimates the hypothetical upper bound achievable if diversity were optimized at the query level, revealing potential gains of up to 18% in F1 scores. Motivated by this gap, we propose DF-RAG, a pipeline that dynamically adapts the level of diversity for each query at test time without requiring prior information. DF-RAG leverages a maximal marginal relevance (MMR)–based scoring mechanism combined with LLM-driven planning and execution, and can be used as a drop-in replacement for cosine similarity retrieval in modern RAG systems. Comprehensive experiments on five QA benchmarks show that DF-RAG consistently outperforms strong baselines, achieving 4-10% improvements in F1 over vanilla RAG and recovering up to 91.3% of the gap between vanilla RAG and the theoretical oracle upper bound.
Latest publications
ART: Adaptive Reasoning Trees for explainable claim verification
A hierarchical method for claim verification in Large Language Models.
EACLLessons from the field: An adaptable lifecycle approach to applied dialogue summarization
An industry case study on developing an agentic system to summarize multi-party interactions.
EACLDeconstructing instruction-following: A new benchmark for granular analysis of Large Language Model instruction compliance abilities
A modular framework that uses a dynamically generated dataset to evaluate the capability of Large Language Models.
EACL