TLDR: QTT-RAG is a novel multilingual Retrieval-Augmented Generation (mRAG) framework that addresses poor translation quality by explicitly evaluating translations across semantic equivalence, grammatical accuracy, and naturalness & fluency. Instead of rewriting content, it attaches these quality scores as metadata, enabling LLMs to make informed decisions and prioritize reliable translations. This approach significantly reduces factual distortions and hallucinations, particularly benefiting low-resource languages like Korean and Finnish, and consistently outperforms existing mRAG baselines.
In the rapidly evolving world of artificial intelligence, Retrieval-Augmented Generation (RAG) systems have become crucial for enabling large language models (LLMs) to access and utilize external knowledge. However, when these systems operate across multiple languages, known as multilingual RAG (mRAG), they face significant hurdles, particularly concerning the quality of translations.
A common practice in mRAG, especially for low-resource languages (like Korean or Finnish), involves retrieving documents, often in English, and then translating them into the user’s query language. The core problem here is that poor translation quality can severely degrade the performance of the LLM, leading to inaccurate or even fabricated responses. Existing solutions, such as CrossRAG, often assume translations are of sufficient quality, or like DKM-RAG, attempt to rewrite translated content. While rewriting can improve fluency, it carries a high risk of introducing factual distortions and hallucinations, where the LLM generates information not present in the original source.
To tackle these critical issues, researchers have introduced a novel framework called Quality-Aware Translation Tagging in mRAG (QTT-RAG). This innovative approach moves away from altering translated content and instead focuses on explicitly evaluating translation quality. QTT-RAG assesses translations across three key dimensions: semantic equivalence (does it preserve the original meaning?), grammatical accuracy (is it grammatically correct?), and naturalness & fluency (does it sound natural to a native speaker?). These quality scores are then attached to the translated documents as metadata, without changing the original text.
The beauty of QTT-RAG lies in its non-destructive design. By providing detailed quality scores, it empowers the generator LLM to make informed decisions. The model can prioritize information from high-quality translations, rely on them more heavily, and approach lower-quality passages with caution. This mechanism helps preserve the factual integrity of the retrieved information, a significant improvement over methods that risk distorting facts through rewriting.
The QTT-RAG system operates through a five-stage pipeline: initial document retrieval, followed by reranking to identify the most relevant documents. Next, it performs language detection; documents already in the query language bypass translation, while foreign language documents are translated using a neural machine translation model. Crucially, the quality tagging module then steps in, using an LLM-based agent to score the translated documents. Finally, these quality-tagged documents are fed into the generator LLM, along with the user query, guiding it to produce more reliable and factually grounded responses.
Extensive experiments were conducted on two multilingual open-domain question answering benchmarks, XORQA and MKQA, covering low-resource languages like Korean and Finnish, and a high-resource language like Chinese. The QTT-RAG framework was evaluated against baselines such as CrossRAG and DKM-RAG, using a diverse set of six instruction-tuned LLMs ranging from 2.4 billion to 14 billion parameters. The results consistently showed that QTT-RAG outperformed the baselines, particularly in low-resource language settings, by improving character 3-gram recall – a metric well-suited for multilingual evaluation.
While QTT-RAG demonstrated significant gains in Korean and Finnish, the improvements were smaller in Chinese. This was attributed to Chinese being a high-resource language, meaning a larger proportion of retrieved documents were already in Chinese, reducing the need for cross-lingual translation and thus limiting the opportunities for QTT-RAG to apply its quality tagging benefits. An ablation study also confirmed that explicit quality tagging is generally more reliable than simply filtering out low-quality translations, as filtering risks discarding potentially useful, albeit imperfect, information.
Also Read:
- Enhancing Indonesian Question Answering with Adaptive Retrieval-Augmented Generation
- RaCoT: Enhancing LLM Reasoning Reliability with Pre-Retrieval Contrastive Thinking
In conclusion, QTT-RAG offers a practical and robust solution for multilingual RAG systems, enabling more effective use of cross-lingual documents in diverse language environments. By explicitly assessing and tagging translation quality, it allows LLMs to navigate the complexities of translated content with greater awareness, leading to more accurate and trustworthy responses. For more details, you can refer to the original research paper.


