A New Milestone in Vietnamese Legal AI: Introducing the VLQA Dataset

TLDR: The VLQA dataset is the first comprehensive, large, and high-quality Vietnamese dataset for legal question answering. It comprises over 3,000 real-world legal questions from citizens, meticulously annotated by legal professionals with references to nearly 60,000 statutory articles. This dataset addresses the critical scarcity of resources for legal Natural Language Processing (NLP) in low-resource languages like Vietnamese, enabling the development and evaluation of more reliable legal AI systems. Experiments with state-of-the-art models show its utility, while also highlighting current limitations of large language models (LLMs) in legal reasoning, such as factual inaccuracies and hallucinations.

The field of Artificial Intelligence (AI) and Natural Language Processing (NLP) has seen remarkable advancements, particularly with the rise of large language models (LLMs). These powerful models are increasingly being explored for complex tasks, including those within the legal domain. However, despite their impressive capabilities, there’s a significant gap between their current performance and the ultimate goal of fully automating legal tasks. This challenge is even more pronounced in countries with distinct legal systems and languages, especially those considered ‘low-resource’ in terms of available digital data, such as Vietnam.

Legal NLP in Vietnamese faces a major hurdle: the scarcity of high-quality, annotated data. This lack of labeled legal corpora is critical for training, validating, and fine-tuning AI models for legal applications. Addressing this pressing need, a new research paper introduces a groundbreaking resource: the VLQA dataset.

What is VLQA?

VLQA, which stands for Vietnamese Legal Question Answering, is introduced as the first comprehensive, large, and high-quality dataset specifically designed for the Vietnamese legal domain. It aims to bridge the gap between complex legal knowledge and public understanding by providing a robust foundation for developing advanced legal AI systems.

The dataset is unique for several reasons. Firstly, it comprises over 3,000 real-world legal questions posed by Vietnamese citizens, ensuring that the data reflects genuine concerns encountered by everyday people. Secondly, these questions are meticulously annotated by legal professionals, with references to relevant statutory articles drawn from an expansive corpus of approximately 59,000 legal provisions. This makes VLQA the largest expert-verified legal question answering dataset covering any statutory domain to date. Lastly, it provides both the relevant legal articles and detailed, long-form answers, supporting two fundamental legal NLP tasks: information retrieval and question answering.

How VLQA Was Built

The creation of the VLQA dataset involved a meticulous four-phase process to ensure its quality and comprehensiveness. It began with the collection of an expansive article-based legal corpus, encompassing 2,162 legal documents across 27 common domains within the Vietnamese legal framework. This corpus, consisting of 59,636 articles, captures the structural hierarchy of Vietnamese legislation, providing rich context for legal queries.

Following this, question-answer-article triplets were collected from well-known online legal consultation platforms in Vietnam. These platforms feature user-generated legal concerns, ensuring the real-world relevance of the questions. A rigorous filtering process was applied to remove irrelevant or duplicated content and to align explicit legal references within the answers to the collected articles.

A crucial phase was expert validation. A professional annotation team, comprising senior law students supervised by an experienced legal expert, meticulously reviewed and refined the collected answers and their associated legal articles. Each question-answer pair was independently annotated by two annotators to minimize bias, and a legal expert conducted the final quality assurance, checking for clarity, validity (ensuring references to current statutory articles), and fluency of the answers.

Finally, the verified data, consisting of 3,129 high-quality triplets, was partitioned into training, validation, and testing subsets to facilitate model development and evaluation.

Key Findings from Experiments

The researchers conducted extensive experiments to establish strong baselines on the VLQA dataset, evaluating various state-of-the-art retrieval and legal question-answering methods. For legal article retrieval, models like BGE-m3 showed strong performance in zero-shot settings, while fine-tuned models like mBERT significantly outperformed zero-shot baselines, underscoring the importance of domain adaptation for effective legal information retrieval.

In legal question answering, both extractive and generative models were benchmarked. Extractive models, which pull answers directly from text, generally performed well on lexical overlap metrics. Generative models, which create new answers, excelled in contextual semantic similarity. Interestingly, a smaller, fine-tuned generative model like BARTpho achieved performance comparable to much larger LLMs, further highlighting the benefits of domain-specific training.

The study also evaluated the in-context learning capabilities of recent LLMs, including open-weight models like Qwen2.5 and commercial offerings like GPT-4o and DeepSeek-V3. While GPT-4o-mini consistently achieved high scores, the research revealed a significant observation: while LLMs can generate fluent and well-structured responses, human evaluation often uncovered factual inaccuracies, incompleteness, or even ‘hallucinated’ elements not present in the source text. This disparity between superficial language generation and robust legal reasoning indicates a substantial area for future improvement in AI for legal applications.

Also Read:

Looking Ahead

The introduction of the VLQA dataset marks a significant step forward for legal NLP in Vietnam and for low-resource languages globally. By making this dataset publicly available, the researchers aim to foster further innovation in legal AI. The findings from their comprehensive evaluations highlight both the potential and the current limitations of state-of-the-art models in handling the nuances of legal information. Future work will focus on developing robust, end-to-end frameworks capable of processing lengthy legal texts, addressing complex legal queries, and meeting real-world application needs, ultimately contributing to more accessible and trustworthy legal assistance tools. You can find more details about this research in the paper available at arXiv:2507.19995.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

A New Milestone in Vietnamese Legal AI: Introducing the VLQA Dataset

What is VLQA?

How VLQA Was Built

Key Findings from Experiments

Looking Ahead

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates