Enhancing Legal AI: A Structured Prompting Method for Long Documents

TLDR: A new study presents a structured prompting methodology for Large Language Models (LLMs) to effectively handle long legal documents. By combining document chunking with augmentation, engineered prompts for QWEN-2, and two novel heuristics (Distribution-based Localisation and Inverse Cardinality Weighting) for candidate selection, the approach achieves state-of-the-art performance in information retrieval from legal texts. This method offers a cost-effective alternative to fine-tuning, improving reliability and transparency in legal AI, and performs up to 9% better than previous methods.

Large Language Models (LLMs) are transforming many fields, but their adoption in the legal sector has faced unique hurdles. Key concerns include ensuring reliability and transparency, especially when dealing with the vast and complex nature of legal documents. A recent study by Strahinja Klem and Noura Al Moubayed introduces an innovative approach to overcome these challenges, offering a structured prompting methodology that allows general-purpose LLMs to effectively process and retrieve information from lengthy legal texts.

Addressing Legal Document Challenges with LLMs

Legal documents are notoriously long and intricate, often exceeding the context window of most LLMs. This “long document problem” means models struggle to process an entire document at once. Furthermore, the task of accurately retrieving specific information from these documents, known as the “information retrieval problem,” is a time-consuming and repetitive part of a legal professional’s job. The researchers recognized the need for trustworthy AI tools that can assist legal practitioners without becoming decision-makers themselves, prioritizing human agency and responsibility.

A Novel Structured Prompting Methodology

Instead of relying on expensive fine-tuning, which is common for specialized AI models, Klem and Al Moubayed propose a structured prompting methodology. This approach leverages the power of a general-purpose model, QWEN-2 (a 7 billion parameter variant), making the solution more accessible and scalable. The core of their method involves several key steps:

Chunking and Augmentation: To tackle the long document problem, legal documents are first split into smaller, manageable “chunks.” A crucial “augmentation” step is then applied, where redundancy is added between chunks. This helps to relink context that might otherwise be lost when a document is divided, significantly reducing the risk of missing critical information.
Engineered Prompting: The study emphasizes the importance of carefully crafted prompts. Through a systematic process of creation, testing, and optimization, the researchers developed prompts that guide the LLM to perform information retrieval tasks more accurately. This prompt engineering step is designed to increase the reliability of the model’s outputs.
Candidate Selection Heuristics: After the LLM processes each chunk and generates potential answers, a “candidate selection problem” arises – how to choose the most accurate answer from multiple possibilities. The researchers introduced two heuristics:
- Distribution-based Localisation (DBL): This heuristic uses patterns from existing data to predict where answers are most likely to appear within a document. Chunks containing these likely locations are given higher weight.
- Inverse Cardinality Weighting (ICW): This method groups similar answers and weights them inversely to the size of their groups. The idea is that correct answers might appear less frequently than incorrect or noisy responses, helping to isolate the most probable correct answer.

Also Read:

Performance and Implications

The methodology was tested on the CUAD dataset, an American legal dataset specifically designed for contract review and information retrieval. The results demonstrated a significant improvement, with the model performing up to 9% better than previously presented methods, achieving state-of-the-art performance. This represents an average increase in correctness of about 9% per question, and an absolute jump of 250 correct answers in total.

While the study highlights the immense potential of structured prompt engineering in the legal domain, it also points out the limitations of current automatic evaluation metrics for question answering. This calls for future research into more specialized metrics that can accurately assess the nuanced and variable nature of legal text outputs.

Ultimately, this research underscores that by combining structured prompt engineering with intelligent heuristics, generalist LLMs can become powerful, reliable, and transparent tools for navigating the complexities of long legal documents, ensuring accountability and responsibility in AI applications within law and beyond. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Legal AI: A Structured Prompting Method for Long Documents

Addressing Legal Document Challenges with LLMs

A Novel Structured Prompting Methodology

Performance and Implications

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates