AI's Role in Islamic Inheritance: Assessing Language Models for Legal Reasoning

TLDR: A study evaluated various large language models (LLMs) on their ability to interpret and apply Arabic Islamic inheritance laws using the QIAS 2025 dataset. Proprietary models like GPT o3 and Gemini Flash 2.5 showed strong performance, and a majority voting ensemble achieved 92.7% accuracy, securing third place in the challenge. The research highlights LLMs’ potential in complex legal reasoning, while also noting challenges in fine-tuning and the importance of detailed datasets.

The intricate domain of Islamic inheritance, known as “Ilm al-Mawārīth,” is a cornerstone for Muslims, ensuring the equitable distribution of assets among heirs. However, the manual calculation of shares across numerous scenarios is notoriously complex, time-consuming, and prone to errors. Recent advancements in Large Language Models (LLMs) have opened new avenues, prompting researchers to explore their potential in assisting with such sophisticated legal reasoning tasks.

A recent study delved into evaluating the reasoning capabilities of state-of-the-art LLMs in interpreting and applying Islamic inheritance laws. This research utilized a specialized dataset from the ArabicNLP QIAS 2025 challenge, which comprises inheritance case scenarios presented in Arabic and derived from authentic Islamic legal sources. The primary goal was to assess how accurately these models could identify heirs, compute their shares, and justify their reasoning in accordance with Islamic legal principles.

Exploring Model Performance

The study rigorously evaluated a range of models, including both base and fine-tuned LLMs. The experiments aimed to answer several key questions: how well do current Arabic open-source LLMs perform, to what extent do proprietary state-of-the-art LLMs excel, and can fine-tuning improve their performance in this specific domain?

For open-source Arabic models, Falcon3, Fanar, Allam, and an optimized version called “Allam Thinking” were tested. Among these, Allam showed relatively better performance, though overall, open-source models demonstrated a general lack of domain knowledge and reasoning capabilities in this complex area.

Proprietary models, including Gemini Flash 2.5, Gemini Pro 2.5, GPT-4o, and GPT o3, were also put to the test. Notably, GPT o3 and Gemini Flash 2.5 emerged as the strongest performers among the base models, showcasing advanced capabilities in understanding and reasoning about Islamic inheritance. Their accuracy rates were impressive, with GPT o3 achieving 92.3% and Gemini Flash 2.5 reaching 91.5%.

The Impact of Fine-Tuning and Prompt Design

The research also investigated the effect of fine-tuning LLMs. For instance, GPT-4o, a highly generalist model, saw significant improvement after fine-tuning with the domain-specific dataset, with its accuracy climbing to over 86%. This suggests that for models with moderate prior knowledge, fine-tuning can effectively bridge knowledge gaps.

Conversely, Gemini Flash 2.5 experienced a performance drop after fine-tuning. Researchers hypothesize this might be due to its optimization for Chain of Thought (CoT) reasoning. If the fine-tuning dataset primarily provides final labels without detailed reasoning chains, the model might lose its inherent reasoning structure, leading to a misalignment between its pre-trained capabilities and the fine-tuning objective.

Prompt design also played a crucial role. Models like GPT-4o showed notable sensitivity to how questions were phrased, with a Chain of Thought prompt significantly enhancing its performance compared to a simpler prompt. Gemini Flash 2.5, however, maintained high accuracy regardless of prompt content, indicating robust internal reasoning.

A Winning Combination

In a final experiment, the study combined the predictions of the three best-performing base LLMs—GPT o3, Gemini Flash 2.5, and Gemini Pro 2.5—using a majority voting technique. This ensemble approach yielded a remarkable accuracy of 92.7%, securing third place overall in the QIAS 2025 challenge. This highlights the power of combining multiple strong models to achieve superior performance in complex legal reasoning tasks.

Also Read:

Looking Ahead

While LLMs show immense promise, limitations remain. They sometimes lack comprehensive knowledge of all inheritance scenarios, leading to occasional inaccuracies. The initial dataset, which lacked detailed reasoning for each answer choice, complicated the fine-tuning of reasoning-based models. However, a recently released second version of the dataset, including detailed reasoning, offers potential for further accuracy improvements.

This study underscores the significant potential of LLMs to assist in complex legal domains like Islamic inheritance, offering a path towards more efficient and accurate calculations. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI’s Role in Islamic Inheritance: Assessing Language Models for Legal Reasoning

Exploring Model Performance

The Impact of Fine-Tuning and Prompt Design

A Winning Combination

Looking Ahead

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates