spot_img
HomeResearch & DevelopmentAI's Role in Islamic Inheritance: Assessing Language Models for...

AI’s Role in Islamic Inheritance: Assessing Language Models for Legal Reasoning

TLDR: A study evaluated various large language models (LLMs) on their ability to interpret and apply Arabic Islamic inheritance laws using the QIAS 2025 dataset. Proprietary models like GPT o3 and Gemini Flash 2.5 showed strong performance, and a majority voting ensemble achieved 92.7% accuracy, securing third place in the challenge. The research highlights LLMs’ potential in complex legal reasoning, while also noting challenges in fine-tuning and the importance of detailed datasets.

The intricate domain of Islamic inheritance, known as “Ilm al-Mawārīth,” is a cornerstone for Muslims, ensuring the equitable distribution of assets among heirs. However, the manual calculation of shares across numerous scenarios is notoriously complex, time-consuming, and prone to errors. Recent advancements in Large Language Models (LLMs) have opened new avenues, prompting researchers to explore their potential in assisting with such sophisticated legal reasoning tasks.

A recent study delved into evaluating the reasoning capabilities of state-of-the-art LLMs in interpreting and applying Islamic inheritance laws. This research utilized a specialized dataset from the ArabicNLP QIAS 2025 challenge, which comprises inheritance case scenarios presented in Arabic and derived from authentic Islamic legal sources. The primary goal was to assess how accurately these models could identify heirs, compute their shares, and justify their reasoning in accordance with Islamic legal principles.

Exploring Model Performance

The study rigorously evaluated a range of models, including both base and fine-tuned LLMs. The experiments aimed to answer several key questions: how well do current Arabic open-source LLMs perform, to what extent do proprietary state-of-the-art LLMs excel, and can fine-tuning improve their performance in this specific domain?

For open-source Arabic models, Falcon3, Fanar, Allam, and an optimized version called “Allam Thinking” were tested. Among these, Allam showed relatively better performance, though overall, open-source models demonstrated a general lack of domain knowledge and reasoning capabilities in this complex area.

Proprietary models, including Gemini Flash 2.5, Gemini Pro 2.5, GPT-4o, and GPT o3, were also put to the test. Notably, GPT o3 and Gemini Flash 2.5 emerged as the strongest performers among the base models, showcasing advanced capabilities in understanding and reasoning about Islamic inheritance. Their accuracy rates were impressive, with GPT o3 achieving 92.3% and Gemini Flash 2.5 reaching 91.5%.

The Impact of Fine-Tuning and Prompt Design

The research also investigated the effect of fine-tuning LLMs. For instance, GPT-4o, a highly generalist model, saw significant improvement after fine-tuning with the domain-specific dataset, with its accuracy climbing to over 86%. This suggests that for models with moderate prior knowledge, fine-tuning can effectively bridge knowledge gaps.

Conversely, Gemini Flash 2.5 experienced a performance drop after fine-tuning. Researchers hypothesize this might be due to its optimization for Chain of Thought (CoT) reasoning. If the fine-tuning dataset primarily provides final labels without detailed reasoning chains, the model might lose its inherent reasoning structure, leading to a misalignment between its pre-trained capabilities and the fine-tuning objective.

Prompt design also played a crucial role. Models like GPT-4o showed notable sensitivity to how questions were phrased, with a Chain of Thought prompt significantly enhancing its performance compared to a simpler prompt. Gemini Flash 2.5, however, maintained high accuracy regardless of prompt content, indicating robust internal reasoning.

A Winning Combination

In a final experiment, the study combined the predictions of the three best-performing base LLMs—GPT o3, Gemini Flash 2.5, and Gemini Pro 2.5—using a majority voting technique. This ensemble approach yielded a remarkable accuracy of 92.7%, securing third place overall in the QIAS 2025 challenge. This highlights the power of combining multiple strong models to achieve superior performance in complex legal reasoning tasks.

Also Read:

Looking Ahead

While LLMs show immense promise, limitations remain. They sometimes lack comprehensive knowledge of all inheritance scenarios, leading to occasional inaccuracies. The initial dataset, which lacked detailed reasoning for each answer choice, complicated the fine-tuning of reasoning-based models. However, a recently released second version of the dataset, including detailed reasoning, offers potential for further accuracy improvements.

This study underscores the significant potential of LLMs to assist in complex legal domains like Islamic inheritance, offering a path towards more efficient and accurate calculations. For more details, you can read the full research paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -