Making AI Transparent in Education: A Deep Dive into the XAI Challenge 2025

TLDR: The XAI Challenge 2025, held at IJCNN 2025, focused on developing explainable Question-Answering (QA) systems for educational settings. Participants were tasked with building AI systems that could answer student queries about university policies and provide clear, logic-based natural language explanations. The challenge emphasized the integration of lightweight Large Language Models (LLMs) with symbolic reasoning, using a meticulously crafted dataset based on real university policies. Evaluation prioritized not only answer correctness but also the relevance of supporting evidence and the clarity of explanations. The competition highlighted the complexity of building transparent AI and showcased diverse approaches, demonstrating the feasibility of bridging LLMs and symbolic reasoning for trustworthy educational AI.

Artificial Intelligence (AI) is increasingly becoming a part of our educational landscape, from helping students with homework to managing university policies. As AI systems become more powerful, there’s a growing need for them to be transparent and understandable, especially in critical areas like education. This is where Explainable AI (XAI) comes in, focusing on making AI decisions clear and interpretable.

To address this crucial need, the XAI Challenge 2025 was organized as a hackathon-style competition. It was a collaborative effort between Ho Chi Minh City University of Technology (HCMUT) and the International Workshop on Trustworthiness and Reliability in Neurosymbolic AI (TRNS-AI), held as part of the International Joint Conference on Neural Networks (IJCNN 2025). The challenge aimed to push the boundaries of AI in education by tasking participants with building Question-Answering (QA) systems. These systems had to answer student queries about university policies and, crucially, provide clear, logic-based explanations in natural language.

The Challenge’s Unique Approach

What made the XAI Challenge 2025 stand out was its emphasis on combining Large Language Models (LLMs) with symbolic reasoning. While LLMs are excellent at understanding and generating human-like text, they can sometimes be opaque in their decision-making. Symbolic reasoning, on the other hand, involves using logic and rules, making its processes inherently more transparent. The challenge encouraged participants to use lightweight LLMs or hybrid LLM-symbolic systems, promoting solutions that were not only accurate but also transparent and trustworthy.

A Carefully Crafted Dataset

A high-quality dataset was central to the challenge. It was designed to reflect real-world academic scenarios, covering various university regulations like course enrollment and graduation requirements. The dataset was unique because it included premises in both natural language and formal logic, along with questions, correct answers, and human-readable explanations that outlined the reasoning steps. This structure encouraged systems to base their answers on explicit logical evidence rather than just statistical approximations. The dataset was built using a three-stage process: generating logically consistent premises with a Z3 Solver, translating them into natural language using AI, and finally, refining them through expert student review to ensure clarity and relevance.

Fair Play and Rigorous Evaluation

To ensure fairness and align with XAI goals, clear rules and constraints were set. Participants’ systems had to accept a question and a list of premises, then return an answer, the indices of supporting premises, and a concise explanation. External data was allowed only if fully disclosed, and hardcoded responses were prohibited. The evaluation protocol was multi-phased and comprehensive, assessing systems on three main dimensions: correctness of answers, relevance of premises (ensuring the system picked the minimal correct set of supporting evidence), and explainability (judging the clarity and logical coherence of the generated explanations). The final round even included live presentations where teams had to explain their design choices and reasoning strategies to a panel of professors.

Insights from the Competition

The XAI Challenge 2025 attracted 107 participants across 28 teams, showcasing a diverse range of backgrounds and approaches. The results highlighted the inherent difficulty of the task, as even top scores were modest, underscoring the complexity of developing AI systems that can perform faithful and interpretable reasoning over real-world policies. The two-phase evaluation structure proved beneficial, allowing teams to refine their systems iteratively. Interestingly, selection scores didn’t always predict final rankings, emphasizing the importance of the live presentation and the ability to clearly explain one’s solution.

Also Read:

Diverse Solutions Emerged

The finalists showcased several innovative approaches:

Multi-Agent Systems with Symbolic Reasoning: One top team used a modular system where different AI agents handled parsing, logical inference (with Z3), and explanation synthesis.
Prompt-Based Learning with Task-Specific Templates: Other teams guided lightweight LLMs with tailored prompts to extract information and generate explanations, sometimes using Chain-of-Thought prompting for step-by-step reasoning.
Rule Retrieval with Symbolic Inference: Another method involved building a structured rulebase and using keyword matching to find relevant policies, then applying a symbolic solver for formal inference.
Multi-Task Fine-Tuning with Mixture-of-Experts: A learning-based system fine-tuned multiple LLMs for specific tasks like answer generation and explanation construction, routing them through a specialized architecture.

These varied approaches demonstrated the trade-offs between transparency, flexibility, and performance, providing valuable insights into designing XAI systems for education and policy domains. The XAI Challenge 2025 successfully demonstrated the potential of combining LLMs and symbolic reasoning to create practical, interpretable solutions for real-world educational QA tasks. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Making AI Transparent in Education: A Deep Dive into the XAI Challenge 2025

The Challenge’s Unique Approach

A Carefully Crafted Dataset

Fair Play and Rigorous Evaluation

Insights from the Competition

Diverse Solutions Emerged

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Astreya Launches Next Generation of Enterprise AI Agents to Transform Operational Insights into Action

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates