Human-Centered AI: A New Framework for Designing and Evaluating Explainable Systems

TLDR: A new research paper reviews 65 user studies on Explainable AI (XAI) systems, revealing a critical gap: current evaluations are often too technical and don’t adequately focus on human user needs. The study proposes a human-centered framework for XAI design and evaluation, categorizing metrics into core system (affection, cognition, usability, interpretability), explanation, and user levels. It expands design goals for AI novices (responsible use, acceptance, user experience) and data experts (human-AI collaboration, task performance), and offers guidelines for improving evaluation practices through validation, holistic assessment, and clear methodology. The aim is to ensure AI systems are not just performant but also understandable and trustworthy for all users.

As Artificial Intelligence (AI) becomes an increasingly common part of our daily lives, there’s a growing need for these intelligent systems to be not only powerful but also easy to understand. This is where Explainable AI (XAI) comes in, aiming to provide clear explanations for how AI makes its decisions and predictions. However, a recent comprehensive review highlights that the way we currently evaluate these XAI systems often focuses too much on technical aspects and not enough on the actual needs of the people using them.

A new research paper, titled On the Design and Evaluation of Human-Centered Explainable AI Systems: A Systematic Review and Taxonomy, by Aline Mangold, Juliane Zietz, Susanne Weinhold, and Sebastian Pannasch, delves into this issue. The authors conducted a thorough review of 65 user studies that evaluated XAI systems across various fields and applications. Their goal was to create a holistic guide for XAI developers, offering an overview of XAI system properties and evaluation metrics that are truly human-centered.

Understanding XAI Systems and Their Explanations

The review found that most XAI systems are designed for analysis, helping users understand data in different domains, from assessing social media content to image classification. These systems often provide explanations that are graphical (like charts and graphs) or textual (written or spoken). Many explanations focus on specific outputs, known as ‘local’ explanations, and frequently use ‘feature-based’ approaches, which highlight the most important factors influencing an AI’s prediction.

A key finding from the paper is the distinction between the ‘core system’ (the AI itself) and the ‘XAI explanation’ (how the AI’s decisions are communicated). Both are crucial for a complete understanding. Explanations can also be interactive, allowing users to ask for more details, or static, simply providing information without further interaction.

Human-Centered Evaluation: Beyond Technical Metrics

The researchers categorized human-centered evaluation metrics into four main areas for the core system: Affection, Cognition, Usability, and Interpretability. They also looked at metrics specifically for explanations and user characteristics.

Affection: This refers to users’ emotional engagement and feelings towards the system. ‘Trust’ was the most frequently measured metric here, as it’s vital for users to rely appropriately on AI. The paper points out a lack of validated questionnaires for measuring trust, suggesting the use of established scales like the Trust Scale for Explainable AI (TXAI).
Cognition: This covers the mental processes users employ to understand the system. ‘Understandability’ was a key metric, focusing on how easy users perceive the system to be.
Usability: This assesses how effectively and intuitively users can interact with the system. ‘Usefulness’ and ‘intention to use’ were highlighted as important indicators of whether users will adopt the system.
Interpretability: The paper redefined ‘transparency’ to focus on users’ perceived clarity of how specific features influence AI decisions, rather than just technical details.

Surprisingly, less than half of the reviewed papers actually evaluated the XAI explanation itself, focusing instead on the core system. The authors emphasize that ‘explanation usefulness’ and ‘explanation satisfaction’ are crucial metrics that need more attention.

User characteristics, such as ‘domain expertise’ (familiarity with the task or field the AI addresses), were also frequently assessed, as they significantly impact how users interact with and perceive XAI systems.

Extended Design Goals for Different User Groups

The paper extends existing design goals for two main user groups:

AI Novices: For everyday users with little AI experience, the design goals include promoting ‘responsible use’ (understanding AI capabilities and risks), improving ‘acceptance’ (showing how the system aligns with user goals), and enhancing ‘user experience’ (making the system intuitive and easy to navigate).
Data Experts: For professionals who use AI for analysis and decision-making, the goals are more performance-oriented. These include enhancing ‘human-AI collaboration’ (a bidirectional flow of information where humans and AI learn from each other) and improving ‘system and user task performance’ (equipping users with knowledge for optimal system use).

Also Read:

Guidelines for Better XAI Evaluation

Based on their findings, the authors provide several guidelines:

Validation and Standardization: Use validated questionnaires and standardize user testing procedures to ensure accuracy and comparability of results.
Presentation of Study Methodology: Clearly describe methods and constructs in research papers to enhance transparency and reproducibility.
Holistic Evaluation: Consider all dimensions—affection, cognition, usability, interpretability, explanation, user characteristics, and user interaction behavior—for a comprehensive assessment.
Evaluation of Explanations: Always integrate measures related to explanations to avoid incomplete evaluations.
Consideration of Behavioral Intentions: Measure ‘intention to use’ in applied XAI studies, as it’s a direct predictor of actual usage behavior.

This research offers a valuable framework for developing and evaluating XAI systems from a human-centered perspective, ensuring that AI is not only intelligent but also understandable and trustworthy for all users.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Human-Centered AI: A New Framework for Designing and Evaluating Explainable Systems

Understanding XAI Systems and Their Explanations

Human-Centered Evaluation: Beyond Technical Metrics

Extended Design Goals for Different User Groups

Guidelines for Better XAI Evaluation

Gen AI News and Updates

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates