Understanding LLM Decisions: A New Look at Explainability with llmSHAP

TLDR: llmSHAP introduces a principled approach to explain Large Language Model (LLM) decisions using Shapley values, addressing the challenges posed by LLMs’ stochastic nature and high computational cost. It explores different implementation variants (Standard, Cache-based, Sliding-Window, and Counterfactual), analyzing their adherence to Shapley axioms, computational complexity, and practical trade-offs between explanation speed and faithfulness to exact Shapley attributions. The cache-based method offers axiomatic guarantees and speed, while sliding-window and counterfactual methods prioritize speed at the cost of some theoretical principles.

Large Language Models (LLMs) have become incredibly powerful tools, assisting humans in making important decisions across various fields. However, their complex internal workings often make it difficult to understand exactly how they arrive at a particular output. This lack of transparency, often referred to as the ‘black box’ problem, raises concerns about trust and oversight. Explainable AI (XAI) aims to address these concerns by providing insights into why and how algorithms produce specific results.

One of the most prominent XAI methods is SHAP, which is based on the Shapley value from cooperative game theory. This method quantifies the contribution of individual input features to a model’s output. While Shapley values offer a theoretically sound approach to explainability, their direct application to LLMs presents unique challenges. The primary issue is that Shapley values assume a deterministic model, meaning the same input always yields the same output. LLMs, by design, are often stochastic; their outputs can vary even with identical inputs due to sampling techniques like ‘temperature’ and ‘top-p’ sampling.

A new research paper, titled “llmSHAP: A Principled Approach to LLM Explainability,” by Filip Naudot, Tobias Sundqvist, and Timotheus Kampik, tackles these challenges head-on. The authors introduce llmSHAP, a framework that adapts Shapley value-based explanations for the stochastic nature of LLMs. The paper explores fundamental trade-offs that arise when implementing Shapley values for LLM explainability, considering factors like explanation speed, agreement with exact Shapley values, and adherence to core principles.

Understanding the Shapley Value and LLM Challenges

The Shapley value is a concept from game theory that fairly distributes the total payoff among players in a cooperative game. In XAI, features of an input are considered ‘players,’ and the model’s output is the ‘payoff.’ The Shapley value for a feature is its average marginal contribution across all possible combinations (coalitions) of features. This method satisfies several desirable principles, including Efficiency (total contributions sum to the total output), Symmetry (equally contributing features get equal attribution), and Null Player (features that don’t contribute get zero attribution).

However, LLMs introduce two main problems: their stochastic inference and the high computational cost of calculating Shapley values. Since LLMs can produce different outputs for the same input, the assumption of deterministic inference, crucial for Shapley value principles, is often violated. Furthermore, computing Shapley values requires evaluating the model with numerous feature coalitions, which can be prohibitively expensive for LLMs.

llmSHAP’s Solutions and Variants

The llmSHAP framework proposes different approaches to address these issues, each with its own trade-offs:

Standard Shapley (ϕS): This is the direct application of Shapley values to LLMs. While it satisfies Symmetry and Null Player axioms, it fails to guarantee Efficiency under stochastic redraws because repeated evaluations of the same coalition might yield different results, preventing terms from canceling out as expected.
Cache-based Shapley (ϕCS): To restore determinism and improve efficiency, this variant caches the results of LLM inference calls for each unique coalition. By reusing cached results, it effectively makes the inference deterministic from the perspective of Shapley value calculation. This method satisfies all Shapley axioms (Efficiency, Symmetry, and Null Player) and significantly speeds up computation by avoiding redundant LLM calls. However, it might present a ‘less stochastic’ view of the LLM’s behavior than its actual application.
Sliding-Window Shapley (ϕSW): This approach is designed to combat the exponential computational cost. Instead of considering all possible coalitions, it computes Shapley values within a smaller, fixed-size ‘window’ that slides across the input features. While much faster (growing approximately linearly with the number of features), this approximation violates Efficiency and Symmetry axioms, though it still satisfies the Null Player axiom. Its accuracy is a trade-off for speed.
Counterfactual (ϕC): This is the simplest and fastest method, evaluating the effect of removing each feature individually from the full input. It’s essentially a Shapley value with a window size of one. Like the sliding-window approach, it violates Efficiency but satisfies Symmetry and Null Player. It provides a quick, intuitive measure of a feature’s impact.

Also Read:

Empirical Insights and Trade-offs

The researchers conducted an empirical evaluation using the OpenAI API with the gpt-4.1-mini model, comparing the different llmSHAP variants against the standard Shapley value (ϕS) as a gold standard. They measured both the cosine similarity of the attribution vectors (how closely they match the gold standard) and the runtime.

The results showed that the cache-based method (ϕCS) maintained the most stable similarity to the standard Shapley value across different feature counts, indicating consistent attributions. The counterfactual (ϕC) and sliding-window (ϕSW) methods, while faster (exhibiting linear growth in runtime compared to the exponential growth of ϕS and ϕCS), showed varying degrees of deviation from the gold standard, with ϕSW performing better than ϕC in terms of stability.

In conclusion, the paper highlights crucial design choices for engineers applying Shapley values to LLM explainability. Caching inference results (ϕCS) guarantees principle satisfaction and speeds up computation, but might mask the LLM’s inherent stochasticity. Approximations like the sliding-window (ϕSW) and counterfactual (ϕC) methods offer significant speed improvements at the cost of violating some core Shapley principles. The choice depends on the specific application’s needs, balancing the desire for speed with the need for accurate and axiomatically sound explanations.

This research provides a foundational understanding of how to approach LLM explainability in a principled manner, paving the way for more trustworthy and understandable AI systems. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Understanding LLM Decisions: A New Look at Explainability with llmSHAP

Understanding the Shapley Value and LLM Challenges

llmSHAP’s Solutions and Variants

Empirical Insights and Trade-offs

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates