spot_img
HomeResearch & DevelopmentUnderstanding LLM Decisions: A New Look at Explainability with...

Understanding LLM Decisions: A New Look at Explainability with llmSHAP

TLDR: llmSHAP introduces a principled approach to explain Large Language Model (LLM) decisions using Shapley values, addressing the challenges posed by LLMs’ stochastic nature and high computational cost. It explores different implementation variants (Standard, Cache-based, Sliding-Window, and Counterfactual), analyzing their adherence to Shapley axioms, computational complexity, and practical trade-offs between explanation speed and faithfulness to exact Shapley attributions. The cache-based method offers axiomatic guarantees and speed, while sliding-window and counterfactual methods prioritize speed at the cost of some theoretical principles.

Large Language Models (LLMs) have become incredibly powerful tools, assisting humans in making important decisions across various fields. However, their complex internal workings often make it difficult to understand exactly how they arrive at a particular output. This lack of transparency, often referred to as the ‘black box’ problem, raises concerns about trust and oversight. Explainable AI (XAI) aims to address these concerns by providing insights into why and how algorithms produce specific results.

One of the most prominent XAI methods is SHAP, which is based on the Shapley value from cooperative game theory. This method quantifies the contribution of individual input features to a model’s output. While Shapley values offer a theoretically sound approach to explainability, their direct application to LLMs presents unique challenges. The primary issue is that Shapley values assume a deterministic model, meaning the same input always yields the same output. LLMs, by design, are often stochastic; their outputs can vary even with identical inputs due to sampling techniques like ‘temperature’ and ‘top-p’ sampling.

A new research paper, titled “llmSHAP: A Principled Approach to LLM Explainability,” by Filip Naudot, Tobias Sundqvist, and Timotheus Kampik, tackles these challenges head-on. The authors introduce llmSHAP, a framework that adapts Shapley value-based explanations for the stochastic nature of LLMs. The paper explores fundamental trade-offs that arise when implementing Shapley values for LLM explainability, considering factors like explanation speed, agreement with exact Shapley values, and adherence to core principles.

Understanding the Shapley Value and LLM Challenges

The Shapley value is a concept from game theory that fairly distributes the total payoff among players in a cooperative game. In XAI, features of an input are considered ‘players,’ and the model’s output is the ‘payoff.’ The Shapley value for a feature is its average marginal contribution across all possible combinations (coalitions) of features. This method satisfies several desirable principles, including Efficiency (total contributions sum to the total output), Symmetry (equally contributing features get equal attribution), and Null Player (features that don’t contribute get zero attribution).

However, LLMs introduce two main problems: their stochastic inference and the high computational cost of calculating Shapley values. Since LLMs can produce different outputs for the same input, the assumption of deterministic inference, crucial for Shapley value principles, is often violated. Furthermore, computing Shapley values requires evaluating the model with numerous feature coalitions, which can be prohibitively expensive for LLMs.

llmSHAP’s Solutions and Variants

The llmSHAP framework proposes different approaches to address these issues, each with its own trade-offs:

  • Standard Shapley (ϕS): This is the direct application of Shapley values to LLMs. While it satisfies Symmetry and Null Player axioms, it fails to guarantee Efficiency under stochastic redraws because repeated evaluations of the same coalition might yield different results, preventing terms from canceling out as expected.

  • Cache-based Shapley (ϕCS): To restore determinism and improve efficiency, this variant caches the results of LLM inference calls for each unique coalition. By reusing cached results, it effectively makes the inference deterministic from the perspective of Shapley value calculation. This method satisfies all Shapley axioms (Efficiency, Symmetry, and Null Player) and significantly speeds up computation by avoiding redundant LLM calls. However, it might present a ‘less stochastic’ view of the LLM’s behavior than its actual application.

  • Sliding-Window Shapley (ϕSW): This approach is designed to combat the exponential computational cost. Instead of considering all possible coalitions, it computes Shapley values within a smaller, fixed-size ‘window’ that slides across the input features. While much faster (growing approximately linearly with the number of features), this approximation violates Efficiency and Symmetry axioms, though it still satisfies the Null Player axiom. Its accuracy is a trade-off for speed.

  • Counterfactual (ϕC): This is the simplest and fastest method, evaluating the effect of removing each feature individually from the full input. It’s essentially a Shapley value with a window size of one. Like the sliding-window approach, it violates Efficiency but satisfies Symmetry and Null Player. It provides a quick, intuitive measure of a feature’s impact.

Also Read:

Empirical Insights and Trade-offs

The researchers conducted an empirical evaluation using the OpenAI API with the gpt-4.1-mini model, comparing the different llmSHAP variants against the standard Shapley value (ϕS) as a gold standard. They measured both the cosine similarity of the attribution vectors (how closely they match the gold standard) and the runtime.

The results showed that the cache-based method (ϕCS) maintained the most stable similarity to the standard Shapley value across different feature counts, indicating consistent attributions. The counterfactual (ϕC) and sliding-window (ϕSW) methods, while faster (exhibiting linear growth in runtime compared to the exponential growth of ϕS and ϕCS), showed varying degrees of deviation from the gold standard, with ϕSW performing better than ϕC in terms of stability.

In conclusion, the paper highlights crucial design choices for engineers applying Shapley values to LLM explainability. Caching inference results (ϕCS) guarantees principle satisfaction and speeds up computation, but might mask the LLM’s inherent stochasticity. Approximations like the sliding-window (ϕSW) and counterfactual (ϕC) methods offer significant speed improvements at the cost of violating some core Shapley principles. The choice depends on the specific application’s needs, balancing the desire for speed with the need for accurate and axiomatically sound explanations.

This research provides a foundational understanding of how to approach LLM explainability in a principled manner, paving the way for more trustworthy and understandable AI systems. For more details, you can read the full paper here.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -