Navigating the 'Rashomon Effect': Why AI Explanations in Self-Driving Cars Can Differ

TLDR: A new research paper quantifies the ‘Rashomon effect’ in autonomous driving, showing that equally accurate AI models can provide significantly different explanations for the same action prediction. Using Qualitative Explainable Graphs (QXGs) and two model types (interpretable gradient boosting and black-box Graph Neural Networks), the study found substantial disagreement in feature importance, particularly among GNNs. This highlights that explanation ambiguity is an inherent challenge in scene understanding, urging a shift from seeking a single ‘ground-truth’ explanation to understanding the multiplicity of valid rationales for building trustworthy AI.

In the rapidly evolving world of autonomous driving, ensuring the reliability and trustworthiness of AI systems is paramount. A new research paper, titled “Rashomon in the Streets: Explanation Ambiguity in Scene Understanding,” delves into a critical challenge for Explainable AI (XAI): the Rashomon effect. This effect highlights that multiple, equally accurate AI models can arrive at the same prediction through vastly different internal reasoning processes, leading to conflicting explanations.

The study, conducted by researchers Helge Spieker, Jørn Eirik Betten, Arnaud Gotlieb, Nadjib Lazaar, and Nassim Belmecheri, provides the first empirical quantification of this explanation ambiguity specifically for action prediction in real-world driving scenarios. This is crucial because in safety-critical applications like self-driving cars, understanding why a system makes a particular decision is as important as the decision itself.

Understanding the Rashomon Effect in AI

Imagine several expert drivers observing the same traffic scene and all deciding to stop. While their final action is identical, their individual reasons for stopping might differ – one might focus on a pedestrian, another on a sudden brake light ahead, and a third on a changing traffic signal. The Rashomon effect in AI is similar: multiple high-performing models, trained on the same data, can make the same predictions but rely on different features or internal logic to do so. This poses a significant problem for XAI, as it means a single explanation from one model might not represent the full picture or even be consistent with other equally valid models.

Qualitative Explainable Graphs (QXGs): A New Way to See the Scene

To investigate this, the researchers utilized Qualitative Explainable Graphs (QXGs) as a symbolic representation of driving scenes. A QXG captures the spatial and temporal relationships between objects (like cars, pedestrians, and cyclists) in a graph structure. For instance, it can describe if a pedestrian is “very close” to a car, or how their trajectories are evolving. This rich, structured representation allows AI models to interpret complex scene dynamics.

Two Approaches to Explaining Actions

The study explored two distinct classes of models for action explanation, both using QXGs:

1. Pair-based Action Explanation: This approach uses interpretable models, specifically gradient boosting decision trees. It breaks down the complex scene graph into simpler pairs of objects (e.g., the ego vehicle and a pedestrian) and classifies which object pair might have caused an action. The advantage here is that the decision rules are more transparent, making them easier to inspect.

2. Graph-based Action Explanation: This method employs more complex Graph Neural Networks (GNNs). Unlike the pair-based approach, GNNs process the entire scene graph, incorporating a broader context of all objects. However, GNNs are considered “black-box” models, meaning their internal decision-making is not directly interpretable. Therefore, external techniques like feature attribution (e.g., Integrated Gradients or SHAP values) are needed to explain their predictions.

The Experiment: Quantifying Disagreement

The researchers trained a “Rashomon set” of models for both approaches using the nuScenes dataset, a large collection of real-world autonomous driving videos, enhanced with object relevance annotations from the DriveLM dataset. They selected models that performed similarly well on a validation set. Then, they measured the agreement of explanations both within and between these model classes using two key metrics:

Fleiss’ Kappa: This metric assessed how much different models agreed on which features were the most important (top-k features) for a prediction.
Kendall’s W: This measured the consensus among models on the overall ranking of feature importance.

Key Findings: Explanation Ambiguity is Real

The results revealed significant explanation disagreement:

Pair-based Models (Interpretable): These models showed a moderate level of agreement on feature importance, which improved when only considering correct predictions. This suggests that when these simpler models get it right, their reasoning is more consistent.
Graph-based Models (Black-box GNNs): In contrast, the GNNs exhibited very low agreement on feature ranking. Even for correct predictions, their explanations were highly divergent. This indicates that different GNNs, despite making the same correct prediction, might be relying on entirely different sets of features or internal pathways to reach that conclusion.

The study suggests that this explanation ambiguity is not merely a flaw in the models but an inherent property of the problem itself, possibly due to “symmetries in the data” (multiple equally valid reasons for an action) or “overparameterization” in complex neural networks.

Also Read:

Implications for Trustworthy AI

These findings serve as a crucial warning against blindly accepting single post-hoc explanations from AI models. The paper argues that instead of seeking a single “ground-truth” rationale, we should shift our perspective to ask: “What are the possible reasons a good model could have for this prediction?”

Future research should focus on understanding and leveraging this multiplicity of explanations, perhaps by developing techniques to find a “consensus explanation” or using explanation variance as a new form of uncertainty quantification. By acknowledging and embracing the diverse rationales behind AI decisions, we can build more robust and trustworthy AI systems for autonomous driving and beyond. You can read the full paper here: Rashomon in the Streets: Explanation Ambiguity in Scene Understanding.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Navigating the ‘Rashomon Effect’: Why AI Explanations in Self-Driving Cars Can Differ

Understanding the Rashomon Effect in AI

Qualitative Explainable Graphs (QXGs): A New Way to See the Scene

Two Approaches to Explaining Actions

The Experiment: Quantifying Disagreement

Key Findings: Explanation Ambiguity is Real

Implications for Trustworthy AI

Gen AI News and Updates

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

TrueBalance Transforms Indian Credit Landscape with Advanced AI for Financial Inclusion

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates