Addressing User-Facing AI Biases: How Product Teams Uncovered and Fixed "Over-the-Hood" Inclusivity Bugs

TLDR: A study with three AI product teams identified six types of “over-the-hood” AI inclusivity bugs—user-facing barriers that disproportionately exclude users with certain problem-solving styles. Using and adapting the GenderMag inclusive design method, particularly with a new “Pre-Action Fork” variant, teams found and fixed 47 of 83 bugs, improving user understanding and gender equity by considering scenarios where users might doubt AI outputs.

The research paper “Over-the-Hood” AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed Them by Andrew Anderson, Fatima A. Moussaoui, Jimena Noa Guevara, Md Montaser Hamid, and Margaret Burnett, delves into a crucial, yet often overlooked, aspect of artificial intelligence: “over-the-hood” inclusivity bugs. While much attention has been paid to “under-the-hood” biases like those in algorithms or training data, this study focuses on barriers in user-facing AI products that can unfairly exclude users with certain problem-solving styles.

The core of this research involved a field study with three different AI product teams: Team Game, Team Weather, and Team Farm. These teams investigated unique AI inclusivity bugs in their user-facing products and explored how an existing inclusive design method, GenderMag, could be adapted to find and fix them. The study revealed 83 instances of 6 distinct AI inclusivity bug types, with fixes developed for 47 of these instances. A new variant of the GenderMag method, called GenderMag-for-AI, proved particularly effective in detecting certain kinds of these bugs.

Understanding AI Inclusivity Bugs

The paper defines an AI inclusivity bug as a usability issue that exists specifically within the AI’s information and disproportionately affects certain groups of AI product users based on their problem-solving styles. The GenderMag method, which the teams used, is built around five distinct problem-solving styles, each with a range of values. These styles are linked to how people approach problems and have shown statistical ties to gender. By considering these styles, particularly through personas like “Abi” (representing risk-averse, comprehensive information processors), the teams could identify where their AI products fell short for diverse users.

The six identified AI inclusivity bug types are:

Interpret AI? (“What does this even mean?”): This was the most common bug, where users struggled to understand the AI’s output or reasoning. It often affected risk-averse users and those who process information comprehensively. Fixes typically involved adding clearer legends, information boxes, or labels.
AI input↔output? (“What does this (AI-input) have to do with that (AI-output)?”): Users found it unclear how the AI’s inputs related to its outputs. This bug was also strongly linked to risk-aversion. Solutions focused on explicitly connecting inputs and outputs, for example, by dynamically highlighting related elements in the interface.
AI: why should I? (“Why even look at this?”): This bug described situations where risk-averse users didn’t see the point in engaging with the AI’s information at all, fearing a waste of time. Fixes often employed a “Surprise-Explain-Reward” strategy, making interactions more inviting and clearly hinting at benefits.
AI: more info! (“Need more info!”): Comprehensive information processors, like the Abi persona, often found the AI’s details insufficient to make decisions or trust the output. Adding more detailed information to the AI’s inputs or outputs was a common solution.
AI: actionable? (“So? What should I DO?”): Users were unclear about what actions they should take based on the AI’s information. This bug frequently impacted comprehensive information processors. Solutions included clearer instructions, intuitive icons, or explicit guidance on next steps.
AI changes? (“What’s changed?”): When the AI’s information changed over time, users with lower self-efficacy often struggled to understand these changes, sometimes blaming themselves. Fixes involved making changes explicit through visual cues like temporary highlights or legends.

Also Read:

The Impact of GenderMag and its Evolution

The study found that using the original GenderMag method was effective in helping teams identify and fix AI inclusivity bugs. For instance, Team Game’s external collaboration showed that their post-GenderMag version significantly improved users’ conceptual understanding of the AI’s reasoning and boosted gender equity in mental model scores by 45%.

However, the original GenderMag had a “blind spot”: it didn’t prompt teams to consider scenarios where users might doubt the AI’s recommendations. This led to the development of GenderMag-for-AI variants. The “Pre-Action Fork GenderMag” proved most successful. This variant introduced a branching path in the evaluation process, allowing teams to explicitly consider what happens when a user believes the AI versus when they doubt it. This adaptation uncovered new types of inclusivity bugs that would have otherwise been missed, highlighting the critical importance of addressing user skepticism in AI design.

This groundbreaking research underscores that true AI inclusivity extends beyond algorithmic fairness to encompass the user’s entire experience. By systematically identifying and addressing “over-the-hood” bugs, developers can create AI products that are not only powerful but also genuinely accessible and beneficial to a wider, more diverse audience. For more details, you can refer to the full research paper here. Read the full paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Addressing User-Facing AI Biases: How Product Teams Uncovered and Fixed “Over-the-Hood” Inclusivity Bugs

Understanding AI Inclusivity Bugs

The Impact of GenderMag and its Evolution

Gen AI News and Updates

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

India’s Evolving Workforce: The Dual Impact of Artificial Intelligence and Growing Female Engagement

AI Models Begin to Grasp What Makes Math Problems Interesting to Humans

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates