TLDR: A study with three AI product teams identified six types of “over-the-hood” AI inclusivity bugs—user-facing barriers that disproportionately exclude users with certain problem-solving styles. Using and adapting the GenderMag inclusive design method, particularly with a new “Pre-Action Fork” variant, teams found and fixed 47 of 83 bugs, improving user understanding and gender equity by considering scenarios where users might doubt AI outputs.
The research paper “Over-the-Hood” AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed Them by Andrew Anderson, Fatima A. Moussaoui, Jimena Noa Guevara, Md Montaser Hamid, and Margaret Burnett, delves into a crucial, yet often overlooked, aspect of artificial intelligence: “over-the-hood” inclusivity bugs. While much attention has been paid to “under-the-hood” biases like those in algorithms or training data, this study focuses on barriers in user-facing AI products that can unfairly exclude users with certain problem-solving styles.
The core of this research involved a field study with three different AI product teams: Team Game, Team Weather, and Team Farm. These teams investigated unique AI inclusivity bugs in their user-facing products and explored how an existing inclusive design method, GenderMag, could be adapted to find and fix them. The study revealed 83 instances of 6 distinct AI inclusivity bug types, with fixes developed for 47 of these instances. A new variant of the GenderMag method, called GenderMag-for-AI, proved particularly effective in detecting certain kinds of these bugs.
Understanding AI Inclusivity Bugs
The paper defines an AI inclusivity bug as a usability issue that exists specifically within the AI’s information and disproportionately affects certain groups of AI product users based on their problem-solving styles. The GenderMag method, which the teams used, is built around five distinct problem-solving styles, each with a range of values. These styles are linked to how people approach problems and have shown statistical ties to gender. By considering these styles, particularly through personas like “Abi” (representing risk-averse, comprehensive information processors), the teams could identify where their AI products fell short for diverse users.
The six identified AI inclusivity bug types are:
- Interpret AI? (“What does this even mean?”): This was the most common bug, where users struggled to understand the AI’s output or reasoning. It often affected risk-averse users and those who process information comprehensively. Fixes typically involved adding clearer legends, information boxes, or labels.
- AI input↔output? (“What does this (AI-input) have to do with that (AI-output)?”): Users found it unclear how the AI’s inputs related to its outputs. This bug was also strongly linked to risk-aversion. Solutions focused on explicitly connecting inputs and outputs, for example, by dynamically highlighting related elements in the interface.
- AI: why should I? (“Why even look at this?”): This bug described situations where risk-averse users didn’t see the point in engaging with the AI’s information at all, fearing a waste of time. Fixes often employed a “Surprise-Explain-Reward” strategy, making interactions more inviting and clearly hinting at benefits.
- AI: more info! (“Need more info!”): Comprehensive information processors, like the Abi persona, often found the AI’s details insufficient to make decisions or trust the output. Adding more detailed information to the AI’s inputs or outputs was a common solution.
- AI: actionable? (“So? What should I DO?”): Users were unclear about what actions they should take based on the AI’s information. This bug frequently impacted comprehensive information processors. Solutions included clearer instructions, intuitive icons, or explicit guidance on next steps.
- AI changes? (“What’s changed?”): When the AI’s information changed over time, users with lower self-efficacy often struggled to understand these changes, sometimes blaming themselves. Fixes involved making changes explicit through visual cues like temporary highlights or legends.
Also Read:
- Unlocking Hidden Biases: A Causal Approach to AI Fairness Testing
- Unpacking Bias in AI’s Thought Process: How Language Models Aggregate Stereotypes
The Impact of GenderMag and its Evolution
The study found that using the original GenderMag method was effective in helping teams identify and fix AI inclusivity bugs. For instance, Team Game’s external collaboration showed that their post-GenderMag version significantly improved users’ conceptual understanding of the AI’s reasoning and boosted gender equity in mental model scores by 45%.
However, the original GenderMag had a “blind spot”: it didn’t prompt teams to consider scenarios where users might doubt the AI’s recommendations. This led to the development of GenderMag-for-AI variants. The “Pre-Action Fork GenderMag” proved most successful. This variant introduced a branching path in the evaluation process, allowing teams to explicitly consider what happens when a user believes the AI versus when they doubt it. This adaptation uncovered new types of inclusivity bugs that would have otherwise been missed, highlighting the critical importance of addressing user skepticism in AI design.
This groundbreaking research underscores that true AI inclusivity extends beyond algorithmic fairness to encompass the user’s entire experience. By systematically identifying and addressing “over-the-hood” bugs, developers can create AI products that are not only powerful but also genuinely accessible and beneficial to a wider, more diverse audience. For more details, you can refer to the full research paper here. Read the full paper.


