TLDR: A new AI decision support system improves human performance in sequential decision-making tasks by adaptively narrowing down the available actions for humans, rather than providing single recommendations. Tested in a wildfire mitigation game with 1,600 participants, the system led to 30% better human performance compared to unsupported humans and over 2% better than the AI alone, demonstrating effective human-AI complementarity. An efficient algorithm was also developed to optimize the level of human agency.
The idea of humans and machines working together to achieve better outcomes than either could alone, known as complementarity, has long fascinated researchers. While much progress has been made in one-shot prediction tasks, applying this principle to complex sequential decision-making has remained a significant challenge.
A recent study introduces a novel approach to decision support systems that aims to achieve this human-AI complementarity in sequential tasks without requiring human experts to understand precisely when to defer to the AI or when to exercise their own judgment. Instead, the system adaptively controls the level of human agency by narrowing down the set of actions a human can take.
How the System Works
Imagine a scenario where a human needs to make a series of decisions, like managing a wildfire or making medical diagnoses. Traditionally, an AI might offer a single best recommendation, leaving the human to decide whether to follow it. This new system takes a different tack: it uses a pre-trained AI agent to present the human with a curated subset of possible actions, called an “action set.” The human then chooses an action from this reduced set.
The core mechanism involves an AI agent that evaluates all potential actions based on the current state of the environment. For instance, in a wildfire scenario, the AI might assess which burning tiles are most critical to extinguish. The system then constructs an action set by including the AI’s top-ranked action and other actions whose valuations are sufficiently close to the best, with a touch of randomness to ensure smooth transitions between different levels of agency. A crucial parameter, epsilon (ε), controls the size of this action set, effectively determining how much agency the human retains. A high epsilon means more choices for the human, while a low epsilon means fewer, more constrained choices.
Optimizing Human Agency
A key challenge is finding the optimal value for epsilon (ε) – the sweet spot where human and AI collaboration yields the best results. To address this, the researchers developed an efficient algorithm called a Lipschitz bandit algorithm. This algorithm is designed to efficiently explore different values of epsilon and identify the one that maximizes the average total reward achieved by the human-AI team. It leverages the mathematical property that the system’s performance changes smoothly with variations in epsilon, allowing for a more targeted and efficient search for the optimum.
Real-World Evaluation: The Wildfire Mitigation Game
To rigorously evaluate their decision support system, the researchers conducted a large-scale human subject study involving 1,600 participants. These participants played 16,000 instances of a custom-designed wildfire mitigation game. In this game, players were presented with a 10×10 grid representing a forest, with healthy, burning, and burnt tiles. The goal was to prevent the fire from spreading by applying water mitigation measures to burning tiles.
Participants either played the game on their own, or with the support of the AI system, which narrowed down the burning tiles they could choose to extinguish. The AI agent itself was a Deep Q-Network (DQN), a type of reinforcement learning model, trained to autonomously manage wildfires.
Also Read:
- Adaptive Search: How Reinforcement Learning Powers Intelligent AI Agents
- KG-Agent: Enhancing AI Exploration and Strategy in API-Free Environments
Impressive Results
The findings from the study were compelling:
- Participants supported by the AI system, operating at the optimal epsilon value, significantly outperformed those playing on their own, achieving approximately 30% higher average cumulative rewards.
- Even more remarkably, the human-AI team, under optimal conditions, managed to slightly outperform the AI agent playing autonomously, by more than 2%. This is particularly significant given that the AI agent alone already substantially outscored unsupported human players.
These results strongly suggest that adaptively narrowing action choices for humans can indeed lead to effective human-AI complementarity in sequential decision-making tasks. The Lipschitz bandit algorithm also proved successful in efficiently identifying the optimal level of human agency.
While the study focused on a specific wildfire mitigation game, the principles demonstrated offer a promising direction for designing decision support systems in various complex domains. For those interested in the technical specifics, the full research paper can be accessed here: Narrowing Action Choices with AI Improves Human Sequential Decisions.


