TLDR: Researchers have introduced and formalized a new category of game dynamics called Bounded One-Sided Response Games (BORGs), where one player’s action temporarily transfers control to an opponent who must fulfill a fixed condition. They developed a modified Monopoly Deal environment to isolate this dynamic and demonstrated that the standard Counterfactual Regret Minimization (CFR) algorithm can effectively learn strategies for it. The project also includes a lightweight, full-stack research platform for efficient and reproducible experimentation, showing strong AI performance against baseline opponents.
Artificial intelligence researchers are constantly seeking new ways to understand and model complex decision-making processes, often turning to card games as simplified yet strategically rich environments. These games help shed light on real-world challenges like negotiation, finance, and cybersecurity. Traditionally, these games are categorized by how players interact: strictly-sequential (players take turns with single actions), deterministic-response (actions trigger fixed outcomes), or unbounded reciprocal-response (players can counter each other repeatedly).
However, a team of researchers has identified and formalized a less-explored but strategically significant interaction pattern: the bounded one-sided response. They term games featuring this dynamic as Bounded One-Sided Response Games (BORGs). In a BORG, when one player takes an action, control temporarily shifts to the opponent. The opponent must then perform a sequence of actions to meet a specific condition before the original player’s turn fully concludes. Crucially, this response phase is ‘one-sided’ (the opponent acts without immediate counterplay) and ‘bounded’ (it has a fixed condition for completion).
Introducing Monopoly Deal as a BORG Benchmark
To specifically isolate and study this BORG dynamic, Will Wolf introduced a modified version of the popular card game, Monopoly Deal. This adaptation simplifies some of the original game’s rules while highlighting the bounded one-sided response mechanism. For instance, when a player uses a ‘Rent’ card, the opponent is compelled to sequentially choose cards (either cash or property) to satisfy the rent demand. This interaction perfectly encapsulates the BORG dynamic, where the responding player must fulfill a condition before the turn resolves.
The research demonstrates that a well-established algorithm, Counterfactual Regret Minimization (CFR), can effectively learn strategies for this BORG environment without needing any new algorithmic modifications. CFR is a gold-standard method for computing approximate Nash equilibria in games where players have incomplete information, meaning they don’t know everything about their opponents’ hands or strategies.
A Comprehensive Research Platform
Beyond the theoretical formalization and algorithmic demonstration, the project also delivers a practical, lightweight, and full-stack research platform. This system integrates the modified Monopoly Deal game environment, a parallelized CFR training engine, and even a human-playable web interface. The entire setup is designed to run efficiently on a single workstation, making it highly accessible for researchers. The platform prioritizes fast convergence (achieving stable strategies in about 20 minutes), detailed logging for introspection, easy human interaction with trained AI agents, and robust reproducibility through deterministic seeding and checkpointing.
The system’s architecture is divided into a training stack and a serving stack. The training stack uses local ‘Ray workers’ to run self-play games, with a central process managing the global policy and regret updates. The serving stack loads these trained policies into a web-based interface, allowing humans to play against the AI and visualize its decision-making process in real-time. All game interactions are logged to a database for later analysis.
How the AI Learns and Performs
The CFR implementation uses a variant called Monte Carlo CFR (MCCFR) with an ‘action-based rollout’ strategy. To manage the complexity of the game, the state space is compressed using an ‘intent-based abstraction.’ This means the AI doesn’t track every minute detail but rather focuses on high-level ‘abstract actions’ like ‘StartNewPropertySet’ or ‘JustSayNo.’ This compact representation results in a manageable number of unique decision points (around a hundred), allowing for rapid and efficient learning.
Experiments showed impressive results. The maximum expected regret, a key metric for convergence in CFR, steadily declined and stabilized within just 1,000 games, taking approximately 19 minutes of training time. When evaluated against baseline opponents, the trained AI achieved a near-perfect win rate (almost 100%) against a random player and a strong 75% win rate against a more sophisticated ‘risk-aware’ heuristic opponent. The policy evolution analysis revealed that the AI learned to favor actions that promote property building and retention, such as playing ‘Just Say No’ or ‘Give Opponent Cash’ during response phases, and ‘AddToPropertySet’ or ‘CompletePropertySet’ during normal play.
Also Read:
- The Oversight Game: A New Framework for Balancing AI Autonomy and Human Control
- Improving LLM Agent Collaboration Through Strategic Talk and Environmental Feedback
Future Directions for BORG Research
While the current formulation of BORGs in Monopoly Deal treats the opponent’s response as a ‘multi-set decision’ (where the order of actions doesn’t affect the final outcome), future work aims to introduce sequential dependencies within the response phase. This would make the decision process even more complex and realistic. Researchers also plan to explore more advanced reinforcement learning techniques beyond tabular CFR to handle larger, more detailed state spaces and potentially remove the need for intent-based abstractions. As policy complexity grows, distributed training using cloud resources will also be a necessary next step.
This work provides a robust foundation for studying bounded one-sided response dynamics, offering both a formal framework and a practical, accessible platform for future AI research in sequential decision-making under uncertainty. You can find the full research paper here.


