TLDR: This research introduces the Model Synthesis Architecture (MSA), a computational framework that explains how people reason in novel, ‘open-world’ situations. MSA uses large language models to identify relevant information and probabilistic programs to build tailored mental models on the fly. Evaluated on a sports reasoning dataset, MSA better captures human judgments than traditional language models, especially when dealing with new variables and unexpected scenarios, suggesting a path to more human-like, flexible AI reasoning.
When faced with new and unexpected situations, humans possess a remarkable ability to pull together relevant information from their vast knowledge base and use it to make sense of the world, draw inferences, and predict outcomes. This flexible and coherent way of thinking, known as ‘open-world cognition,’ is a cornerstone of human intelligence. A recent research paper, titled ‘Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models,’ explores a computational approach to understanding and replicating this unique human capability.
The paper, authored by Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gerstenberg, Timothy O’Donnell, Alexander K. Lew, Jacob D. Andreas, Joshua B. Tenenbaum, and Tyler Brooke-Wilson, delves into the long-standing idea in cognitive science that people reason using ‘mental models’ – structured internal representations that mirror aspects of the world. These models help us maintain consistent beliefs, integrate new information, and evaluate different possibilities.
While traditional Bayesian models in cognitive science have successfully explained human judgments in many specific tasks, they often fall short when confronted with truly novel situations. These models are typically designed for a limited scope and cannot easily incorporate new, unforeseen variables or dependencies. This is where the concept of ‘open-world’ reasoning becomes crucial: how do we maintain coherent reasoning in a specific context while drawing on a vast, globally relevant pool of background knowledge?
Introducing the Model Synthesis Architecture (MSA)
The researchers propose a novel computational framework called the Model Synthesis Architecture (MSA) to address this challenge. MSA hypothesizes that human minds construct small, ad-hoc mental models on the fly, tailored to the specific demands of a task. By reasoning within these smaller, custom-built models, MSA can achieve local coherence over the relevant variables, while its ability to synthesize arbitrary models allows it to operate in open-ended environments where relevant considerations are not fixed in advance.
The MSA approach breaks down open-world reasoning into two main subproblems: first, ‘synthesizing’ ad-hoc models that include all relevant variables for a given situation; and second, ‘reasoning within’ that constructed model using general algorithms for belief updating and decision-making.
In their implementation, the team uses large language models (LMs) to handle the ‘global relevance’ aspect – retrieving and organizing relevant background knowledge. For the ‘local coherence’ aspect, they employ probabilistic programming languages (PPLs) to construct bespoke, coherent world models. This combination allows MSA to process arbitrary natural language inputs (thanks to the LM front end) and express arbitrary probabilistic models (due to the general-purpose PPL modeling language).
Evaluating MSA: The ‘Model Olympics’
To evaluate their MSA, the researchers created a novel reasoning dataset called ‘Model Olympics,’ consisting of natural language vignettes about sporting events. This domain was chosen because it naturally integrates intuitive causal reasoning, uncertainty, and diverse latent variables, providing a structured yet open-ended setting for testing flexible cognitive architectures. For instance, in a new sports scenario, one cannot know in advance if injuries, weather, team dynamics, or new equipment will be relevant.
Three experiments were conducted with human participants and compared against the MSA and other baseline models:
-
Experiment 1 (Detailed Backgrounds): Participants reasoned about situations where all relevant causal relationships were explicitly described in language.
-
Experiment 2 (Underspecified Backgrounds): Key relationships between variables were only briefly mentioned or implied, requiring participants to retrieve details from their background knowledge.
-
Experiment 3 (Participant-Generated Novel Details): This experiment introduced new, uncontrolled variables (like ‘sports commentary’ from other naive human subjects, e.g., an athlete pulled a muscle) that needed to be integrated into reasoning, directly testing open-world capabilities.
Also Read:
- Dynamic Tree Reasoning with Reinforcement Learning for Adaptive LLM Problem Solving
- Unlocking AI Autonomy: How Mental Imagery Could Transform Machine Thinking
Key Findings and Implications
Across all experiments, the MSA consistently captured human judgments better than language model-only baselines, including those using chain-of-thought prompting. This suggests several important insights:
-
Human Reasoning Aligns with Probabilistic Models: People’s judgments are generally consistent with Bayesian inference in ad-hoc probabilistic models, even when those models are synthesized on the fly from natural language.
-
MSA Outperforms LMs in Generalization: The structured nature of MSA’s probabilistic models allows it to generalize better to arbitrary new details and less familiar settings (like the canoe racing domain, which was less likely to be in LM training data) compared to pure LMs. This difference was most pronounced in Experiment 3, highlighting MSA’s strength in open-world scenarios where new variables and dependencies emerge.
-
Retrieving and Representing Relevant Information: The MSA demonstrated its ability to retrieve reasonable descriptions of variables and causal dependencies from natural language and formalize them into functional probabilistic programs.
The research suggests that while large language models are excellent at retrieving world knowledge, they may be less adept at integrating that evidence into a locally coherent world model in the way humans do. The explicit causal and probabilistic representations within MSA appear to force the model to focus on deeper structural properties rather than just superficial linguistic features, leading to a better fit with human reasoning.
This work offers a promising path toward understanding and replicating human reasoning in open-ended domains, bridging the gap between the flexibility of large language models and the coherence of structured probabilistic models. For more details, you can refer to the full research paper available at arXiv.org.


