TLDR: A new framework, Open-Universe Assistance Games (OU-AGs), and a method called GOOD (GOals from Open-ended Dialogue) are introduced to help AI agents infer and adapt to diverse, evolving human goals in real-time. GOOD uses large language models to track, refine, and prioritize natural language goals, improving agent performance in complex tasks like grocery shopping and household robotics compared to baselines that lack explicit goal tracking.
In the rapidly evolving world of artificial intelligence, a significant challenge for embodied AI agents is their ability to understand and adapt to the diverse and often unstated goals and preferences of humans. Traditional AI designs often rely on predefined sets of goals, which makes them struggle in ‘open-universe’ environments where human needs are dynamic and underspecified. Imagine a grocery assistant that needs to account for allergies, local ingredient preferences, or specific dietary requirements – these are difficult for designers to anticipate in advance.
Introducing Open-Universe Assistance Games (OU-AGs)
To address this, researchers Rachel Ma, Jingyi Qu, Andreea Bobu, and Dylan Hadfield-Menell from MIT CSAIL have introduced a new framework called Open-Universe Assistance Games (OU-AGs). This framework allows AI agents to reason over an unbounded and evolving space of possible human goals. Unlike previous models that might only account for uncertainty in the environment, OU-AGs specifically model uncertainty about the human’s task and preferences, which can change and grow during an interaction.
For instance, in a grocery shopping scenario, a human’s initial goal might be a generic ‘buy cake ingredients’. Through dialogue, this could evolve to ‘buy vanilla cake ingredients for 12’ and later include a constraint like ‘don’t buy dairy’. OU-AGs are designed to track these evolving sets of preferences, allowing the AI to maintain an interpretable understanding of the human’s active goals.
GOOD: Goals from Open-ended Dialogue
To solve the challenges posed by OU-AGs, the team developed a data-efficient, online method called GOOD (GOals from Open-ended Dialogue). GOOD leverages large language models (LLMs) to extract and manage human goals expressed in natural language during an interaction. It performs three key functions:
- Proposing new candidate goal sets based on the ongoing dialogue.
- Removing goals that are no longer likely or relevant (perhaps because they’ve been achieved).
- Ranking these goals to guide the agent’s actions.
GOOD’s inference module can either use simple LLM prompting to select the most likely goals or employ a more explicit probabilistic inference by eliciting pairwise comparisons from the LLM to compute a distribution over goal sets. This allows the agent to estimate uncertainty and act only when sufficiently certain about a particular goal.
Real-World Applications and Performance
The researchers evaluated GOOD in two open-ended assistance domains: a text-based grocery shopping environment and a text-operated simulated household robotics environment (AI2Thor). They compared GOOD against a ‘Full Context Baseline’ agent, which relies solely on the full conversation history for decision-making without explicit goal tracking.
The results showed that GOOD consistently outperformed the baseline. In the robot domain, where actions are more varied and outcomes more distinct, GOOD significantly improved action quality. The baseline often struggled with long dialogue contexts, leading to repetitive or unhelpful actions. By explicitly tracking goals, GOOD agents could better focus their actions to meet human preferences. While the differences were less pronounced in the grocery domain, the benefits of explicit goal tracking were still evident.
Both LLM-as-a-judge and human evaluations confirmed GOOD’s superior performance, with human ratings generally mirroring the trends observed in LLM evaluations. The study also noted that while GOOD with probabilistic inference generally took longer to run than the baseline, it offered a more robust and interpretable method for goal tracking.
Also Read:
- Adaptive Planning for LLM Agents with Grounded Memory
- Enhancing LLM Perspective-Taking: A Look at Structured Thought-Action Sequences
Future Directions
This research marks a significant step towards building more adaptable, interpretable, and corrigible AI agents. Future work aims to integrate GOOD with Vision-Language Models (VLMs) and other multimodal systems to support richer forms of input, moving beyond text-based scenarios. Further human subject studies are also planned to explore the benefits of interpretable goals and how human feedback can be incorporated for corrections. For more details, you can read the full research paper: Open-Universe Assistance Games.


