Navigating Uncertainty: How PlanU Empowers LLMs for Better Decision-Making

TLDR: PlanU is a new method that helps Large Language Models (LLMs) make better decisions in unpredictable environments. It uses Monte Carlo Tree Search (MCTS) but enhances it by modeling the full range of possible outcomes (quantile distribution) instead of just averages, and introduces a “Curiosity” score to balance exploring new options with exploiting known good ones. This allows LLMs to effectively handle both their own inherent uncertainties and the uncertainties of the environment, leading to superior performance in various complex tasks.

Large Language Models (LLMs) are becoming increasingly powerful, showing remarkable abilities in areas like reasoning and coding. This success has naturally led researchers to explore their potential in decision-making tasks, where an AI agent needs to choose actions to achieve specific goals.

However, LLMs often face significant hurdles when making decisions in environments filled with uncertainty. This challenge stems from two main sources: LLM uncertainty and environmental uncertainty.

LLM uncertainty arises from the inherent randomness in how these models generate text. Sometimes, this can lead to “hallucinations” or inaccurate outputs. Many current approaches try to tackle this by generating multiple reasoning paths or using complex search trees. But these methods frequently overlook the second type of challenge: environmental uncertainty. This occurs when the environment itself is unpredictable, meaning an action might lead to different outcomes each time it’s performed, rather than a single, deterministic result. Imagine trying to plan a series of actions where each step has a chance of failing or leading to an unexpected situation – that’s environmental uncertainty at play.

Introducing PlanU: A New Approach to Decision Making Under Uncertainty

To address these critical uncertainty challenges, researchers have introduced PlanU, an innovative LLM-based planning method. PlanU is designed to help LLMs make better decisions in stochastic, or unpredictable, environments by integrating uncertainty directly into its planning process. You can read the full research paper here.

At its core, PlanU builds upon Monte Carlo Tree Search (MCTS), a well-known algorithm for exploring decision spaces. Unlike traditional MCTS methods that often simplify uncertainty by averaging outcomes, PlanU takes a more sophisticated approach. It models the potential “return” or outcome of each decision point in the MCTS tree not as a single average value, but as a “quantile distribution.” This allows PlanU to capture the full spectrum of possible outcomes and their likelihoods, providing a much richer understanding of the risks and rewards associated with each action.

Balancing Exploration and Exploitation with Curiosity

Another key innovation in PlanU is its “Upper Confidence Bounds with Curiosity” (UCC) score. During the planning process, PlanU needs to decide whether to stick with actions that have worked well in the past (exploitation) or try new, less-explored options (exploration). The UCC score helps PlanU make this balance by not only considering the potential value of an action but also how “curious” the model is about a particular state. This curiosity is measured by estimating the novelty of a state, encouraging the agent to explore less-visited parts of the environment.

Furthermore, to mitigate the impact of LLM uncertainty, PlanU uses a text encoder. This helps the model recognize that slightly different textual descriptions can refer to the same underlying state, preventing the LLM from getting confused by minor variations in language.

Also Read:

Demonstrated Effectiveness Across Diverse Tasks

Extensive experiments have shown PlanU’s effectiveness across various decision-making benchmarks, including Blocksworld (stacking blocks), Overcooked (preparing meals), and VirtualHome (household tasks). PlanU consistently outperformed existing state-of-the-art methods, especially in environments where actions had a chance of failure or led to unpredictable outcomes. For instance, in a simple stock investment task, PlanU correctly identified the optimal investment strategy where other methods struggled due to their inability to properly model environmental uncertainty.

The research also highlighted PlanU’s robustness to LLM uncertainty. Even when prompts were shuffled or injected with irrelevant information, PlanU’s performance remained largely stable, demonstrating its ability to handle the inherent variability of language models.

In conclusion, PlanU represents a significant step forward in enabling LLMs to make more reliable and effective decisions in complex, uncertain real-world scenarios. By explicitly modeling uncertainty through quantile distributions and fostering intelligent exploration with curiosity-driven scores, PlanU paves the way for more capable and trustworthy AI agents.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Navigating Uncertainty: How PlanU Empowers LLMs for Better Decision-Making

Introducing PlanU: A New Approach to Decision Making Under Uncertainty

Balancing Exploration and Exploitation with Curiosity

Demonstrated Effectiveness Across Diverse Tasks

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates