TLDR: A new research paper explores how Large Language Models (LLMs) like ChatGPT can automatically formulate and solve complex stochastic optimization problems from natural language. Focusing on chance-constrained and two-stage stochastic models, the study introduces novel prompting strategies, including a multi-agent framework, and a ‘soft scoring’ metric for evaluation. Key findings show GPT-4-Turbo, combined with specific prompting methods, significantly outperforms other models, demonstrating LLMs’ potential to revolutionize decision-making under uncertainty.
Large Language Models (LLMs) like ChatGPT are rapidly transforming various fields, and operations research (OR) is no exception. While previous efforts have focused on using LLMs for deterministic optimization problems, a recent study delves into the more complex realm of stochastic optimization. This area is crucial for real-world decision-making where uncertainty plays a significant role, such as in supply chain management, energy systems, and finance.
The research, titled “Large Language Model-Based Automatic Formulation for Stochastic Optimization Models” by Amirreza Talebi from The Ohio State University, presents the first integrated and systematic investigation into how LLMs can automatically formulate and solve stochastic optimization problems from natural language descriptions. This is a significant step forward, as stochastic problems, involving elements like chance constraints and two-stage recourse models, are inherently more challenging due to their probabilistic variables and multi-stage decision-making.
Tackling Uncertainty with LLMs
The paper focuses on three main categories of stochastic optimization problems: joint chance-constrained models, individual chance-constrained models, and two-stage stochastic linear programs (SLP-2), along with their deterministic counterparts (DLP-2). These problems require sophisticated reasoning to handle probabilistic variables, constraints, and objective functions across different stages.
To guide ChatGPT through these complex tasks, the researchers designed several innovative prompting strategies. These include ‘chain-of-thought’ and ‘modular reasoning,’ which help the LLM break down problems into manageable steps. A particularly novel approach is the multi-agent prompting framework, where specialized ChatGPT agents collaborate. For instance, one agent might extract model elements (sets, parameters, variables), another formulates the mathematical model, and reviewer agents evaluate the consistency and coherence of the outputs. This collaborative setup mimics a team of experts working together, enhancing the LLM’s ability to tackle intricate problems.
A New Way to Evaluate Performance
One of the key contributions of this paper is the introduction of a novel ‘soft scoring’ metric. Traditional evaluation methods often rely on canonical accuracy (exact matches to a ground-truth model) or execution-based accuracy (whether the generated code runs and produces the correct optimal solution). However, these methods can be limited, as they might not account for partial correctness, structural quality, or variations in notation. The soft scoring metric addresses these limitations by evaluating structural similarity, notational variations, and component-level permutations, providing a more nuanced assessment of the LLM-generated models.
Also Read:
- Bridging Human Intuition and AI Efficiency in Design Optimization
- Evaluating LLM Behavior in Dynamic Economic Tasks
Key Findings and Future Directions
The extensive experimental evaluation, using various ChatGPT models (GPT-3.5-Turbo, GPT-3.5-16K, GPT-4.0, and GPT-4.0-Turbo) and prompting methods, yielded compelling results. GPT-4-Turbo consistently emerged as the top performer, excelling in partial score, variable matching, and objective accuracy, while also exhibiting the lowest error rates. Among the prompting strategies, ‘cot_s_instructions’ proved most effective for SLP-2 tasks, and the ‘agentic’ (multi-agent) approach demonstrated superior robustness in reasoning, even without explicit instructions.
The findings underscore that with carefully engineered prompts and multi-agent collaboration, LLMs can significantly facilitate stochastic formulations. This paves the way for intelligent, language-driven modeling pipelines in stochastic optimization, making complex OR tasks more accessible and efficient. While deterministic two-stage problems (DLP-2) were found to be the most tractable, constraint matching and overall accuracy remain areas for further improvement across all problem categories.
Future research aims to enhance the multi-agent system with dynamic feedback loops, integrate symbolic solvers for real-time validation, and extend the framework to handle non-linear, integer, or multi-stage stochastic models. This work represents a crucial step towards fully leveraging LLMs to automate and improve decision-making under uncertainty. You can read the full research paper here.


