Enhancing Software Reliability: A Hybrid Approach with Large Language Models and Scenario-Based Programming

TLDR: A new methodology integrates Large Language Models (LLMs) with Scenario-Based Programming (SBP) to improve software reliability. While LLMs accelerate development, they can introduce errors. This approach guides LLMs through a structured, iterative process of defining scenarios, providing background knowledge, and incrementally developing code, with human oversight and formal verification. A case study building a Connect4 AI agent demonstrated the methodology’s effectiveness, with the agent winning against strong opponents and achieving formal correctness guarantees in specific scenarios. The research also introduced extensions to SBP for event-specific priorities, enhancing control and flexibility.

Large Language Models (LLMs) have rapidly become essential tools for software developers, helping to create complex programs. They offer significant advantages like reducing development time, generating well-organized code, and even suggesting innovative ideas. However, LLMs often introduce errors and present incorrect code with convincing confidence, which can lead developers to accept flawed solutions.

To address this challenge and bring LLMs into the software development cycle more reliably, researchers Ayelet Berzack and Guy Katz propose a novel methodology. Their approach combines LLMs with “traditional” software engineering techniques, specifically focusing on Scenario-Based Programming (SBP). The goal is to streamline development, reduce errors, and allow users to verify crucial program properties with greater confidence. This hybrid method allows human developers to infuse their expert knowledge into the LLM and to inspect and verify its outputs.

Understanding the Core Components: LLMs and Scenario-Based Programming

LLMs are deep learning models trained on vast amounts of text, capable of generating coherent natural language and performing tasks like code synthesis and debugging. While powerful, they are probabilistic and can produce incorrect or unexecutable code, known as “hallucinations.” This unreliability is a major concern for critical applications.

Scenario-Based Programming (SBP) is a paradigm where a system’s behavior is defined by independent, interacting scenarios. Each scenario represents a specific requirement or constraint and is implemented as a “scenario object.” These objects run concurrently and coordinate by declaring intentions related to events: requesting events, waiting for events, or blocking events. The SBP runtime environment then selects an event to trigger based on these declarations. This modularity makes SBP programs easier to interpret, maintain, and verify, even with automated analysis techniques like model checking. The researchers used BPpy, a Python-based implementation of SBP, for their work.

The Proposed Methodology for Reliable Software Development

The methodology emphasizes incremental development, careful planning, and active collaboration between the developer and the LLM. It’s designed to leverage LLMs for localized behaviors while compensating for their struggles with large-scale planning.

The process involves three key steps:

Define the Scenarios: Developers start by outlining the program’s overall goals and intended behaviors, breaking them down into individual, granular scenarios. The LLM can assist in identifying any missing responsibilities, helping to establish a clear, high-level structure for the SBP program.
Provide Background Knowledge to the LLM: Before implementation, the LLM is equipped with relevant background information. This includes an overview of Scenario-Based Programming (which is not widely represented in online resources) and domain-specific knowledge related to the application.
Incremental Scenario-Thread Development and Refinement: The LLM generates one scenario object at a time based on clear descriptions of required behavior. Developers manually inspect the code for logic and correctness. If flaws are found, targeted feedback is provided to the LLM for iterative refinement. The LLM acts as a coding partner, generating ideas, validating design decisions, and drafting code, while the developer guides the process and ensures correctness through testing and verification.

Case Study: Building a Connect4 Agent

To evaluate their methodology, the researchers applied it to a complex case study: developing an agent to play Connect4. The goals were to create an agent that understood the game rules and then to implement various strategies to improve its gameplay.

The process began by defining core game scenarios like board representation, player roles, turn alternation, winning conditions, and draws. After providing the LLM with SBP background, the team iteratively queried the LLM to produce scenario objects for each rule. For instance, the LLM initially misunderstood the `user_player` scenario’s role, but with clarification and targeted feedback, it successfully generated the correct code. This demonstrated how the methodology fosters productive collaboration, with the developer defining precise responsibilities and the LLM generating the implementation.

Next, the team tackled strategy implementation. The LLM helped generate strategies, starting with simple ones like prioritizing the middle column and blocking immediate threats, and progressing to more complex ones like fork prevention. Formal verification tools were used periodically to identify counterexamples where the opponent could still win. These insights were fed back to the LLM, which then helped diagnose problems and implement solutions, such as win-detection scenarios.

Extending SBP for Greater Control

During the case study, two challenges emerged that required extending the standard SBP semantics. First, some strategies would block a move that, while potentially leading to a later loss, would actually secure an immediate win. Standard SBP blocking takes precedence, preventing the winning move. Second, there was a need to request events with different priorities within a single synchronization point, especially for complex strategies like managing intersection forks.

The solution involved extending SBP to allow scenario objects to assign event-specific priorities to requested and blocked events. This introduced the concept of “soft blocking,” where a blocked event could still be triggered if requested with a higher priority. This enhancement, suggested by the LLM itself, provides greater control over event handling and allows for more nuanced strategic decisions.

Evaluation: A Winning Agent and Improved Development

The evaluation focused on the agent’s robustness, capabilities, and the developer experience:

Agent Performance: The Connect4 agent was pitted against three strong, AI-based opponents available online: the “Pro Player,” “Unbeatable AI,” and an AI player by Keith Galli. Playing as the first player (yellow), the agent won every single match against all opponents, demonstrating a reliable and successful strategy.
Formal Guarantees: Due to the large state space of Connect4, full verification was not possible. However, using BPPy’s model checker, the researchers successfully verified that their agent was guaranteed to win in six different fixed opening sequences, regardless of the opponent’s subsequent moves. This provided valuable partial correctness guarantees.
Developer Experience: Initially, developers faced frustration with LLM hallucinations, especially given the limited SBP examples. However, providing clear, iterative feedback allowed the model to improve rapidly. The LLM proved valuable for generating creative solutions and accelerating development. The modularity of SBP was crucial for isolating logic and supporting verification, leading to a more structured and manageable workflow. While there was a learning curve, the methodology ultimately proved convenient and effective for transforming behavioral descriptions into robust code.

This research demonstrates that combining LLM-driven code generation with scenario-based modeling can lead to the development of robust, maintainable, and high-performing systems. The methodology offers a controlled, reliable, and scalable approach to LLM-assisted software development. For more details, you can read the full research paper here.

Also Read:

Future Directions

Future research will explore more scalable model checking techniques for SBP, apply the methodology to other domains like robotics, and develop LLM agents specifically fine-tuned for scenario-based reasoning to further enhance coherence, safety, and reusability in scenario-driven development.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Software Reliability: A Hybrid Approach with Large Language Models and Scenario-Based Programming

Understanding the Core Components: LLMs and Scenario-Based Programming

The Proposed Methodology for Reliable Software Development

Case Study: Building a Connect4 Agent

Extending SBP for Greater Control

Evaluation: A Winning Agent and Improved Development

Future Directions

Gen AI News and Updates

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Unlocking Test Code Clarity: How Assertions Guide AI in Summarization

Large Language Models and Loop Invariants: A Performance Review

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates