How Large Language Models Analyze and Optimize Business Processes

TLDR: A research paper evaluates Large Language Models (LLMs), particularly ChatGPT (o3), for their ability to understand, analyze, and optimize business process models through conversational interaction. The study found that o3 excels at identifying syntactic and logical errors, reasoning deeply, and suggesting optimizations in complex process models from finance and healthcare domains. It significantly outperformed other LLMs like Claude, Grok, and Gemini. The findings suggest that LLMs can serve as valuable assistants for business process designers, making complex analysis accessible to non-experts.

In today’s fast-paced business world, efficient processes are the backbone of any successful organization. From handling customer orders to managing insurance claims, every operation relies on well-defined steps. Traditionally, designing and optimizing these processes, often using languages like Business Process Model and Notation (BPMN), has been the domain of expert designers. However, a recent research paper explores how Large Language Models (LLMs) could change this, acting as intelligent assistants for process analysis and optimization.

The paper, titled “Evaluation of LLMs for Process Model Analysis and Optimization,” by Akhil Kumar, J. Leon Zhao, and Om Dobariya, delves into the capabilities of several LLMs, with a particular focus on ChatGPT (model o3). The core idea is to see if these AI models can understand a process model presented interactively, identify errors, and reason deeply about it through natural language conversations.

The Promise of LLMs in Process Management

Large Language Models are advanced AI programs that process vast amounts of data to perform natural language tasks. They are designed to respond to user queries in a conversational style, making them ideal candidates for assisting in complex tasks like business process management. The researchers aimed to evaluate if these models could empower non-expert users to check their process models for correctness, suggest corrections, and perform various analyses independently.

A Deep Dive into LLM Capabilities

The study adopted a Design Science Research (DSR) framework to evaluate LLMs based on utility, consistency, and novelty. The evaluation workflow involved presenting a process model to an LLM and then posing interactive, conversational prompts to assess its capabilities. The LLM’s responses would then guide further tasks, such as applying fixes or performing calculations.

One of the primary case studies involved a mortgage application review process, intentionally designed with minor errors. ChatGPT (o3) was tested in a zero-shot setting, meaning it received no prior specific training for this task. The results were impressive:

Process Description: o3 accurately described the process from an image, breaking it down into main flow, approval, rejection, and notification paths.
Error Detection and Correction: It successfully identified both syntactic (BPMN notation) and logical errors. This included spotting duplicate task IDs, incorrect duration labels on gateways, misspellings, and truncated task names. Crucially, o3 also suggested precise fixes for these errors.
Redrawing Diagrams: After identifying errors, o3 was able to apply the suggested corrections and even produced a revised BPMN diagram.
Semantic Understanding and Reasoning: The model correctly calculated minimum, maximum, and average finish times for the process, providing clear reasoning for its calculations, including handling parallel sections and average time estimations.
Process Redesign: When presented with various redesign scenarios (e.g., making tasks optional, replacing tasks, doing tasks in parallel), o3 accurately calculated the time and cost impact of each scenario. It even demonstrated an understanding of the “fastest-possible redesign” by considering different process paths (acceptance vs. rejection) to find the absolute minimum time.
Logical Design Error Detection: o3 showed a deep understanding of process semantics by detecting logical errors even when the syntax was technically correct. For instance, it identified an incorrect parallel gateway being used where an exclusive choice was intended.

Comparative Performance

To generalize these findings, the researchers compared ChatGPT (o3) with Claude Opus 4, Grok 3, and Gemini 2.5 Flash using criteria like syntax error detection, logical error detection, semantic comprehension, reasoning ability, and BPMN diagramming. ChatGPT (o3) consistently outperformed the other LLMs, achieving a perfect score across all criteria. The other models struggled with various aspects, from failing to detect syntax errors to providing incorrect calculations or admitting inability to reproduce diagrams.

Handling Complexity: A Healthcare Process Example

To further stress-test the approach, a more complex healthcare process for diagnosing a suspected femoral fracture was presented to o3. This process featured nested control structures and inter-task temporal constraints. Again, o3 demonstrated remarkable capability, providing an accurate narrative, listing constraints, and even refining its time calculations when prompted to consider maximum wait times allowed by these constraints. This highlighted its ability to understand complex processes with multiple tasks, nested gateways, and intricate constraints.

The LLM’s “Thought Process”

The study also observed that o3’s reasoning processes seemed to mimic human thought. When asked complex questions, it could dissect the user’s prompt in detail, considering various angles to decipher the exact intention, much like a human analyst would. This anthropomorphic property suggests a sophisticated underlying mental model.

Also Read:

Conclusion: A Smart Assistant for All

The research concludes that LLMs like ChatGPT (o3) can serve as highly effective smart assistants and conversational partners for business process analysis. They can understand process models at syntactic, semantic, and logical levels, identify and correct errors, and perform complex calculations and redesign analyses. This capability opens the door for non-expert users to engage in sophisticated process design, analysis, and optimization, a domain previously reserved for specialists. The paper underscores that process design, analysis, and optimization are no longer solely the province of expert users, thanks to advancements in AI. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

How Large Language Models Analyze and Optimize Business Processes

The Promise of LLMs in Process Management

A Deep Dive into LLM Capabilities

Comparative Performance

Handling Complexity: A Healthcare Process Example

The LLM’s “Thought Process”

Conclusion: A Smart Assistant for All

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates