Boosting Predictive Process Monitoring with Domain-Adapted LLMs

TLDR: This research explores directly adapting Large Language Models (LLMs) to process data for Predictive Process Monitoring (PPM) tasks like next activity and remaining time prediction. Instead of relying on natural language reformulation or prompt engineering, the study uses parameter-efficient fine-tuning (PEFT) techniques. The results show that these domain-adapted LLMs can outperform traditional recurrent neural networks and narrative-style LLM approaches, especially in multi-task settings, with faster convergence and reduced hyperparameter optimization, demonstrating their potential for interpreting sequential process information.

Large Language Models (LLMs) have rapidly become a cornerstone in various research fields, including Process Mining (PM). Traditionally, their application in PM has revolved around prompt engineering or transforming event logs into narrative-style datasets, leveraging the LLMs’ inherent semantic understanding. However, a recent study introduces a novel approach: directly adapting pretrained LLMs to process data without the need for natural language reformulation. This method, driven by the LLMs’ proficiency in generating token sequences—a task akin to objectives in PM—aims to unlock their full potential in this domain.

The research, titled Domain Adaptation of LLMs for Process Data, focuses on parameter-efficient fine-tuning (PEFT) techniques. This strategy is crucial for mitigating the substantial computational overhead typically associated with large models, making their adaptation more practical and accessible. The experimental setup specifically targets Predictive Process Monitoring (PPM), a branch of PM concerned with forecasting future process states and case behaviors. The study investigates both single-task and multi-task predictions, such as predicting the next activity (NA) or the remaining time (RT) in a process.

The findings are compelling: the fine-tuned LLMs demonstrate a potential improvement in predictive performance compared to state-of-the-art recurrent neural network (RNN) approaches and even recent narrative-style solutions, particularly excelling in multi-task settings. Beyond performance, these adapted models exhibit faster convergence during training and require significantly less hyperparameter optimization, simplifying their deployment and maintenance.

Why Direct Adaptation?

Current methods often treat event logs as plain text, relying on the LLMs’ general language skills. This overlooks the structured, domain-specific nature of process data. Event logs are not linguistic artifacts; they adhere to a smaller alphabet of activity labels with a distinct syntax governed by behavioral relations. The authors argue that relying solely on semantic meaning, as captured by LLMs trained on natural text, is insufficient for modeling complex process behavior. Direct adaptation, through retraining embedding layers and fine-tuning specific weights, allows the LLM to learn from process data in its native format, bypassing the need for natural language conversion and evaluating its intrinsic capability to interpret sequential process information.

Furthermore, prompt engineering, while powerful, demands expert knowledge and is prone to errors and model sensitivity. Small changes in phrasing can lead to drastically different outcomes, highlighting the fragility of such approaches. The systematic fine-tuning approach presented in this paper offers a more robust and consistent alternative.

Methodology at a Glance

The proposed methodology employs various PEFT strategies across different LLMs, comparing their effectiveness against RNN-based and prompt-based solutions. The framework comprises four main components: input layers, backbone, output layers, and the PEFT of these components. Input layers convert raw event features into a common vector space. The backbone, typically a transformer model like GPT-2, Qwen2, or Llama3.2, transforms this representation. Output layers map the backbone’s outputs to task-specific predictions (e.g., NA or RT). PEFT involves training only a small subset of parameters, such as new input/output layers, while the main backbone is either frozen, partially frozen, or enhanced with adapter layers like Low-Rank Adaptation (LoRA).

Key Experimental Insights

The experiments utilized five real-world event logs, including BPI12, BPI17, and three versions of BPI20, chosen for their diversity. The results highlighted several critical points:

Multi-task RNNs consistently underperformed, and narrative-style solutions (S-NAP) were significantly outperformed across all datasets, suggesting that semantic capabilities alone are insufficient for learning complex process behaviors.
While recurrent networks use fewer parameters and less runtime, they demand extensive hyperparameter optimization. LLMs, especially with LoRA, required less tuning and generally outperformed RNNs and S-NAP in both single- and multi-task setups.
Among the fine-tuned LLMs, Llama and Qwen demonstrated remarkable stability across datasets, while PM-GPT2 showed strong performance on specific datasets but lacked overall consistency.
LLMs and single-task RNNs converged faster than multi-task RNNs for NA prediction, with LLMs often needing fewer than five epochs. For RT prediction, LLMs consistently outperformed both single- and multi-task RNNs.
LoRA proved particularly effective for RT prediction, outperforming freezing configurations. This suggests that LLMs, originally trained as classifiers, benefit significantly from adapter layers when tackling regression tasks. For NA prediction, fine-tuning a few layers or using LoRA generally yielded better results than fully freezing the model.

Also Read:

Future Directions

While this work marks a significant step forward, limitations remain. PEFT, though cost-effective, still involves more trainable parameters than RNNs. Future research could explore quantization techniques to further reduce model size. Additionally, adapting LLMs for other complex PM tasks like process discovery and anomaly detection, which don’t align with standard training formats, presents ongoing challenges. The study also notes that LoRA was used with default settings, implying potential for further optimization to reduce memory usage.

In conclusion, this study provides a systematic evaluation of fine-tuning methods for adapting LLMs to predictive process monitoring. By moving beyond prompt engineering and narrative reformulations, the research demonstrates that explicitly adapted LLMs can outperform traditional PPM models and narrative-style approaches in both single- and multi-task next activity and remaining time prediction, paving the way for more robust and efficient process analysis.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting Predictive Process Monitoring with Domain-Adapted LLMs

Why Direct Adaptation?

Methodology at a Glance

Key Experimental Insights

Future Directions

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates