TLDR: A new research paper explores how Large Language Model (LLM) agents can significantly improve predictive maintenance (PdM) by automating the cleaning of noisy maintenance logs. The study introduces a synthetic data generation framework and benchmarks various LLMs on their ability to detect and correct common data errors in a real-time, stream-based manner. While LLM agents excel at generic cleaning tasks, they face challenges with domain-specific inconsistencies like temporal misalignments. The research highlights a cost-quality trade-off among models and outlines future enhancements to address current limitations, aiming for more robust and autonomous data curation in industrial settings.
Predictive maintenance (PdM) is a crucial strategy in industries like automotive, aiming to foresee equipment failures before they happen. This proactive approach helps reduce downtime, cut costs, and improve operational efficiency. However, a significant hurdle in implementing effective PdM has always been the quality of maintenance logs. These logs, which are vital for training machine learning models, are often riddled with errors, making them unreliable for accurate predictions.
Traditional methods of cleaning these logs are typically manual, time-consuming, and prone to human error. They often involve batch processing, where data is collected over long periods and then cleaned retrospectively. This process is inefficient and can leave residual noise, ultimately degrading the performance of predictive models.
A New Approach with LLM Agents
A recent research paper, titled “Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance,” explores a novel solution: leveraging Large Language Model (LLM)-based agents to automate and enhance the data cleaning process. Authored by Valeriu Dimidov, Faisal Hawlader, Sasan Jafarnejad, and Raphaël Frank from the University of Luxembourg, this study investigates how LLM agents can transform maintenance log cleaning from a batch-oriented task to a real-time, stream-based correction system. You can find the full paper here: Cleaning Maintenance Logs with LLM Agents.
Understanding the Data Challenges
The researchers highlight several common types of noise found in real-world maintenance logs, which can severely impact PdM models:
- Vehicle identifier misalignment: Incorrect vehicle IDs, like a device name being used instead of a license plate.
- Out-of-fleet vehicles: Records referencing vehicles not part of the monitored fleet.
- Invalid values: Typos or non-standard entries in categorical fields (e.g., ‘Brake Sysem’ instead of ‘Brake System’).
- Missing values: Critical fields left empty.
- Digital system test entries: Records related to system testing rather than actual vehicle maintenance.
- Wrong end dates: Maintenance end dates that conflict with vehicle usage patterns.
The Agentic Framework
To evaluate LLM agents, the researchers developed a synthetic data generation framework called AgenticPdmDataCleaner. This framework creates realistic, noisy maintenance logs, overcoming the challenges of data scarcity and privacy in the automotive sector. It simulates a fleet monitoring and maintenance process, including a fleet registry, sensor data, a service operations catalog, and the maintenance log itself.
The LLM agents are equipped with two main interfaces:
- Database Tools: These allow agents to query enterprise data sources like the fleet registry, service catalog, and sensor data (e.g., odometer readings) to validate information.
- Log Cleaning API: This API enables the agent to perform actions on a maintenance record:
accept(if clean),reject(if irreparable or out-of-scope), orupdate(to apply a single-field correction).
The core task for the agents is a three-class classification problem for each log entry, where they must decide the appropriate action based on the record’s content and external data sources. Importantly, the agents operate in a zero-shot setting, meaning they are guided only by a system prompt and task instructions, without prior examples, forcing them to generalize to unseen noise patterns.
Performance and Insights
The study benchmarked six production-grade LLMs, ranging from small (Nemotron-Nano-9B-v2) to large (Gpt-5), evaluating their Error Detection Rate (EDR) and Error Correction Rate (ECR). Key findings include:
- Generic Noise Handling: All models performed well on noise-free records and generative noise types like ‘digital system test’ entries, achieving high EDRs. Larger models, such as Gpt-5 and Gpt-Oss-120B, also showed strong capabilities in correcting ‘invalid values’ and ‘missing values’.
- Domain-Specific Challenges: A significant limitation was the agents’ struggle with domain-specific noise patterns, particularly ‘wrong end dates’ and ‘vehicle identifier misalignments’. These require deeper temporal consistency checks and cross-table reasoning, which current LLMs found challenging.
- Cost-Quality Trade-off: Gpt-5 delivered the best overall performance but was the most expensive and slowest. Gpt-Oss-120B offered a favorable balance of price and performance, making it an attractive option for many industrial applications. Even smaller models like Nemotron demonstrated basic cleaning capabilities.
Also Read:
- AI-Powered Agents Revolutionize Drug Discovery with Autonomous Reasoning and Accelerated Research
- Advancing Urban Planning with AI: Beyond Prediction to Transparent Reasoning
Future Directions
The research highlights the promising potential of LLM agents for autonomous and context-aware data curation in PdM. However, it also points to areas for improvement. Future work includes expanding the noise taxonomy to cover more complex errors, integrating temporal-logic validators into the agent’s toolset, adopting hybrid rule-LLM architectures, and fine-tuning models on domain-specific data to enhance their understanding of industrial contexts. Ultimately, evaluating these agents with authentic, anonymized maintenance logs from industrial partners will be crucial to validate their real-world applicability.
This study marks a significant step towards more efficient and reliable data cleaning in predictive maintenance, paving the way for LLM-powered solutions to overcome long-standing data quality challenges in the automotive sector and beyond.


