Enhancing 5G Network Troubleshooting with Advanced Language Models

TLDR: A new framework uses reasoning Large Language Models (LLMs) for Root Cause Analysis (RCA) in 5G networks. It introduces TeleLogs, a new dataset, and a two-stage training method (supervised fine-tuning followed by reinforcement learning). This approach significantly improves LLM accuracy and reasoning for network troubleshooting, even enabling smaller models to outperform larger, state-of-the-art LLMs, and shows strong generalization.

Modern 5G mobile networks are incredibly complex systems that need to maintain high performance and reliability. However, like any intricate system, faults inevitably occur, ranging from hardware issues to software misconfigurations. While detecting these faults is important, truly resolving them requires understanding the underlying causes. This process is known as Root Cause Analysis (RCA), a crucial part of managing network operations.

Traditionally, RCA has relied on expert knowledge to create logical frameworks that map symptoms to potential root causes. However, manually encoding these rules becomes increasingly difficult with the growing size and complexity of today’s mobile networks. Machine learning techniques have been applied to automate RCA, but they often face limitations in terms of scalability, how easily their decisions can be understood, and their ability to generalize to new problems, especially when dealing with many symptoms or large amounts of data.

The Role of Large Language Models (LLMs)

Recent advancements in Large Language Models (LLMs) have opened new possibilities for designing more advanced RCA models. LLMs are good at processing unstructured data, combining different pieces of knowledge, and generating explanations that humans can read. This makes them well-suited for network troubleshooting tasks. However, LLMs also have limitations; their outputs, while rich in context, often lack the formal precision and consistency needed for critical decision-making, which rule-based systems typically provide.

To overcome these challenges, researchers are proposing to use ‘reasoning LLMs’ – models specifically fine-tuned for structured, multi-step reasoning. These models can produce coherent diagnostic explanations that combine learned patterns with specific rules from the domain, making them both more understandable and practical for use.

A Novel Approach to RCA in 5G Networks

A new research paper, titled “Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks” by Mohamed Sana, Nicola Piovesan, Antonio De Domenico, Yibin Kang, Haozhe Zhang, Merouane Debbah, and Fadhel Ayed, introduces a lightweight framework that uses LLMs for RCA. The paper highlights how domain knowledge can be systematically integrated into the LLM’s reasoning process to improve both the accuracy and interpretability of fault diagnosis. You can read the full paper here: Research Paper.

A key contribution of this work is the introduction of TeleLogs, a specially designed dataset of annotated troubleshooting problems. This dataset is built by simulating a network drive testing environment based on real network engineering parameters. It provides full visibility into network configuration and user-plane performance, allowing for detailed analysis of fault scenarios. TeleLogs includes network engineering parameters (like gNodeB ID, cell ID, antenna configurations) and user-plane data (like downlink throughput, signal strength, and mobility context). The diagnostic scenarios in TeleLogs focus on a specific symptom: a significant drop in downlink throughput below a certain threshold (e.g., 600 Mbps). The dataset comprises eight possible root causes, ranging from excessive vehicle speed affecting link quality to misconfigured antenna angles or interference from neighboring cells.

Two-Stage Training Methodology

To train the reasoning LLMs to solve problems in TeleLogs, the researchers propose a two-stage training methodology:

1. Supervised Fine-Tuning (SFT)

This initial stage provides the model with a strong foundation by aligning its outputs with high-quality, labeled examples. It uses a multi-agent data generation pipeline where multiple LLM-based reasoning agents independently perform RCA using different strategies, such as elimination-based or contradiction-based prompting. An ‘aggregator agent’ then synthesizes these diverse reasoning paths into a concise and structured explanation, improving interpretability and efficiency.

2. Reinforcement Learning (RL) Fine-Tuning

After SFT, the model undergoes RL training using a method called Group Relative Policy Optimization (GRPO). This stage further refines the model’s diagnostic performance and reasoning ability. RL helps the model learn to produce better outputs by rewarding correct answers and penalizing incorrect or less optimal reasoning paths.

Impressive Results and Generalization

The experiments, conducted using models from the Qwen family (1.5B, 7B, and 32B parameters), showed significant performance gains. The proposed SFT+RL method consistently and substantially improved accuracy across all model sizes compared to using SFT or RL alone, or the base models. For instance, even the smallest model, Qwen2.5-RCA-1.5B, achieved an accuracy of over 80%, a seven-fold gain over the base model.

Remarkably, the fine-tuned smaller models often outperformed larger, state-of-the-art reasoning LLMs. For example, Qwen2.5-RCA-32B achieved over 95% accuracy, far surpassing models like Qwen3-32B and DeepSeek R1 Distill-Llama-70B. Even Qwen2.5-RCA-1.5B reached over 87% accuracy, which is more than 2.5 times higher than some leading reasoning models.

To test the robustness and generalization of the method, the LLMs were also evaluated on a randomized version of the dataset, where superficial cues were altered. The fine-tuned models continued to show strong performance, indicating that they were learning robust causal reasoning strategies rather than just memorizing patterns.

Also Read:

Conclusion

This research demonstrates that reasoning LLMs, when properly trained with a domain-specific dataset and a two-stage methodology, can serve as powerful and explainable diagnostic tools for complex systems like 5G networks. Future work will explore extending this method to handle scenarios with multiple root causes and incorporating real-world operational data.