Bridging the Gap in Observability Data: Introducing TelecomTS

TLDR: TelecomTS is a new, large-scale, multi-modal observability dataset derived from a 5G telecommunications network. It addresses limitations of existing datasets by providing de-anonymized, heterogeneous time series data with absolute scale information, supporting tasks like anomaly detection, root-cause analysis, and multi-modal question-answering. Experiments show that current state-of-the-art models struggle with the unique, erratic nature of observability data, highlighting the need for new foundation models that leverage scale information for practical applications.

Modern enterprises rely heavily on monitoring complex systems, generating vast amounts of time series data known as observability data. Unlike traditional time series data from fields like weather or finance, observability data presents unique challenges: it’s often zero-inflated, highly stochastic, and lacks clear temporal patterns. Despite its critical importance for maintaining system health, public benchmarks for observability datasets have been scarce due to proprietary restrictions. Existing datasets are typically anonymized and normalized, which removes crucial scale information and limits their utility for advanced tasks beyond simple forecasting, such as anomaly detection, root-cause analysis, and multi-modal reasoning.

To address this significant gap, researchers have introduced TelecomTS, a groundbreaking, large-scale observability dataset derived directly from a live 5G telecommunications network. This dataset stands out by featuring heterogeneous, de-anonymized covariates with explicit scale information, making it suitable for a wide array of downstream tasks. These tasks include anomaly detection, root-cause analysis, and a novel question-answering benchmark that demands multi-modal reasoning capabilities.

What Makes TelecomTS Unique?

TelecomTS differentiates itself from prior datasets in two key ways:

First, it offers heterogeneous and de-anonymized covariates with full scale information. Built from extensive data collection on a 5G network, TelecomTS comprises over 1 million observations of Key Performance Indicators (KPIs) across all layers of the protocol stack. It captures categorical covariates from dynamic communication protocols alongside mixed data types (integers and floating-point variables with diverse ranges), reflecting the true complexity of observability data. Crucially, preserving absolute scale information enables the design of meaningful tasks grounded in operational semantics and allows for investigating the impact of normalization strategies.

Second, TelecomTS supports a comprehensive suite of downstream tasks. Recognizing that observability applications extend beyond simple forecasting, the dataset incorporates a diverse set of anomalies. These include real anomalies generated via controlled jamming signals and synthetically curated rare events based on scholarly descriptions of real-world failures. This native support for anomaly detection and root-cause analysis is complemented by a question-answering (Q&A) benchmark that combines temporal reasoning with domain-specific questions tied to network observability semantics.

How Was TelecomTS Created?

The dataset was collected using a custom-built 5G wireless network in a lab environment, ensuring freedom from privacy concerns. This setup allowed for recording 18 KPIs from both the base station and mobile devices at a high 100 ms resolution. To introduce variability, the lab environment was divided into three zones based on distance from the base station, simulating different signal qualities. Data was also collected under static and mobile device conditions, and congestion scenarios were emulated by introducing secondary devices generating heavy traffic.

Anomalies were carefully curated, combining real-world jamming events with synthetically generated ones. The synthetic anomalies were designed to mimic realistic network failures, drawing from technical manuals and expert feedback to define KPI symptoms and their temporal characteristics. Each synthetic anomaly also comes with a generated textual troubleshooting ticket, created using GPT-4.1 and human-verified, mirroring real-world incident reports.

For the question-answering task, two families of Q&A pairs were created: one focusing on qualitative and quantitative aspects of time series data (e.g., mean, variance, periodicity, trend) and another containing contextual ground truths about user behavior and network conditions (e.g., user activity, mobility state, zone, congestion status).

Key Findings from Benchmarking

Benchmarking state-of-the-art time series, language, and reasoning models on TelecomTS revealed that existing approaches consistently struggle with the abrupt, noisy, and high-variance dynamics inherent in observability data. These challenges manifest as elevated false positives in anomaly detection, misdiagnosed root causes, and poor performance on time series Q&A tasks.

For instance, large language models (LLMs) and reasoning models showed a strong bias toward false positives in anomaly detection when no contextual information was provided, often misinterpreting natural fluctuations as anomalies. Even with context, precision remained low. Time series foundation models, despite extensive pretraining, also struggled, indicating their learned representations are insufficient for this nuanced distinction. A notable exception was Mantis, a model that embeds scale information, demonstrating the importance of preserving absolute scale.

While models performed better at localizing anomaly durations once an anomalous sample was identified, root cause analysis proved challenging for LLMs. Forecasting tasks highlighted the difficulty in predicting sudden spikes and handling oscillatory patterns, even if overall performance metrics appeared inflated by stable periods.

In question-answering, models performed well on smooth KPIs but struggled with erratic ones like TX_Bytes. They also fell short in effectively linking engineering concepts and contextual knowledge to underlying time series data, pointing to a critical gap in multi-modal reasoning.

Also Read:

The Path Forward

The introduction of TelecomTS highlights a critical disparity between the performance of current state-of-the-art models on existing benchmarks and their applicability to real-world observability scenarios. The abrupt, noisy, and irregular nature of observability data, along with the presence of categorical variables and the crucial role of scale information, are aspects that current models often fail to adequately address. This research underscores the urgent need for more robust and scale-aware time series foundation models capable of effectively operating in complex, real-world observability environments. You can read the full research paper here: TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging the Gap in Observability Data: Introducing TelecomTS

What Makes TelecomTS Unique?

How Was TelecomTS Created?

Key Findings from Benchmarking

The Path Forward

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates