spot_img
HomeResearch & DevelopmentBridging the Gap in Observability Data: Introducing TelecomTS

Bridging the Gap in Observability Data: Introducing TelecomTS

TLDR: TelecomTS is a new, large-scale, multi-modal observability dataset derived from a 5G telecommunications network. It addresses limitations of existing datasets by providing de-anonymized, heterogeneous time series data with absolute scale information, supporting tasks like anomaly detection, root-cause analysis, and multi-modal question-answering. Experiments show that current state-of-the-art models struggle with the unique, erratic nature of observability data, highlighting the need for new foundation models that leverage scale information for practical applications.

Modern enterprises rely heavily on monitoring complex systems, generating vast amounts of time series data known as observability data. Unlike traditional time series data from fields like weather or finance, observability data presents unique challenges: it’s often zero-inflated, highly stochastic, and lacks clear temporal patterns. Despite its critical importance for maintaining system health, public benchmarks for observability datasets have been scarce due to proprietary restrictions. Existing datasets are typically anonymized and normalized, which removes crucial scale information and limits their utility for advanced tasks beyond simple forecasting, such as anomaly detection, root-cause analysis, and multi-modal reasoning.

To address this significant gap, researchers have introduced TelecomTS, a groundbreaking, large-scale observability dataset derived directly from a live 5G telecommunications network. This dataset stands out by featuring heterogeneous, de-anonymized covariates with explicit scale information, making it suitable for a wide array of downstream tasks. These tasks include anomaly detection, root-cause analysis, and a novel question-answering benchmark that demands multi-modal reasoning capabilities.

What Makes TelecomTS Unique?

TelecomTS differentiates itself from prior datasets in two key ways:

First, it offers heterogeneous and de-anonymized covariates with full scale information. Built from extensive data collection on a 5G network, TelecomTS comprises over 1 million observations of Key Performance Indicators (KPIs) across all layers of the protocol stack. It captures categorical covariates from dynamic communication protocols alongside mixed data types (integers and floating-point variables with diverse ranges), reflecting the true complexity of observability data. Crucially, preserving absolute scale information enables the design of meaningful tasks grounded in operational semantics and allows for investigating the impact of normalization strategies.

Second, TelecomTS supports a comprehensive suite of downstream tasks. Recognizing that observability applications extend beyond simple forecasting, the dataset incorporates a diverse set of anomalies. These include real anomalies generated via controlled jamming signals and synthetically curated rare events based on scholarly descriptions of real-world failures. This native support for anomaly detection and root-cause analysis is complemented by a question-answering (Q&A) benchmark that combines temporal reasoning with domain-specific questions tied to network observability semantics.

How Was TelecomTS Created?

The dataset was collected using a custom-built 5G wireless network in a lab environment, ensuring freedom from privacy concerns. This setup allowed for recording 18 KPIs from both the base station and mobile devices at a high 100 ms resolution. To introduce variability, the lab environment was divided into three zones based on distance from the base station, simulating different signal qualities. Data was also collected under static and mobile device conditions, and congestion scenarios were emulated by introducing secondary devices generating heavy traffic.

Anomalies were carefully curated, combining real-world jamming events with synthetically generated ones. The synthetic anomalies were designed to mimic realistic network failures, drawing from technical manuals and expert feedback to define KPI symptoms and their temporal characteristics. Each synthetic anomaly also comes with a generated textual troubleshooting ticket, created using GPT-4.1 and human-verified, mirroring real-world incident reports.

For the question-answering task, two families of Q&A pairs were created: one focusing on qualitative and quantitative aspects of time series data (e.g., mean, variance, periodicity, trend) and another containing contextual ground truths about user behavior and network conditions (e.g., user activity, mobility state, zone, congestion status).

Key Findings from Benchmarking

Benchmarking state-of-the-art time series, language, and reasoning models on TelecomTS revealed that existing approaches consistently struggle with the abrupt, noisy, and high-variance dynamics inherent in observability data. These challenges manifest as elevated false positives in anomaly detection, misdiagnosed root causes, and poor performance on time series Q&A tasks.

For instance, large language models (LLMs) and reasoning models showed a strong bias toward false positives in anomaly detection when no contextual information was provided, often misinterpreting natural fluctuations as anomalies. Even with context, precision remained low. Time series foundation models, despite extensive pretraining, also struggled, indicating their learned representations are insufficient for this nuanced distinction. A notable exception was Mantis, a model that embeds scale information, demonstrating the importance of preserving absolute scale.

While models performed better at localizing anomaly durations once an anomalous sample was identified, root cause analysis proved challenging for LLMs. Forecasting tasks highlighted the difficulty in predicting sudden spikes and handling oscillatory patterns, even if overall performance metrics appeared inflated by stable periods.

In question-answering, models performed well on smooth KPIs but struggled with erratic ones like TX_Bytes. They also fell short in effectively linking engineering concepts and contextual knowledge to underlying time series data, pointing to a critical gap in multi-modal reasoning.

Also Read:

The Path Forward

The introduction of TelecomTS highlights a critical disparity between the performance of current state-of-the-art models on existing benchmarks and their applicability to real-world observability scenarios. The abrupt, noisy, and irregular nature of observability data, along with the presence of categorical variables and the crucial role of scale information, are aspects that current models often fail to adequately address. This research underscores the urgent need for more robust and scale-aware time series foundation models capable of effectively operating in complex, real-world observability environments. You can read the full research paper here: TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -