TLDR: SPEAR (Soft Prompt Enhanced Anomaly Recognition) is a novel method that leverages large language models (LLMs) for time series anomaly detection. It addresses the limitations of traditional LLM applications by using learnable ‘soft prompts’ and data quantization. This approach transforms time series data into LLM-compatible embeddings, which are then combined with soft prompts and fed into a frozen LLM. The soft prompts are iteratively updated to adapt the LLM for anomaly detection without costly fine-tuning. SPEAR demonstrates superior performance over zero-shot LLMs and traditional methods like LSTM across various datasets, offering computational efficiency and improved accuracy, especially for imbalanced and variable-length time series data.
Time series data, which tracks changes over time in various fields like healthcare and internet traffic, often contains critical anomalies that signal important events. Detecting these anomalies is a crucial task, but traditional methods struggle with the diverse nature of time series sequences and anomalies that depend on context.
The rise of large language models (LLMs) has opened new avenues for tackling this challenge. However, directly applying LLMs to anomaly detection comes with its own set of hurdles. Existing approaches, such as prompt-based methods, fine-tuning, and zero-shot learning, often require meticulously crafted prompts, incur significant computational costs, or risk distorting the LLM’s pre-trained features. Furthermore, using advanced LLMs like GPT-4 for zero-shot detection can be financially prohibitive and raise privacy concerns, especially in real-world applications where computational efficiency and quick response times are paramount.
Introducing SPEAR: A Novel Approach
To address these limitations, researchers have proposed Soft Prompt Enhanced Anomaly Recognition (SPEAR), a new method that harnesses the power of LLMs for time series anomaly detection using soft prompts and quantization. SPEAR aims to adapt smaller LLMs to specific time series tasks efficiently, potentially matching or even surpassing the performance of larger, more resource-intensive models without the need for extensive training or fine-tuning.
The core idea behind SPEAR involves transforming continuous time series data into a format that LLMs can readily process. This is achieved through a process called quantization, where the data is converted into discrete tokens. These tokens are then embedded into a high-dimensional space. Crucially, SPEAR introduces ‘soft prompts’ – learnable embedding vectors that are combined with the input embeddings. These soft prompts act as guides, directing the frozen LLM’s attention towards the anomaly detection task without altering the LLM’s original weights.
During training, the soft prompts are iteratively updated based on a cross-entropy loss function, allowing the model to learn and adapt specifically for identifying anomalies. This approach ensures that LLMs, which are inherently designed to handle discrete sequences, can effectively process time series data. The use of soft prompts offers several advantages: it eliminates the need for complex chat templates, reduces computational costs, and enhances privacy by enabling the deployment of smaller LLMs locally.
How SPEAR Works
The SPEAR framework operates in several key steps:
- Preprocessing and Quantization: Raw time series data is scaled and then quantized, converting continuous values into a sequence of discrete tokens.
- Tokenization and Embedding: These quantized tokens are mapped to high-dimensional embedding vectors.
- Soft Prompt Integration: Learnable soft prompt embeddings are initialized and then concatenated with the input embeddings, forming a combined sequence.
- Frozen LLM Processing: This combined embedding is fed into a pre-trained, frozen LLM (such as BERT or Gemma).
- Anomaly Classification: A small classification head is added to the LLM’s output to perform binary classification, identifying data points as either normal or anomalous.
- Soft Prompt Optimization: Only the soft prompt embeddings are updated during training using backpropagation, based on the difference between predicted and true anomaly labels.
The paper also details sophisticated data preprocessing techniques to handle common challenges in time series anomaly detection, such as class imbalance (using T-SMOTE) and variable sequence lengths. For datasets like MIMIC-IV, additional context-based anomalies (e.g., monotonic trends, sudden spikes, shifts, and volatility changes) were introduced to enrich the data and make it more representative of real-world scenarios.
Also Read:
- Context-Aware AI Agents Enhance Anomaly Detection in Critical IoT Systems
- Rethinking Periodicity for Efficient Time Series Forecasting
Experimental Results and Impact
SPEAR’s effectiveness was evaluated across three diverse datasets: MIMIC-IV (medical lab results), NASA’s satellite telemetry, and the Numenta Anomaly Benchmark (NAB). The results demonstrated that SPEAR, particularly when using BERT as the base LLM (SPEAR-BERT), consistently outperformed zero-shot LLM approaches and even traditional models like LSTM.
For instance, on the MIMIC-IV dataset, SPEAR-BERT achieved significantly higher accuracy compared to zero-shot Gemma, GPT-4, and BERT, and also surpassed LSTM. This is particularly notable for MIMIC-IV, which features irregular, variable-length sequences from medical records, a scenario where LLMs’ flexibility shines. While zero-shot models sometimes showed high recall, they often suffered from very low precision, leading to an abundance of false positives – a critical issue in applications like spacecraft telemetry where anomalies signify serious problems.
SPEAR-BERT also showed a more balanced performance across highly imbalanced datasets like NASA and NAB, excelling in metrics like AUROC and AUPR, which are crucial for accurately identifying minority classes (anomalies). The research highlights that soft prompts effectively guide LLMs to better detect time series anomalies, achieving a superior balance between recall and precision.
Furthermore, SPEAR demonstrated remarkable computational efficiency. The soft prompt models (e.06 MB for BERT, 0.16 MB for Gemma) were substantially smaller than a traditional LSTM model (0.29 MB), adding minimal overhead to existing pre-trained LLMs. This compact design allows for adaptability across different LLM sizes and eases the burden on users to craft detailed prompts.
In conclusion, SPEAR represents a significant step forward in leveraging LLMs for time series anomaly detection. By combining soft prompts with quantization, it offers an efficient, adaptable, and high-performing solution that can make LLM-based anomaly detection more accessible and practical for a wide range of real-world applications. For more details, you can read the full research paper here.


