TLDR: CTBench is the first comprehensive benchmark for evaluating Time Series Generation (TSG) models in cryptocurrency markets. It features a curated dataset of 452 tokens, a dual-task evaluation framework (Predictive Utility and Statistical Arbitrage), and a suite of financial metrics. The benchmark evaluates eight TSG models across different market regimes, revealing trade-offs between statistical fidelity and real-world profitability, and offers practical recommendations for model selection.
Synthetic time series data is becoming an increasingly vital tool in quantitative finance. It helps in augmenting data, stress testing financial models, and prototyping new algorithms. However, the unique characteristics of cryptocurrency markets – such as 24/7 trading, extreme volatility, and rapid shifts in market conditions – pose significant challenges that traditional time series generation (TSG) methods and benchmarks often fail to address.
Most existing research in this area either focuses on non-financial or traditional financial domains, narrows its scope to classification and forecasting without considering crypto-specific complexities, or lacks crucial financial evaluations, especially for real-world trading applications.
Introducing CTBench: A New Standard for Crypto Time Series Generation
To fill these critical gaps, researchers have introduced CTBench, the first comprehensive benchmark specifically designed for evaluating Time Series Generation models in the cryptocurrency domain. CTBench aims to provide a robust and realistic framework for assessing how well synthetic data can replicate and support real-world crypto market dynamics.
CTBench is built upon a meticulously curated, open-source dataset comprising data from 452 different cryptocurrency tokens. This dataset covers market activity from January 2020 to December 2024, capturing various market regimes including bull runs, crashes, and consolidation phases. The data is preprocessed to ensure high quality and includes essential financial features commonly used in quantitative trading, such as Alpha101 factors and technical indicators.
Dual-Task Evaluation Framework
A key innovation of CTBench is its dual-task evaluation framework, which assesses TSG models from two complementary perspectives:
-
Predictive Utility Task: This task measures how effectively synthetic data can train forecasting models that perform well on actual market data. It evaluates whether the synthetic data preserves the temporal and cross-sectional patterns necessary for accurate predictions. Essentially, it asks: can synthetic data help us predict real market movements?
-
Statistical Arbitrage Task: This task focuses on whether the reconstructed time series can support mean-reverting signals for trading. It assesses a model’s ability to isolate tradable residual signals from market dynamics. This means evaluating if the synthetic data can reveal profitable trading opportunities based on assets reverting to their historical averages.
The benchmark also incorporates three diverse trading strategies – Cross-Sectional Momentum, Long-Only Top-Quantile, and Proportional-Weighting – to stress-test how well synthetic data supports various trading styles, reducing the risk of models overfitting to specific patterns.
Comprehensive Evaluation Metrics
CTBench employs a comprehensive suite of 13 metrics across five key dimensions to provide a holistic view of model performance:
-
Forecasting Accuracy: Measured by Mean Squared Error (MSE) and Mean Absolute Error (MAE).
-
Rank Fidelity: Assessed using Information Coefficient (IC) and Information Ratio (IR), which are crucial for strategies that rely on ranking assets.
-
Trading Performance: Evaluated through Compound Annual Growth Rate (CAGR) and Sharpe Ratio (SR), indicating profitability and risk-adjusted returns.
-
Risk Assessment: Quantified by Maximum Drawdown (MDD), Value at Risk (VaR), and Expected Shortfall (ES), which highlight potential losses during adverse market conditions.
-
Computational Efficiency: Measured by Training Time and Inference Time, important for practical deployment.
Benchmarking Diverse TSG Models
The researchers benchmarked eight representative TSG models from five major methodological families: GAN-based (Quant-GAN, COSCI-GAN), VAE-based (TimeVAE, KoVAE), Diffusion-based (Diffusion-TS, FIDE), Flow-based (Fourier-Flow), and Mixed-type (LS4). This diverse selection ensures a broad architectural coverage and fair comparisons.
Also Read:
- SymbolBench: Assessing Large Language Models in Time Series Reasoning
- Advancing Time Series Foundation Models with Synthetic Data Pretraining
Key Findings and Recommendations
The extensive evaluations revealed that there is no single universally dominant model. Instead, trade-offs exist between statistical fidelity and real-world profitability. For instance, models that excel in forecasting accuracy (like Diffusion-TS) might not necessarily yield the most profitable trading strategies, as they can sometimes suppress the volatility essential for directional gains.
Conversely, models like TimeVAE and COSCI-GAN often strike a better balance, delivering solid forecasting accuracy alongside robust returns, particularly in specific market regimes. TimeVAE, for example, proved robust in stable or mean-reverting markets, while COSCI-GAN thrived in volatile, directional environments.
The study also highlighted the importance of computational efficiency, with VAE-based models like TimeVAE demonstrating superior speed, making them ideal for real-time applications. Diffusion models, while powerful, were found to be more computationally intensive, better suited for offline use.
Ultimately, CTBench provides actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development. It emphasizes that effective model selection requires understanding the current market regime, the desired alpha source, and operational constraints, rather than solely prioritizing synthetic data fidelity.
The introduction of CTBench marks a significant step forward in the rigorous evaluation and development of Time Series Generation models for the dynamic and complex cryptocurrency markets. Future work aims to expand the dataset, integrate more advanced architectures, and support automated evaluation for enhanced usability.


