TLDR: ELATE is a novel framework that uses large language models (LLMs) within an evolutionary optimization process to automate feature engineering for time-series data. It significantly improves forecasting accuracy (average 8.4% RMSE reduction) and is more efficient than traditional methods, while also providing interpretable feature code. The system leverages LLMs’ contextual understanding to propose relevant transformations, which are then evaluated and pruned iteratively.
Time-series prediction, which involves forecasting future values from historical data, is a critical task across many industries, from predicting stock prices to understanding disease progression. While machine learning models have become increasingly popular for these tasks, a significant challenge remains: feature engineering. This process, where existing data features are transformed into new, more informative ones, is crucial for model performance but is often manual, time-consuming, and requires deep domain expertise.
Traditional attempts to automate feature engineering often rely on exhaustive enumeration, which can be computationally expensive and lacks the nuanced understanding that a human data scientist brings to the table. These methods might miss valuable transformations or struggle with the sheer volume of possibilities, especially in complex time-series data where temporal relationships are key.
Introducing ELATE: A New Approach to Automated Time-Series Feature Engineering
Researchers Andrew Murray, Danial Dervovic, and Michael Cashmore from JP Morgan AI Research have introduced a novel solution called ELATE, which stands for Evolutionary Language model for Automated Time-series Engineering. This innovative framework combines the power of large language models (LLMs) with an evolutionary optimization process to automate the creation of features for time-series data.
ELATE addresses the limitations of previous automation efforts by leveraging the extensive domain knowledge embedded within LLMs. Instead of blindly trying every possible transformation, the language model proposes new, contextually relevant feature transformations. This is a significant departure from older methods, as LLMs can understand the ‘why’ behind a feature, such as recognizing that Body Mass Index (BMI) is a useful indicator for diabetes prediction, even if it requires multiple intermediate calculation steps.
How ELATE Works
The ELATE system operates by maintaining a dynamic collection of features. It starts with an initial set, and then, in an iterative process, the LLM is prompted to generate new features. This prompt includes a description of the dataset, examples of existing features, and information about previously generated features and their performance scores. This feedback loop helps the LLM learn and propose increasingly effective transformations.
Once a new feature is proposed (in the form of Python code), it undergoes a validation process to ensure the code is correct and safe to execute. If valid, the feature is then evaluated using specific time-series statistical measures, namely Granger causality and mutual information. These measures help quantify the predictive power of the new feature on the target variable, capturing both linear and non-linear relationships.
To manage the growing number of features, ELATE employs a SHAP (SHapley Additive exPlanations) filter. This filter intelligently prunes low-scoring or redundant features, ensuring that the system maintains a compact set of high-quality, impactful features. This evolutionary cycle of generation, evaluation, and selection allows ELATE to continuously refine and improve its feature set over multiple generations.
Demonstrated Performance and Efficiency
The researchers conducted extensive experiments across seven diverse time-series prediction tasks, including forecasting influenza cases, store sales, electricity transformer temperature, and energy demand. ELATE consistently outperformed eight baseline methods, including traditional feature engineering packages like VEST and TSFRESH, as well as LSTM neural networks.
On average, ELATE improved forecasting accuracy by 8.4% in terms of Root Mean Squared Error (RMSE) and 9.6% in Mean Absolute Error (MAE) compared to models without any feature engineering. Notably, ELATE proved to be significantly more time and memory efficient than exhaustive expand-and-reduce approaches like TSFRESH, which often exceeded memory limits on larger datasets. ELATE was able to engineer features for problems with nearly 180,000 rows in a matter of hours, a task that would typically take data scientists days.
The study also explored the cost-effectiveness of using different LLMs. While GPT-4o yielded slightly better results, the more cost-effective GPT-3.5 Turbo still provided significant improvements over base features at a fraction of the cost, making ELATE a viable option for various budgets.
Also Read:
- Automating Financial Time-Series Modeling with Adaptive AI Agents
- Advancing AI’s Problem-Solving: A Dual Approach to Heuristic Design
Interpretability and Future Directions
Unlike complex deep neural networks that learn features internally and are often difficult to interpret, ELATE explicitly returns the Python code used to generate each feature, along with a description of its potential utility. This interpretability is a major advantage, especially in high-stakes applications like healthcare and finance, where understanding the model’s decisions is crucial.
While ELATE represents a significant leap forward, the authors acknowledge areas for future improvement, such as optimizing LLM querying costs and exploring more advanced prompting strategies. They also emphasize that ELATE is designed to augment, not replace, human data scientists, as human oversight is still valuable to ensure the generated features make practical sense for the given task.
For more in-depth details, you can read the full research paper: ELATE: Evolutionary Language model for Automated Time-series Engineering.


