spot_img
HomeResearch & DevelopmentSTONK: A New AI Framework for Smarter Stock Movement...

STONK: A New AI Framework for Smarter Stock Movement Prediction

TLDR: STONK (Stock Optimization using News Knowledge) is a novel multimodal AI framework that enhances daily stock movement prediction by integrating numerical market indicators with sentiment-enriched news embeddings. It combines these data types using feature concatenation and cross-modal attention. Backtesting results show STONK outperforms numerical-only baselines, demonstrating significant gains in predictive accuracy and risk-adjusted returns. The framework also highlights the importance of domain-adaptive sentiment scoring and robust time-series validation, while cautioning against the memorization tendencies of large language models in financial forecasting.

Financial markets are incredibly complex, influenced by a myriad of factors beyond just numbers. Traditionally, stock movement predictions have heavily relied on numerical indicators, often overlooking the rich, contextual information found in financial news. This narrow focus can lead to incomplete analyses, and even advanced AI models, like Large Language Models (LLMs), have shown limitations, sometimes appearing to predict based on memorization rather than genuine foresight.

To address these challenges, researchers have introduced a groundbreaking new framework called STONK, which stands for Stock Optimization using News Knowledge. STONK is a multimodal system designed to significantly improve daily stock movement predictions by integrating numerical market indicators with sentiment-enriched news embeddings. This unified approach combines both types of data through sophisticated methods like feature concatenation and cross-modal attention, aiming to overcome the limitations of analyzing these data sources in isolation.

The core idea behind STONK is to blend the ‘what’ (numerical data like stock prices and trading volumes) with the ‘why’ (the sentiment and context from news articles). This is achieved through several key contributions. Firstly, it uses a multimodal fusion technique that combines numerical data with textual features that have been annotated with sentiment. Secondly, it employs a domain-adaptive sentiment scoring method using a powerful language model called RoBERTa, ensuring that the sentiment extracted from financial news is highly precise and relevant to market signals. Lastly, STONK features a robust backtesting framework that uses rolling time-series splits and various financial metrics for rigorous model validation, providing a more realistic assessment of its performance in dynamic market conditions.

How STONK Works: A Look Under the Hood

The STONK pipeline involves three main stages: Data Preparation, Feature Generation, and Market Vector Generation.

For **Data Preparation**, stock data from 2007 to 2023 was collected, including open, high, close, and volume prices. Daily stock movement was calculated, and news articles were preprocessed and assigned sentiment scores using the RoBERTa model. To ensure fair prediction, a one-day time lag was introduced for numerical data, meaning only historical information was used.

In **Feature Generation**, both textual and numerical features are created. Textual embeddings are generated using five popular encoder-based models: FinBERT, ModernBERT, Electra, DeBERTa, and MiniLM. These models are adept at understanding the nuances of language. For numerical features, eight key indicators were selected and scaled, including Open, sentiment volatility, aggregate sentiment score, Close, High, Volume, Daily Return, and Volatility.

Finally, for **Market Vector Generation**, two distinct methods are used to combine the numerical and textual data into a unified ‘market vector’. The first is simple **Concatenation**, where scaled numerical features are directly merged with the previous day’s aggregated textual embedding. The second, more advanced method, is **Cross-Modal Attention**. This technique allows the model to selectively focus on relevant parts of the textual information based on the numerical data, creating a more nuanced combined representation.

Testing STONK: Rigorous Evaluation

The researchers employed a rigorous evaluation strategy, primarily using a 5-Fold TimeSeriesSplit for their datasets. This method partitions the data chronologically into five segments, using earlier segments for training and later ones for testing in a rolling fashion. This approach is crucial for financial data, as it mimics real-world scenarios and avoids ‘information leakage’ that can inflate accuracy estimates in simpler random splits.

STONK’s performance was compared against several baselines. A simple Logistic Regression Classifier was used, both with and without sentiment scores, to highlight the impact of textual data. Additionally, the performance of a ‘Frozen LLM’ (QWEN 2.5 7B) was assessed using different prompting methods (zero-shot, one-shot, few-shot) to understand the inherent predictive capabilities of large language models without specific fine-tuning.

Evaluation wasn’t just about accuracy; it also included financial metrics relevant to trading. Beyond standard classification metrics like Accuracy, Precision, Recall, and F-score, the study also looked at the Matthews Correlation Coefficient (MCC), Directional Win Rate (DWR), Profit Factor (PF), and Sharpe Ratio (SR). These financial metrics provide a more comprehensive view of how well the model performs in a simulated trading environment, considering profitability and risk-adjusted returns.

Also Read:

Key Findings and Future Outlook

The results clearly demonstrated the value of integrating sentiment scores. Incorporating sentiment alongside numerical features consistently boosted performance across both classification and financial metrics. Specifically, the Concatenation fusion method with MiniLM showed a compelling combination of predictive accuracy and high profitability, achieving a Profit Factor of 2.03 and a Sharpe Ratio of 3.15. Meanwhile, the Cross-Modal Attention fusion with DeBERTa achieved top classification results (accuracy 0.68, F1 0.73) while maintaining robust financial returns.

Fine-tuning these models further improved their performance. For instance, MiniLM with Concatenation saw improvements in accuracy and F1 score, while DeBERTa with Cross-Modal Attention showed gains in both classification and risk-adjusted returns. The study also highlighted that while frozen LLMs might show promising recall in some prompting paradigms, this could be due to memorization rather than genuine forecasting ability, underscoring the need for robust evaluation methods.

In conclusion, STONK represents a significant step forward in multimodal financial forecasting. By effectively integrating numerical market indicators with sentiment-enriched news embeddings, it achieves substantial gains in predictive accuracy and risk-adjusted returns compared to traditional numeric-only approaches. The researchers plan to extend this work to live trading deployment, multi-asset portfolios, and explore additional data modalities and advanced learning techniques to enhance adaptability in ever-changing markets. You can read the full research paper here: Towards Unified Multimodal Financial Forecasting.

Nikhil Patel
Nikhil Patelhttps://blogs.edgentiq.com
Nikhil Patel is a tech analyst and AI news reporter who brings a practitioner's perspective to every article. With prior experience working at an AI startup, he decodes the business mechanics behind product innovations, funding trends, and partnerships in the GenAI space. Nikhil's insights are sharp, forward-looking, and trusted by insiders and newcomers alike. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -