STONK: A New AI Framework for Smarter Stock Movement Prediction

TLDR: STONK (Stock Optimization using News Knowledge) is a novel multimodal AI framework that enhances daily stock movement prediction by integrating numerical market indicators with sentiment-enriched news embeddings. It combines these data types using feature concatenation and cross-modal attention. Backtesting results show STONK outperforms numerical-only baselines, demonstrating significant gains in predictive accuracy and risk-adjusted returns. The framework also highlights the importance of domain-adaptive sentiment scoring and robust time-series validation, while cautioning against the memorization tendencies of large language models in financial forecasting.

Financial markets are incredibly complex, influenced by a myriad of factors beyond just numbers. Traditionally, stock movement predictions have heavily relied on numerical indicators, often overlooking the rich, contextual information found in financial news. This narrow focus can lead to incomplete analyses, and even advanced AI models, like Large Language Models (LLMs), have shown limitations, sometimes appearing to predict based on memorization rather than genuine foresight.

To address these challenges, researchers have introduced a groundbreaking new framework called STONK, which stands for Stock Optimization using News Knowledge. STONK is a multimodal system designed to significantly improve daily stock movement predictions by integrating numerical market indicators with sentiment-enriched news embeddings. This unified approach combines both types of data through sophisticated methods like feature concatenation and cross-modal attention, aiming to overcome the limitations of analyzing these data sources in isolation.

The core idea behind STONK is to blend the ‘what’ (numerical data like stock prices and trading volumes) with the ‘why’ (the sentiment and context from news articles). This is achieved through several key contributions. Firstly, it uses a multimodal fusion technique that combines numerical data with textual features that have been annotated with sentiment. Secondly, it employs a domain-adaptive sentiment scoring method using a powerful language model called RoBERTa, ensuring that the sentiment extracted from financial news is highly precise and relevant to market signals. Lastly, STONK features a robust backtesting framework that uses rolling time-series splits and various financial metrics for rigorous model validation, providing a more realistic assessment of its performance in dynamic market conditions.

How STONK Works: A Look Under the Hood

The STONK pipeline involves three main stages: Data Preparation, Feature Generation, and Market Vector Generation.

For **Data Preparation**, stock data from 2007 to 2023 was collected, including open, high, close, and volume prices. Daily stock movement was calculated, and news articles were preprocessed and assigned sentiment scores using the RoBERTa model. To ensure fair prediction, a one-day time lag was introduced for numerical data, meaning only historical information was used.

In **Feature Generation**, both textual and numerical features are created. Textual embeddings are generated using five popular encoder-based models: FinBERT, ModernBERT, Electra, DeBERTa, and MiniLM. These models are adept at understanding the nuances of language. For numerical features, eight key indicators were selected and scaled, including Open, sentiment volatility, aggregate sentiment score, Close, High, Volume, Daily Return, and Volatility.

Finally, for **Market Vector Generation**, two distinct methods are used to combine the numerical and textual data into a unified ‘market vector’. The first is simple **Concatenation**, where scaled numerical features are directly merged with the previous day’s aggregated textual embedding. The second, more advanced method, is **Cross-Modal Attention**. This technique allows the model to selectively focus on relevant parts of the textual information based on the numerical data, creating a more nuanced combined representation.

Testing STONK: Rigorous Evaluation

The researchers employed a rigorous evaluation strategy, primarily using a 5-Fold TimeSeriesSplit for their datasets. This method partitions the data chronologically into five segments, using earlier segments for training and later ones for testing in a rolling fashion. This approach is crucial for financial data, as it mimics real-world scenarios and avoids ‘information leakage’ that can inflate accuracy estimates in simpler random splits.

STONK’s performance was compared against several baselines. A simple Logistic Regression Classifier was used, both with and without sentiment scores, to highlight the impact of textual data. Additionally, the performance of a ‘Frozen LLM’ (QWEN 2.5 7B) was assessed using different prompting methods (zero-shot, one-shot, few-shot) to understand the inherent predictive capabilities of large language models without specific fine-tuning.

Evaluation wasn’t just about accuracy; it also included financial metrics relevant to trading. Beyond standard classification metrics like Accuracy, Precision, Recall, and F-score, the study also looked at the Matthews Correlation Coefficient (MCC), Directional Win Rate (DWR), Profit Factor (PF), and Sharpe Ratio (SR). These financial metrics provide a more comprehensive view of how well the model performs in a simulated trading environment, considering profitability and risk-adjusted returns.

Also Read:

Key Findings and Future Outlook

The results clearly demonstrated the value of integrating sentiment scores. Incorporating sentiment alongside numerical features consistently boosted performance across both classification and financial metrics. Specifically, the Concatenation fusion method with MiniLM showed a compelling combination of predictive accuracy and high profitability, achieving a Profit Factor of 2.03 and a Sharpe Ratio of 3.15. Meanwhile, the Cross-Modal Attention fusion with DeBERTa achieved top classification results (accuracy 0.68, F1 0.73) while maintaining robust financial returns.

Fine-tuning these models further improved their performance. For instance, MiniLM with Concatenation saw improvements in accuracy and F1 score, while DeBERTa with Cross-Modal Attention showed gains in both classification and risk-adjusted returns. The study also highlighted that while frozen LLMs might show promising recall in some prompting paradigms, this could be due to memorization rather than genuine forecasting ability, underscoring the need for robust evaluation methods.

In conclusion, STONK represents a significant step forward in multimodal financial forecasting. By effectively integrating numerical market indicators with sentiment-enriched news embeddings, it achieves substantial gains in predictive accuracy and risk-adjusted returns compared to traditional numeric-only approaches. The researchers plan to extend this work to live trading deployment, multi-asset portfolios, and explore additional data modalities and advanced learning techniques to enhance adaptability in ever-changing markets. You can read the full research paper here: Towards Unified Multimodal Financial Forecasting.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

STONK: A New AI Framework for Smarter Stock Movement Prediction

How STONK Works: A Look Under the Hood

Testing STONK: Rigorous Evaluation

Key Findings and Future Outlook

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates