TLDR: A new research paper introduces Latent-Trajectory (LT) signals, which analyze the temporal evolution of a large language model’s internal representations (hidden states) during reasoning. These signals—Net Change, Cumulative Change, and Aligned Change—can reliably predict the accuracy of a model’s solution. By leveraging LT signals, multi-sample inference becomes significantly more efficient, reducing token usage by up to 70% and improving accuracy by an average of 2.6% compared to traditional methods like majority voting. The signals also enable early identification and selection of high-quality reasoning paths, offering both practical inference-time benefits and deeper insights into how AI models reason.
Large language models (LLMs) are becoming increasingly adept at complex reasoning tasks, often by generating multiple ‘chains-of-thought’ or reasoning traces. However, not all these traces are equally productive; some lead to correct answers, while others are inefficient or incorrect. Identifying which reasoning paths are likely to succeed is a significant challenge, as it can drastically reduce wasted computation and improve overall efficiency.
A new research paper, Tracing the Traces: LATENTTEMPORALSIGNALS FOR EFFICIENT ANDACCURATEREASONING, introduces a novel approach called Latent-Trajectory (LT) signals. These signals characterize the temporal evolution of a model’s internal representations – essentially, how the model’s ‘thoughts’ change over time – during the generation of intermediate reasoning tokens. By analyzing these internal dynamics, LT signals can predict the accuracy of a solution more reliably than previous methods.
Understanding Latent-Trajectory Signals
The core idea behind LT signals is to look beyond the surface-level natural language output and delve into the model’s hidden states, which are its internal numerical representations at each step of the reasoning process. The researchers, Martina G. Vilas, Safoora Yousefi, Besmira Nushi, Eric Horvitz, and Vidhisha Balachandran, propose three complementary signals:
- Net Change: This measures the total representational change from the beginning to the end of a reasoning trace. A larger net change often suggests a more significant and potentially deeper reasoning process.
- Cumulative Change: This quantifies the total amount of representational movement accumulated across all intermediate steps. Interestingly, traces with higher cumulative change (more ‘wandering’ in the latent space) tend to be less accurate.
- Aligned Change: This assesses how consistently intermediate updates progress towards the final state. Higher alignment indicates that the model’s internal steps are moving in a coherent direction towards a solution.
These signals are computed directly from the model’s hidden states during inference, requiring no additional training or external annotations. To make the signals more robust, the reasoning trace is divided into segments (e.g., 500 tokens), and the average hidden state for each segment is used.
Superior Predictive Power
The study evaluated LT signals across various reasoning-enabled LLMs, including DeepSeek-R1-Distill-Qwen14B, Phi4-Reasoning-Plus, and Qwen3-14B, and across diverse domains like scientific questions (GPQA Diamond), mathematical problems (AIME 2025), and algorithmic tasks (TSP benchmark). The results showed that LT signals consistently and significantly distinguished between correct and incorrect answers. They outperformed traditional baselines such as cross-layer metrics (which look at changes across different layers of the model) and output-based confidence measures (like logit margin or entropy of the final token distribution), which often performed close to or even below chance.
Specifically, Net Change and Aligned Change showed a positive correlation with accuracy, meaning larger and more directed shifts in the model’s internal state were linked to better performance. Conversely, Cumulative Change was negatively correlated, suggesting that excessive, less stable movement in the latent space often leads to incorrect answers.
Boosting Efficiency and Accuracy
One of the most impactful findings is how LT signals can enhance multi-sample inference, a common strategy where models generate multiple potential answers and then aggregate them (e.g., using majority voting). By using LT signals to guide answer selection, the researchers demonstrated significant improvements:
- Reduced Token Usage: LT-guided selection reduced token usage by up to 70% compared to majority voting, leading to substantial savings in computational cost.
- Improved Accuracy: The approach not only preserved accuracy but often improved it by an average of 2.6% over majority voting baselines. This means LT signals can help identify correct reasoning paths even when most sampled solutions are incorrect.
Furthermore, LT signals proved effective for early path selection. By evaluating partial reasoning traces, the signals could identify high-quality trajectories early in the generation process. This allows for the termination of less promising paths, allocating computational resources only to the most promising candidates. This early selection strategy reduced average token usage by 61% while maintaining or improving accuracy.
Also Read:
- Balancing Efficiency and Accuracy in Large AI Models with PAC Reasoning
- New Method Curbs ‘Overthinking’ in AI Reasoning Models for Better Efficiency
A Deeper Look into AI Reasoning
Beyond the practical benefits, this research offers valuable insights into the interpretability of LLM reasoning. It reveals that successful reasoning trajectories are characterized by larger, more directionally consistent shifts in the model’s latent space, while unsuccessful ones involve more wandering and less aligned paths. This understanding can pave the way for future work in fine-tuning models to produce more reliable reasoning trajectories and developing more sophisticated learned classifiers for even greater efficiency and accuracy.


