TLDR: DTS (Decoding Tree Sketching) is a new, training-free framework that improves Large Reasoning Models (LRMs) by reducing “overthinking.” It works by selectively exploring promising reasoning paths in parallel and stopping early to pick the shortest, most accurate solution. This approach boosts accuracy by up to 8%, cuts reasoning length by 23%, and significantly reduces repetitive outputs, making LRMs more efficient and reliable.
Large Reasoning Models (LRMs) have shown impressive capabilities in complex tasks like mathematics and programming. However, they often “overthink,” producing excessively long chains of thought (CoT) that increase costs and can even reduce accuracy. This phenomenon, where longer reasoning paths accumulate errors and repetitions, is a significant challenge for the practical application of LRMs.
Researchers have observed a clear inverse relationship: shorter reasoning paths consistently achieve higher correctness, while longer ones tend to degrade in accuracy. Ideally, finding these short, optimal paths would involve exploring the entire tree-structured reasoning space, but this space grows exponentially, making exhaustive exploration impossible.
To tackle this, a new framework called DTS (Decoding Tree Sketching) has been introduced. DTS is a model-agnostic decoding framework designed to enhance both the efficiency and accuracy of LRMs without requiring any additional training or supervision. It works by intelligently “sketching” the reasoning space during the decoding process.
How DTS Works
DTS operates by constructing a dynamic reasoning tree at inference time. Instead of blindly expanding every possible path, DTS selectively branches out only at points where the model’s next-token prediction is highly uncertain (indicated by high entropy). When the model is confident, it continues along a single path. This selective branching allows DTS to capture the most essential parts of the reasoning tree, forming a compact “backbone.”
All these potential reasoning paths are generated in parallel. A key feature of DTS is its “early stopping” strategy. Based on the observation that shorter paths are often more accurate, DTS stops as soon as any of its parallel branches successfully completes a reasoning path with an ending token. The first completed path, which is by definition the shortest, is then selected as the final answer. This approach directly aligns with the empirical finding that concise reasoning often leads to higher accuracy.
For example, imagine asking an LRM to calculate the area of a rectangle. A standard LRM might go through many unnecessary steps or even get stuck in a loop. DTS, however, would explore a few promising paths in parallel. If one path quickly arrives at “The area is length × width. Here, length = 12 and width = 9. So area = 12 × 9 = 108,” DTS would immediately select this shortest, correct path, preventing the model from overthinking. You can read more about this innovative approach in the full research paper: DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching.
Also Read:
- DART: Tailoring LLM Thought Processes for Efficiency
- DeepCompress: Adaptive AI Reasoning for Better Performance and Efficiency
Significant Improvements
Experiments conducted on the AIME2024 and AIME2025 datasets using DeepSeek-R1-Distill-Qwen-7B and 1.5B models demonstrated impressive results. DTS improved accuracy by up to 8% and significantly reduced the average reasoning length by 23%. Furthermore, it decreased the frequency of repetitive reasoning by 12%, a common issue where models get stuck in endless loops, consuming resources without progress.
The framework’s training-free and model-agnostic design makes it a plug-and-play solution, easily integrated into existing LRM setups without the need for extensive retraining or labeled data. Its ability to leverage GPU parallelism also ensures efficient and scalable optimization of reasoning paths.
In conclusion, DTS offers a robust solution to the “overthinking” problem in Large Reasoning Models. By intelligently sketching the decoding tree and prioritizing concise, accurate paths, DTS not only boosts performance but also makes LRM reasoning more efficient and reliable, paving the way for more practical and scalable AI applications.


