TLDR: A new survey explores how to make Large Reasoning Models (LRMs) more efficient by tackling the problem of unnecessarily long and redundant reasoning chains. It details two main strategies: Concise Thinking (shortening reasoning) and Adaptive Thinking (adjusting reasoning depth based on problem difficulty). The paper categorizes current methods into training-free approaches (like prompt-guided strategies, pipeline-based systems, decoding manipulation, and model merging) and training-based approaches (including fine-tuning with variable-length data and reinforcement learning with length penalties or difficulty awareness). The survey concludes by highlighting critical future challenges, such as integrating model capability awareness, human preferences, and trustworthiness into the development of more effective and reliable LRMs.
Large Reasoning Models (LRMs) like OpenAI o1 and DeepSeek R1 have shown remarkable abilities in tackling complex tasks such as advanced mathematics and programming. These models achieve their impressive performance by generating detailed, step-by-step reasoning sequences, often referred to as Chain-of-Thought (CoT). This deliberate, ‘slow thinking’ approach is a significant leap from traditional large language models (LLMs) that typically rely on ‘fast thinking’.
However, this powerful capability comes with a notable challenge: LRMs frequently produce overly lengthy and redundant reasoning chains, even for simple questions. This ‘overthinking’ phenomenon leads to a substantial waste of computational resources, increases response times for straightforward queries, and ultimately limits the practical application of LRMs in real-world products. To address this, researchers are focusing on two key areas: shortening these verbose reasoning chains (Concise Thinking) and enabling models to adapt their thinking style—switching between fast and slow thinking—based on the difficulty of the input (Adaptive Thinking).
Approaches to Concise and Adaptive Thinking
The survey, titled Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey, categorizes existing research into two main directions: training-free methods and training-based methods.
Training-Free Methods: These approaches aim to achieve concise and adaptive thinking without requiring additional model training. They are convenient for rapid deployment but often rely on the model’s inherent ability to follow instructions, which can be inconsistent.
- Prompt-Guided Strategies: This involves crafting specific instructions within the prompt, such as telling the model to “Be concise” or setting a strict token limit for the answer. While simple, models don’t always strictly adhere to these instructions, and forcing brevity can sometimes lead to less accurate or inconsistent results.
- Pipeline-Based Methods: These design modular workflows to offload computational tasks. A common example is using a ‘router’ that directs simple questions to smaller, faster models and only sends complex problems to the more powerful LRMs. While effective in reducing reasoning length, these pipelines can introduce their own overhead, potentially increasing overall response time.
- Decoding Manipulation: This involves dynamically adjusting the model’s output generation process. Techniques include ‘budget forcing’ (setting maximum/minimum reasoning tokens), ‘early exiting’ (stopping reasoning when the model is confident enough), manipulating token probabilities to suppress verbose phrases, or ‘activation steering’ (directly influencing the model’s internal states to control reasoning depth). Parallel scaling, where multiple shorter responses are generated simultaneously, is also explored.
- Model Merging: This combines a ‘slow-thinking’ LRM with a ‘fast-thinking’ LLM, often through weight averaging, to create a hybrid model that balances efficiency and accuracy.
Training-Based Methods: These approaches involve fine-tuning LRMs using specially prepared data or reinforcement learning to teach them adaptive reasoning behaviors.
- Fine-tuning: This involves training models on datasets that contain reasoning paths of varying lengths. Researchers construct these datasets by generating diverse reasoning structures and complexities. The fine-tuning process then teaches the model to dynamically adjust its reasoning length based on the query’s complexity, providing concise answers for simple questions and detailed ones for complex problems. This includes methods for compressing long CoTs, selecting short CoTs, or even internalizing the reasoning process (implicit CoT) so the model “thinks” without explicitly generating verbose steps.
- Reinforcement Learning (RL): RL methods use reward functions to encourage efficient reasoning. This can involve penalizing overly long responses, using advanced algorithms (GRPO variants) to manage different reasoning modes, or incorporating ‘difficulty-awareness’ to ensure appropriate reasoning depth for problems of varying complexity. Some RL approaches explicitly train models to switch between a detailed “Thinking” mode and a direct “No Thinking” mode.
Also Read:
- Unlocking Reliable AI Reasoning Through Hidden Cognitive Signals
- Boosting Mathematical Reasoning in LLMs: A Two-Stage Training Strategy for Accuracy and Efficiency
Challenges and Future Directions
Despite significant progress, several challenges remain. Current evaluations often focus solely on reasoning benchmarks, overlooking the impact on a model’s fundamental capabilities. There’s also a lack of unified evaluation standards, making it difficult to compare different methods.
Future research needs to focus on:
- Model Capability-aware Reasoning: Beyond just input difficulty, adaptive thinking should consider the LRM’s inherent knowledge and problem-solving abilities for specific questions.
- Human Preference-aware Reasoning: Aggressively shortening reasoning chains can reduce interpretability, which is a key strength of LRMs. Future work should ensure that reasoning processes align with human cognitive patterns and preferences, especially in sensitive domains like medicine or finance.
- Trustworthy Reasoning: The impact of concise and adaptive thinking on hallucination (e.g., “reasoning hallucination” where the steps are plausible but factually incorrect), safety, and instruction following needs more attention. Smarter models sometimes become less compliant with instructions, and the trade-off between reasoning capability and instruction adherence is a critical area for future study.
By addressing these challenges, researchers aim to develop LRMs that are not only powerful but also efficient, interpretable, and reliable for real-world applications.


