Beyond Discrete Tokens: Enhancing Reasoning in LLMs with Randomized Soft Thinking

TLDR: Large Language Models (LLMs) using “Soft Thinking” (reasoning with continuous probability distributions instead of discrete tokens) were thought to explore multiple reasoning paths simultaneously. However, this paper reveals that vanilla Soft Thinking acts greedily, predominantly relying on the most probable token, limiting its effectiveness. To overcome this “Greedy Pitfall,” the researchers introduce randomness using techniques like Gumbel-Softmax, which significantly improves performance by allowing LLMs to truly leverage the continuous concept space and explore diverse reasoning paths.

Large Language Models (LLMs) have made remarkable strides in various tasks, largely due to techniques like Chain-of-Thought (CoT) reasoning. CoT allows LLMs to break down complex problems into intermediate steps, much like human thought. However, traditional CoT relies on generating discrete, distinct tokens, which can limit the model’s ability to explore alternative solutions and reason beyond the confines of natural language.

Drawing inspiration from human cognition, where thought often involves abstract and fluid concepts, researchers have been exploring “Soft Thinking” or “Latent CoT” approaches. These methods aim to enable LLMs to reason within a continuous concept space by using hidden states or probability distributions (Soft Tokens) instead of discrete tokens. The idea is that Soft Tokens can carry more information and allow LLMs to explore multiple potential reasoning paths simultaneously.

The Unexpected Reality: LLMs Have a ‘Heart of Stone’

Contrary to the common belief that Soft Thinking inherently allows for the simultaneous exploration of diverse reasoning paths, a recent research paper titled “LLMs Have a Heart of Stone: Demystifying the Soft Thinking Ability of Large Reasoning Models” reveals a different story. The authors, Ch¨unhung Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, and Hua Wu from Baidu Inc., found that vanilla Soft Thinking implementations often underperform compared to traditional discrete token thinking.

Through a series of probing techniques, the researchers discovered that LLMs, when presented with Soft Tokens, predominantly rely on the single most influential component (the token with the highest probability) in the soft input for subsequent decoding steps. This creates a feedback loop, essentially reducing vanilla Soft Thinking to a form of greedy decoding. This “Greedy Pitfall” prevents the LLM from fully leveraging the richer information contained in Soft Tokens and exploring alternative reasoning trajectories.

For instance, their analysis showed that the model’s prediction behavior during a Soft Thinking step closely aligns with what would happen if only the highest probability token was considered. The influence of the second-highest probability token was found to be very limited. Furthermore, by tracking the internal hidden states, they observed that while the model might initially consider multiple paths, it quickly prunes away less dominant ones, favoring the most self-assured reasoning path.

Unleashing Potential: The Power of Randomness

Recognizing that the issue wasn’t with Soft Thinking itself but with its inherent greedy tendency, the researchers explored ways to introduce randomness into the process. The goal was to generate “randomized Soft Tokens” that are still valid probability distributions, reflect the original predictive information, and remain ‘soft’ (not collapsing into a single, discrete token).

Two main approaches were investigated: Dirichlet Sampling and the Gumbel-Softmax trick. Both methods aim to inject randomness into the Soft Token generation process. Experiments conducted on various reasoning benchmarks using mainstream LLMs like Deepseek-R1-Distill-Qwen-32B, QwQ-32B, and Skywork-OR1-32B demonstrated significant improvements.

Both randomized Soft Thinking approaches outperformed vanilla Soft Thinking, effectively mitigating the “Greedy Pitfall.” Notably, the Gumbel-Softmax trick consistently achieved performance gains that surpassed even discrete Token Thinking. This method proved superior because it allows for a balanced adjustment of both randomness and ‘softness’ through a temperature hyperparameter, ensuring that the Soft Tokens remain diverse yet interpretable.

The paper also provides theoretical justification for the Gumbel-Softmax trick’s optimality, linking it to Luce’s choice axiom, which ensures that selection probabilities accurately reflect the relative preferences among items. This theoretical backing, combined with strong experimental results, positions the Gumbel-Softmax trick as an effective way to unlock the true potential of Soft Thinking.

Also Read:

Looking Ahead

This research deepens our understanding of how LLMs reason internally and highlights a critical limitation in previous Soft Thinking implementations. By identifying and addressing the “Greedy Pitfall” through controlled randomness, particularly with the Gumbel-Softmax trick, the paper paves the way for more effective and robust reasoning capabilities in large language models. This work not only enhances current Soft Thinking approaches but also lays a foundation for future advancements, including reinforcement learning training for LLMs. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond Discrete Tokens: Enhancing Reasoning in LLMs with Randomized Soft Thinking

The Unexpected Reality: LLMs Have a ‘Heart of Stone’

Unleashing Potential: The Power of Randomness

Looking Ahead

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Upwork Study Reveals AI Agents Thrive with Human Collaboration, Struggle Alone

Frontier AI Models Show Advanced Planning Skills, Rivaling Specialized Planners in 2025

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates