spot_img
HomeResearch & DevelopmentBeyond Discrete Tokens: Enhancing Reasoning in LLMs with Randomized...

Beyond Discrete Tokens: Enhancing Reasoning in LLMs with Randomized Soft Thinking

TLDR: Large Language Models (LLMs) using “Soft Thinking” (reasoning with continuous probability distributions instead of discrete tokens) were thought to explore multiple reasoning paths simultaneously. However, this paper reveals that vanilla Soft Thinking acts greedily, predominantly relying on the most probable token, limiting its effectiveness. To overcome this “Greedy Pitfall,” the researchers introduce randomness using techniques like Gumbel-Softmax, which significantly improves performance by allowing LLMs to truly leverage the continuous concept space and explore diverse reasoning paths.

Large Language Models (LLMs) have made remarkable strides in various tasks, largely due to techniques like Chain-of-Thought (CoT) reasoning. CoT allows LLMs to break down complex problems into intermediate steps, much like human thought. However, traditional CoT relies on generating discrete, distinct tokens, which can limit the model’s ability to explore alternative solutions and reason beyond the confines of natural language.

Drawing inspiration from human cognition, where thought often involves abstract and fluid concepts, researchers have been exploring “Soft Thinking” or “Latent CoT” approaches. These methods aim to enable LLMs to reason within a continuous concept space by using hidden states or probability distributions (Soft Tokens) instead of discrete tokens. The idea is that Soft Tokens can carry more information and allow LLMs to explore multiple potential reasoning paths simultaneously.

The Unexpected Reality: LLMs Have a ‘Heart of Stone’

Contrary to the common belief that Soft Thinking inherently allows for the simultaneous exploration of diverse reasoning paths, a recent research paper titled “LLMs Have a Heart of Stone: Demystifying the Soft Thinking Ability of Large Reasoning Models” reveals a different story. The authors, Ch¨unhung Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, and Hua Wu from Baidu Inc., found that vanilla Soft Thinking implementations often underperform compared to traditional discrete token thinking.

Through a series of probing techniques, the researchers discovered that LLMs, when presented with Soft Tokens, predominantly rely on the single most influential component (the token with the highest probability) in the soft input for subsequent decoding steps. This creates a feedback loop, essentially reducing vanilla Soft Thinking to a form of greedy decoding. This “Greedy Pitfall” prevents the LLM from fully leveraging the richer information contained in Soft Tokens and exploring alternative reasoning trajectories.

For instance, their analysis showed that the model’s prediction behavior during a Soft Thinking step closely aligns with what would happen if only the highest probability token was considered. The influence of the second-highest probability token was found to be very limited. Furthermore, by tracking the internal hidden states, they observed that while the model might initially consider multiple paths, it quickly prunes away less dominant ones, favoring the most self-assured reasoning path.

Unleashing Potential: The Power of Randomness

Recognizing that the issue wasn’t with Soft Thinking itself but with its inherent greedy tendency, the researchers explored ways to introduce randomness into the process. The goal was to generate “randomized Soft Tokens” that are still valid probability distributions, reflect the original predictive information, and remain ‘soft’ (not collapsing into a single, discrete token).

Two main approaches were investigated: Dirichlet Sampling and the Gumbel-Softmax trick. Both methods aim to inject randomness into the Soft Token generation process. Experiments conducted on various reasoning benchmarks using mainstream LLMs like Deepseek-R1-Distill-Qwen-32B, QwQ-32B, and Skywork-OR1-32B demonstrated significant improvements.

Both randomized Soft Thinking approaches outperformed vanilla Soft Thinking, effectively mitigating the “Greedy Pitfall.” Notably, the Gumbel-Softmax trick consistently achieved performance gains that surpassed even discrete Token Thinking. This method proved superior because it allows for a balanced adjustment of both randomness and ‘softness’ through a temperature hyperparameter, ensuring that the Soft Tokens remain diverse yet interpretable.

The paper also provides theoretical justification for the Gumbel-Softmax trick’s optimality, linking it to Luce’s choice axiom, which ensures that selection probabilities accurately reflect the relative preferences among items. This theoretical backing, combined with strong experimental results, positions the Gumbel-Softmax trick as an effective way to unlock the true potential of Soft Thinking.

Also Read:

Looking Ahead

This research deepens our understanding of how LLMs reason internally and highlights a critical limitation in previous Soft Thinking implementations. By identifying and addressing the “Greedy Pitfall” through controlled randomness, particularly with the Gumbel-Softmax trick, the paper paves the way for more effective and robust reasoning capabilities in large language models. This work not only enhances current Soft Thinking approaches but also lays a foundation for future advancements, including reinforcement learning training for LLMs. You can read the full research paper here.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -