TLDR: Tiny Recursive Model (TRM), a new approach using a single, two-layer neural network with only 7 million parameters, significantly outperforms larger Hierarchical Reasoning Models (HRM) and many Large Language Models (LLMs) on challenging puzzle tasks like Sudoku, Maze, and ARC-AGI. TRM achieves this by simplifying recursive reasoning, eliminating complex theoretical assumptions, using a single network, and optimizing training efficiency, demonstrating that “less is more” for generalization on small datasets.
Large Language Models (LLMs) have made incredible strides, but they often struggle with complex puzzle-solving tasks like Sudoku, Maze navigation, and the ARC-AGI challenges. These problems require deep reasoning, and LLMs, which generate answers step-by-step, can easily make errors that invalidate the entire solution. While techniques like Chain-of-Thought (CoT) and Test-Time Compute (TTC) aim to improve reliability, they can be expensive, require high-quality data, and don’t always guarantee success.
An earlier approach, the Hierarchical Reasoning Model (HRM), offered a promising alternative. HRM used two small neural networks that “recurse” or repeat their operations at different frequencies, inspired by how the brain processes information. Combined with a technique called “deep supervision” (improving answers over multiple steps), HRM showed impressive results on these hard puzzle tasks, even outperforming LLMs with far fewer parameters. However, HRM was quite complex, relying on intricate biological arguments and mathematical theorems that weren’t always perfectly applicable. Its training process also required two forward passes for a feature called Adaptive Computational Time (ACT), making it less efficient.
Introducing Tiny Recursive Models (TRM)
A new research paper, “Less is More: Recursive Reasoning with Tiny Networks,” introduces the Tiny Recursive Model (TRM), a significantly simpler and more effective approach. TRM achieves even higher generalization than HRM, using a single, much smaller network with only two layers. With just 7 million parameters, TRM outperforms many LLMs (like Deepseek R1, o3-mini, and Gemini 2.5 Pro) on ARC-AGI-1 and ARC-AGI-2, using less than 0.01% of their parameters. You can read the full paper here: Less is More: Recursive Reasoning with Tiny Networks.
Simplifying Recursive Reasoning
TRM addresses several complexities of HRM:
- No Fixed-Point Theorem Needed: HRM relied on a mathematical theorem to justify only back-propagating through the last few steps of its recursion. TRM bypasses this by back-propagating through the entire recursion process, which, surprisingly, leads to a massive boost in performance without needing complex theoretical assumptions.
- Clearer Latent Features: HRM used two latent features, zL and zH, with a hierarchical interpretation based on biological arguments. TRM simplifies this, viewing one feature (y) as the current proposed solution and the other (z) as a latent reasoning feature. This intuitive explanation clarifies why two features are optimal without needing complex biological justifications.
- Single, Tiny Network: HRM used two separate networks, doubling its parameter count. TRM demonstrates that a single network is sufficient to perform both tasks of iterating on the latent reasoning and updating the solution, significantly reducing parameters while improving generalization.
- Less is More in Layers: Counter-intuitively, TRM found that using fewer layers (2 instead of 4) in its network, while increasing the number of recursions, led to better generalization. This suggests that for tasks with limited data, smaller networks with deep recursion can prevent overfitting.
- Efficient Training with Simplified ACT: HRM’s Adaptive Computational Time (ACT) mechanism, designed to speed up training, required an extra forward pass. TRM simplifies ACT by removing the “continue loss,” eliminating the need for this expensive second pass without compromising accuracy.
- Enhanced Stability with EMA: For datasets with small amounts of training data, TRM incorporates Exponential Moving Average (EMA) of weights, a technique commonly used in other advanced models, to improve stability and prevent overfitting.
Impressive Performance Gains
TRM shows significant improvements across various benchmarks:
- On Sudoku-Extreme, TRM-MLP achieved 87.4% test accuracy, a substantial leap from HRM’s 55.0%.
- For Maze-Hard, TRM-Att reached 85.3% accuracy, compared to HRM’s 74.5%.
- On ARC-AGI-1, TRM-Att scored 44.6%, surpassing HRM’s 40.3%.
- And on the more challenging ARC-AGI-2, TRM-Att achieved 7.8% accuracy, higher than HRM’s 5.0%.
These results are particularly noteworthy because TRM achieves them with significantly fewer parameters (7M for TRM-Att vs. 27M for HRM), demonstrating remarkable efficiency.
Also Read:
- Efficiently Verifying AI’s Step-by-Step Thinking with NCV
- Unlocking Smarter LLM Reasoning: Introducing AutoMR’s Dynamic Approach
The Future of Recursive Reasoning
TRM represents a significant step forward in solving complex reasoning tasks with small, efficient models. By simplifying the underlying mechanisms and focusing on effective recursion, it offers a powerful alternative to large, resource-intensive LLMs for specific problem types. While currently a supervised learning method, future work could explore extending TRM to generative tasks, allowing it to produce multiple possible solutions.


