TLDR: Sticker-TTS is a novel AI framework that improves the reasoning capabilities of large language models (LRMs) by learning from past attempts. It uses ‘stickers’—distilled key conditions—to guide an iterative process involving a Sticker Extractor, Modifier, and Utilizer. This two-stage trained system (imitation learning and self-improvement) consistently outperforms existing methods on mathematical reasoning benchmarks, demonstrating enhanced efficiency and scalability by effectively leveraging historical experience.
Large reasoning models (LRMs) have shown impressive capabilities in tackling complex reasoning tasks, and their performance can often be boosted by increasing the computational resources used during inference. However, many existing methods for scaling performance at test-time often rely on simply trying multiple solutions independently, which can be inefficient because they don’t learn from past attempts.
A new framework called Sticker-TTS aims to change this by learning from historical experience. It introduces a novel approach that coordinates three collaborative LRMs to iteratively explore and refine solutions, guided by what it calls “stickers.” These stickers are essentially distilled key conditions or critical information extracted from previous reasoning attempts.
At the heart of Sticker-TTS are these “stickers,” which are compact sets of essential solution cues. They are extracted, refined, and reused across multiple rounds of reasoning. This process helps the models focus on critical information without getting bogged down by overly verbose reasoning histories or overly brief final answers that lack detail for revision.
The Sticker-TTS framework consists of three main components:
Sticker Extractor
This component is responsible for distilling concise and relevant insights, the “stickers,” from the model’s previous reasoning steps. It captures the primary strategy and identifies weaknesses in an existing reasoning trace, summarizing them into a structured sticker.
Sticker Modifier
The Sticker Modifier examines the extracted stickers for any potential errors, such as computational mistakes or flaws in methodology. It then applies necessary corrections, generating a revised sticker that helps address previously identified weaknesses in the reasoning.
Also Read:
- ProToM: An AI That Understands and Encourages Helpful Behavior Among Independent Agents
- Unlocking Innovation: How Functional Concept Graphs Inspire Creative Solutions
Sticker Utilizer
This component integrates the modified sticker with the original question and the previous answer to generate a new, enhanced reasoning path. This new path then serves as input for the next iteration, allowing for progressive refinement of the solution.
To make this collaborative framework effective, Sticker-TTS employs a two-stage optimization strategy. The first stage involves imitation learning, where the extractor and modifier are trained on a dataset of distilled examples. The second stage uses self-improvement, where the full framework generates its own reasoning trajectories, which are then filtered and used to iteratively refine the modules. This cycle of generation and retraining continuously improves the model’s reasoning ability.
Extensive evaluations were conducted on challenging mathematical reasoning benchmarks, including AIME-24, AIME-25, and OlymMATH. Sticker-TTS consistently outperformed strong baselines, such as self-consistency methods and advanced reinforcement learning approaches, even when using comparable computational resources. For instance, it achieved a 12.42% relative improvement over self-consistency on AIME-25 with a 7B model.
The framework also demonstrated scalability across different model sizes, showing considerable improvements with both 7B and 32B parameter variants. This indicates that Sticker-TTS can effectively divide labor among its components, regardless of the base model’s size, leading to coherent collaboration and specialized task execution.
Furthermore, Sticker-TTS proved to be highly efficient in its reasoning. The “stickers” themselves are lightweight and incur minimal overhead, as they replace lengthy reasoning traces. The main computational cost comes from the Sticker Modifier and Utilizer. This efficiency means that the total reasoning cost for N iterations is comparable to generating 2N long-form solutions, allowing Sticker-TTS to achieve superior performance with a favorable reasoning cost.
The research highlights the effectiveness of using sticker-guided historical experience to enhance test-time scaling. For more technical details, you can refer to the original research paper.


