TLDR: A new research paper introduces ‘Chain-of-Edits’ (CoE), a method that allows small language models (up to 3B parameters) to reason effectively by interacting with external tools via a custom language. Benchmarked on Python code repair, CoE significantly improves performance for smaller models, outperforming traditional ‘Chain-of-Thought’ methods and making advanced AI capabilities more accessible.
A groundbreaking new research paper introduces an innovative approach that enables smaller language models (LMs) to achieve sophisticated reasoning capabilities by integrating tool usage into their thought processes. Traditionally, large language models have relied on generating extensive ‘Chains-of-Thought’ (CoTs) in natural language to solve complex problems. However, this method often proves inefficient or ineffective for more compact models.
The paper, titled “Replacing thinking with tool usage enables reasoning in small language models,” proposes a paradigm shift. Instead of generating verbose natural language thoughts, models are trained to interact with a stateful external tool, such as a text editor, through a series of structured commands. This new method is dubbed ‘Chain-of-Edits’ (CoE).
The core idea behind CoE is to format the model’s ‘thinking’ tokens as a multi-turn interaction trace with a tool. At each step, the model observes the tool’s current state (e.g., code in an editor, execution feedback) and then generates a command in a custom Domain-Specific Language (DSL) to modify that state. This constrained interaction significantly reduces the model’s action space and provides a denser reward signal, which is crucial for effective learning, especially for smaller models.
The researchers benchmarked this approach on the challenging task of repairing malfunctioning Python code. Their training pipeline involves two key stages: Supervised Fine-Tuning (SFT) on synthetically generated demonstrations of CoE usage, followed by Reinforcement Learning with Verifiable Rewards (RLVR). Notably, both stages utilize Low-Rank Adaptation (LoRA), a technique that allows for efficient fine-tuning without modifying the entire model.
The results are particularly compelling for smaller models. The CoE approach led to significant performance improvements for models up to 3 billion parameters (1B and 3B Llama models) in code repair tasks. These models performed substantially better when using CoE compared to simply providing a direct answer or attempting natural language-based CoTs. In fact, traditional text-based CoT methods largely failed to induce reasoning behavior in these smaller models, often leading to repetitive or nonsensical outputs.
Interestingly, for a larger 8 billion parameter model, the benefits of CoE were less pronounced, and natural language reasoning (trained on a different dataset) showed better performance in some metrics. This suggests that while CoE is highly effective for smaller models, larger models might still leverage their extensive pre-training knowledge more effectively in a direct-answer or natural language reasoning setting.
Also Read:
- CodeAgents: Boosting LLM Agent Performance and Efficiency with Codified Reasoning
- Streamlining LLM Reasoning: A New Approach to Chain-of-Thought Compression
This research opens new avenues for democratizing access to advanced AI capabilities. By enabling smaller, more efficient language models to reason effectively through tool interaction, the findings could lead to more accessible and deployable AI systems for a variety of tasks. For more details, you can read the full research paper here.


