spot_img
HomeResearch & DevelopmentKlear-AgentForge: A New Open-Source Pipeline for Building Versatile AI...

Klear-AgentForge: A New Open-Source Pipeline for Building Versatile AI Agents

TLDR: Klear-AgentForge introduces an open-source pipeline for training high-performance AI agents, starting with the Qwen3-8B model. It uses supervised fine-tuning with synthetic data and multi-turn reinforcement learning to excel in diverse tasks like tool use and coding, achieving state-of-the-art results for its size and demonstrating competitive performance against much larger models. The research highlights the effectiveness of their training methodology and explores scaling factors for model and data, as well as the challenges and benefits of different RL strategies and test-time scaling.

The world of artificial intelligence is rapidly evolving, with a growing focus on ‘agentic’ models. Unlike traditional AI that gives a single response, agentic models can act autonomously over multiple steps to achieve complex goals, much like a human problem-solver. This capability is particularly valuable in areas like coding, where tasks often require planning, execution, and self-correction through several reasoning cycles.

However, developing these sophisticated AI agents, especially open-source ones, has been challenging due to the lack of detailed post-training methodologies. This is where Klear-AgentForge steps in, presenting a comprehensive and fully open-source pipeline for training high-performance agentic models.

Introducing Klear-AgentForge

Developed by the Klear Team at Kuaishou Technology, Klear-AgentForge is a new framework designed to enhance the agentic capabilities of large language models (LLMs). The project specifically focuses on building Klear-Qwen3-AgentForge, starting from the Qwen3-8B base model. The core idea is to unlock the potential for diverse agentic tasks, including tool use and coding, through a two-stage training process.

The Training Recipe: SFT and RL

The training methodology behind Klear-AgentForge involves two key phases:

1. Supervised Fine-Tuning (SFT): This initial stage involves training the model on a massive dataset of around 2.4 billion tokens. This data is a mix of high-quality open-source datasets and specially synthesized data for agentic tool use and coding. For tool-use, multi-turn prompting with powerful LLMs is used to generate new tools, tasks, and conversations. For coding, data includes problems from code contests and software engineering tasks from GitHub repositories, with a focus on creating ‘buggy code – fix patch – tests passing’ triplets to teach the model how to correct errors.

2. Reinforcement Learning (RL): Following SFT, the model undergoes multi-turn reinforcement learning. This is crucial for agentic tasks, as it allows the model to learn from interactions with various environments over longer sequences of actions. To overcome the challenge of ‘sparse rewards’ (where feedback is only available at the end of a long sequence), Klear-AgentForge uses a fine-grained reward mechanism that provides localized feedback at intermediate actions. The training also employs a ‘disaggregated architecture’ to improve efficiency, separating the process of generating model responses (rollouts) from the actual training updates.

Performance and Key Findings

Klear-AgentForge-8B has demonstrated impressive results across various agentic benchmarks. It significantly outperforms official post-trained Qwen3-8B models in both ‘Thinking’ and ‘Non-Thinking’ modes. The model shows strong performance in tool-use benchmarks like BFCL v3 and Ï„-bench (Retail and Airline domains), as well as coding benchmarks such as SWE-bench Verified and Aider Polyglot. Notably, Klear-AgentForge-8B, despite being an 8B parameter model, competes effectively with much larger models, even matching the performance of some 32B systems in coding tasks.

The research also explored several scaling factors:

  • Model Scaling: While larger models generally perform better, the 8B models showed a more significant performance gain through in-domain data fine-tuning, suggesting that smaller models can rapidly enhance their agentic capabilities.
  • Data Scaling: Increasing both the number of unique prompts and the number of trajectories per prompt led to similar improvements in model accuracy.
  • Reasoning Data: Interestingly, incorporating reasoning-focused SFT data did not directly enhance agentic capabilities; in fact, it led to performance drops, indicating that a careful design of data mix and training strategies is needed for models to excel in both.

In the RL analysis, the disaggregated training framework proved to be more efficient, boosting training speed by about 32%. The study also compared multi-task RL training with a model merging strategy. While both approaches yielded performance gains, multi-task RL training requires careful monitoring to prevent training collapse and does not inherently produce a synergistic boost across tasks. Model merging, while efficient, sometimes led to slight performance drops in coding tasks, possibly due to the imbalance in training data volume.

Test-time scaling, where multiple candidate solutions are generated and then selected, showed that increasing candidate diversity improves solution coverage. However, the overall performance gains are still limited by the effectiveness of the verification strategy. The ‘Agent Confidence Select’ method, which uses internal confidence estimation, showed the most stable improvements.

Also Read:

Looking Ahead

The Klear-AgentForge project aims to continue exploring ways to invoke multi-domain agentic abilities through post-training scaling. Future work will focus on ‘mid-training’ to smoothly transition base models to agentic ones, developing longer and broader RL training methods, and further researching small agentic LLMs. The team believes that while larger models currently show stronger performance, small language models offer a more compelling path for agentic AI due to their efficiency and suitability for high-volume use. A promising direction might be to train specialized small models for specific tasks and then use a meta-agent to intelligently combine their strengths.

For more in-depth technical details, you can read the full research paper here: Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -