TLDR: A new research paper introduces AsyncThink, a novel reasoning paradigm where large language models (LLMs) learn to organize their internal thinking processes into concurrently executable structures. Through an ‘organizer-worker’ protocol and a two-stage reinforcement learning approach, AsyncThink enables LLMs to dynamically assign sub-queries, merge knowledge, and produce coherent solutions collaboratively. This method significantly improves accuracy and reduces inference latency compared to sequential and parallel thinking, and demonstrates strong generalization to unseen tasks, paving the way for more efficient and adaptive AI agentic organizations.
A new vision for artificial intelligence, termed ‘agentic organization,’ is emerging, where AI agents collaborate to solve complex problems that go beyond what a single AI can achieve. This approach aims to create organizational systems where multiple agents work together, much like a team of humans, to tackle challenges more effectively.
While large language models (LLMs) have shown impressive individual reasoning abilities, enabling them to work collaboratively as an organized system has presented several hurdles. Traditional parallel thinking methods, which run multiple independent thought processes and then combine their results, often suffer from high latency because they are limited by the slowest process and the time taken for final aggregation. Furthermore, these methods typically rely on fixed, pre-designed workflows that struggle to adapt to the diverse requirements of different tasks.
To overcome these limitations, researchers at Microsoft Research have introduced a new reasoning paradigm called Asynchronous Thinking, or AsyncThink. This innovative approach teaches large language models to organize their internal thinking processes into structures that can be executed concurrently. The goal is to allow LLMs to learn how to dynamically manage and coordinate multiple thought processes, leading to more efficient and accurate problem-solving.
How AsyncThink Works: The Organizer-Worker Protocol
At the heart of AsyncThink is an ‘organizer-worker’ thinking protocol. In this setup, a single LLM plays two distinct roles: an organizer and multiple workers. The organizer acts as the central coordinator, dynamically structuring the thinking process, while the workers execute individual sub-queries assigned by the organizer.
The organizer uses a set of specific actions to manage the workflow:
-
Think: The organizer advances its own decoding process.
-
Fork: The organizer assigns a sub-query to an available worker. This allows a new thinking job to begin concurrently.
-
Join: The organizer requests the output from a previously ‘Forked’ thinking job. If the worker is still processing, the organizer pauses and waits for the result, which is then integrated into its own context.
-
Answer: The organizer terminates the process and provides the final solution.
Workers, on the other hand, receive sub-queries from the organizer and independently carry out their thinking tasks. Once a worker completes its task, it sends the result back to the organizer. This protocol allows for flexible and dynamic reasoning, enabling the model to explore various execution structures.
Learning to Organize: A Two-Stage Training Process
AsyncThink models are trained using a two-stage procedure:
-
Cold-Start Format Fine-Tuning: Initially, the model is fine-tuned on synthetic data to learn the syntax and format of the AsyncThink actions (Fork, Join, etc.). This stage teaches the model how to use the protocol correctly, even if it doesn’t yet understand how to solve problems with it.
-
Reinforcement Learning: In the second stage, the model is further optimized using reinforcement learning. It explores different thinking structures and refines its asynchronous thinking capabilities. The training is guided by a reward system that encourages:
-
Accuracy: Rewards for correct final answers.
-
Format Compliance: Penalties for errors in using the Fork-Join protocol.
-
Thinking Concurrency: Rewards for efficiently organizing thinking processes into concurrently executable parts, promoting parallel work.
-
Impressive Results and Generalization
Experiments conducted on tasks such as multi-solution countdown, mathematical reasoning (AMC-23 and AIME-24), and Sudoku demonstrated that AsyncThink consistently achieves higher accuracy while significantly reducing latency compared to traditional sequential and parallel thinking models. For instance, it achieved 28% lower inference latency than parallel thinking while improving accuracy on mathematical reasoning.
Remarkably, AsyncThink also showed strong generalization capabilities. Models trained solely on simple countdown data were able to perform zero-shot asynchronous thinking on previously unseen tasks, like Sudoku, with superior performance and lower latency. This indicates that AsyncThink learns a generalizable organization policy rather than just task-specific patterns.
The research highlights that short, organized thinking fragments can collectively achieve high problem-solving quality under a learned organization policy. This approach allows LLMs to adapt, explore, and optimize their thinking behavior according to task demands, distributing work across multiple agents rather than relying on a single, linear thought process.
This work formalizes the ‘learning-to-organize’ problem, introducing a new reasoning paradigm that allows LLMs to learn how to structure their internal thinking into concurrently executable processes through reinforcement learning. For more details, you can refer to the full research paper here.
Also Read:
- Language Models Enhance Decentralized Multi-Agent Goal Assignment
- Orion: Accelerating LLM Reasoning for Real-Time Web Applications
Future Directions for Agentic Organization
The researchers envision several exciting future directions for agentic organization:
-
Scaling with Massive and Diverse Agents: Exploring how accuracy and latency evolve with hundreds or thousands of workers, and integrating heterogeneous expert agents equipped with various external tools (e.g., code interpreters, web search).
-
Recursive Agentic Organization: Allowing workers to become sub-organizers, creating hierarchical structures for deeply nested and complex problems.
-
Human-AI Agentic Organization: Integrating humans directly into the organizational framework, either as organizers delegating tasks to AI workers or as workers providing human judgment for AI-forked tasks.
These advancements pave the way for a new era of AI where collective intelligence and learned organizational strategies enable unprecedented problem-solving capabilities.


