TLDR: Dingtalk-DeepResearch is a new multi-agent AI framework from Alibaba Group’s Dingtalk, designed for complex enterprise tasks. It unifies deep research, heterogeneous table reasoning, and multimodal report generation. Key features include an entropy-guided, memory-aware online learning mechanism for continuous adaptation without LLM retraining, and DingAutoEvaluator, an automated evaluation engine that drives improvement through a data flywheel. The framework is already deployed in corporate workflows and will soon be available as a service, offering robust and adaptive intelligence for evolving business needs.
The Industrial Brain Team at Dingtalk, Alibaba Group, has unveiled a groundbreaking multi-agent intelligence framework called Dingtalk-DeepResearch. This innovative system is designed to tackle the complex and ever-evolving demands of real-world enterprise environments, offering advanced capabilities in deep research, heterogeneous table reasoning, and multimodal report generation.
Unlike traditional, static AI architectures, Dingtalk-DeepResearch introduces a dynamic approach where its agents can continuously learn and adapt. This is achieved through an entropy-guided, memory-aware online learning mechanism. Essentially, the system intelligently retrieves valuable past experiences from an episodic memory bank and explores diverse historical contexts. This process refines the agents’ reasoning and planning abilities without the need to retrain the underlying large language model (LLM), ensuring remarkable adaptability to new and changing tasks.
Driving Continuous Improvement with DingAutoEvaluator
A cornerstone of this framework is DingAutoEvaluator, an automated evaluation engine crucial for sustained improvement. This engine employs uncertainty-aware case mining, identifying instances where the model operates at the edge of its competence. These ‘grey-zone’ outputs are then prioritized for expert review, creating a high-value feedback loop. DingAutoEvaluator utilizes multi-dimensional metrics to assess performance across various stages, including retrieval, generation, LLM performance, reasoning quality, agent orchestration, and knowledge base health. This comprehensive evaluation forms a ‘data flywheel’ that not only prevents performance regression but also enriches training data, driving a closed-loop optimization process.
The collected cases feed directly into document-reward modeling and a multi-stage documentary reinforcement learning process. This learning occurs across both static and live environments, significantly enhancing factual accuracy, structural quality, and user alignment in generated documents. Beyond document generation, the same evaluation-driven methodology is applied to complex table parsing, retrieval, and reasoning. Feedback from DingAutoEvaluator—including structural fidelity checks, context-aware decomposition, metric-guided retrieval tuning, and SQL-based symbolic verification—helps the system identify and correct errors in heterogeneous table question answering. This, in turn, fine-tunes the NL2SQL generator, leading to iterative improvements in table-reasoning accuracy and robustness.
A Multi-Layered Architecture for Enterprise Intelligence
The Dingtalk-DeepResearch framework is structured into three distinct yet integrated layers:
- Dingtalk-DeepResearch Agent Studio: This layer provides a suite of professional agents for deep research, tabular processing, and data analytics, alongside options for customizable personal agents.
- Dingtalk-DeepResearch Core: This central component integrates context compression, reasoning and planning, long/short-term memory, human-in-the-loop control, and the self-evolution engine. It also includes integrated tools for code execution, web search, file and tabular retrieval, multimodal processing, and seamless connectivity to the enterprise ecosystem, including Dingtalk’s internal files, messages, and tasks.
- Dingtalk-DeepResearch Data Layer: This unified data backbone encompasses knowledge graphs, databases, caches, and multimodal datasets (dialogue, audio, image, video, graph, text, tabular) from business, industry, personal, and synthetic sources, enabling intelligent correlation and retrieval of diverse corporate and sector-specific data.
The framework’s ability to handle heterogeneous data sources, perform multi-step reasoning, and generate structured or multimodal outputs addresses critical limitations of existing deep research systems, which often struggle with adaptive optimization, long-term memory, and the integration of structured and unstructured data.
Also Read:
- FM Agent: A New AI Framework for Autonomous Scientific Discovery and Optimization
- Expanding AI Reasoning: A New Approach to Zero Reinforcement Learning for Diverse Tasks
Real-World Applications and Future Outlook
Dingtalk-DeepResearch has already been validated in production environments, demonstrating consistent gains in accuracy, structural quality, and user alignment. It is currently operational in mission-critical corporate workflows and is slated to be available soon as a service within Dingtalk, offering broader access and hands-on experience. The research paper highlights several showcases, including complicated tabular parsing, retrieval, and reasoning in manufacturing and supply chain scenarios, as well as semantically aligned vision-language fusion for multimodal document generation, such as a Kaggle competition case study for supermarket sales forecasting and a detailed market research report on video editing software. These examples underscore the system’s practical robustness and scalability in high-stakes operational contexts.
For more in-depth technical details, you can refer to the original research paper: Dingtalk-DeepResearch: A Unified Multi-Agent Framework for Adaptive Intelligence in Enterprise Environments.


