TLDR: DRF is a new framework for multi-agent LLM systems that quantifies agent performance, measures honesty and capability through a dynamic reputation system, and uses an Upper Confidence Bound (UCB) strategy for efficient agent selection. It significantly improves task completion quality and collaboration efficiency in logical reasoning and code generation by identifying and prioritizing high-performing, trustworthy, and cost-effective agents.
The rapid advancement of generative AI has led to the emergence of multi-agent systems powered by large language models (LLMs). These systems are becoming increasingly adept at handling complex tasks, moving beyond single-model approaches to collaborative multi-agent modes. However, a significant challenge remains: accurately measuring the performance of individual agents and assessing their trustworthiness within these collaborative environments.
To address this, researchers have introduced DRF, a novel framework designed for LLM-Agent Dynamic Reputation Filtering. DRF aims to quantify agent performance, measure agent honesty and capability through a reputation scoring mechanism, and improve agent selection efficiency using an Upper Confidence Bound (UCB)-based strategy. Experimental results indicate that DRF substantially enhances task completion quality and collaboration efficiency in tasks such as logical reasoning and code generation.
Understanding the DRF Framework
DRF tackles three key limitations observed in many current multi-agent frameworks: the heavy reliance on human experts for predefined role allocation, an over-reliance on agent reliability without accounting for malicious interference or adversarial prompts, and a lack of differentiation in agent capabilities. The framework proposes an optimal multi-agent system that can form teams without specific task constraints, allowing agents to dynamically adapt to new requirements. It also includes a robust mechanism to identify capable and efficient agents while systematically eliminating underperforming or malicious participants.
The core contributions of DRF are threefold:
- It introduces an interactive rating network that dynamically assesses agent performance during task execution, providing a quantifiable measure of effectiveness.
- It develops a reputation iteration mechanism to rigorously evaluate agent reputation and capability, significantly reducing risks from low-efficiency agents.
- It integrates the rating network and reputation iteration mechanism into an adaptive UCB selection architecture, which has shown excellent performance in improving task completion quality and cost-effectiveness.
How DRF Works: Key Mechanisms
DRF operates with a novel agent team structure consisting of a core agent responsible for decision-making and control, and multiple task agents that execute and evaluate tasks. Key concepts within the framework include:
- Task and Round: A task is a problem solvable in finite steps, comprising multiple subtasks, each completable in a limited number of rounds.
- Reputation: An intrinsic property reflecting an agent’s trustworthiness and capability, quantified by a reputation score.
- Accuracy: Measures how well an agent’s task completion aligns with requirements.
- Cost: The proposed cost (e.g., API calls) an agent bids for participating in a task round.
The framework’s primary objectives are to minimize task execution cost and maximize the average payoff (accuracy) of the team.
Model Construction in Detail
DRF’s model construction involves three main components:
1. LLM-Agent Rating Network: This network quantifies agent performance. When a task is published, agents decide whether to participate. The network construction involves a “forward pass” where agents generate solutions and evaluate solutions from others. Inspired by Reflexion, LLMs are used as evaluators. A “backward pass” then calculates each agent’s score based on evaluations received, giving more weight to evaluations from agents with higher reputations.
2. LLM-Agent Reputation Iteration Mechanism: After each round, agents receive a task score. A reputation score is assigned to each agent to measure its credibility. If an agent’s task score meets or exceeds a predefined threshold, its reputation increases. Conversely, if it falls below the threshold, its reputation decreases, with a greater decline for lower scores, allowing for quicker identification of malicious or underperforming agents.
3. LLM-Agent Task Scheduling Strategy: Initially, agent attributes and reputations are unknown. DRF employs a redesigned UCB (Upper Confidence Bound) algorithm, a reinforcement learning method, to balance exploration (trying new agents) and exploitation (using known high-performing agents). This strategy helps identify agents with high reputation values and considers both reputation and cost when selecting agents for tasks. Agents exceeding a certain reputation threshold are deemed trustworthy and are exempted from further reputation checks.
Experimental Validation
The effectiveness and superiority of DRF were demonstrated through experiments on real-world datasets. For code tasks, the HumanEval dataset was used, while logical reasoning tasks were tested with the BigBench dataset. The experiments utilized agents powered by the DeepSeek-R1 model, simulating different capabilities (low, medium, high).
Results showed that DRF effectively increased the reputation of high-capability agents and decreased that of low-capability agents within a limited number of rounds. In comparative experiments against mainstream LLM agent frameworks like DyLAN, CodeT, and Reflexion, DRF consistently outperformed them in terms of task completion quality (Pass@1 metric for code tasks and accuracy for logical reasoning) and average cost. This success is attributed to DRF’s integrated rating network and reputation iteration mechanism, which efficiently identifies high-reputation agents, and its task scheduling strategy, which prioritizes these high-reputation, low-cost agents.
Also Read:
- Unlocking Deeper Logic in Language Models with Dynamic Rewards
- RAFFLES: A Smarter Way to Diagnose Failures in Advanced AI Systems
Conclusion and Future Outlook
DRF represents a significant step forward in LLM-based multi-agent frameworks. By combining an interactive rating network, a dynamic reputation iteration mechanism, and an adaptive UCB selection strategy, it enhances both collaboration efficiency and the quality of task completion. Unlike traditional systems, DRF leverages the reputation attribute of LLM agents through a reinforcement learning approach. The framework’s strong performance in logical reasoning and code generation tasks under various conditions highlights its potential.
Future research will explore integrating more advanced reinforcement learning algorithms, such as DQN, into DRF to handle even more complex and diverse tasks. Additionally, there are plans to investigate how to enhance LLMs with an experience-augmented reputation model, offering more sophisticated agent options for challenging tasks. You can read the full research paper here.


