Self-Evolution Agent: A New Approach to Computer Control

TLDR: SEA is a 7-billion parameter AI agent designed for computer use that achieves high performance by introducing a novel closed-loop data generation pipeline for verifiable tasks, an efficient step-wise reinforcement learning strategy with multiple reward types to overcome sparse rewards in long-horizon tasks, and a grounding-based generalization enhancement method that merges planning and perception abilities. This approach allows SEA to outperform models of similar scale and compete with larger models on complex computer control benchmarks.

In the rapidly evolving field of artificial intelligence, the concept of a ‘computer use agent’ is gaining significant traction. These agents are designed to operate computers and execute user tasks, moving us closer to truly general artificial intelligence. However, current agents face substantial hurdles, including the difficulty of acquiring high-quality training data, the challenge of sparse rewards in long, multi-step tasks, and the high computational costs associated with processing complex visual information from computer screens.

Addressing these critical challenges, researchers have introduced the Self-Evolution Agent (SEA), a novel approach to building more robust and efficient computer use agents. This innovative agent, detailed in the research paper SEA: Self-Evolution Agent with Step-wise Reward for Computer Use, proposes creative methods across data generation, reinforcement learning, and model enhancement to significantly improve performance.

A New Approach to Data Generation

One of the core innovations of SEA is its automatic pipeline for generating verifiable task trajectories. Unlike traditional methods that rely on costly manual annotations and static data, SEA employs a closed-loop system. This system uses a ‘Task Agent’ to generate diverse task instructions and a ‘Code Generation Agent’ to create corresponding Python programs for both executing and verifying these tasks. This ensures that every generated task comes with a clear success criterion and a programmatic way to validate its completion. Furthermore, a method called Generation and Assessment for Trajectory Extraction (GATE) refines this data by performing multiple rounds of inference, selecting the most efficient and successful trajectories, and even filtering out redundant steps to create high-quality training data.

Efficient Step-wise Reinforcement Learning

Training AI agents for long, multi-step computer tasks is notoriously difficult due to ‘sparse rewards’ – meaning the agent only receives feedback after completing an entire task, making it hard to learn from individual actions. SEA tackles this with ‘Trajectory Reasoning by Step-wise Reinforcement Learning’ (TR-SRL). Instead of waiting for the final outcome, TR-SRL provides immediate feedback at each step of a task. It incorporates three types of rewards:

Step Reward: Given for successfully completing an individual step, providing clear, immediate feedback.
Reasoning and Action Consistency Reward: Encourages the agent’s internal thought process to align with its executed actions, promoting coherent behavior.
Action Format Reward: Penalizes actions that don’t conform to the required format, ensuring the agent generates valid and executable commands.

This step-wise training, combined with an efficient reinforcement learning algorithm, significantly reduces computational requirements compared to traditional long-horizon training methods.

Enhancing Generalization and Perception

Beyond planning, a computer use agent also needs strong ‘grounding’ ability – the capacity to accurately locate target elements on a screen. SEA enhances this by first training a dedicated grounding model. This model is then merged with the planning model using a technique that combines their strengths without requiring extensive additional training. To further optimize efficiency, SEA introduces a ‘Temporal Compressed Sensing Mechanism’ (TCSM), which helps the agent focus on the most important visual information from recent screen observations, reducing computational overhead while maintaining critical semantic content.

Also Read:

Impressive Performance with Fewer Parameters

The effectiveness of the SEA agent has been demonstrated on the OSWorld benchmark, a challenging platform for evaluating computer use agents in real-world applications. Despite having only 7 billion parameters, SEA outperforms other models of similar size and achieves performance comparable to much larger models. This highlights the efficiency and robustness of SEA’s innovative data generation, training strategy, and enhancement methods, paving the way for more capable and accessible computer use agents.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Self-Evolution Agent: A New Approach to Computer Control

A New Approach to Data Generation

Efficient Step-wise Reinforcement Learning

Enhancing Generalization and Perception

Impressive Performance with Fewer Parameters

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates