MapAgent: Enhancing Mobile Task Automation with Experience-Driven AI Planning

TLDR: MapAgent is a novel LLM-based agent framework designed to automate complex tasks on mobile devices. It addresses the limitations of current LLM agents, such as lack of real-world app knowledge and hallucinations, by leveraging a memory system constructed from historical task execution trajectories. The framework uses a trajectory-based memory mechanism to store structured page information, a coarse-to-fine planning approach augmented by retrieving relevant memory pages, and a dual-LLM architecture task executor for robust action generation and progress monitoring. Experiments show MapAgent achieves superior performance and efficiency in real-world mobile scenarios.

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) have shown immense promise in automating tasks on mobile devices through their ability to interact with graphical user interfaces (GUIs). However, these advanced agents often hit roadblocks when faced with the complexities of real-world mobile applications. A significant challenge arises from LLMs’ inherent lack of practical knowledge about these apps, leading to inefficient task planning and sometimes even generating incorrect or ‘hallucinated’ actions.

Addressing these critical limitations, a new framework called MapAgent has been introduced. This innovative LLM-based agent leverages a unique memory system built from past task execution experiences, known as historical trajectories, to significantly enhance its current task planning capabilities. Imagine an agent that learns from its past interactions, much like a human user would, and applies that learned knowledge to new, similar tasks.

MapAgent operates through three core components that work in harmony. First, it features a

Trajectory-based Memory Mechanism

. This mechanism is inspired by how humans remember information during device operation. It takes the agent’s past task execution paths – sequences of actions and the pages encountered – and condenses them into a structured ‘page-memory database’. Each ‘page’ within a trajectory is captured as a concise yet comprehensive snapshot, detailing both its visual layout (UI) and its functional purpose. This ensures that crucial information from previous interactions is retained and organized for future use.

Secondly, MapAgent employs a sophisticated

Memory-Augmented Task Planning

approach, which operates in a coarse-to-fine manner. When given a new task, the system first generates a broad plan, breaking the task down into general subtasks. Then, it intelligently retrieves relevant ‘pages’ from its memory database based on how similar they are to the current subtasks. This retrieved information is then fed into the LLM planner, providing it with valuable context and compensating for any gaps in its understanding of real-world app scenarios. This process leads to more informed and context-aware task planning, preventing the agent from making common mistakes or getting stuck.

Finally, the planned tasks are brought to life by the Also Read:

Task Executor

, which is powered by a dual-LLM architecture. This executor is designed to translate the refined plans into concrete, executable actions on the mobile device. It consists of two collaborating LLM roles: a ‘Decision-maker’ that proposes actions and a ‘Judge’ that evaluates the progress and success of each action. This collaborative approach, combined with a short-term memory unit that tracks the current task’s historical responses, ensures effective tracking of task progress and allows the agent to adapt to the dynamic nature of mobile environments. This dual-LLM setup helps the agent detect and correct errors, making it more robust in unpredictable situations.

Extensive experiments conducted in real-world scenarios, across both English and Chinese mobile applications, have demonstrated MapAgent’s superior performance compared to existing methods. It has shown notable improvements in handling complex cross-app tasks, where it efficiently breaks down large tasks into manageable subtasks while maintaining context. Furthermore, the framework exhibits a reasonable balance between computational efficiency and high success rates, proving its practicality for real-world deployment. The research paper detailing this framework can be found at arXiv:2507.21953.

MapAgent represents a significant step forward in mobile task automation, offering a robust and intelligent solution that learns from experience, plans with enhanced context, and executes tasks with greater reliability. Its ability to bridge the knowledge gap between LLMs and real-world mobile applications opens up new possibilities for AI-assisted smartphone development and user interaction.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MapAgent: Enhancing Mobile Task Automation with Experience-Driven AI Planning

Trajectory-based Memory Mechanism

Memory-Augmented Task Planning

Task Executor

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates