WEBDART: A New Approach for LLM Agents to Master Complex Web Tasks

TLDR: WEBDART is a new framework that significantly improves how large language model (LLM) agents handle complex web tasks. It achieves this by breaking down difficult objectives into three manageable subtasks—navigation, information extraction, and execution—and continuously adapting its plan as new information appears on webpages. This dynamic decomposition and re-planning strategy helps LLM agents avoid being overwhelmed, leading to higher success rates and more efficient task completion on challenging benchmarks.

Large Language Models (LLMs) have shown impressive capabilities in automating simple web tasks, like filling out forms or navigating to a specific product page. However, when faced with more intricate objectives—those requiring extensive navigation, extracting large amounts of information, or reasoning under specific constraints—these agents often struggle. This challenge, often termed “cognitive overload,” is where human experts naturally excel by breaking down complex problems into simpler, sequential steps.

A new research paper introduces WEBDART, a novel framework designed to empower LLM agents to tackle these complex web chores with greater success. WEBDART stands for “Decomposition & Adaptive Re-planning for Tasks,” and its core innovation lies in two key areas: dynamic task decomposition and continuous re-planning.

Breaking Down the Challenge

Unlike traditional LLM agents that attempt to handle all aspects of a complex task simultaneously, WEBDART dynamically breaks down each objective into three distinct and focused subtasks:

Navigation: This involves browsing through multiple web pages to locate all potential sources of information relevant to the task.

Information Extraction: Once the relevant pages are identified, a dedicated module extracts the necessary content and converts it into a structured, standardized format.

Execution: The extracted and structured data is then analyzed to meet the task’s specific constraints, which might involve filtering, sorting, aggregation, or even generating Python code to perform calculations.

This modular approach allows the LLM agent to concentrate on one specific skill at a time, significantly reducing the cognitive burden and making complex objectives more manageable. For instance, instead of trying to navigate, filter by price, and rank products all at once, WEBDART first focuses on gathering all product information, then extracts it, and finally applies the filtering and ranking logic.

Adapting on the Fly: Dynamic Re-planning

An initial plan, based solely on the task description, might not always be optimal. As an agent explores a website, it might discover new elements like price filters, sorting options, or shortcuts that were not apparent at the outset. WEBDART addresses this with its dynamic re-planning mechanism. After each navigation step, the agent continuously re-evaluates and revises its plan based on newly observed webpages. This adaptive adjustment helps correct any initial missteps, takes advantage of newly discovered efficiencies, and avoids redundant exploration, leading to more efficient and robust task completion.

How WEBDART Works in Practice

The framework operates sequentially. First, the task is decomposed. Then, the navigation module guides the agent through the website, recording all observed pages. During this phase, dynamic re-planning can update the navigation goals if new opportunities arise. Once navigation is complete, the information extraction module selects the most relevant pages from the browsing history and extracts specific data fields into a structured format, like JSONL. Finally, the execution module processes this structured data. For data analysis tasks, it can generate and run Python code, even incorporating a self-reflection loop to correct errors. For action-oriented tasks, it can perform short-horizon navigation to post or submit information.

Also Read:

Impressive Results on Complex Web Tasks

WEBDART was rigorously evaluated on WebChoreArena, a benchmark specifically designed for higher-complexity web tasks, and WebArena, which focuses on simpler navigation-oriented objectives. The results were compelling:

On WebChoreArena, WEBDART consistently outperformed state-of-the-art agents across various LLM backbones (GPT-5, GPT-4o, and GLM-4.5-air-fp8), achieving up to a 13.7 percentage point increase in end-to-end success rates.

The dynamic re-planning module proved highly effective, reducing the average number of navigation steps by up to 14.7 while simultaneously improving accuracy in several domains.

Crucially, WEBDART maintained competitive performance on the easier WebArena suite, demonstrating its versatility and robustness across different task complexities.

Case studies further illustrated WEBDART’s ability to adapt. For example, an agent initially planning to visit every page in a product category could revise its plan to use a “products displayed per page” menu, drastically cutting down navigation steps. Another case showed the agent correcting a flawed initial decomposition by directly extracting user submissions from a profile page rather than traversing every forum alphabetically.

In conclusion, WEBDART offers a significant advancement in the field of LLM-powered web agents. By explicitly decoupling subtasks and incorporating dynamic re-planning, it enables agents to handle complex web tasks with unprecedented efficiency and accuracy, paving the way for more capable and robust web automation tools. You can find the full research paper here: WEBDART: Dynamic Decomposition and Re-planning for Complex Web Tasks.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

WEBDART: A New Approach for LLM Agents to Master Complex Web Tasks

Breaking Down the Challenge

Adapting on the Fly: Dynamic Re-planning

How WEBDART Works in Practice

Impressive Results on Complex Web Tasks

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates