spot_img
HomeResearch & DevelopmentThe Remote Labor Index: A New Measure for AI...

The Remote Labor Index: A New Measure for AI Automation

TLDR: The Remote Labor Index (RLI) is a new benchmark of 240 real-world, economically valuable remote work projects designed to measure AI automation. Sourced from freelance platforms, RLI projects are complex and diverse. Current frontier AI agents achieve a low automation rate of 2.5%, indicating they are far from autonomously performing most remote labor. However, relative performance scores show steady improvement among models. The RLI provides an empirical basis for tracking AI’s impact on the workforce.

A new study introduces the Remote Labor Index (RLI), a groundbreaking benchmark designed to empirically measure how well AI can automate real-world remote work. This index aims to provide a clear, standardized way to track AI’s impact on the workforce, moving beyond theoretical benchmarks to evaluate AI agents on economically valuable projects.

The RLI is unique because it comprises 240 complete projects sourced directly from online freelance platforms. These aren’t simplified tasks; they represent actual work performed by human professionals, complete with original project briefs and gold-standard human deliverables. This approach ensures the benchmark is grounded in real economic transactions and captures the true diversity and complexity of the remote labor market, including areas like game development, product design, architecture, and data analysis.

To create the RLI, researchers engaged with 358 experienced freelancers, collecting 550 initial projects. These projects underwent a rigorous cleaning and filtering process to ensure they were self-contained, reproducible, and met specific criteria, such as not requiring physical labor or client interaction. The final dataset spans 23 categories of work from the Upwork taxonomy and involves a wide variety of file formats, making it far more diverse than previous AI benchmarks.

The projects within the RLI are also significantly more complex, with human professionals spending an average of 28.9 hours and a median of 11.5 hours to complete them. The average cost of these projects was $632.6, with a median of $200, totaling over 6,000 hours of work valued at more than $140,000 across the dataset. This demonstrates the substantial economic value and difficulty captured by the RLI.

Researchers evaluated several leading AI agents, including ChatGPT agent, GPT-5, Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, and Manus, on the RLI. The results revealed that current AI agents perform near the floor, with the highest-performing agent achieving an automation rate of only 2.5%. This means that less than 3% of the projects were completed by AI at a quality level comparable to or exceeding human work, indicating a significant gap between current AI capabilities and the demands of real-world remote labor.

Despite the low absolute automation rates, the study also used an Elo-based scoring system to measure the relative performance of different AI agents. This metric showed that models are steadily improving, with newer frontier models generally achieving higher scores than older ones. This suggests that while full automation is still distant, AI capabilities are progressing, and the RLI is sensitive enough to track these granular shifts.

Qualitative analysis of AI failures highlighted common issues such as technical and file integrity problems (corrupt or empty files), incomplete or malformed deliverables, poor overall quality, and inconsistencies across generated files. Successful AI deliverables, though few, were predominantly in creative projects like audio and image generation, as well as writing and data retrieval tasks, where current AI strengths are more pronounced.

Also Read:

The RLI provides an essential empirical foundation for understanding AI automation. It moves beyond specialized skill evaluations to assess end-to-end project completion in economically valuable contexts. This benchmark will be crucial for researchers, policymakers, and the public to monitor AI’s evolving capabilities and proactively address its potential impacts on the future of work. You can find more details about this research in the full paper: Remote Labor Index: Measuring AI Automation of Remote Work.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -