TLDR: AGOCS is an open-source, high-fidelity cloud workload simulator that uses real-world traces from Google’s 12.5K-node cluster to accurately model detailed job, task, and node behaviors. It’s designed for desktop use, implemented in Scala, and provides fine-grained resource usage statistics, making it ideal for researchers developing and testing advanced cloud scheduling and load balancing algorithms.
Understanding how cloud systems behave under different workloads is crucial for designing efficient and reliable distributed applications. However, gaining unrestricted access to large-scale cloud environments for testing can be challenging and expensive. This is where simulation frameworks come into play, offering a controlled environment to evaluate performance and strategies.
A new research paper introduces the Accurate Google Cloud Simulator (AGOCS), a novel, high-fidelity framework designed to bring realistic cloud workload simulation to your desktop. Developed by Leszek Sliwko and Vladimir Getov from the University of Westminster, AGOCS aims to fill a gap in existing simulators by providing exceptionally detailed and accurate insights into cloud operations.
The Challenge of Cloud Workload Simulation
Traditional cloud simulators often succeed in representing high-level infrastructure parameters like nodes and tasks. However, they frequently fall short in providing fine-grained system traces, such as specific application memory usage, local and remote disk space, disk I/O, or cycles per instruction. This lack of detail becomes particularly problematic when researching deep system-critical mechanisms like task scheduling or fault handling, where subtle interactions can have significant impacts.
The AGOCS framework addresses this by basing its simulations on real-world workload traces. While artificial workload generators exist, building a high-fidelity one that captures the dynamic and non-cyclical nature of cloud workloads is extremely difficult. Therefore, AGOCS leverages actual data from a Google Cluster with 12,500 nodes, collected over a calendar month in May 2011, to ensure its simulations are as close to reality as possible.
How AGOCS Works: A Glimpse Under the Hood
AGOCS is implemented in Scala, with a strong focus on parallel execution and an easy-to-extend design. The core of its operation revolves around processing a massive amount of real-world data – approximately 191 GB of uncompressed Google Cluster Data (GCD) traces. These traces record every significant event in the cluster, from job submissions and cancellations to changes in node configurations and dynamic resource usage by tasks.
The simulator handles a highly concurrent environment by treating all workload state updates as immutable events, each marked with a precise timestamp. This allows the system to maintain consistency even under heavy load. Key events include adding or removing tasks, updating task resource requirements or constraints, and changes to node configurations (e.g., adding new memory banks or taking nodes offline).
The framework utilizes five independent workers, implemented as Akka Actors, to read and parse these workload traces. A central WorkloadGenerator then collects these events and updates a shared system state object. To manage the vast amount of data without loading it all into memory, AGOCS continuously reads and parses trace files at runtime, maintaining a buffer of upcoming events in fast-access memory to minimize blocking operations.
Practical Applications and Performance
AGOCS was initially developed as part of the Multi-Agent System Balancer (MASB) research project, which aims to design intelligent schedulers for cloud systems. It provides a reliable stream of high-quality workload events to test and fine-tune these schedulers.
One of the interesting features of AGOCS is its ability to be paused at any time, allowing researchers to take snapshots of current task distributions and job states. This enables direct comparison of various scheduling algorithms as they run, offering a more dynamic analysis than just reviewing final statistics.
Despite processing huge datasets, AGOCS is designed to run comfortably on a typical desktop machine. For instance, a month-long simulation can be completed in approximately 9 hours on a MacBook Pro with a 75x speed factor, processing about 21.22 GB of data per hour. The system efficiently utilizes available CPU cores, with its primary bottleneck being the speed at which it can read workload traces.
AGOCS vs. Other Simulators
The paper compares AGOCS with other popular cloud simulation frameworks like CloudAnalyst, GreenCloud, and particularly CloudSim. While these tools offer broad simulation capabilities, AGOCS distinguishes itself with a focused goal: simulating the Google Cloud cell environment with very fine-grained details. It provides parameters such as CPU cores (requested/used), canonical and assigned memory, disk I/O time, cycles per instruction, and memory access per instruction, which are often missing in other simulators.
Unlike CloudSim, which is memory-driven and single-threaded, AGOCS is designed for multi-threading, leveraging all available CPU cores. While CloudSim might perform better with smaller datasets, AGOCS shows a less rapid increase in computation time for more complex scenarios, making it more scalable for large-scale, detailed simulations.
Also Read:
- Understanding the System Demands of Reinforcement Learning for Large Language Models
- Understanding Why AI Agent Systems Fail: A Deep Dive into Root Causes
Looking Ahead
AGOCS is an open-source project, available on GitHub, and is continuously being developed. Future plans include a visualization module for cloud node workloads, implementation of parsers for alternative workload trace formats, and the ability to create and load task distribution and workload state snapshots for debugging and analysis.
For researchers and developers working on advanced cloud scheduling, load balancing, or other system-critical mechanisms, AGOCS offers a powerful tool to simulate complex cloud environments with unprecedented accuracy and detail. You can find more technical details in the full research paper: AGOCS – Accurate Google Cloud Simulator Framework.


