TLDR: The AI-Agent School (AAS) is a multi-agent simulation system that uses large language models (LLMs) to create high-fidelity educational scenarios. It features a “Zero-Exp” strategy with a dual memory system (experience and knowledge, each with short-term and long-term components) that allows AI teacher and student agents to autonomously evolve through interactions. Experiments show that this system effectively simulates complex educational dynamics, fostering advanced agent cognitive abilities and generating realistic behavioral data, moving education towards an “Era of Simulation.”
In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) are increasingly being used to simulate and understand complex human systems. A new research paper introduces an innovative system called the AI-Agent School (AAS), designed to simulate intricate educational dynamics with remarkable fidelity.
The core challenge addressed by this research is the difficulty in systematically modeling the teaching process and the limitations of current AI agents in accurately simulating the diverse behaviors of students and teachers in educational settings. To overcome these hurdles, the AAS system proposes a self-evolving mechanism that allows AI agents to learn and adapt within a simulated school environment.
The Zero-Exp Strategy and Dual Memory System
Central to the AAS is the ‘Zero-Exp’ strategy, which guides agents to evolve from a state of no experience to expert-level behavior. This strategy is built upon a continuous cycle of “experience-reflection-optimization” and is powered by a sophisticated dual memory base. This memory system is divided into two main components: an Experience Base, which stores records of past events and interactions, and a Knowledge Base, containing structured information like academic knowledge or teaching methodologies.
Both the Experience and Knowledge Bases are further organized into short-term and long-term memory components. Short-term memory holds information deemed most relevant for current tasks, mimicking human attention, while long-term memory serves as a comprehensive repository of all accumulated experiences and knowledge. This hierarchical and dual memory structure allows agents to retain vast amounts of information, enabling long-term learning, reflection, and decision-making crucial for their autonomous evolution.
A Simulated School Environment
The AAS environment is a detailed virtual school, inspired by real-world layouts, featuring 25 distinct areas such as classrooms, libraries, laboratories, and sports fields. Within this environment, two main types of interactive roles exist: teacher agents and student agents. These roles are meticulously designed using LLMs to generate rich and diverse backgrounds, personality traits, and specific characteristics.
Agents perform a variety of actions tailored to their roles. Teacher agents engage in teaching practices, reflection, and guidance, while student agents participate in classroom learning, laboratory work, peer interaction, self-directed learning, and extracurricular activities. These actions drive the simulation and generate valuable behavioral data.
How Agents Evolve
The Zero-Exp mechanism ensures that agents continuously improve their behaviors. At each step of the simulation, agents process the current environment and their roles. They retrieve relevant information, prioritizing short-term memory, and then integrate it with their working memory (previous interaction history). The agent’s response and the outcomes of their actions trigger a crucial process of memory update and self-reflection. This means that new insights and optimized strategies are added to their memory bases, and their internal role settings (like teaching methods or study habits) are dynamically updated.
Experimental Validation
To evaluate the effectiveness of the AAS and its Zero-Exp mechanism, researchers conducted extensive experiments using various LLMs as agents, including GPT-4o, Qwen3-235B-A22B, and Qwen3-8B. They designed nine different memory configurations to analyze the impact of the dual memory structure and the short-term/long-term hierarchy.
The dataset for these simulations was created through a multi-step process involving LLM generation and rigorous expert refinement, ensuring realistic initial conditions and high-fidelity interaction sequences. Evaluation was performed using both automated metrics (ROUGE-L scores for text similarity) and human evaluation by educational experts.
The results were compelling. The full model, incorporating both the dual experience/knowledge base and the short-term/long-term hierarchy, consistently achieved the highest ROUGE-L scores. This demonstrated the significant benefits of external memory, the separate organization of experience and knowledge, and the prioritization of salient memories in short-term memory.
Human evaluation further corroborated these findings. Educational experts judged the interactions generated by the full model as significantly more realistic than those from baseline configurations. Over time, the perceived realism of the full model’s agents approached that of expert-curated ground truth data, indicating a strong learning and adaptation curve.
Also Read:
- Unpacking the Architecture of Autonomous LLM Agents
- Dyna-Mind: Teaching AI Agents to Think Ahead Through Experience and Simulation
Pioneering Computational Education Science
This research marks a significant step towards a new paradigm of “Computational Education Science,” integrating traditional educational research with advanced AI technologies. The AAS environment and Zero-Exp mechanism provide a verifiable technical model for developing educational digital twins and generating valuable behavioral data. This work helps propel the education field from the “Era of Experience” to the “Era of Simulation,” offering foundational elements for future educational systems, teacher training platforms, and policy simulation tools.
While promising, the research acknowledges limitations such as the current simulation scale (50 agents over 5 days) and the reliance on LLMs without visual perception capabilities. Future work aims to scale the environment, incorporate multimodal models, and apply the generated data to specific educational applications like personalized learning. You can read the full research paper here.


