TLDR: A study explored how Multi-Agent Systems (MAS) embedded with the Knowledge–Learning–Instruction (KLI) framework can create higher-quality K-12 learning activities compared to single-agent AI. While quantitative scores showed minor differences, teachers overwhelmingly preferred the collaborative MAS-CMD system for its creativity, real-world relevance, and completeness, highlighting the importance of pedagogical principles in AI design for education.
Large Language Models (LLMs) are becoming increasingly popular among K-12 educators for creating instructional materials. However, these powerful AI tools often fall short in providing high-quality teaching support. This is largely due to two main reasons: commercial LLMs typically lack deep pedagogical theory, and most teachers don’t have the time or expertise for sophisticated prompt engineering to bridge this gap.
A recent study, titled Enabling Multi-Agent Systems as Learning Designers: Applying Learning Sciences to AI Instructional Design, introduces an innovative approach to overcome this challenge. Instead of relying on teachers to encode pedagogical nuance into their prompts, this research embeds the well-established Knowledge–Learning–Instruction (KLI) framework directly into a Multi-Agent System (MAS). This system acts as a sophisticated instructional designer, aiming to produce more effective and engaging learning activities.
Understanding the KLI Framework and Multi-Agent Systems
The Knowledge–Learning–Instruction (KLI) framework is a theory-driven approach to instructional design. It emphasizes the alignment of three core elements: Knowledge Components (what students need to learn), Learning Processes (how learning occurs), and Instructional Principles (specific methods to facilitate learning). By integrating KLI, the AI system is grounded in evidence-based educational theory.
Multi-Agent Systems (MAS) are particularly well-suited for this task because they allow for distributed decision-making and problem-solving. This means different AI agents can assume specialized pedagogical roles and work together to complete complex instructional design tasks, much like a team of human educators.
Three Systems for Generating Learning Activities
The researchers designed and tested three distinct systems for generating secondary Math and Science learning activities:
-
Single-Agent System (SAS): This served as a baseline, simulating typical, simple prompts a teacher might use without specialized AI knowledge. It’s designed to represent a basic, non-expert interaction with an LLM.
-
Role-Based Multi-Agent System (MAS-Roles): This system operationalizes the KLI framework through a sequential pipeline. Specialized agents, such as a KC Agent, Learning Process Agent, Instructional Principle Agent, Design Agent, and Feedback Agent, work in a structured order, with the output of one agent feeding into the next.
-
Multi-Agent System with Conquer and Merge Discussion (MAS-CMD): This is a more dynamic and collaborative architecture. It instantiates three distinct agents, each given a different “teacher persona” (e.g., Behaviorist, Constructivist). These agents independently draft learning activities, then engage in a structured, multi-turn dialogue to provide feedback and revise their drafts. A final decision agent then selects the best activity.
Evaluation by Teachers and AI
The generated learning materials were evaluated using a mixed-methods approach. Twenty practicing secondary Math and Science teachers provided expert human evaluation, using an adapted Quality Matters (QM) K-12 rubric. Additionally, a complementary LLM-as-a-judge system was employed, using both the QM rubric and a more detailed Integrated Learning Sciences Evaluation Rubric.
Key Findings: Teachers Prefer Collaboration and Creativity
While quantitative rubric scores showed only small, often statistically insignificant differences between the systems, the qualitative feedback from educators painted a clear and compelling picture. Teachers strongly preferred the activities generated by the collaborative MAS-CMD system.
Educators described the MAS-CMD outputs as significantly more creative, contextually relevant, and classroom-ready. They praised its ability to create “fantastic” and “so creative” ideas, often incorporating strong real-world contexts like an “urban planning theme” to make geometry meaningful. Teachers also valued the comprehensive packages provided by MAS-CMD, including worksheets, exit tickets, and even teacher dialogue, which were deemed “Super helpful” and saved them time.
In contrast, the SAS system, while producing relevant content, often lacked polish, completeness, and innovation. The MAS-Roles system was seen as an improvement but sometimes felt generic or unresponsive to specific requests.
The study also highlighted a trade-off between quality and computational cost. The MAS-CMD system, which produced the highest-quality outputs according to teachers, was also the most computationally expensive in terms of time, tokens used, and requests made.
Also Read:
- RoboBuddy: Empowering Teachers with LLM-Powered Social Robots for Engaging Classroom Learning
- Navigating AI’s Moral Compass: A Call for Dynamic Value Alignment
Implications for AI in Education
This research suggests that simply using LLMs for content generation isn’t enough. Embedding learning science principles directly into AI architectures, particularly through collaborative multi-agent systems, offers a scalable path for creating high-quality educational content. The study underscores that the architecture of these multi-agent systems matters; a collaborative, discussion-based model like MAS-CMD can outperform sequential ones in producing innovative and coherent designs.
The divergence between quantitative and qualitative results is also a significant finding. It suggests that standardized rubrics may not fully capture the nuanced, multifaceted nature of what experienced educators perceive as “quality” in a lesson plan. The rich qualitative feedback provided insights into what teachers truly value: creativity, deep contextual relevance, and practical utility that reduces their workload.
Ultimately, this study points towards a promising future for AI in education, where LLM tools are not just content generators but pedagogically intelligent partners by design, helping to bridge the “prompting gap” and empower educators with innovative, classroom-ready materials.


