TLDR: PHYSICSMINIONS is a novel coevolutionary multi-agent AI system designed to solve complex physics problems found in Physics Olympiads. It features three studios—Visual, Logic, and Review—that work together in an iterative feedback loop. This system has enabled open-source AI models to achieve gold medal performance in major Olympiads like IPhO for the first time, significantly outperforming single-model approaches and even ranking among top human experts.
Physics Olympiads are widely recognized as the ultimate test of physical intelligence, demanding not only deep understanding of physics principles but also complex reasoning and the ability to interpret various forms of information, including diagrams and plots. For a long time, artificial intelligence (AI) systems have struggled to achieve gold-medal-level performance in these challenging competitions, with most existing approaches relying on single models that fall short of the required multimodal understanding.
Addressing this significant gap, a groundbreaking new system called PHYSICSMINIONS has been introduced. This coevolutionary multi-agent system is specifically designed to tackle the intricacies of Physics Olympiads, pushing the boundaries of what AI can achieve in competitive physics problem-solving.
How PHYSICSMINIONS Works: A Collaborative Approach
PHYSICSMINIONS operates through a unique architecture comprising three synergistic studios, each with specialized agents working in harmony:
1. Visual Studio: This studio is responsible for interpreting diagrams, plots, and other visual inputs. It transforms raw visual information into a structured, unambiguous format, such as a JSON description. This process involves an Inspector that identifies image types and extracts details, an Introspector that refines this information for consistency, and a Verifier that checks for errors. By converting visual data into a structured representation, the Visual Studio significantly reduces ambiguity and ensures accurate input for the subsequent stages.
2. Logic Studio: Once the visual information is processed, the Logic Studio takes over to formulate solutions. It consists of a Solver, which generates an initial solution, and an Introspector, which iteratively refines this solution. The solutions are presented in a structured format, including a summary and a detailed, step-by-step analysis with equations written in TeX. This structured approach makes the reasoning explicit and errors traceable, facilitating targeted improvements.
3. Review Studio: This is where the crucial dual-stage verification takes place. The Review Studio employs two types of verifiers: a Physics-Verifier and a General-Verifier. The Physics-Verifier conducts domain-specific checks, ensuring physical consistency, correct units, and appropriate use of constants. If this stage passes, the General-Verifier then performs more detailed inspections of logical consistency, reasoning flow, and numerical calculations. If any issues are found, a detailed bug report is sent back to the Logic Studio, guiding the Introspector to revise the solution. This iterative feedback loop, where solutions are continuously critiqued and refined, is at the heart of the system’s coevolutionary process.
Historic Breakthroughs and Human-Level Performance
Evaluated on the HiPhO benchmark, which includes seven of the latest physics Olympiads, PHYSICSMINIONS has delivered remarkable results:
- Strong Generalization: The system consistently improves the performance of both open-source and closed-source AI models of different sizes, demonstrating clear benefits over their single-model baselines.
- Historic Medal Achievements: Open-source models, which previously secured only 1-2 gold medals, now achieve 6 gold medals across 7 Olympiads with PHYSICSMINIONS. Notably, this marks the first time an open-source model has achieved a gold medal in the latest International Physics Olympiad (IPhO) under the average-score metric.
- Scaling to Human Expert Levels: The system further advanced the open-source Pass@32 score to an impressive 26.8 out of 30 points on the latest IPhO. This places the AI system 4th among 406 contestants, far surpassing the top single-model score and outperforming 99% of human participants.
These achievements highlight PHYSICSMINIONS’s potential as a generalizable framework for Olympiad-level problem-solving, with the capacity to extend across various scientific disciplines.
Also Read:
- SciExplorer: An AI Agent for Autonomous Discovery in Physics
- CLPO: A Self-Evolving Learning Approach for Enhanced LLM Reasoning
Overcoming Limitations and Future Directions
While PHYSICSMINIONS represents a significant leap forward, challenges remain, particularly in precise data extraction from complex visual charts. The Visual Studio, despite its advancements, can still misinterpret some fine-grained details. Future work aims to enhance visual understanding, expand the use of external tools and domain-specific verifiers, and apply this coevolutionary paradigm to other challenging domains beyond physics.
The development of PHYSICSMINIONS signifies a major step towards creating AI systems that can not only understand complex scientific problems but also reason, verify, and refine solutions at a level comparable to, and in some cases exceeding, human experts. For more details, you can read the full research paper here.


