TLDR: EvoGraph is a new framework that allows software systems to automatically evolve and maintain their own code, documentation, and build processes. It uses a graph representation of all software artifacts, applies changes guided by specialized small language models (SLMs), and selects improvements based on performance, security, and other goals. This approach significantly reduces modernization costs and time, addressing common challenges in updating legacy systems and paving the way for continuously adapting software.
In the world of enterprise software, maintaining and updating vast amounts of legacy code is a monumental challenge. Many companies operate with millions of lines of older code that struggle to keep pace with modern business demands. Traditional modernization efforts often fail, with estimates suggesting over 70% do not succeed, largely because automated tools miss hidden business logic, performance requirements, or critical external connections. This often leads to a reliance on manual integration, which is slow and prone to errors.
Addressing these persistent issues, researchers Igor Costa and Christopher Baran from AutoHand AI have introduced a groundbreaking framework called EvoGraph. This innovative system enables software to evolve its own source code, build processes, documentation, and even issue tickets automatically. EvoGraph represents every part of a software system as a typed directed graph, allowing it to understand and manage the complex relationships between different components. You can learn more about this work by reading the full research paper here.
How EvoGraph Works
At its core, EvoGraph operates by applying learned “mutation operators” to this software graph. These mutations, which are essentially changes to the code, documentation, or build configurations, are guided by specialized small language models (SLMs). Unlike large language models (LLMs) that are general-purpose, SLMs are fine-tuned for specific tasks and languages, making them highly efficient and effective for specialized jobs like legacy modernization.
After applying mutations, EvoGraph selects the best changes based on a multi-objective fitness function. This means it evaluates potential changes against several critical criteria, including user task success, system performance (like reducing latency), security scores, business key performance indicators (KPIs), documentation freshness, and build reproducibility. This rigorous selection process ensures that only safe and beneficial changes are integrated into the system.
Key Advantages and Results
EvoGraph has demonstrated impressive capabilities across various benchmarks. For instance, it successfully fixed 83% of known security vulnerabilities in one benchmark. It also achieved 93% functional equivalence when translating COBOL code to Java, a notoriously difficult task. Furthermore, it maintained documentation freshness within two minutes of code changes, a significant improvement over manual processes.
Beyond these specific achievements, EvoGraph showed a 40% reduction in system latency and a remarkable sevenfold drop in feature lead time (the time it takes to go from a business request to a deployed feature) compared to strong existing methods. This highlights its potential to dramatically accelerate software development and maintenance cycles.
The Power of Small Language Models (SLMs)
A key innovation within EvoGraph is its extension, `evoGraph`, which specifically leverages small language models. The researchers emphasize that SLMs are not only powerful enough for specialized tasks in agentic systems but are also inherently more economical and operationally suitable than their larger counterparts. For example, SLMs like Phi-2 (2.7 billion parameters) and Nemotron-H (4.8 billion parameters) offer comparable performance to much larger models at a fraction of the computational cost.
This efficiency translates into significant cost savings. The `evoGraph` approach reduces GPU hours by 90% and energy consumption by a similar margin compared to methods using large language models like GPT-4. This makes large-scale legacy modernization projects economically viable even for organizations with limited resources. The smaller memory footprint of SLMs also allows for parallel processing of multiple code segments, further speeding up modernization efforts.
The `evoGraph` framework includes specialized SLMs for a range of legacy languages, including .NET, Lisp, CGI, ColdFusion, legacy Python, and C. These language-specific models are trained on relevant data, allowing them to capture domain-specific patterns and idioms that generalist models might miss. This specialization resulted in high semantic equivalence (82-96%) across different language translations.
Also Read:
- Polymath: An AI Agent That Learns and Adapts Its Own Problem-Solving Strategies
- TRAJ EVO: Crafting Intelligent Movement Prediction Rules with AI Evolution
Addressing Modernization Challenges
EvoGraph’s design directly tackles common reasons why modernization programs fail. It addresses implicit contracts by mining invariants from dynamic traces and legacy comments, blocking unsafe rollouts if these invariants are violated. It preserves performance by including latency and reproducible build metrics in its fitness evaluation. And it manages integration evolution by tracking cross-system dependencies, ensuring that mutations propagate correctly across different parts of the software ecosystem.
In essence, EvoGraph offers a practical pathway toward “Software 3.0,” where systems can continuously adapt and evolve while remaining under measurable control. This framework represents a significant step forward in automating the complex and often costly process of maintaining and modernizing enterprise software.


