TLDR: Team Atlanta’s ATLANTIS system won the DARPA AI Cyber Challenge by integrating large language models (LLMs) with traditional security techniques like fuzzing and symbolic execution to autonomously discover and patch software vulnerabilities. The system features specialized modules for C, Java, and multi-language vulnerability detection, automated patch generation, and SARIF report assessment. ATLANTIS successfully identified numerous vulnerabilities, including 0-day bugs, demonstrating the significant potential of AI in enhancing cybersecurity.
A groundbreaking system named ATLANTIS, developed by Team Atlanta, has emerged victorious in the DARPA AI Cyber Challenge (AIxCC), marking a significant leap forward in autonomous cybersecurity. This innovative system combines the power of large language models (LLMs) with established security techniques to automatically discover and patch software vulnerabilities at an unprecedented scale and speed.
The AIxCC, a two-year competition backed by substantial prizes and partnerships with leading AI companies, challenged teams to build Cyber Reasoning Systems (CRSs) capable of securing real-world open-source software. Team Atlanta, a collaboration of researchers from Georgia Institute of Technology, Samsung Research, KAIST, and POSTECH, successfully demonstrated ATLANTIS’s capabilities at DEF CON 33 in August 2025.
The ATLANTIS System: A Hybrid Approach to Cybersecurity
At its core, ATLANTIS is designed to orchestrate state-of-the-art vulnerability discovery methods, including symbolic execution, directed fuzzing, and static analysis. What sets it apart is the deep integration of LLMs, which helps overcome the traditional limitations of autonomous vulnerability discovery and patching. The system is built to handle multiple challenge projects concurrently, ensuring reliability and maximizing the utilization of allocated computing and LLM resources.
The architecture of ATLANTIS is modular, comprising several specialized components:
-
ATLANTIS-C: This module focuses on C and C++ projects, employing a multi-fuzzer ensemble approach. It combines various fuzzing engines like LibAFL, AFL++, and libFuzzer, augmented with LLM-driven seed generation and time-based task scheduling to efficiently find bugs.
-
ATLANTIS-Java: Tailored for Java vulnerabilities, this subsystem adopts a sinkpoint-centered approach. It identifies security-sensitive API calls (sinkpoints) and uses a combination of static analysis, dynamic testing, and LLMs to both explore paths to these sinks and exploit them effectively. Key components include DEEPGENERATOR for high-throughput seed generation and a Concolic Executor for deep path exploration.
-
ATLANTIS-Multilang: As the primary vulnerability discovery engine, ATLANTIS-Multilang supports fuzzing across multiple languages. It features various input generation modules with different levels of LLM dependence, from traditional fuzzers to highly LLM-intensive agents like the Multi-Language LLM Agent (MLLA). MLLA uses LLMs to derive tainted call graphs, identify bug candidates, and generate targeted inputs and scripts for exploitation.
-
ATLANTIS-Patching: This crucial subsystem automatically generates security patches. It uses an ensemble of AI agents, each with distinct specializations and LLM models, to create and refine patches. A unified framework called CRETE supports efficient agent development and patch validation. A notable innovation is the use of custom LLMs trained for code context learning, allowing the system to intelligently retrieve relevant code snippets for more accurate patch generation.
-
ATLANTIS-SARIF: This module is responsible for assessing Static Analysis Results Interchange Format (SARIF) reports broadcast during the competition. It uses reachability analysis and LLM-based validation to determine the correctness of reported vulnerabilities, prioritizing precision to avoid penalties for false assessments.
Also Read:
- Automating Software Vulnerability Discovery with AI-Powered Fuzzing
- xOffense: An AI Framework for Autonomous Penetration Testing
Overcoming Challenges with AI
The integration of LLMs presented several challenges, including context window limitations, high API costs, and the need for precise validation. ATLANTIS addresses these through specialized agent designs, multi-turn interactive retrieval, and reinforcement learning to optimize retrieval policies. For instance, the custom LLMs in ATLANTIS-Patching learn to identify and retrieve only the most relevant code contexts, significantly reducing token usage and improving patch quality.
Team Atlanta’s success is not just theoretical; the ATLANTIS system discovered 43 out of 70 total vulnerabilities in the final competition, including several 0-day bugs in widely used software like SQLite3 and Apache Commons Compress. These discoveries highlight the system’s ability to uncover previously unknown security flaws, demonstrating the practical impact of AI in enhancing national cybersecurity.
The team’s journey from AI skeptics to strong advocates underscores the transformative potential of LLMs in traditional security tasks. As AI models continue to advance, systems like ATLANTIS are poised to revolutionize how vulnerabilities are discovered and patched, offering a glimpse into a future where software is secured autonomously. For more details, the complete system and its benchmarks are available as open source, and further insights will be shared through blog postings and academic publications. You can find the full research paper here: ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System.


