TLDR: A new research paper introduces A1, an AI agent system that transforms Large Language Models (LLMs) into end-to-end exploit generators for smart contracts. A1 uses six specialized tools and execution feedback to autonomously discover, validate, and monetize real-world DeFi vulnerabilities. It achieved a 62.96% success rate on a benchmark dataset, recovering $9.33 million in value, and demonstrated advanced reasoning capabilities beyond traditional fuzzing tools. However, the study also highlights a significant economic asymmetry, where attackers can profit from vulnerabilities at values 10 times lower than what defenders require to break even, raising concerns about the future balance of offensive and defensive AI capabilities in blockchain security.
Smart contracts, the self-executing programs powering Decentralized Finance (DeFi), manage vast sums of digital assets, with over $111 billion USD currently locked. However, their autonomy and direct control over value make them prime targets for attackers, leading to staggering financial losses exceeding $11.59 billion USD. Traditional security practices, relying on manual reviews and automated tools, struggle with the escalating complexity and volume of contracts, as well as high false positive rates and an inability to confirm actual exploitability.
Introducing A1: An AI Agent for Smart Contract Exploit Generation
A new research paper, “AI Agent Smart Contract Exploit Generation”, introduces A1, an innovative agentic execution-driven system designed to transform any Large Language Model (LLM) into an end-to-end exploit generator for smart contracts. Unlike traditional fuzzers that depend on rigid, hand-crafted heuristics, A1 operates without such constraints, offering a flexible and adaptable approach to vulnerability discovery. It combines human-like reasoning with machine-scale speed and cost, acting as an always-on auditor.
A1 is equipped with six domain-specific tools that enable autonomous vulnerability discovery. These tools allow the agent to understand smart contract behavior, generate exploit strategies, test them on blockchain states, and refine its approaches based on execution feedback. Crucially, all outputs are concretely validated to eliminate false positives, ensuring that only profitable Proof-of-Concepts (PoCs) are reported.
How A1 Works
The A1 system operates by providing an LLM agent with a suite of specialized tools. These include a source code fetcher to resolve proxy contracts, a constructor parameter tool to extract initialization details, a state reader to query contract functions, and a code sanitizer to remove non-essential elements. For validation, A1 uses a concrete execution tool built on Forge, which simulates blockchain forks to test exploit strategies against authentic on-chain states. A revenue normalizer tool then converts extracted tokens into native currency, ensuring economic validation of vulnerabilities.
The LLM agent autonomously decides how to leverage these tools, acting as a security analyst. It processes contract context, generates hypotheses, and refines its attack strategies based on real-time execution feedback, including profitability indicators, detailed transaction traces, and revert reasons. This iterative refinement process allows A1 to evolve its understanding of contract behavior and potential attack vectors.
Performance and Impact
The evaluation of A1 across 36 real-world vulnerable contracts on Ethereum and Binance Smart Chain (BSC) demonstrated a 62.96% success rate on the VERITE benchmark. Beyond this dataset, A1 identified 9 additional vulnerable contracts, with 5 cases occurring after the strongest model’s training cutoff date, showcasing its generalization capabilities. Across all 26 successful cases, A1 extracted up to $8.59 million USD per case, totaling $9.33 million USD.
The research analyzed 432 experiments across six different LLMs, revealing diminishing returns with average marginal gains for iterations 2-5. Per-experiment costs ranged from $0.01 to $3.59. While premium models like OpenAI’s o3-pro and o3 achieved higher success rates (88.5% and 73.1% respectively), more economical models also demonstrated notable success at lower costs.
A1 also showed unique advantages over traditional fuzzing tools. For instance, in the SGETH incident, A1 naturally reasoned about the need for multi-actor collaboration to exploit a privilege management vulnerability. In the GAME incident, it demonstrated strategic contract composition by deploying a helper contract and orchestrating a precise sequence to exploit a reentrancy flaw. These capabilities highlight LLMs’ strength in complex reasoning and strategy composition, complementing the systematic state space exploration of fuzzers.
Economic Viability and Asymmetry
The paper introduces an economic feasibility framework to assess A1’s viability for continuous security monitoring. The analysis reveals a troubling asymmetry: at a 0.1% vulnerability incidence rate, attackers can achieve profitability with a $6,000 exploit value, while defenders require a $60,000 exploit value to break even. This 10x difference in required exploit value for profitability suggests that even with symmetric technological capabilities, the current economic structure of bug bounties versus direct exploitation inherently favors attackers.
This imbalance raises fundamental questions about whether AI agents inevitably favor exploitation over defense. To achieve equilibrium, either bug bounty payouts would need to significantly increase, or defensive scanning costs would need to decrease substantially relative to offensive costs. The findings emphasize that A1’s utility is maximized when integrated into continuous monitoring systems that can initiate analysis with minimal delay, as longer detection delays significantly reduce success probabilities.
Also Read:
- Unmasking AI Agent Risks: A New Framework for Real-World Safety Evaluation
- CA VGAN: A Unified Approach to Securing Large Language Models Against Jailbreak Attacks
Limitations and Future Outlook
The study acknowledges several limitations, including its focus on 36 real incidents (a small fraction of total DeFi hacks), reliance on proprietary models, and simplified assumptions regarding infrastructure costs and exploit caps. A1 currently supports only EVM-compatible contracts with verified source code, leaving complex proxies or non-EVM rollups out of scope. Despite these limitations, A1 represents a significant step forward, demonstrating that agentic LLMs open a new design space in smart contract security, complementing existing tools and potentially transforming the landscape of DeFi auditing.


