spot_img
HomeResearch & DevelopmentAI's Growing Prowess in Smart Contract Exploits: Understanding the...

AI’s Growing Prowess in Smart Contract Exploits: Understanding the Threat and Defense

TLDR: A new research paper introduces REX, a framework that uses Large Language Models (LLMs) to automatically generate and validate exploits for vulnerable smart contracts. The study found that LLMs, particularly Gemini 2.5 Pro and GPT-4.1, can reliably create functional exploits with high success rates, driven by their internal reasoning capabilities rather than contract complexity. The paper also explores various defense strategies, showing that combined measures can significantly reduce exploit success, though some vulnerability types remain challenging.

Smart contracts, the self-executing agreements on blockchain, have revolutionized various industries. However, their immutable nature means that even a small vulnerability can lead to permanent and substantial financial losses. A notable example is the February 2025 exploit on Bybit’s Safe multi-signature wallet, which resulted in a staggering 1.5 billion US dollars being drained.

While traditional vulnerability detection tools like Slither and Mythril exist, they often struggle with accuracy and scalability, performing poorly in real-world scenarios. This is where Large Language Models (LLMs) come into play. LLMs have shown impressive capabilities in code-related tasks, including generation, summarization, and bug fixing, making them a promising avenue for identifying smart contract vulnerabilities.

Introducing REX: Automated Exploit Generation

A recent research paper, Prompt to Pwn: Automated Exploit Generation for Smart Contracts, explores the feasibility of using LLMs for Automated Exploit Generation (AEG) against vulnerable smart contracts. The researchers introduce REX, a novel framework that integrates LLM-based exploit synthesis with the Foundry testing suite. REX enables the automated generation and validation of proof-of-concept (PoC) exploits, offering an end-to-end pipeline for exploit generation, compilation, execution, and verification.

The REX framework operates in five key steps. First, it preprocesses input smart contracts by removing comments and non-functional content, ensuring the LLM focuses solely on the core contract logic. Second, given a vulnerable contract, the LLM generates two Foundry scripts: an exploit contract to trigger the vulnerability and a test contract to validate its success. The LLMs are designed to iteratively optimize their prompts and reason step-by-step to improve accuracy. Third, an optional script optimization step automatically fixes common errors like non-checksummed Ethereum addresses and missing ‘payable’ casts. Fourth, the generated scripts are compiled and tested within a Foundry project, verifying them syntactically and semantically. Finally, an iterative feedback loop returns error messages to the LLM if compilation or testing fails, allowing it to correct and regenerate the scripts until a valid exploit is found or a retry limit is reached.

LLM Performance in Exploit Generation

The research evaluated five state-of-the-art LLMs: GPT-4.1, Gemini 2.5 Pro, Claude Opus 4, DeepSeek, and Qwen3 Plus. They were tested on both synthetic benchmarks (SMART BUGS-CURATED) and real-world smart contracts affected by known high-impact exploits (WEB3-AEG). The results were compelling: modern LLMs can reliably generate functional PoC exploits for diverse vulnerability types, with success rates reaching up to 92%.

Notably, Gemini 2.5 Pro and GPT-4.1 consistently outperformed the others in both synthetic and real-world scenarios. Gemini 2.5 Pro achieved the highest average success rate of 67.3% across various vulnerability types, excelling in arithmetic (92.9%), front running (75.0%), and unchecked low-level calls (56.7%). It even demonstrated autonomous exploit discovery capabilities and expert-like reasoning in some real-world cases. While LLMs showed strong performance, they primarily generated single-contract exploits. Human experts, in contrast, often craft complex exploit chains that span multiple contracts and interact with DeFi protocols for maximum profit.

Factors Influencing Exploit Generation

The study also delved into factors affecting AEG effectiveness. The LLM’s inherent capabilities emerged as the primary determinant of success. Models with stronger general coding abilities, as evidenced by benchmarks like Aider and LMArena, performed better in exploit generation. Recurring failure patterns included cryptographic limitations (e.g., incorrect Ethereum addresses) and semantic misunderstandings (e.g., misusing the ‘payable’ modifier).

Interestingly, structural properties of target contracts, such as code length or complexity, showed only weak correlations with AEG success. This suggests that while complexity might correlate with vulnerability, it doesn’t reliably predict how easily an LLM can exploit it. Vulnerability types with predictable structures, like arithmetic overflows, were more exploitable due to their simplicity and fixed patterns. Lastly, prompt engineering, while helpful for output format, had limited effect on overall AEG performance, indicating that the LLM’s internal reasoning capacity is more crucial than external instructions.

Also Read:

Defending Against LLM-Based Threats

The research also proposes several defense strategies to mitigate LLM-driven threats. These include:

  • Externalization via Code Splitting: Decomposing contract logic into modular components (e.g., separating proxies) forces LLMs to reason across multiple contracts, increasing exploit generation difficulty.
  • Structural, Not Superficial, Complexity: Using deep inheritance trees, abstract interfaces, and polymorphic dispatch complicates semantic tracing for LLMs, reducing exploit success.
  • Breaking Canonical Signatures: Diversifying vulnerability contexts with redundant logic, unconventional naming, or control-flow indirection can disrupt the LLM’s pattern-matching abilities.
  • Decoy Vulnerabilities: Intentionally introducing false-positive patterns that resemble canonical vulnerabilities can mislead LLMs.
  • Use of Edge Syntax and Low-Level Features: Implementing critical logic using less common Solidity constructs like inline Yul or assembly can introduce semantic obfuscation.

While individual defensive modifications showed limited impact, combining multiple techniques significantly reduced the success rate of LLM-based AEG. However, even with strengthened protection, certain vulnerability types, such as bad randomness and time manipulation, remained susceptible to LLM-generated attacks.

In conclusion, LLMs are proving to be powerful tools for automated exploit generation in smart contracts, driven primarily by their reasoning and code generation abilities. This research not only highlights their potential but also provides valuable insights into developing more robust defense mechanisms for the evolving landscape of blockchain security.

Dev Sundaram
Dev Sundaramhttps://blogs.edgentiq.com
Dev Sundaram is an investigative tech journalist with a nose for exclusives and leaks. With stints in cybersecurity and enterprise AI reporting, Dev thrives on breaking big stories—product launches, funding rounds, regulatory shifts—and giving them context. He believes journalism should push the AI industry toward transparency and accountability, especially as Generative AI becomes mainstream. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -