TLDR: A new research paper reveals that widely used production Large Language Models (LLMs) are inadvertently generating code containing malicious URLs at a significant rate. An automated audit framework tested four LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, DeepSeek-V3) and found that, on average, 4.2% of generated programs included harmful links, often in response to seemingly benign developer prompts. This systemic vulnerability, likely due to poisoned training data from the internet, highlights an urgent need for enhanced safety checks and defense mechanisms in AI-assisted code generation.
Large Language Models (LLMs) have become indispensable tools for software developers, assisting in generating vast amounts of code. However, their reliance on massive internet datasets for training introduces a critical security vulnerability: the potential to absorb and reproduce malicious content. A recent study by researchers at the University of Toronto has uncovered a systemic issue, revealing that production LLMs are indeed generating code containing hidden scam endpoints at a non-negligible rate.
The Hidden Danger in AI-Generated Code
The research highlights that the internet is rife with misinformation, scams, and deliberately poisonous content. Unlike traditional web services that can delist harmful URLs in real-time, malicious content embedded in an LLM’s training data becomes permanently ingrained. This means even if the original source is removed from the web, the poisoned data can persist within models, repeatedly exposing users to harm.
This threat is particularly severe in AI-assisted code generation. Developers often integrate LLM-generated code into production systems, where it might access sensitive data or acquire administrative privileges. The sheer volume of code generated makes manual review challenging, allowing cleverly hidden vulnerabilities or malicious payloads to go unnoticed until execution, leading to significant damage.
A Real-World Cryptocurrency Scam
A striking example of this danger occurred in November 2024, when a ChatGPT user lost approximately $2,500 in cryptocurrency. The user requested a Python script to buy a token on the pump.fun platform. After initial corrections, ChatGPT generated code that included a malicious API endpoint: https://api.solanaapis.com/pumpfun/buy. Crucially, the script instructed the user to include their wallet’s private key directly in the POST request payload—a fundamental security violation. Despite initial errors, the user debugged the script with ChatGPT’s help, eventually executing a version that transmitted their private key to the malicious endpoint, resulting in the theft of their cryptocurrency.
Subsequent investigation revealed that the malicious domain was part of a large-scale cryptocurrency theft operation, with fraudulent APIs strategically distributed across popular developer platforms like GitHub and Stack Exchange to enhance their perceived legitimacy. This incident underscores that malicious actors exploit a key characteristic of LLM behavior: when legitimate services cannot fulfill highly specific user requirements, models may preferentially recommend endpoints that claim to provide exact functionality matches, regardless of security implications.
Uncovering the Vulnerability: The Automated Audit Framework
To systematically evaluate this threat, the researchers introduced a scalable, automated audit framework. This framework operates in four stages:
- Malicious URL Collection: It starts by gathering URLs from existing phishing databases, such as those maintained by MetaMask and PhishFort.
- Prompt Synthesis: A “Prompt LLM” analyzes the content of these malicious pages to synthesize innocuous, developer-style programming tasks. These prompts are designed to be specific, incorporating keywords from the malicious pages, and appear as benign coding requests.
- Code Generation and URL Extraction: These synthesized prompts are then fed to “Codegen LLMs” (the target production LLMs), which generate code snippets. A URL extraction module identifies all endpoints embedded in the generated code.
- URL Malice Detection and Human Adjudication: The extracted URLs are evaluated by an ensemble of independent detectors (e.g., Google Safe Browsing). Finally, human experts review the prompts to ensure they are genuinely innocuous developer requests, not adversarial ones.
Alarming Findings Across Production LLMs
The large-scale evaluation involved four prominent production LLMs: GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3. The results were unequivocal: all tested models generated malicious code at a non-negligible rate.
- On average, 4.2% of programs generated in the experiments contained malicious URLs.
- When focusing solely on the URLs extracted from the generated code, an alarming 12% were found to be malicious.
- The combination of GPT-4o-mini as the prompt generator and GPT-4o as the code generator yielded the highest rate, with 5.94% of its generated files containing malicious URLs, and 17.60% of extracted URLs being malicious.
- Even with “creative sampling” (introducing more randomness in generation), the vulnerability persisted, demonstrating it’s a fundamental issue, not just an artifact of deterministic settings.
- Crucially, the researchers identified 177 truly innocuous prompts that consistently triggered all four LLMs to produce harmful outputs, underscoring the systemic nature of the problem.
A significant finding was the high overlap of malicious domains identified by models from different companies. This suggests that despite independent data collection efforts, the public internet itself acts as a common source, leading to a convergence in malicious domain knowledge that is absorbed by these models.
Also Read:
- Securing AI on the Go: A Look at Privacy and Security in Mobile Large Language Models
- Pinpointing Safety: A New Look at LLM Jailbreak Defenses Through Knowledge Neurons
The Urgent Need for Stronger Defenses
These findings provide strong empirical evidence that the training data of production LLMs has been successfully poisoned at scale. The prevalence of malicious links in model-generated code poses a tangible and urgent threat to everyday users and developers. If executed, such code could lead to phishing attacks, malware payloads, or significant financial losses.
The study underscores the urgent need for more robust defense mechanisms and rigorous post-generation safety checks to mitigate the propagation of these hidden security threats. Developers and users should exercise extreme caution when integrating AI-generated code, especially when it involves external API calls or sensitive information. For more details, you can read the full research paper here.


