The Silent Threat: LLMs Found to Embed Malicious URLs in Generated Code

TLDR: A new research paper reveals that widely used production Large Language Models (LLMs) are inadvertently generating code containing malicious URLs at a significant rate. An automated audit framework tested four LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, DeepSeek-V3) and found that, on average, 4.2% of generated programs included harmful links, often in response to seemingly benign developer prompts. This systemic vulnerability, likely due to poisoned training data from the internet, highlights an urgent need for enhanced safety checks and defense mechanisms in AI-assisted code generation.

Large Language Models (LLMs) have become indispensable tools for software developers, assisting in generating vast amounts of code. However, their reliance on massive internet datasets for training introduces a critical security vulnerability: the potential to absorb and reproduce malicious content. A recent study by researchers at the University of Toronto has uncovered a systemic issue, revealing that production LLMs are indeed generating code containing hidden scam endpoints at a non-negligible rate.

The Hidden Danger in AI-Generated Code

The research highlights that the internet is rife with misinformation, scams, and deliberately poisonous content. Unlike traditional web services that can delist harmful URLs in real-time, malicious content embedded in an LLM’s training data becomes permanently ingrained. This means even if the original source is removed from the web, the poisoned data can persist within models, repeatedly exposing users to harm.

This threat is particularly severe in AI-assisted code generation. Developers often integrate LLM-generated code into production systems, where it might access sensitive data or acquire administrative privileges. The sheer volume of code generated makes manual review challenging, allowing cleverly hidden vulnerabilities or malicious payloads to go unnoticed until execution, leading to significant damage.

A Real-World Cryptocurrency Scam

A striking example of this danger occurred in November 2024, when a ChatGPT user lost approximately $2,500 in cryptocurrency. The user requested a Python script to buy a token on the pump.fun platform. After initial corrections, ChatGPT generated code that included a malicious API endpoint: https://api.solanaapis.com/pumpfun/buy. Crucially, the script instructed the user to include their wallet’s private key directly in the POST request payload—a fundamental security violation. Despite initial errors, the user debugged the script with ChatGPT’s help, eventually executing a version that transmitted their private key to the malicious endpoint, resulting in the theft of their cryptocurrency.

Subsequent investigation revealed that the malicious domain was part of a large-scale cryptocurrency theft operation, with fraudulent APIs strategically distributed across popular developer platforms like GitHub and Stack Exchange to enhance their perceived legitimacy. This incident underscores that malicious actors exploit a key characteristic of LLM behavior: when legitimate services cannot fulfill highly specific user requirements, models may preferentially recommend endpoints that claim to provide exact functionality matches, regardless of security implications.

Uncovering the Vulnerability: The Automated Audit Framework

To systematically evaluate this threat, the researchers introduced a scalable, automated audit framework. This framework operates in four stages:

Malicious URL Collection: It starts by gathering URLs from existing phishing databases, such as those maintained by MetaMask and PhishFort.
Prompt Synthesis: A “Prompt LLM” analyzes the content of these malicious pages to synthesize innocuous, developer-style programming tasks. These prompts are designed to be specific, incorporating keywords from the malicious pages, and appear as benign coding requests.
Code Generation and URL Extraction: These synthesized prompts are then fed to “Codegen LLMs” (the target production LLMs), which generate code snippets. A URL extraction module identifies all endpoints embedded in the generated code.
URL Malice Detection and Human Adjudication: The extracted URLs are evaluated by an ensemble of independent detectors (e.g., Google Safe Browsing). Finally, human experts review the prompts to ensure they are genuinely innocuous developer requests, not adversarial ones.

Alarming Findings Across Production LLMs

The large-scale evaluation involved four prominent production LLMs: GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3. The results were unequivocal: all tested models generated malicious code at a non-negligible rate.

On average, 4.2% of programs generated in the experiments contained malicious URLs.
When focusing solely on the URLs extracted from the generated code, an alarming 12% were found to be malicious.
The combination of GPT-4o-mini as the prompt generator and GPT-4o as the code generator yielded the highest rate, with 5.94% of its generated files containing malicious URLs, and 17.60% of extracted URLs being malicious.
Even with “creative sampling” (introducing more randomness in generation), the vulnerability persisted, demonstrating it’s a fundamental issue, not just an artifact of deterministic settings.
Crucially, the researchers identified 177 truly innocuous prompts that consistently triggered all four LLMs to produce harmful outputs, underscoring the systemic nature of the problem.

A significant finding was the high overlap of malicious domains identified by models from different companies. This suggests that despite independent data collection efforts, the public internet itself acts as a common source, leading to a convergence in malicious domain knowledge that is absorbed by these models.

Also Read:

The Urgent Need for Stronger Defenses

These findings provide strong empirical evidence that the training data of production LLMs has been successfully poisoned at scale. The prevalence of malicious links in model-generated code poses a tangible and urgent threat to everyday users and developers. If executed, such code could lead to phishing attacks, malware payloads, or significant financial losses.

The study underscores the urgent need for more robust defense mechanisms and rigorous post-generation safety checks to mitigate the propagation of these hidden security threats. Developers and users should exercise extreme caution when integrating AI-generated code, especially when it involves external API calls or sensitive information. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

The Silent Threat: LLMs Found to Embed Malicious URLs in Generated Code

The Hidden Danger in AI-Generated Code

A Real-World Cryptocurrency Scam

Uncovering the Vulnerability: The Automated Audit Framework

Alarming Findings Across Production LLMs

The Urgent Need for Stronger Defenses

Gen AI News and Updates

Rubrik Report Reveals Alarming Decline in Cyber Resilience Amidst AI Agent Proliferation

Anthropic Reveals First AI-Orchestrated Cyber Espionage Campaign by Chinese State-Sponsored Group

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates