spot_img
HomeNews & Current EventsNew Research Reveals AI Models Generate Code with Significant...

New Research Reveals AI Models Generate Code with Significant Security Vulnerabilities

TLDR: A recent study by researchers from New York University, Columbia University, Monash University, and CSIRO has found that state-of-the-art AI models, including advanced ‘reasoning’ models, produce code with significant security flaws 18% to 50% of the time. These vulnerabilities are particularly severe in areas like authentication and cookie management, raising concerns about the security of AI-generated code and the need for new solutions to manage these risks.

A groundbreaking new study, published on November 2, 2025, by a collaborative team of researchers from New York University, Columbia University, Monash University, and Australia’s national science agency, CSIRO, has unveiled a critical security challenge in the rapidly evolving field of artificial intelligence. The research indicates that state-of-the-art large language models (LLMs) are prone to generating code containing significant security vulnerabilities, with flaws appearing in 18% to 50% of the code produced.

The study involved tasking nine advanced LLMs, including ‘reasoning’ models like o3-mini and DeepSeek-R1, with generating Chrome extensions based on 140 different functional scenarios. The findings revealed that the vulnerabilities were most pronounced when models were instructed to build tools for ‘Authentication & Identity’ or ‘Cookie Management,’ where vulnerable code was produced up to 83% and 78% of the time, respectively. The most common and severe flaw identified was ‘Privileged Storage Access,’ which improperly exposed sensitive browser data such as cookies, history, and bookmarks to untrusted sources.

Perhaps more concerning, the study highlighted that newer, more advanced ‘reasoning’ models often performed worse than their predecessors, generating more vulnerabilities or a higher density of them. This suggests that the pursuit of more sophisticated AI capabilities does not automatically translate to improved security performance.

These findings contribute to a growing ‘Productivity Paradox’ in AI-powered coding. While AI can significantly accelerate code generation, potentially quadrupling output for some teams, it can also introduce new bottlenecks and risks. The increased volume of lower-quality, insecure code can intensify challenges in code review, testing, and rework, potentially diminishing overall productivity or even leading to a net negative impact. The researchers of a related Chrome study emphasized that ‘human oversight remains essential for security assurance,’ but current human oversight mechanisms are already stretched thin.

Also Read:

To address these multifaceted challenges, the study underscores the urgent need for new tools and approaches to manage the first-order and second-order consequences of AI-generated code. As the volume of AI-generated code is projected to surpass one trillion lines per year, organizations require innovative solutions to maintain security, visibility, and control. One promising example mentioned is Macroscope, an AI tool designed to assist with code review and codebase comprehension, which demonstrated a 48% detection rate for real-world production bugs while generating 75% fewer comments than other accurate AI review tools. Such tools are becoming increasingly vital to prevent the AI productivity boom from collapsing under the weight of unreviewed, insecure, and incomprehensible code.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -