BreakFun: Unmasking LLM Vulnerabilities Through Structured Data Exploitation

TLDR: BreakFun is a new jailbreaking method that exploits Large Language Models’ (LLMs) strong adherence to structured data schemas. It uses a three-part prompt—innocent framing, a ‘Trojan Schema’ designed to compel harmful content, and a Chain-of-Thought distraction—to bypass safety mechanisms. The attack achieved an 89% success rate across 13 diverse LLMs, revealing a ‘Guardrail Divide’ where open-source models were highly vulnerable. The ‘Trojan Schema’ was identified as the primary causal factor. A proposed defense, ‘Adversarial Prompt Deconstruction,’ successfully detected all BreakFun attacks by isolating and analyzing the true semantic intent.

Large Language Models (LLMs) are incredibly powerful, especially when it comes to handling structured data like code or JSON. This ability, while driving their widespread use, also creates a surprising vulnerability. Researchers Amirkia Rafiei Oskooei and Mehmet S. Aktas from Yildiz Technical University and Intellica Business Intelligence have explored this weakness through a new method called BreakFun, which exploits an LLM’s strong tendency to follow structured schemas to bypass its safety mechanisms.

The core idea behind BreakFun is what the authors call “cognitive misdirection.” It’s designed to make an LLM focus so intensely on a complex, seemingly harmless technical task – like interpreting a data structure and simulating its output – that it overlooks any harmful content hidden within. This allows the model’s usual safety checks to be bypassed.

How BreakFun Works: A Three-Part Attack

BreakFun uses a clever three-part prompt to achieve its goal:

1. Innocent Framing: The prompt starts by presenting itself as a benign, educational request. For example, it might pretend the user is a new programmer trying to understand how a schema-guided generation library works. This makes the LLM adopt a helpful persona and lowers its guard.

2. Trojan Schema: This is the heart of the attack. It’s a carefully designed code snippet containing a data structure that looks legitimate but is crafted to force the LLM to generate harmful content. This is done through “adversarial naming” (choosing class and field names that subtly lead to harmful output) and “structural customization” (adapting a generic schema with specific classes and fields for different harmful tasks, like generating malware or disinformation).

3. Chain-of-Thought (CoT) Distraction: The final component instructs the LLM to “think step by step” about the hypothetical code execution. It asks the model to explain the code library, how the schema is built, and then provide a concrete example of the structured output. This creates a significant cognitive load, forcing the model to concentrate on the *process* of generation rather than the *content*, further diluting the malicious payload.

Widespread Vulnerability and the “Guardrail Divide”

The researchers tested BreakFun across 13 different LLMs, including models from major providers like Google, OpenAI, and Anthropic, as well as open-source models. The results were striking: BreakFun achieved an average success rate of 89% across all models, demonstrating that this vulnerability is widespread and affects a broad range of LLMs, regardless of their size or provider.

A key finding was the “Guardrail Divide.” Locally-hosted, open-source models (Tier 1) were extremely vulnerable, with nearly perfect success rates (around 98%). Production-hardened API-based systems (Tier 2) showed more resistance but were still consistently bypassed (around 78% success rate). This suggests that while provider-side safety mechanisms help, they don’t fix the underlying vulnerability in the models themselves.

Identifying the Core Weakness

An ablation study, where components of the BreakFun prompt were individually removed, confirmed that the “Trojan Schema” is the primary cause of the jailbreak. Its removal drastically reduced the attack’s success rate. The Chain-of-Thought distraction was also found to be a critical enabler, while the innocent framing played a minor enhancing role.

A Promising Defense: Adversarial Prompt Deconstruction

Based on their findings, the researchers proposed a defense mechanism called “Adversarial Prompt Deconstruction.” This guardrail uses a smaller, secondary LLM to analyze and deconstruct the user’s input before it reaches the main model. It works in three steps:

1. Literal Transcription: Extracts all human-readable text and content from structural formatting, isolating the attacker’s payload.

2. CoT Unwrapping: Forces the guardrail LLM to explicitly transcribe the extracted strings, creating a “sanitized context” for analysis.

3. Logical OR: If *any* single component of the transcribed content is deemed harmful, the entire prompt is flagged.

This proof-of-concept guardrail successfully detected 100% of BreakFun attacks. While it had an 18% false positive rate on complex benign prompts, indicating a need for further refinement, it validates that targeting the deceptive schema is a viable mitigation strategy.

Also Read:

Implications for AI Safety

The BreakFun research highlights a fundamental tension between an LLM’s advanced capabilities and its security. The models’ drive to follow complex instructions, which makes them powerful, also makes them vulnerable to this type of exploitation. The paper concludes that securing future LLM systems will require moving beyond simple content filtering and developing deeper, more fundamental resilience against cognitive deception and structural exploits. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

BreakFun: Unmasking LLM Vulnerabilities Through Structured Data Exploitation

How BreakFun Works: A Three-Part Attack

Widespread Vulnerability and the “Guardrail Divide”

Identifying the Core Weakness

A Promising Defense: Adversarial Prompt Deconstruction

Implications for AI Safety

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Anthropic Reveals First AI-Orchestrated Cyber Espionage Campaign by Chinese State-Sponsored Group

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates