CHAI: How Malicious Signs Can Hijack Embodied AI Decisions

TLDR: A new research paper introduces CHAI, a novel attack that exploits Large Visual-Language Models (LVLMs) in embodied AI systems. CHAI embeds deceptive natural language instructions as visual signs in an AI’s environment, systematically optimizing both the text and its appearance to hijack the AI’s command decisions. Experiments show high success rates in simulations and real-world robotic vehicles across tasks like drone landing, autonomous driving, and object tracking, demonstrating that visual text can override safety protocols and visual cues, even across different languages. This highlights a critical new vulnerability and the urgent need for advanced multimodal defenses in embodied AI.

Embodied Artificial Intelligence (AI) holds immense promise for the future of robotic systems, especially in handling unpredictable situations where traditional data is scarce. These advanced AIs, often powered by Large Visual-Language Models (LVLMs), are designed to use common-sense reasoning, grounded in what they see and do, to adapt to new real-world scenarios. Think of autonomous cars navigating unexpected road conditions or drones making critical decisions during emergencies. However, these very capabilities also open the door to new and sophisticated security vulnerabilities.

Introducing CHAI: A New Threat to Embodied AI

A recent research paper introduces a novel class of prompt-based attacks called CHAI (Command Hijacking against embodied AI). This attack exploits the ability of LVLMs to interpret multimodal language – meaning they understand both visual information and natural language instructions. CHAI works by embedding deceptive natural language instructions, such as misleading signs, directly into the visual input that an embodied AI system perceives. It then systematically searches for the most effective text and visual characteristics to generate what are called ‘Visual Attack Prompts’.

Unlike previous attacks that primarily target the perception layer (like dirty road patterns confusing lane detection or LiDAR spoofing), CHAI focuses on hijacking the intermediate, text-based planning decisions made by embodied AIs. This means it doesn’t just make the AI misinterpret what it sees; it makes the AI misinterpret what it should *do* based on what it sees and reads.

How CHAI Works Under the Hood

The core of CHAI involves a clever two-stage optimization process. First, it creates a ‘dictionary’ of potential malicious prompts. This is done by having an ‘attacker LLM’ (another language model) converse with the target LVLM, learning which phrases are most likely to succeed in an attack. This helps narrow down the vast possibilities of language. Second, it jointly optimizes both the semantic content of the visual prompt (what the sign says) and its perceptual realization (how it looks – color, font, size, placement). This dual optimization ensures the attack is both semantically persuasive and visually effective, even under varying conditions.

The researchers designed CHAI as a ‘black-box’ attack, meaning it doesn’t need to know the internal workings, weights, or architecture of the target LVLM. This makes it highly relevant for real-world scenarios where many advanced LVLMs are only accessible through limited APIs.

Real-World Impact and Experiments

The CHAI attack was rigorously evaluated across four different LVLM agents and a real robotic vehicle, demonstrating its broad applicability and effectiveness. The applications included:

Drone Emergency Landing: Misleading a drone to land on a crowded, unsafe rooftop instead of a clear one.
Autonomous Driving (DriveLM): Forcing an autonomous car to proceed into a crosswalk with pedestrians, despite safety protocols.
Aerial Object Tracking (CloudTrack): Tricking a drone into tracking a decoy civilian car instead of a target police vehicle.

In simulations, CHAI achieved impressive attack success rates (ASR), reaching up to 95.5% on CloudTrack, 81.8% on DriveLM, and 68.1% on drone landing. It consistently outperformed state-of-the-art attacks like SceneTAP, sometimes by as much as 10 times. Crucially, CHAI attacks also showed strong ‘transferability’, meaning they remained effective on images and scenarios not used during the attack’s optimization, with ASRs often above 70% for GPT-based models.

Perhaps the most compelling results came from real-world experiments using a physical robotic vehicle. By printing optimized visual prompts on paper and placing them in the scene, CHAI successfully biased LVLM decisions, achieving over 87% ASR for GPT-based models. This demonstrated that the attack is practical even under real-world challenges like variable lighting, viewing angles, and sensor noise.

Also Read:

Key Insights and Future Implications

The research revealed several critical insights:

Text Overrides Safety: LVLMs can be convinced by visual text prompts to bypass inherent safety considerations, even when they recognize other hazards in the scene (e.g., pedestrians).
Multilingual Vulnerability: CHAI attacks are effective across different languages, including Chinese, Spanish, and even ‘Spanglish’, potentially allowing attackers to hide their malicious intent from human observers.
Prompt Overrides Visual Cues: A well-crafted text prompt can make an LVLM misidentify objects, even when visual evidence contradicts the text.

These findings underscore an urgent need for new defenses that can reason over both text and vision simultaneously. The paper suggests future work on filter-based defenses, safety alignment techniques for LVLMs, and provable defenses against such visual prompts. The introduction of CHAI exposes a fundamentally new attack surface against LVLM-driven embodied AI, highlighting the critical importance of developing robust multimodal defenses before these systems are widely deployed in safety-critical applications. You can read the full research paper here: CHAI: Command Hijacking against embodied AI.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CHAI: How Malicious Signs Can Hijack Embodied AI Decisions

Introducing CHAI: A New Threat to Embodied AI

How CHAI Works Under the Hood

Real-World Impact and Experiments

Key Insights and Future Implications

Gen AI News and Updates

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

Generative AI Powers Next-Gen Autonomous Emergency Response

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates