TLDR: A new research paper introduces CHAI, a novel attack that exploits Large Visual-Language Models (LVLMs) in embodied AI systems. CHAI embeds deceptive natural language instructions as visual signs in an AI’s environment, systematically optimizing both the text and its appearance to hijack the AI’s command decisions. Experiments show high success rates in simulations and real-world robotic vehicles across tasks like drone landing, autonomous driving, and object tracking, demonstrating that visual text can override safety protocols and visual cues, even across different languages. This highlights a critical new vulnerability and the urgent need for advanced multimodal defenses in embodied AI.
Embodied Artificial Intelligence (AI) holds immense promise for the future of robotic systems, especially in handling unpredictable situations where traditional data is scarce. These advanced AIs, often powered by Large Visual-Language Models (LVLMs), are designed to use common-sense reasoning, grounded in what they see and do, to adapt to new real-world scenarios. Think of autonomous cars navigating unexpected road conditions or drones making critical decisions during emergencies. However, these very capabilities also open the door to new and sophisticated security vulnerabilities.
Introducing CHAI: A New Threat to Embodied AI
A recent research paper introduces a novel class of prompt-based attacks called CHAI (Command Hijacking against embodied AI). This attack exploits the ability of LVLMs to interpret multimodal language – meaning they understand both visual information and natural language instructions. CHAI works by embedding deceptive natural language instructions, such as misleading signs, directly into the visual input that an embodied AI system perceives. It then systematically searches for the most effective text and visual characteristics to generate what are called ‘Visual Attack Prompts’.
Unlike previous attacks that primarily target the perception layer (like dirty road patterns confusing lane detection or LiDAR spoofing), CHAI focuses on hijacking the intermediate, text-based planning decisions made by embodied AIs. This means it doesn’t just make the AI misinterpret what it sees; it makes the AI misinterpret what it should *do* based on what it sees and reads.
How CHAI Works Under the Hood
The core of CHAI involves a clever two-stage optimization process. First, it creates a ‘dictionary’ of potential malicious prompts. This is done by having an ‘attacker LLM’ (another language model) converse with the target LVLM, learning which phrases are most likely to succeed in an attack. This helps narrow down the vast possibilities of language. Second, it jointly optimizes both the semantic content of the visual prompt (what the sign says) and its perceptual realization (how it looks – color, font, size, placement). This dual optimization ensures the attack is both semantically persuasive and visually effective, even under varying conditions.
The researchers designed CHAI as a ‘black-box’ attack, meaning it doesn’t need to know the internal workings, weights, or architecture of the target LVLM. This makes it highly relevant for real-world scenarios where many advanced LVLMs are only accessible through limited APIs.
Real-World Impact and Experiments
The CHAI attack was rigorously evaluated across four different LVLM agents and a real robotic vehicle, demonstrating its broad applicability and effectiveness. The applications included:
- Drone Emergency Landing: Misleading a drone to land on a crowded, unsafe rooftop instead of a clear one.
- Autonomous Driving (DriveLM): Forcing an autonomous car to proceed into a crosswalk with pedestrians, despite safety protocols.
- Aerial Object Tracking (CloudTrack): Tricking a drone into tracking a decoy civilian car instead of a target police vehicle.
In simulations, CHAI achieved impressive attack success rates (ASR), reaching up to 95.5% on CloudTrack, 81.8% on DriveLM, and 68.1% on drone landing. It consistently outperformed state-of-the-art attacks like SceneTAP, sometimes by as much as 10 times. Crucially, CHAI attacks also showed strong ‘transferability’, meaning they remained effective on images and scenarios not used during the attack’s optimization, with ASRs often above 70% for GPT-based models.
Perhaps the most compelling results came from real-world experiments using a physical robotic vehicle. By printing optimized visual prompts on paper and placing them in the scene, CHAI successfully biased LVLM decisions, achieving over 87% ASR for GPT-based models. This demonstrated that the attack is practical even under real-world challenges like variable lighting, viewing angles, and sensor noise.
Also Read:
- How Narrative Attacks Exploit Unified AI Models
- The Stealthy Threat of Deceptive AI Reasoning: Introducing DecepChain
Key Insights and Future Implications
The research revealed several critical insights:
- Text Overrides Safety: LVLMs can be convinced by visual text prompts to bypass inherent safety considerations, even when they recognize other hazards in the scene (e.g., pedestrians).
- Multilingual Vulnerability: CHAI attacks are effective across different languages, including Chinese, Spanish, and even ‘Spanglish’, potentially allowing attackers to hide their malicious intent from human observers.
- Prompt Overrides Visual Cues: A well-crafted text prompt can make an LVLM misidentify objects, even when visual evidence contradicts the text.
These findings underscore an urgent need for new defenses that can reason over both text and vision simultaneously. The paper suggests future work on filter-based defenses, safety alignment techniques for LVLMs, and provable defenses against such visual prompts. The introduction of CHAI exposes a fundamentally new attack surface against LVLM-driven embodied AI, highlighting the critical importance of developing robust multimodal defenses before these systems are widely deployed in safety-critical applications. You can read the full research paper here: CHAI: Command Hijacking against embodied AI.


