Web Agents Can Be Subtly Manipulated Through Combined Visual and Textual Tweaks

TLDR: A new research paper introduces Cross-Modal Preference Steering (CPS), a method that can subtly manipulate AI web agents by making imperceptible changes to both images and text descriptions of online content. This black-box attack, which doesn’t require access to the agent’s internal workings, significantly biases agent selections (e.g., product choices, movie recommendations) while remaining largely undetected, highlighting a critical security vulnerability in AI-powered decision-making systems.

AI-powered web agents are becoming increasingly common, taking on crucial roles from recommending movies to selecting products online. These agents are designed to make decisions on behalf of users, often by combining what they see (visuals) with what they read (textual descriptions). However, new research reveals a significant vulnerability: these agents can be subtly manipulated through a novel attack called Cross-Modal Preference Steering (CPS).

Traditionally, attacks on AI systems have often relied on unrealistic assumptions, such as having full access to the AI model’s internal workings (white-box access) or complete control over the webpages themselves. These requirements severely limit their practical application in the real world. The new research, however, focuses on a much more realistic scenario: an attacker who acts like a regular content publisher, able to edit only their own listing’s images and text, without any insight into the agent’s underlying AI model.

Understanding Cross-Modal Preference Steering (CPS)

CPS is a groundbreaking attack framework that jointly exploits two fundamental vulnerabilities in Vision-Language Model (VLM)-based web agents: visual perception and textual interpretation. By making imperceptible modifications to both an item’s visual and natural language descriptions, CPS can effectively steer an agent’s decisions towards a targeted item.

One key aspect of CPS is its exploitation of a visual vulnerability. Many VLMs rely on similar image encoders, creating a common weakness. The researchers developed a method using Projected Gradient Descent (PGD) to create tiny, unnoticeable changes to images. These changes are so subtle that humans can’t see them, but they can drastically alter how an AI agent perceives an image. For example, an image of an “apple” could be made to appear as an “orange” to the AI, without any visible change to a human observer. This attack is effective even against black-box commercial models like GPT-4.1 and works across various image resolutions.

The second vulnerability CPS targets is textual. AI models, especially those trained with Reinforcement Learning from Human Feedback (RLHF), inadvertently develop systematic preferences for specific linguistic patterns and stylistic elements. Attackers can exploit these biases by crafting descriptions that subtly appeal to the agent’s learned preferences, without triggering any detection mechanisms. This means carefully chosen words and phrases can make an item seem more appealing to the AI.

The power of CPS lies in the synergy of these two approaches. By coordinating both visual perturbations and textual modifications, the attack’s effectiveness is significantly amplified beyond what either single-modal approach could achieve alone. The research utilized GPT-4.1 as an “attacker model” in a feedback loop to iteratively refine both the visual concepts (e.g., injecting “best choice” into an image) and textual descriptions, ensuring semantic consistency while maximizing manipulation.

Also Read:

Real-World Impact and Stealth

The researchers evaluated CPS on agents powered by state-of-the-art VLMs, including GPT-4.1, Qwen-2.5VL, and Pixtral-Large, across tasks like movie selection and e-commerce. The results were striking: CPS consistently outperformed leading baseline methods, achieving preference manipulation rates as high as 71%. This means the attack successfully biased the agent’s selection towards the targeted item in a vast majority of cases.

Crucially, CPS also demonstrated remarkable stealth. While other manipulation methods were easily detected (with detection rates as high as 98%), CPS maintained a significantly lower detection rate (between 18.5% and 26.0%). This low detectability means that even advanced AI detectors, explicitly looking for manipulation, struggled to identify the subtle changes made by CPS. This highlights a critical challenge for current defense mechanisms.

The findings of this research underscore an urgent need for robust defenses against such sophisticated attacks. As AI agents become more integrated into our daily lives, mediating high-stakes decisions in areas like market fairness, user autonomy, and information integrity, the ability for any content publisher to invisibly steer their preferences poses immediate and serious risks. This work, detailed further in the research paper, Cross-Modal Content Optimization for Steering Web Agent Preferences, serves as a vital step towards understanding and securing the next generation of autonomous web agents.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Web Agents Can Be Subtly Manipulated Through Combined Visual and Textual Tweaks

Understanding Cross-Modal Preference Steering (CPS)

Real-World Impact and Stealth

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates