Unseen Messages in Plain Sight: How LLMs Conceal Information

TLDR: A new protocol allows Large Language Models (LLMs) to hide a complete, meaningful text inside another, entirely different but plausible text of the same length. This method works efficiently even with modest LLMs and demonstrates a radical separation of text from its author’s true intent, raising significant questions for AI safety, such as the covert deployment of unfiltered LLMs, and challenging our understanding of LLM knowledge and hallucinations.

Large Language Models (LLMs) have introduced a fascinating and somewhat unsettling capability: the ability to conceal an entire meaningful text within another, completely different yet coherent and plausible text of the same length. This groundbreaking development, explored in a recent paper by Antonio Norelli and Michael Bronstein from Project CETI and the University of Oxford, highlights a radical decoupling of text from authorial intent, further eroding trust in written communication.

Imagine a tweet containing a harsh political critique hidden inside a tweet celebrating the same political leader, or a secret manuscript embedded within an ordinary product review. This is now achievable with a simple and efficient protocol using LLMs. Even modest 8-billion-parameter open-source LLMs can produce high-quality results, encoding and decoding messages as long as an abstract on a laptop in mere seconds.

How Does This Text-Hiding Protocol Work?

The method, described as a straightforward ‘recipe’, involves a few key steps. First, the secret message (the text to hide) is tokenized using the LLM’s tokenizer. For each token, its rank within the LLM’s probability distribution, given the preceding context, is recorded. Then, to construct the ‘fake’ or ‘stegotext’, the LLM is prompted with a secret key. Instead of sampling tokens based on their probabilities, the LLM is forced to choose tokens according to the recorded ranks of the secret message. The receiver, possessing the same LLM and the secret key, can reverse this process to perfectly reconstruct the original hidden text.

The quality of the resulting stegotext—how plausible and coherent it appears—depends on several factors: the soundness of the original message, the effectiveness of the secret prompt (key), and the capability of the LLM used. If the LLM is good at predicting the original message, the ranks will be low, ensuring the stegotext remains coherent and aligns with the context set by the secret prompt. However, if the original message is difficult for the LLM to predict, the stegotext might become garbled.

Also Read:

Implications for AI Safety and Trust

This protocol opens up significant questions and potential applications, particularly in the realm of AI safety. One concrete scenario presented in the paper is how a company could covertly deploy an unfiltered LLM. By encoding the answers from a powerful, unaligned LLM within the compliant responses of a safe, aligned model, users could unknowingly extract uncensored information. The paper illustrates this with an example of a user asking how to tamper with a gas meter; the aligned chatbot provides a censored response, but a hidden message within it guides the user’s local open-source LLM to generate the uncensored instructions.

The existence of such a protocol challenges our fundamental understanding of what it means for an LLM to ‘know’ something. LLMs can generate virtually any text, but does assigning a high probability to a text equate to knowledge? The paper suggests a novel perspective on ‘hallucinations’—not merely as factual falsehoods, but as a ‘void of intention.’ When a text is generated under the constraint of encoding another message, it becomes difficult to ascribe genuine intent to the LLM, much like appreciating Oulipo literature written under arbitrary constraints.

Ultimately, this research highlights the extreme constraint satisfaction problem underlying standard LLM text generation. It reveals a clash between the LLM’s ability to produce coherent text and the human expectation of authorial purpose. As machine-generated text becomes ubiquitous, this decoupling of text from human intent could profoundly impact our trust in written communication. The paper concludes by warning that any original text could now be a ‘beautiful and treacherous, and spacious, Trojan horse.’ You can read the full research paper here: LLMs can hide text in other text of the same length.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unseen Messages in Plain Sight: How LLMs Conceal Information

How Does This Text-Hiding Protocol Work?

Implications for AI Safety and Trust

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates