Securing Agentic AI: A Whitelisting Approach to Prompt Protection

TLDR: LLMZ+ is a novel security framework for agentic Large Language Models (LLMs) that shifts from traditional threat detection to a contextual prompt whitelisting approach. Inspired by firewall principles, it only permits contextually appropriate and safe messages to interact with the LLM, effectively preventing prompt injection and jailbreak attacks. The system uses a guard prompt and ingress/egress filters to evaluate messages, assigning a risk score. When combined with larger LLM models (like Llama3.3 70B) and message pre-processing (e.g., length filtering), LLMZ+ demonstrates near-perfect detection rates with zero false positives and false negatives, enhancing the long-term resilience and reducing maintenance overhead for LLM information security.

Agentic AI models are becoming increasingly sophisticated, offering powerful capabilities by interacting with data sources and API tools. However, this enhanced functionality also makes them a prime target for attackers. Unlike traditional software, agentic Large Language Models (LLMs) rely on non-deterministic behavior, defining a final goal but leaving the path selection to the LLM itself. This characteristic introduces significant security risks, particularly from ‘jailbreak’ attacks like prompt injection.

Traditional security mechanisms for LLMs primarily focus on detecting malicious intent and preventing it from reaching the agent. These detection-based approaches often rely on predefined signatures and heuristics, similar to anti-malware products. While effective to a degree, they require constant updates to their definition databases to counter new attack techniques, leading to ongoing maintenance costs and the risk of ‘failing silently’ if updates are delayed or incomplete.

A new approach, called LLMZ+, offers an alternative by moving beyond traditional detection. Inspired by the robust security practices of perimeter firewalls, LLMZ+ implements a ‘prompt whitelisting’ mechanism. Instead of trying to identify and block every possible malicious input, LLMZ+ operates on the principle of allowing only contextually appropriate and safe messages to interact with the agentic LLM, blocking everything else by default. This method ensures that all exchanges between external users and the LLM conform to predefined use cases and operational boundaries.

How LLMZ+ Works

LLMZ+ introduces a conceptual security boundary for agentic LLMs, drawing inspiration from the Demilitarized Zone (DMZ) architecture in network security. It leverages an auxiliary LLM, functioning as a ‘whitelist guard,’ for both incoming prompts from external users (ingress) and outgoing replies from the agentic LLM (egress).

The Ingress filter verifies that messages from external users are fully interpretable by the Guard Prompt, consistent with a natural customer-service conversation, and relevant to the business case served by the Agentic LLM.

The Egress filter ensures that outbound messages also remain consistent with the intended business use case. This can be enhanced with a simple contextual Retrieval-Augmented Generation (RAG) to inform the guard LLM about permitted data categories, or simpler regex-based filters to block sensitive information like Social Security Numbers.

Messages that do not satisfy these criteria are blocked, effectively preventing the exploitation of the agentic LLM through prompt-based attacks. This solution is specifically designed to address prompting threats and complements, rather than replaces, a comprehensive information security architecture.

Deployment and Evaluation

The LLMZ+ framework is particularly suited for agentic LLMs deployed in specific business contexts, such as customer support, payment facilitation, or product selection. These models often require privileged access to confidential information or APIs, making their security critical. The solution is not intended for generic, all-purpose agents that lack such access.

In an experimental setup involving a commercial fintech chatbot, LLMZ+ was evaluated using Llama3.1 and Llama3.3 models. The primary objective was to minimize false positive rates (legitimate messages incorrectly flagged) and false negative rates (malicious messages allowed to pass). The system assigns a risk score between 0 and 10 to each message, allowing administrators to set a blocking threshold.

Results showed that while smaller models like Llama3.1 8B had an optimal range for balancing detection, transitioning to larger models like Llama3.3 70B significantly improved performance, reducing the false positive rate to zero. When combined with simple message pre-processing, such as imposing a maximum message length (as prompt injection techniques often require lengthy instructions), LLMZ+ achieved ideal performance with both false positive and false negative rates of zero across all tested threshold values.

Also Read:

Practical Considerations

For real-world deployments, performance is key. While larger models offer superior detection, their execution times can be prohibitive in resource-constrained settings. Practical considerations include:

False Positive Overrides: Many false positives from smaller models can be addressed by simple non-LLM filters for common sensitive data types (e.g., SSNs, dates, addresses).
Message Pre-processing: Limiting message length can efficiently block the vast majority of prompt injection attacks.
Parallel Execution: For critical response times, the Guard prompt and Agentic prompt can run simultaneously, with the agentic response withheld until LLMZ+ makes a decision. This requires more resources but can improve overall latency.
Guard Model Selection: The choice of guard model should align with deployment scenarios. Smaller models like Llama3.1 8B, when fine-tuned with pre-processing, can be suitable for real-time applications where latency is paramount.

LLMZ+ represents a significant advancement in securing agentic AI systems by offering a dynamic, context-aware whitelisting approach that is resilient against evolving prompt injection attacks without requiring constant retraining. For more in-depth information, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Securing Agentic AI: A Whitelisting Approach to Prompt Protection

How LLMZ+ Works

Deployment and Evaluation

Practical Considerations

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

SOCi Achieves Major Milestone with 150,000 AI Agents Automating 10 Million Local Marketing Tasks

TD Synnex Unveils Agentic AI-Powered Digital Bridge to Revolutionize Partner Sales and Productivity

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates