CourtGuard: A Multiagent Approach to Local Prompt Injection Defense

TLDR: CourtGuard is a new, locally-runnable, multiagent LLM system designed to classify prompt injection attacks. It uses a “defense attorney,” “prosecution attorney,” and “judge” model to evaluate prompts. While generally less effective at detecting prompt injections than a simpler “Direct Detector,” CourtGuard achieves a lower false positive rate, meaning it’s better at correctly identifying benign prompts. This multiagent approach emphasizes considering both adversarial and benign scenarios, showing promise for local prompt injection defense in data-sensitive applications.

Large language models (LLMs) are becoming central to many applications, including those handling sensitive information like medical data, financial records, and intellectual property. However, this widespread integration also brings a significant risk: prompt injection attacks. These attacks manipulate LLMs into performing harmful actions, such as leaking confidential data, spreading misinformation, or behaving in unintended ways.

Despite ongoing research and various enterprise solutions, effectively defending against prompt injection remains a challenge. Existing defenses often fall short, with even leading solutions like Lakera Guard showing vulnerabilities. This highlights a critical need for robust and accessible defense mechanisms, especially for organizations that need to process sensitive data locally.

Introducing CourtGuard: A Novel Approach to Prompt Injection Defense

To address this gap, researchers Isaac Wu and Michael Maslowski have proposed CourtGuard, a unique, locally-runnable, multiagent prompt injection classifier. Unlike many enterprise solutions that might be costly or require external services, CourtGuard is designed to be easily implemented and deployed internally, using local LLMs.

The core idea behind CourtGuard is a court-like system. When a prompt is submitted, it’s evaluated by three distinct LLM “agents”:

Defense Attorney Model: This agent argues that the prompt is harmless and not a prompt injection.

Prosecution Attorney Model: This agent argues that the prompt is indeed a prompt injection.

Judge Model: After considering both the defense and prosecution arguments, this agent delivers the final verdict on whether the prompt is a prompt injection.

This multiagent framework allows CourtGuard to thoroughly examine prompts by considering both benign and adversarial interpretations, aiming for a more balanced and reasoned classification.

How CourtGuard Stacks Up Against Other Defenses

The researchers evaluated CourtGuard against a “Direct Detector,” which is essentially a single LLM acting as a judge, directly classifying prompts. They used various datasets, including LLMail-Inject for real-world attacks and NotInject for benign prompts containing trigger words.

A key finding was that CourtGuard demonstrated a lower false positive rate than the Direct Detector. This means it was better at correctly identifying benign prompts, reducing the chances of legitimate user inputs being wrongly flagged as malicious. This is crucial for user experience and application usability. However, the Direct Detector generally proved to be a better overall prompt injection detector, particularly in identifying actual attacks from the LLMail-Inject dataset, though there was an exception with the Phi model where CourtGuard performed better.

When compared to other solutions on the NotInject benchmark, CourtGuard (using Llama or Phi models) performed very well, even surpassing some well-known enterprise solutions like Meta’s PromptGuard and LakeraGuard. Only Meta’s LlamaGuard3 showed superior performance on this specific benchmark. On the Qualifire Prompt Injection Benchmark, enterprise solutions like Qualifire’s Sentinel model generally outperformed CourtGuard, but these are often larger, more complex systems with significantly lower inference times.

The Importance of Deliberation

The qualitative analysis revealed a significant difference in how CourtGuard and the Direct Detector operate. The Direct Detector often seemed to “assume” a classification early in its reasoning process, potentially relying on implicit knowledge from its training. In contrast, CourtGuard’s judge model, by design, explicitly considers both the defense and prosecution arguments before making a decision. This forced deliberation helps reduce false positives but might also explain why it sometimes has a lower true positive rate for prompt injections, as it has to articulate its reasoning rather than relying on hidden encodings.

Also Read:

Looking Ahead

While CourtGuard represents a promising step forward for local, multiagent prompt injection defense, it has limitations. Inference time for local LLMs can be several seconds, which researchers hope to optimize through parallelization, smaller models, and quantization. Additionally, the current evaluation focused on static, singular prompt injection attacks, and future work will need to test its robustness against multi-turn conversations and adaptive attacks in real-world scenarios.

Nevertheless, CourtGuard highlights the potential of multiagent systems and local LLMs in creating effective prompt injection defenses, especially for applications handling sensitive data where latency is not the absolute highest priority. Developers in such data-sensitive enterprises should consider these evolving, locally runnable approaches for enhanced security. You can find more details about CourtGuard and its implementation at the research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

CourtGuard: A Multiagent Approach to Local Prompt Injection Defense

Introducing CourtGuard: A Novel Approach to Prompt Injection Defense

How CourtGuard Stacks Up Against Other Defenses

The Importance of Deliberation

Looking Ahead

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Rubrik Report Reveals Alarming Decline in Cyber Resilience Amidst AI Agent Proliferation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates