VelLMes: An AI Framework for Realistic Cyber Deception

TLDR: VelLMes is a new AI-based deception framework that uses Large Language Models (LLMs) to create highly interactive and realistic honeypots for various network services like SSH, MySQL, POP3, and HTTP. Evaluated through unit tests, an experiment with 89 human attackers (where ~30% were deceived), and real-life internet deployments, VelLMes demonstrated its effectiveness in simulating services and engaging attackers, responding correctly to over 90% of commands in real-world scenarios.

In the evolving landscape of cybersecurity, deception plays a crucial role in understanding and mitigating threats. Traditional deception systems, often called honeypots, aim to lure attackers into controlled environments to study their tactics without risking real systems. However, many existing systems are limited in their realism and interaction capabilities, often simulating only a single service like an SSH shell.

A new research paper introduces VelLMes, an innovative AI-based deception framework designed to overcome these limitations. VelLMes leverages Large Language Models (LLMs) to create highly interactive and realistic simulations of multiple network protocols and services, including SSH Linux shell, MySQL, POP3, and HTTP. These simulated services can be deployed as honeypots, offering a versatile toolkit for cybersecurity professionals.

What Makes VelLMes Unique?

VelLMes stands out due to its ability to simulate a variety of services with high fidelity. Unlike systems that might offer static or pre-scripted responses, VelLMes uses LLMs to generate dynamic and realistic interactions. This means that when an attacker interacts with a VelLMes honeypot, the responses feel authentic, making it harder for them to distinguish it from a real system. The framework is designed with human attackers in mind, prioritizing interactivity and realism.

A key safety feature of VelLMes is that all outputs are LLM-generated, meaning no actual commands are executed on a real system. This eliminates the risk of attackers breaking out of the sandbox environment, a common concern with more complex honeypot deployments.

How VelLMes Works

The core of VelLMes’s realism comes from careful “prompt engineering.” For each simulated service, a “personality prompt” is crafted and passed to the LLM when an interaction begins. This prompt instructs the LLM on how to behave, ensuring its responses are consistent with a real service. Techniques like chain-of-thought reasoning and providing examples are used to guide the LLM’s behavior. To maintain consistency across sessions, each interaction is saved, and this history is fed back to the LLM if an attacker reconnects.

For instance, the SSH Linux shell component, called shelLM, uses a fine-tuned GPT-3.5 model to simulate a Linux environment, responding to commands and generating realistic file systems. The MySQL honeypot is instructed to act as a command-line client for an IT company’s database, creating engaging tables and information. Similarly, POP3 simulates an email service with detailed message headers, and HTTP acts as an internal printer server, capable of generating HTML pages that can be rendered by a real browser.

Rigorous Evaluation

The researchers conducted three types of evaluations to assess VelLMes’s capabilities:

Generative Capabilities (Unit Tests): To test if LLMs could realistically simulate protocols, unit tests were developed. These tests checked for expected substrings, output length, and consistency. Results showed that LLMs, especially GPT-4 and fine-tuned GPT-3.5, could successfully simulate various services, with some achieving a 100% passing rate. The way conversation history was handled significantly impacted performance.
Deception Capabilities (Human Attackers): An experiment with 89 human attackers was conducted using the shelLM honeypot. Attackers were randomly assigned either a real Ubuntu system or a shelLM honeypot and tasked with exfiltrating a secret key. Approximately 30% of the attackers who interacted with the LLM-based honeypot believed they were interacting with a real system. This highlights the potential of LLMs in cyber deception, even with potential biases introduced by the experimental setup.
Real-Life Attacks (Internet Deployment): Ten instances of the shelLM honeypot were deployed on the internet for five days to capture real-world attacks. Out of 2,825 attack sessions, 151 interactive sessions were analyzed. The shelLM honeypot responded correctly to 98.91% of the 276 executed commands, demonstrating its robustness against unstructured and unexpected attacks. It even managed to engage a human attacker to manually inspect the system.

Also Read:

Conclusion

VelLMes represents a significant step forward in AI-based cyber deception. By leveraging LLMs, it creates highly interactive and realistic honeypots for multiple protocols, proving effective against both human and automated attackers. The framework’s open-source release aims to contribute to the cybersecurity community. Future work will focus on improving LLM responses, incorporating more protocols, and designing even more robust evaluation experiments to minimize bias. You can find the full research paper here: VelLMes: A high-interaction AI-based deception framework.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

VelLMes: An AI Framework for Realistic Cyber Deception

What Makes VelLMes Unique?

How VelLMes Works

Rigorous Evaluation

Conclusion

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates