TLDR: VelLMes is a new AI-based deception framework that uses Large Language Models (LLMs) to create highly interactive and realistic honeypots for various network services like SSH, MySQL, POP3, and HTTP. Evaluated through unit tests, an experiment with 89 human attackers (where ~30% were deceived), and real-life internet deployments, VelLMes demonstrated its effectiveness in simulating services and engaging attackers, responding correctly to over 90% of commands in real-world scenarios.
In the evolving landscape of cybersecurity, deception plays a crucial role in understanding and mitigating threats. Traditional deception systems, often called honeypots, aim to lure attackers into controlled environments to study their tactics without risking real systems. However, many existing systems are limited in their realism and interaction capabilities, often simulating only a single service like an SSH shell.
A new research paper introduces VelLMes, an innovative AI-based deception framework designed to overcome these limitations. VelLMes leverages Large Language Models (LLMs) to create highly interactive and realistic simulations of multiple network protocols and services, including SSH Linux shell, MySQL, POP3, and HTTP. These simulated services can be deployed as honeypots, offering a versatile toolkit for cybersecurity professionals.
What Makes VelLMes Unique?
VelLMes stands out due to its ability to simulate a variety of services with high fidelity. Unlike systems that might offer static or pre-scripted responses, VelLMes uses LLMs to generate dynamic and realistic interactions. This means that when an attacker interacts with a VelLMes honeypot, the responses feel authentic, making it harder for them to distinguish it from a real system. The framework is designed with human attackers in mind, prioritizing interactivity and realism.
A key safety feature of VelLMes is that all outputs are LLM-generated, meaning no actual commands are executed on a real system. This eliminates the risk of attackers breaking out of the sandbox environment, a common concern with more complex honeypot deployments.
How VelLMes Works
The core of VelLMes’s realism comes from careful “prompt engineering.” For each simulated service, a “personality prompt” is crafted and passed to the LLM when an interaction begins. This prompt instructs the LLM on how to behave, ensuring its responses are consistent with a real service. Techniques like chain-of-thought reasoning and providing examples are used to guide the LLM’s behavior. To maintain consistency across sessions, each interaction is saved, and this history is fed back to the LLM if an attacker reconnects.
For instance, the SSH Linux shell component, called shelLM, uses a fine-tuned GPT-3.5 model to simulate a Linux environment, responding to commands and generating realistic file systems. The MySQL honeypot is instructed to act as a command-line client for an IT company’s database, creating engaging tables and information. Similarly, POP3 simulates an email service with detailed message headers, and HTTP acts as an internal printer server, capable of generating HTML pages that can be rendered by a real browser.
Rigorous Evaluation
The researchers conducted three types of evaluations to assess VelLMes’s capabilities:
-
Generative Capabilities (Unit Tests): To test if LLMs could realistically simulate protocols, unit tests were developed. These tests checked for expected substrings, output length, and consistency. Results showed that LLMs, especially GPT-4 and fine-tuned GPT-3.5, could successfully simulate various services, with some achieving a 100% passing rate. The way conversation history was handled significantly impacted performance.
-
Deception Capabilities (Human Attackers): An experiment with 89 human attackers was conducted using the shelLM honeypot. Attackers were randomly assigned either a real Ubuntu system or a shelLM honeypot and tasked with exfiltrating a secret key. Approximately 30% of the attackers who interacted with the LLM-based honeypot believed they were interacting with a real system. This highlights the potential of LLMs in cyber deception, even with potential biases introduced by the experimental setup.
-
Real-Life Attacks (Internet Deployment): Ten instances of the shelLM honeypot were deployed on the internet for five days to capture real-world attacks. Out of 2,825 attack sessions, 151 interactive sessions were analyzed. The shelLM honeypot responded correctly to 98.91% of the 276 executed commands, demonstrating its robustness against unstructured and unexpected attacks. It even managed to engage a human attacker to manually inspect the system.
Also Read:
- AutoPentester: Advancing Cybersecurity with Fully Automated LLM-Powered Penetration Testing
- AgentTypo: Exploiting Visual Vulnerabilities in Multimodal AI Agents Through Typographic Prompt Injection
Conclusion
VelLMes represents a significant step forward in AI-based cyber deception. By leveraging LLMs, it creates highly interactive and realistic honeypots for multiple protocols, proving effective against both human and automated attackers. The framework’s open-source release aims to contribute to the cybersecurity community. Future work will focus on improving LLM responses, incorporating more protocols, and designing even more robust evaluation experiments to minimize bias. You can find the full research paper here: VelLMes: A high-interaction AI-based deception framework.


