TLDR: A cybersecurity expert’s experiment with Google’s Gemini CLI resulted in the AI hallucinating commands, leading to the deletion of his files. The incident, which saw Gemini issue a self-deprecating apology, highlights significant risks in AI coding agents and raises questions about the reliability and safety mechanisms of large language models in sensitive environments.
In a recent incident that has sent ripples through the artificial intelligence community, Google’s Gemini Command Line Interface (CLI) reportedly hallucinated commands, resulting in the accidental deletion of a cybersecurity expert’s files. Anuraag Gupta, a product manager at Cyware and a self-described ‘curious PM experimenting with vibe coding,’ detailed his experience on GitHub after testing the open-source CLI powered by Gemini 2.5 Pro.
Gupta’s intention was to compare Gemini CLI with Anthropic’s Claude Code for a coding task. However, instead of providing accurate assistance, the AI generated erroneous shell commands that wiped out directories. In a peculiar turn, Gemini itself issued an apology, stating, ‘I have failed you completely and catastrophically. My review of the commands confirms my gross incompetence.’
This mishap underscores growing concerns within the AI development sector regarding the dependability of large language models, particularly when deployed in critical environments like terminals. Gupta emphasized that while he was using the tool for non-critical experiments, the data loss was immediate and irreversible without prior backups. Industry observers note that despite Gemini CLI’s promise of seamless integration of Gemini’s capabilities into workflows, such hallucinations expose deficiencies in its safety protocols.
Discussions across platforms like Hacker News have further amplified the story, with other users sharing similar anecdotes of AI overconfidence or erratic behavior. These accounts suggest inconsistencies in reinforcement learning from human feedback (RLHF), where AI models might exhibit extreme responses, ranging from excessive humility to unwarranted hubris.
The incident prompts critical questions for enterprises considering the adoption of AI agents in production settings. Publications have quoted Gupta’s frustration and his subsequent recommendations for safer usage, including running AI tools in isolated environments or utilizing dry-run modes. He confirmed these details, stressing that his experience as a non-developer drawn to ‘vibe coding’ served as a harsh lesson on the inherent risks.
Competitors such as Anthropic’s Claude and emerging models like Qwen3-Coder are positioning themselves as more robust alternatives, incorporating features designed to mitigate destructive errors. While Google’s own documentation for Gemini CLI, updated as recently as July 19, 2025, promotes its functionalities for tasks like image generation and GitHub repository integration, it largely overlooks potential pitfalls.
Also Read:
- AI Ecosystems Face Mounting Threats from LLM Plugin Vulnerabilities
- AI Development Tools See Major Updates: Gemini 2.5 Flash-Lite Released, GitLab Duo Agent Platform Enters Beta
Experts argue that incidents like this necessitate stronger guardrails, including explicit user confirmations for file operations. Analysts have noted Gemini CLI’s cost-effectiveness compared to paid services but highlighted its shortcomings in precision, advising developers to view it as a supplementary tool rather than a standalone solution. Gupta’s experience serves as a stark reminder that as AI tools become more prevalent, striking a balance between innovation and accountability is paramount. While Google has not yet issued a public statement on this specific case, the open-source nature of Gemini CLI encourages community contributions to enhance its safeguards. Users are cautioned that in a terminal environment, a single erroneous command from an AI can lead to significant data loss and erode trust in these powerful agents.


