Unmasking Infostealer Campaigns: How AI Analyzes Digital Crime Scene Screenshots

TLDR: A new research paper introduces a novel method using Large Language Models (LLMs) to analyze screenshots from infostealer infections. By processing these ‘digital crime scene’ images, the LLM (gpt-4o-mini) can identify malicious URLs, installer files, and social engineering tactics used by infostealers like Aurora. The study extracted hundreds of actionable indicators from 1000 screenshots, revealing common lures like cracked software and gaming mods, and distribution strategies via YouTube and Google Ads. It successfully mapped three distinct campaigns (Blitz Java, Zero MidJourney, Snow Microsoft 2022), demonstrating the LLM’s potential to transform passive forensic evidence into actionable threat intelligence, despite some challenges in browser tab analysis.

Infostealers are a dangerous type of malware designed to steal sensitive information like login credentials, session cookies, and personal data from infected computers. In 2024 alone, over 29 million instances of stolen data (known as stealer logs) were reported. The sheer volume of this data makes it virtually impossible for humans to manually analyze and mitigate these threats effectively.

While much of the cybersecurity research focuses on preventing malware infections, there’s a significant gap in how we reactively analyze the aftermath of an infection, especially by looking at associated artifacts like screenshots. These screenshots, captured at the moment of compromise, offer invaluable clues about how the infection occurred but have largely been overlooked.

A new research paper introduces a groundbreaking approach that uses Large Language Models (LLMs), specifically gpt-4o-mini, to analyze these infection screenshots. The goal is to extract potential Indicators of Compromise (IoCs), map out how infections spread, and track malware campaigns. Focusing on the Aurora infostealer, the study demonstrates how LLMs can process these visual records to identify infection vectors such as malicious URLs, installer files, and even the themes of exploited software.

The researchers successfully extracted 337 actionable URLs and 246 relevant files from 1000 screenshots. By correlating these extracted filenames, URLs, and infection themes, they were able to identify three distinct malware campaigns. This highlights the immense potential of LLM-driven analysis in uncovering the full workflow of an infection and significantly enhancing threat intelligence. This method shifts malware analysis from traditional log-based detection to a reactive, artifact-driven approach, offering a scalable way to identify infection vectors and enable quicker intervention.

How the LLM Analyzes Screenshots

The researchers collected 1,000 screenshots from devices infected with the Aurora infostealer through Flare’s Platform, an IT security company. These images were then encoded and fed into the gpt-4o-mini LLM. The screenshots were categorized into three types: web content, file system, and hybrid, to guide the LLM’s analysis.

A detailed prompt was engineered to instruct the LLM to describe the main content, identify installer files and other programs, list all visible URLs, analyze browser tabs, and highlight any suspicious elements. For example, if a screenshot showed a YouTube video offering a ‘crack’ for antivirus software, the LLM would identify the video, its URL, and flag it as suspicious due to its nature.

LLM Performance and Challenges

The LLM performed exceptionally well in several areas. It achieved 96% accuracy in providing comprehensive general descriptions of the screenshots, showing a strong understanding of the visual context. When files were present, the LLM was 100% accurate in identifying them, even when partially obscured. It also demonstrated strong capability (87%) in detecting suspicious elements, accurately identifying malicious URLs, files, or applications.

However, the LLM faced challenges, particularly with browser tab identification. While it sometimes perfectly identified tabs and infection vectors, its performance was inconsistent, especially with many open tabs or when distinguishing between active tabs and bookmarks. This inconsistency could impact its ability to pinpoint the exact infection vector. Despite this, even when it failed to fully identify tabs, it often still managed to flag suspicious elements within those tabs.

Insights from LLM-Generated Data

From the LLM-generated descriptions, the researchers extracted 337 unique and actionable URLs. These fell into categories like YouTube videos (often containing malicious links in their descriptions), file distribution platforms (like mega.nz or mediafire.com, providing direct access to malware), and other diverse domains (including potential phishing sites).

The LLM also identified 246 relevant files, including executable (.exe) files, compressed archives (.zip, .rar), and dynamic link libraries (.dll). While some filenames were generic, many were highly descriptive (e.g., ‘Microsoft Office Crack 2022.rar’), revealing the specific lures used by attackers.

Common Lures and Distribution Strategies

The analysis revealed two primary themes used to trick users into downloading malware:

Cracked Software: About 28.3% of infections were related to users seeking free or cracked versions of popular software like Adobe Suite, Filmora, VEGAS Pro, Midjourney, and especially Microsoft Office. Attackers exploit the desire to avoid paying for legitimate licenses.
Gaming Mods, Cheats, and Hacks: Around 7.4% of infections involved malware disguised as mods or cheats for popular games like Roblox, Minecraft, and Fortnite. This targets players looking to enhance their gaming experience without paying.

Two main distribution strategies were identified:

YouTube as a Distribution System: Many screenshots showed YouTube videos acting as tutorials, guiding users to download malicious payloads disguised as free software or game cheats. These videos often instruct users to disable their antivirus software, claiming it’s necessary for installation.
Leveraging Google Ads: Threat actors paid for Google ads to promote malicious websites that mimicked official software pages. These fake sites appeared at the top of search results, tricking users who trusted the prominent placement into downloading malware.

Also Read:

Notable Malware Campaigns

The research detailed three significant campaigns:

Blitz Java: This campaign mimicked legitimate Java installations. Users searching for Java downloads clicked on sponsored Google ads leading to fake Java websites (e.g., go.java-gapp.space). Within a rapid 19-hour window, users downloaded a malware-laced ‘Java Client.zip’ and executed ‘Java Setup.exe’, leading to infection.
Zero MidJourney: Capitalizing on the popularity of the AI art platform Midjourney, this campaign used typosquatting domains (e.g., ai.mid-j0urney.org) and Google Ads. Users were lured to fake sites that advised them to disable their antivirus before downloading a supposed ‘beta version’ of Midjourney, which was actually malware.
Snow Microsoft 2022: This campaign distributed a cracked version of Microsoft Office 2022. Victims searched for ‘Microsoft Office 2022 crack’ on YouTube, found a video from the ‘DataStat’ channel, and followed a download link in the description. This led to a password-protected archive (password ‘YUKI’) hosted on MEGA.nz, which contained the infostealer. The password protection helped evade antivirus detection.

This research underscores that simple social engineering remains a primary infection vector, exploiting user trust in search engines, video platforms, and free software. While LLMs excel at extracting IoCs from screenshots, a hybrid approach combining LLM strengths with human analysis might be optimal for complex cases. The visual-based detection method is resilient to code changes, offering a promising avenue for scaling malware campaign tracking and early intervention. For more details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Infostealer Campaigns: How AI Analyzes Digital Crime Scene Screenshots

How the LLM Analyzes Screenshots

LLM Performance and Challenges

Insights from LLM-Generated Data

Common Lures and Distribution Strategies

Notable Malware Campaigns

Gen AI News and Updates

Rubrik Report Reveals Alarming Decline in Cyber Resilience Amidst AI Agent Proliferation

Anthropic Reveals First AI-Orchestrated Cyber Espionage Campaign by Chinese State-Sponsored Group

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates