spot_img
HomeResearch & DevelopmentUnmasking AI-Generated Images: A System for Detection and Explanation

Unmasking AI-Generated Images: A System for Detection and Explanation

TLDR: A new research paper introduces an explainable system for detecting AI-generated images, even low-resolution ones with adversarial perturbations. It combines a lightweight classifier, “Faster-Than-Lies,” with a Vision-Language Model (Qwen2-VL-7B) to achieve 96.5% accuracy. The system localizes artifacts using autoencoder-based reconstruction error maps and provides human-understandable explanations, making it suitable for deployment on local or edge devices with fast inference times.

In an era where artificial intelligence can create incredibly realistic images, distinguishing between genuine and AI-generated content has become a significant challenge. A new research paper, titled “Explainable Detection of AI-Generated Images with Artifact Localization Using Faster-Than-Lies and Vision–Language Models for Edge Devices,” introduces an innovative system designed to tackle this problem, not just by detecting fake images but also by explaining why they are deemed inauthentic.

The paper, authored by Aryan Mathur, Asaduddin Ahmed, Pushti Amit Vasoya, Simeon Kandan Sonar, Yasir Z, and Madesh Kuppusamy from the Indian Institute of Technology Palakkad, India, outlines a system that combines a lightweight convolutional classifier, dubbed “Faster-Than-Lies,” with a powerful Vision–Language Model (VLM), specifically Qwen2-VL-7B. This combination allows for the classification, localization, and explanation of artifacts in 32×32 resolution images.

Addressing Key Challenges in AI Image Detection

The researchers faced several unique hurdles, particularly when working with low-resolution 32×32 pixel images. Most existing research focuses on high-resolution imagery, making traditional methods ineffective. The goal was to develop a model that was not only accurate but also lightweight and efficient enough to be deployed on local or edge devices, with an inference time of approximately 200 milliseconds on CPUs and a size of around 98 MB.

Another critical aspect was explainability. Simply labeling an image as fake isn’t enough; understanding the underlying reasons builds trust. This required integrating a fine-tuned VLM capable of translating technical findings into natural language explanations, even for low-resolution inputs. The team also had to contend with adversarial perturbations—deliberate alterations designed to fool detection systems—which significantly impacted model performance.

The System’s Core Components and Methodology

The system’s detection capabilities are powered by “Faster-Than-Lies,” a convolutional network chosen after extensive evaluation of various models like EfficientNet and Vision Transformers. Despite having slightly more parameters than initially targeted (29.8 million vs. 23 million), it achieved an optimal balance of speed and accuracy, with an inference time of 175 ms on an 8-core CPU.

For explainability and artifact localization, the researchers selected Qwen2-VL-7B. This VLM proved most effective in handling low-resolution images and generating meaningful explanations. To enhance its performance, especially for localization, an autoencoder was trained on real images to reconstruct them. When presented with an image, areas that could not be properly reconstructed (indicating artifacts) generated higher pixel-wise loss, which was then plotted on attention maps to highlight the artifact regions. This visual aid significantly improved the VLM’s ability to understand and explain anomalies.

The dataset used for training was an extended CiFAKE dataset, augmented with images from models like Stable Diffusion, PixArt, GigaGAN, and Adobe Firefly, totaling 1.2 million images. To counter adversarial perturbations, the training dataset was further enriched with synthetic perturbations such as Gaussian noise, motion blur, and adversarial noise, making the model more robust.

Categorizing Artifacts and Generating Explanations

The team meticulously categorized 70 visual artifact types into eight semantic groups, including Geometric and Structural Anomalies, Texture and Surface Issues, Lighting and Reflection Problems, and Anatomical and Biological Anomalies. When a fake image is detected, a CLIP Encoder first identifies the top three most likely artifact categories. These categories are then fed to the Qwen2-VL-7B model, which performs a detailed visual analysis and generates a textual explanation of the detected artifacts.

Also Read:

Performance and Future Outlook

The system achieved an impressive 96.5% accuracy on the extended CiFAKE dataset, even with adversarial perturbations. The Faster-Than-Lies classifier maintained a rapid inference time of 175 ms on 8-core CPUs, making it suitable for edge deployment. While the Qwen2-VL-7B model had a longer inference time (5.189s per image on an L40s GPU), the overall system demonstrated practical potential for real-world applications.

The researchers acknowledge limitations, such as the VLM’s dependency for explanations and the need to incorporate more AI image sources and perturbation types. Future work aims to extend the approach to variable-resolution datasets, enhance localization precision using diffusion-based reconstruction, and evaluate cross-domain generalization for applications in forensics, industrial inspection, and social media moderation.

This research marks a significant step towards building trust in visual content by providing not just detection, but also clear, human-understandable explanations for AI-generated imagery. For more details, you can read the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -