TLDR: EditTrack is a new framework that detects if a suspicious image was created by AI editing a specific base image and identifies the AI model used. It works by re-editing both the base and suspicious images with candidate AI models and comparing the results using various similarity metrics, then classifying the outcome. This method significantly outperforms existing techniques, offering a crucial tool for digital forensics against deepfakes and copyright infringement.
The rapid advancements in AI image editing models have opened up incredible possibilities for transforming existing images into high-quality outputs with simple language instructions. From changing a dog into a cat in an artwork to replacing a person in a photo, these tools empower users to create diverse visual content. However, this powerful technology also brings significant concerns, particularly regarding copyright infringement and the creation of deepfakes. Imagine an artist’s work being subtly altered by AI and then claimed by someone else, or a deepfake image being used to spread misinformation. These scenarios highlight a critical need for tools that can identify and trace AI-assisted image modifications.
Addressing these growing concerns, researchers Zhengyuan Jiang, Yuyang Zhang, Moyang Guo, and Neil Zhenqiang Gong from Duke University have introduced EditTrack, a groundbreaking framework designed to detect and attribute AI-assisted image editing. Unlike previous methods that primarily focus on whether an image was AI-generated or edited in general, EditTrack specifically tackles the problem of determining if a suspicious image was derived from a particular base image using an AI editing model, and if so, identifying which model was responsible.
The Challenge with Existing Approaches
Current methods for detecting AI-generated images fall short when it comes to this specific problem. They can tell if an image is AI-edited, but they cannot establish a link to a specific base image. Furthermore, while some attribution methods exist, they often overlook the unique characteristics introduced during the editing process relative to the original base image, limiting their effectiveness.
How EditTrack Works: A Novel Re-editing Strategy
EditTrack is built upon four key observations about how AI image editing models behave:
Robustness: If an image is edited with a model and a prompt, using a semantically similar prompt with the same model will produce a very similar edited image.
Stability: If an image has already been edited by a model, re-editing it with the same model and a similar prompt will result in an image that remains highly similar to the already edited one.
Variety: Different editing models will produce distinct results even when given the same base image and editing prompt.
Dissimilarity: If a suspicious image was not originally derived from a base image by a particular model, that model will not be able to reproduce the suspicious image from the base image.
Leveraging these insights, EditTrack employs a clever “re-editing” strategy. Given a base image and a suspicious image, it first uses a captioning model to describe the differences between them, creating a “proxy” editing prompt. Then, for each candidate AI editing model, EditTrack applies this proxy prompt to both the original base image and the suspicious image, generating two new “re-edited” images.
The core idea is this: if the suspicious image was indeed created from the base image by a specific model, say Model X, then the re-edited images produced by Model X (from both the base and suspicious images) should look very similar to the suspicious image. In contrast, re-edited images from other models, or from Model X if the suspicious image was not originally from the base, would be comparatively dissimilar.
Quantifying Similarity with Multiple Metrics
To accurately measure this similarity, EditTrack doesn’t rely on just one measure. Instead, it uses six complementary metrics across three categories:
Structural Similarity: Measures how the overall structure and composition align (e.g., geometric layout).
Semantic Similarity: Captures high-level conceptual content, like whether images depict the same objects or scenes.
Pixel-value Similarity: Evaluates low-level visual correspondence, focusing on pixels, colors, and textures.
These metrics generate a comprehensive set of features for each base-suspicious image pair. EditTrack then feeds these features into a multi-class classifier, which is trained to identify either the specific editing model responsible or to determine that the image was not edited from the base.
Impressive Performance
The researchers conducted extensive evaluations using five state-of-the-art AI editing models and multiple datasets. EditTrack consistently achieved high detection and attribution accuracy, significantly outperforming various baselines, including direct prompting or fine-tuning of large vision-language models (VLMs), watermark-based methods, and other AI-generated image detection techniques. The studies also confirmed that using both re-edited base and suspicious images, as well as combining all six similarity metrics, contributes to EditTrack’s superior performance.
Also Read:
- New Defense Against Deepfakes: Disrupting Diffusion Model Editing
- Tackling Multimodal Misinformation: A Unified Detection System
Looking Ahead
EditTrack represents a significant step forward in digital forensics and content verification in the age of AI. By providing a reliable post-hoc mechanism to trace AI-assisted image editing, it offers a crucial tool for artists, law enforcement, and anyone concerned with the authenticity of digital media. Future work aims to extend this framework to detect and attribute AI-assisted editing in text and video content, further strengthening our ability to navigate the complexities of AI-generated media.


