TLDR: A new research paper proposes using large pre-trained multi-modal AI models for deepfake detection. It finds that intermediate layers of these models contain unique “digital fingerprints” that effectively distinguish real from fake content across images and audio, outperforming existing methods and even identifying the source of the fake.
In today’s digital age, generative AI models are creating incredibly realistic images, audio, and text. While these tools have many positive applications, they are unfortunately also being exploited by malicious users to spread misinformation and create “deepfakes.” This has led to an urgent need for reliable tools that can detect synthetic content, especially as new generative models emerge constantly.
Most existing deepfake detectors are trained to identify fakes from specific types of generators or data, meaning they often fail when faced with new or different kinds of synthetic media. This limitation makes it difficult to create a “universal” detector that works across various generative models and data types.
A new research paper, Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics, proposes a novel approach to tackle this challenge. The authors, Tom Or and Omri Azencot, suggest using large, pre-trained multi-modal models, which are AI systems designed to understand and process different types of data like images and audio simultaneously. Their key insight is that the “latent code” – the internal representations these models learn – naturally contains information that can distinguish between real and fake content.
The core hypothesis of this research is that the most effective features for deepfake detection are found not in the very first or very last layers of these multi-modal models, but rather in their intermediate layers. Think of it like this: the initial layers might capture very basic details, while the final layers focus on high-level meanings. The “sweet spot,” according to the researchers, is in the middle layers, where there’s a balance of both low-level and high-level information, making them ideal for spotting subtle digital fingerprints left by generative models.
To test this, the researchers conducted several experiments. They visualized the data using a technique called t-SNE, which showed a clear separation between real and fake content when using features from intermediate layers, unlike the entangled data seen in the first and last layers. They also trained simple linear classifiers, like Support Vector Machines (SVMs), on these intermediate features. These classifiers achieved state-of-the-art results across various modalities, including images and audio, and proved to be computationally efficient and fast to train, even with limited data.
The paper demonstrates that their method not only excels at general deepfake detection but also offers advanced capabilities. For instance, it can perform clustering-based detection, where similar fake images group together, making them easier to identify. Furthermore, the approach can even help identify the specific generative model that created the synthetic content, which could be useful for copyright issues or improving targeted detection methods.
Also Read:
- AI-Driven Fraud Threatens Identity Verification Systems
- Leading AI Agents Vulnerable: Security Flaws Exposed in Major Red Teaming Competition
This work represents a significant step towards developing more robust and universal deepfake detection tools. By leveraging the inherent properties of large pre-trained multi-modal models and focusing on their intermediate layers, the researchers have opened new avenues for combating the spread of synthetic misinformation across different data types and generative technologies.


