TLDR: This research paper proposes a multi-level strategy for deepfake content moderation, specifically tailored to meet EU regulations like the AI Act and DSA. It highlights the limitations of current single-method approaches due to the rapid evolution of deepfake technology and the sheer volume of online content. The proposed strategy combines initial marker-based categorization with a multimodal detection system that integrates both technical (e.g., AI-based anomaly detection) and trusted (e.g., human fact-checking) methods. It also incorporates a ‘downstream risk’ assessment and a scoring system to guide transparent labeling of content as ‘Deepfakes’, ‘Verified’, ‘Untrustworthy’, or ‘Trustworthy’. The paper discusses challenges such as misclassification trade-offs, regulatory ambiguities, and enforcement needs, advocating for a holistic, adaptable, and collaborative approach to safeguard democratic societies.
The proliferation of deepfake technologies, which are AI-generated media designed to mimic real persons, objects, or events, poses significant risks to democratic societies, particularly in areas like political communication on online platforms. These sophisticated fakes can manipulate public opinion and spread misinformation, as seen in recent geopolitical events and elections. Recognizing this growing threat, the European Union (EU) has introduced regulations, notably the Artificial Intelligence Act (AI Act) and the Digital Services Act (DSA), to mandate transparency for AI system providers, deployers, and online platforms regarding deepfake content.
However, the rapid evolution of deepfake technology and the sheer volume of content on online platforms present substantial challenges to effective content moderation and enforcement of these regulations. Current methods for identifying deepfakes often fall short, as no single approach offers a universal solution. This research paper, titled A Multi-Level Strategy for Deepfake Content Moderation under EU Regulation, by Max-Paul Förster, Luca Deck, Raimund Weidlich, and Niklas Kühl, addresses these challenges by proposing a comprehensive multi-level strategy.
Understanding EU Regulations and the Need for a New Approach
The AI Act, set to be fully enforced from August 2026, requires providers of AI systems that generate deepfakes to mark their output in a machine-readable format. Similarly, deployers of AI systems must mark content to disclose its artificial origin. The DSA, on the other hand, places transparency obligations on very large online platforms (VLOPs) and very large online search engines (VLOSEs), requiring them to label deepfake content. A key difficulty for VLOPs is that they cannot always trace the origin of content from third parties, especially if it comes from outside the EU or is intentionally deceptive. This necessitates robust detection frameworks.
Existing methods for handling deepfakes can be broadly categorized into marking methods, technical detection methods, and trusted detection methods. Marking methods, such as metadata, frequency component, cryptographic, and statistical watermarking, embed information directly into the content. While some are robust, none are entirely immune to manipulation, and they require constant adaptation. Technical detection methods, including artefact-based and undirected approaches (like classification and anomaly detection), use algorithms to identify anomalies. These methods are scalable but often require vast amounts of training data, can suffer from performance loss with new deepfake technologies (concept drift), and their “blackbox” nature can make their reasoning difficult to trace.
Trusted methods, conversely, leverage human intelligence, either through expert-based fact-checking or crowd-sourced fact-checking. Experts can consider context, but they are scarce and not efficient for large volumes of content. Crowd-sourced methods offer scalability and diverse expertise but require sufficient incentives and are vulnerable to external manipulation. The paper highlights that individual methods alone are insufficient to meet both regulatory and practical demands.
The Proposed Multi-Level Strategy
To overcome these limitations, the researchers propose a multi-level strategy that combines the strengths of existing methods. The first level involves pre-categorizing content based on embedded markers. Positive markers indicate deepfakes, while negative markers authenticate real content. For this to be effective, a verified certification chain is crucial to prevent tampering.
When markers are absent, the second level employs a multimodal approach that integrates both technical and trusted detection methods. Technical methods, particularly undirected ones, are highly scalable and can detect a majority of deepfakes with high certainty. Trusted methods become vital when technical methods fail or when the content carries a high potential for harm. For instance, VLOPs could implement collective signing systems for content moderation or engage experts for complex cases. The strategy also incorporates an assessment of “downstream risk,” classifying content based on its potential impact, especially for sensitive areas like political communication.
The ultimate goal is not just detection but effective labeling. The multi-level strategy includes a scoring system that combines technical detection results, trusted detection insights, and downstream risk assessment. This system assigns a score that guides the final classification and labeling of content. This leads to four potential labeling levels: “Deepfakes” (for positively marked content), “Verified” (for negatively marked content), “Untrustworthy,” and “Trustworthy” (based on the scoring system). This transparent scoring and labeling mechanism aims to build greater trust in content moderation processes.
Also Read:
- Fair-FLIP: Balancing Accuracy and Equity in Deepfake Detection
- Navigating AI’s Future: Technical Pathways for Halting Dangerous Development
Challenges and Future Outlook
While promising, the multi-level strategy faces challenges. Misclassification, leading to false positives (real content labeled as deepfake) or false negatives (deepfakes labeled as real), can erode trust. The paper suggests that the scoring system allows platforms to balance these trade-offs, but determining the optimal threshold requires further research into the costs associated with each type of misclassification. False negatives pose risks to democracy, while false positives can infringe on freedom of speech.
Another challenge lies in the vague wording of current EU regulations regarding the scope of labeling obligations. The paper argues that multi-level labeling systems, including the verification of real content, might be necessary to truly achieve a “safe, predictable, and trusted online environment” as envisioned by the DSA, especially for high-risk content like political communication.
Finally, effective enforcement of these transparency obligations requires proactive measures from authorities, such as random sample tests and collaboration with online platforms to refine and test different configurations of detection methods and scoring systems. The researchers conclude that their multi-level strategy is a crucial building block in making democratic society more resilient to the risks posed by deepfakes, emphasizing the ongoing need for interdisciplinary collaboration and improved media literacy among users.


