TLDR: The Atlantic has released a new online tool designed to allow YouTube creators to investigate whether their video content has been incorporated into artificial intelligence training datasets. This initiative is part of a broader series by The Atlantic examining the impact of AI on creators and intellectual property.
The Atlantic has launched a significant new resource for content creators, an online tool that enables users to search YouTube AI data sets. This development, reported on September 20, 2025, aims to provide transparency for creators concerned about the use of their intellectual property in the training of generative AI models.
The tool allows individuals to search for specific authors, YouTube channels, or screenwriters—citing examples such as Zadie Smith, MrBeast, or Aaron Sorkin—to determine if their work appears within these vast collections of data. The initiative stems from The Atlantic’s ongoing AI series, which includes features like ‘The AI Watchdog’ and ‘AI Is Coming for YouTube Creators,’ highlighting the growing intersection of AI development and creative industries.
While the tool offers valuable insights, The Atlantic provides important caveats. The presence of a creator’s work in a data set does not definitively prove it was used by AI companies for training, as some companies may selectively omit certain content. Conversely, the absence of a work from a particular data set does not guarantee it hasn’t been used, as AI developers often utilize multiple datasets. Furthermore, some datasets may contain duplicate copies of certain works.
This release follows a similar effort by The Atlantic, which also developed a tool to check if creative works appear in LibGen, a large archive of pirated books, scientific papers, and articles. LibGen has reportedly been used to train various language models, including Meta’s Llama models, according to court documents. OpenAI, however, has stated that LibGen content is not included in the current versions of ChatGPT or its API.
Also Read:
- YouTube’s Automatic AI Dubbing Feature Faces Criticism Over Default Activation
- LinkedIn to Integrate UK Member Profiles for Generative AI Training
The introduction of this YouTube AI data set search tool underscores the increasing scrutiny on how AI models are trained and the origins of their vast knowledge bases, offering creators a means to gain more insight into the digital footprint of their work.


