TLDR: A group of prominent authors has filed a lawsuit against Microsoft in New York federal court, alleging the tech giant used nearly 200,000 pirated books to train its Megatron AI model. The lawsuit claims copyright infringement, seeking significant damages and an injunction, highlighting the escalating legal battles over intellectual property in AI development.
Microsoft is facing a significant legal challenge as a group of high-profile authors has filed a lawsuit accusing the company of illegally using their copyrighted works to train its artificial intelligence models. The complaint, lodged in a New York federal court, alleges that Microsoft leveraged a dataset of nearly 200,000 pirated digital books to develop its ‘Megatron’ AI system, a generative model designed to produce human-like text.
Among the plaintiffs are Pulitzer Prize winner Kai Bird, New Yorker writer Jia Tolentino, and authors Daniel Okrent and Victor LaValle. They contend that Microsoft’s Megatron AI model, which responds to text prompts, was built on unlicensed literary material and is capable of mimicking the syntax, voice, and themes of the original copyrighted works. The authors argue that their creative output was exploited without consent or compensation, enabling Microsoft to gain commercially at the expense of creators and rightsholders.
The lawsuit seeks substantial statutory damages of up to $150,000 for each infringed work, along with a court order to prevent Microsoft from continuing its alleged unauthorized use of their materials. While the exact number of infringed titles is still being determined, the plaintiffs assert that thousands of authors’ works may have been used without proper licensing or attribution.
Microsoft has not yet issued a public response to these allegations. An attorney representing the authors has also declined to comment on the ongoing litigation.
This case is the latest in a series of high-stakes legal actions targeting major technology firms over the use of protected content in training generative AI tools. It follows closely on the heels of recent rulings in similar lawsuits against AI firms like Meta Platforms and Anthropic. Just days prior to the Microsoft filing, a federal judge in California ruled that while AI firm Anthropic might have made ‘fair use’ of copyrighted works for training, it could still be held liable for sourcing those works through piracy. Similarly, a US court ruled that Meta’s training of AI models on copyrighted books fell under the ‘fair use’ doctrine, though the specifics of data sourcing remain a point of contention in the broader legal landscape.
Also Read:
- Microsoft Restructures Workforce, Redirects Billions Towards AI Infrastructure Amidst Strategic Shift
- UK Courts Grapple with AI Misuse: Lawyers Referred to Regulators Over Fabricated Case Citations
The lawsuit against Microsoft underscores the intensifying debate surrounding copyright law and AI development, particularly concerning what constitutes ‘fair game’ for training data. The outcome of this case could set a significant precedent for how courts address intellectual property rights in the rapidly evolving field of artificial intelligence.


