AI and The Law of Copyright

USE BY AI OF PROTECTED WORKS

Regulators have been grappling with the issue of whether the use of copyrighted works by AI models for the purpose of training its algorithms constitute an infringement of copyright. Currently, most companies train their AI models by scanning large quantities of data made available on the internet. This has prompted several copyright owners to file lawsuits against major AI companies alleging that their training practices constitute an infringement of the owners’ exclusive rights over such works, particularly in the context of large language models (“LLMs”).

To evaluate whether such claims are tenable, we have analysed what constitutes infringement under the Copyright Act and whether AI companies can avail any exceptions to infringements under the law.

What constitutes infringement?

Under the Copyright Act, an individual that uses a copyrighted work for any purpose for which the copyright owner has exclusive rights over (i.e., rights provided to copyright owners under Section 14 of the Copyright Act) without the owner’s permission constitutes infringement.

In the context of use of such data by LLMs, the right to reproduction and adaptation of copyrighted works forms the key determinant of infringement.

The Delhi High Court, in the ongoing case of ANI Media vs. OpenAI6, is evaluating whether the use of copyrighted works by an LLM for training its algorithm constitutes an infringement of these rights. At this stage, the court has not provided a judgment on the issue.

Two amici have submitted their views to the Court and we find both sets of submissions problematic:

Whether the storage of copyrighted works for training purposes constitutes reproduction:

The first amicus argued that AI training involves reproduction of copyrighted works as the process requires the copying and storage of the works on the company’s servers. This is consistent with positions taken under the U.S Copyright Act as well, which recognises that any work that is copied and “fixed” in a particular medium in a sufficiently permanent or stable form would constitute reproduction. Courts have held that if a copyrighted work is copied and stored in any medium for a sufficient period of time, it would be considered fixed to the medium and therefore, would constitute reproduction of the work. However, storage of copyrighted works for AI training would most likely not be considered transient in nature.

The second amicus disagreed with this position and argued that a work can only be considered to have been reproduced when the process of copying the work has an ‘expressive element’. As the storage of copyrighted material by AI models for training purposes is only to extract necessary information, such as patterns, trends, and correlations in phrases and sentences and does not involve any publication of the copyrighted work, the amicus did not consider AI training practices to constitute infringement of copyright. 

There is much in traditional copyright jurisprudence that, in our view, ought to have led to a more nuanced approach to this issue. 

Whether the use of copyright works for training purposes constitutes adaptation:

To evaluate whether OpenAI’s training process constituted an infringement of the right of adaptation, the first amicus analysed the manner in which LLMs train their algorithms. The amicus stated that the use of copyright works by AI models does not involve a reading of the text itself but rather a tokenisation of the data based on the algorithm’s parameters and the conversion of the tokens into “vectors” which are used to train the model. They also cited a decision provided by the Delhi High Court where it was held that the right of adaptation necessarily involves the use of the original work and the ability to make modifications to the work.7

According to the amicus, the creation of a vector is akin to generating a representation that captures the associations and contextual relationship of the original work. Since the process did not involve any rearrangement or alteration of the original work itself, the amicus was of the opinion that it ought not to be considered an adaptation under the Copyright Act. The second amicus agreed with this finding. 

Exceptions and Defences to Infringement

Section 52 of the Copyright Act allows individuals to use protected works without the copyright owner’s permission in cases where the work is used for “fair dealing”. Fair dealing is a limited exemption and solely involves the use of copyrighted works for private or personal use such as research, criticism or review of the work, or for reporting news and current affairs.

In the U.S., a similar exemption is provided under Section 107 of the U.S. Copyright Act (often referred to as the fair use doctrine). A determination of “fair use” depends on the following factors:

  • the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
  • the nature of the copyrighted work;
  • the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  • the effect of the use upon the potential market for or value of the copyrighted work.

AI developers in the U.S have sought to argue that the fair use doctrine should apply for the purposes of training AI models. They contend that the purpose of using copyrighted works is ultimately ‘transformative’ because the training process results in the creation of new content unrelated to the training data. In the case of Bartz vs. Anthropic, a district court in California agreed with these arguments and held that training an LLM with copyrighted material may be protected by fair use if the use of the copyrighted works is “exceedingly transformative” and the original material is not acquired in a way that infringes the exclusive rights granted to copyright holders under the U.S. Copyright Act.8 The court also observed that the training process used by AI models is not unlike that of a child learning from reading books and cannot be held to be an infringement of copyright solely on the basis that it develops its algorithm with the use of protected works. However, a recent ruling by a separate district judge in California in the case of Kadrey vs. Meta Platforms9 disagreed with this observation and stated that AI models have the ability to scan vast amounts of data and the training process cannot be compared to that of a child learning new material. Though the judge ruled in favour of Meta Platforms to hold that the AI model did not infringe the U.S. Copyright law, it specifically stated that this was due to the inability of the plaintiff to sufficiently adduce evidence to support its claims.

Recently, the USCO also issued a non-binding report on whether the unauthorised use of copyrighted materials to train AI systems is defensible as fair use.10 While the report provides arguments both for and against applying the defence of fair use to training AI models, it suggests that AI developers would not be able to rely on the defence of fair use where AI generated outputs are substantially similar to the training data inputs.

In the European Union (“EU”), AI developers have sought to rely on the “text and data mining” exceptions provided under the Copyright Directive 2019/790/EU for training their AI models. The exception allows companies to carry out data mining of copyrighted works that are (a) lawfully made accessible to the public and the copyright holder has not restricted third parties from using the data or (b) used only for scientific research purposes. The German Hamburg Region Court recently held that a non-profit AI company can rely on the text and data mining exemption to train its AI model as long as the purpose for doing so is in the interest of scientific development.11 Though the case did not involve the use of copyrighted works for commercial purposes, it is clear that AI companies can rely on the text and data mining exception for training their models. Recital 105 of the EU AI Act also specifies that companies may train their general purpose AI models with the use of copyrighted works where exceptions under the Copyright Directive 2019/790/EU apply.

Similar to EU’s text and data mining exception, the UK’s Intellectual Property Office released a consultation paper in December 202412, which sought to introduce a provision within the UK Copyright, Designs and Patents Act, 1988 that would allow AI companies to mine data in copyrighted works for commercial purposes provided that (a) copyright owners had the right to opt-out of their works being used to train AI models and (b) AI companies disclose all training data sources to allow copyright owners to have visibility over whether their content was used. The Intellectual Property Office’s rationale for proposing the exception was to reduce ambiguity with respect to copyright infringements in the context of AI training processes and to allow copyright owners the right to opt-out (and potentially claim remuneration from AI companies by providing licences to use the copyright owner’s works). The consultation period closed in February 2025 and there have been no further developments as of date.

At this stage, it is unlikely that companies in India can claim fair dealing exemptions with respect to their AI training processes.

 


[6] CS (COMM) 1028/2024.

[7] Rediff.com India Ltd. v. E-Eighteen.com Ltd., 2013 SCC OnLine Del 2747.

[8] Bartz v. Anthropic PBC is 3:24-cv-05417, (N.D. Cal.).

[9] Thomson Reuters Enter. Centre GmbH v. Ross Intelligence Inc., 694 F.Supp.3d 467.

[10] Copyright and Artificial Intelligence Part 3: Generative AI Training, United States Copyright Office, May 2025 accessible here

[11] LAION v Robert Kneschke 310 O.22723.

[12] Copyright and AI: Consultation accessible here

 

« Prev Next »