Where the Output of a generative AI system is the same or substantially similar to a third party’s copyright work (for example, in the case of software code, an image or text):

  • Has the copyright work been copied?
  • If so:
    • Is that an infringement?
    • Which of the AI actors has undertaken the infringement?

The question of how the Output was produced and what role (if any) the original copyright work played in the process may not be a straightforward one to answer. Questions include:

  • Was the AI system trained using the copyright work, or did the system otherwise have access to the work (for example, through direct internet access)?
  • What role did the training (or access) play in the production of the Output?

At the time of publication, none of the main Developers of generative AI systems have disclosed precisely how their systems operate at a detailed technical level, so these systems are currently, to a large extent, a ‘black box’ to Deployers and Users. It is also unclear to what extent the Developer of an AI system would be able to identify exactly how the system generated the Output, and from what training sources.


How Output is generated: A text-based example

Despite the black box problem, the general principles as to how content is produced can be illustrated by taking the example of an AI system that is text-based:

  • During training, the AI system analyzes Input and the probability of words following other words in a specific context.
  • The system then uses this probability (determined during training), and the context that the Prompt provides, to produce the Output.
  • The Output is created by selecting a string of words with the highest probability of occurring one after the other in the context of the Prompt.

Where part of the Output is similar to part of the copyright work, this might be evidence that the copyright work was used as Input; or the similarity might be due to the replicated part of a copyright work following the same pattern as the one determined by the probabilities identified by the generative AI system.

The question of infringement revolves around this weighing exercise, and will depend on how much has been reproduced and how unique that part was.


Although generative AI systems are trained on vast sources of data, the narrower the Prompt is, the more likely the system is to reproduce an exact copy of the underlying source.


Replicating Inputs

Where generative AI has to rely on a single data source in a particularly niche topic (for example, where the Prompt is to write software code to address a particular problem where only one example of such code exists in its training set), the most probable answer to the Prompt will be the replication of that Input data.

Examples of this can be seen in the current public deployment of ChatGPT. If asked to give the lyrics to a song about a paranoid android, the system will create what appears to be a new work (or at least a work that differs each time it is asked the same question), whereas if asked to give the lyrics to the song “Paranoid Android” by the band Radiohead, the system will deliver verbatim the lyrics to that particular song.


Does the Output of a generative AI system containing a part or all of a publicly accessible copyright work infringe copyright?

The evidentiary standard and precise test that needs to be satisfied to determine whether the Output infringes copyright varies from jurisdiction to jurisdiction.

What evidence is required for the Output of a generative AI system containing a part or all of a publicly accessible copyright work to be considered copyright infringement?


Primary infringement: Under Australian law, direct or primary infringement requires copying a ‘substantial part’ of a work, objective similarity to the infringed work and a causal connection between the infringing subject matter and the work.1

The owner of the allegedly infringed work must prove that the infringer has copied their work.2 Case law has explained that this means the owner must establish that the alleged infringer has access to the infringed work and copied it, ‘directly, indirectly, consciously or unconsciously.’3

Secondary Infringement: Secondary infringement or authorizing infringement allows an owner to commence proceedings against a person or entity who authorizes infringement.4 Infringement through authorization is particularly relevant to employers and businesses hosting, ‘making available,’ commercializing or profiting from infringing material.5


Under Canadian law, copyright infringement occurs when a person reproduces all or a substantial part of an original work without the copyright owner’s authorization or the availability of a statutory exception.

In considering what constitutes a ‘substantial part’ of a work, the Supreme Court noted that it is a ‘flexible notion,’ which is ‘a matter of fact and degree.’6

The Copyright Act protects authors against both literal and non-literal copying,7 as long as the copied material forms a substantial part of the infringed work. The substantiality analysis should take on a qualitative and holistic approach to assess substantiality, and a piecemeal approach should be avoided. More specifically, the analysis should focus on whether the copied features constitute a substantial part of the plaintiff's work,and not whether they amount to a substantial part of the defendant's work.8

In Canada, there are two types of copyright infringement:

  1. Direct/Primary Infringement

    This type of infringement includes doing anything that only the owner of the copyright has the exclusive right to do (for example, copying a work or communicating to the public a sound recording without the copyright owner’s consent). Direct infringement can occur even if the infringer does not realize that copying the work infringes someone else’s copyright.
  2. Indirect/Secondary Infringement
    This type of infringement takes place when someone knows (or should have known) that a work or content infringes someone else’s copyright and sells, rents out, distributes or imports the work or content without the owner’s consent.

To establish a claim for copyright infringement, the owner of the copyright must be able to prove the following:

  • Copyright subsists in the work or content in question.
  • They are the owner of the copyright in the work or content.
  • The work or content (or elements of it) was infringed.

Copyright only protects the original expressions of ideas, and does not protect ‘ideas’ themselves. This is relevant to software, as functional ‘ideas’ can be expressed in different ways. A third party may use the ‘idea’ provided by the functionality of the software without ‘copying’ the code of the software.

The following should be borne in mind:

  • It is an open question whether it is an infringement of copyright to train models of an AI system using training data when the owner of the training data does not authorize such use. For example, it is not yet clear whether there is a reproduction of data during the training process to amount to infringement.
  • It is unclear if use of a dataset by an AI tool to generate new Output is an infringement of copyright in the dataset if the Output is similar to a data source and not authorized by the owner of the data source.
  • It is also unclear if an exception applies.

Accordingly, there are infringement risks relating to using copyright works as Input for the system (for example, by training the system) and relating to generating Output by the system that is similar to other copyright works.


According to judicial practice, copyright infringement will mainly be decided based on ‘access’ plus ‘substantial similarity.’

Though still a first instance judgment involving AIGC issued in February 2024, in Shanghai Xinchuang Culture Development Co. v. AI Co. (pseudonym),9 the Guangzhou Internet Court:

  • Took the view that the corresponding image Outputs of the AI drawing tool, after entering simple commands that include a keyword, were substantially similar to the plaintiff's copyright works of art.
  • Found that the defendant who provided the tool for its members did not exercise a reasonable duty of care to respect intellectual property rights.
  • Held that the copyright infringement was established, based on the conventional method for determining copyright infringement of art works.


Counterfeiting is defined as ‘any reproduction, representation or distribution, by whatever means, of a work of the mind in violation of the author’s rights’ (Art. L335-3 of the French Intellectual Property Code). This definition is supplemented by Article L335-2 of the Intellectual Property Code, which provides that ‘any publication of writings, musical compositions, drawings, paintings or any other printed production, engraved in whole or in part, in defiance of the laws and regulations relating to the ownership of authors, is an infringement.’

The good faith or bad faith of the counterfeiter is not an element of copyright infringement. Thus, the counterfeiter will not be able to exonerate themselves by proving their good faith.

The infringement is characterized by the mere reproduction of the characteristic elements of the previous work and would be evaluated in light of the resemblance with that work.

Therefore, the use of part or all of a publicly accessible copyright work could constitute an act of counterfeiting unless such use (including reproduction, representation or distribution, by whatever means) is made with the prior consent of the copyright owner or without commercial intent (for private use).


In cases where the Output incorporates works (or protected parts of works) of third parties, it may infringe the copyright in the incorporated work. Whether the Output infringes the third party's copyright depends on whether the Output is a reproduction of the work (Section 16 of the German Copyright Act) or free use and therefore lawful without the author’s consent (Section 23 (1), sentence 1 of the German Copyright Act).

According to the relevant case law of the Court of Justice of the European Union,10 the criterion is whether the third party’s work is still recognizable in the output in its unique qualities (reproduction) or not (free use). This must be determined on a case-by-case basis. However, indisputably, a 1:1 adoption of a work may be permissible if the adoption constitutes such a small part of the Output that the adopted work is completely absorbed into the Output and is no longer recogniszable to a viewer in the new context.

What is disputed is whether the free use exemption applies if the Output itself is not copyrightable according to the criteria laid out elsewhere in this guide – see Is the output of the generative AI system protected by intellectual property rights? Section 23 (1), sentence 2 of the German Copyright Act refers to ‘the newly created work.’ Therefore, according to prevailing opinion, the Output itself has to be copyrightable (and therefore a ‘work’ within the meaning of the German Copyright Act) in order to rely on the free use exception (otherwise the copyright of the third party who created the adapted work is infringed).

On the issue of evidence, to prove that the Output infringes copyright, the author must generally be able to provide all the facts necessary to prove that its work has not only been used to create the Output, but that it is still recognizable in the Output to the extent that it constitutes a reproduction of the work in question.

Hong Kong

A claimant will have to prove both that: (i) there is copying of the copyright work; and (ii) the allegedly infringing work and the original work are substantially similar.

The difficulty lies in proving the act of copying while the training dataset is unknown. Nevertheless, certain inferences of copying can be drawn from:

  • Whether the Provider, Deployer or User had access to the original work.
  • The degree of similarity between the works.

Independent creation, if successfully established, would be a valid defense to infringement.

The Netherlands

Copyright infringement requires the duplication (reproduction/copying) or disclosure (unauthorized distribution) of (elements of) a copyright work under Article 5 of the Dutch Copyright Act, unless such duplication or disclosure is done with the consent of the copyright owner or without commercial intent (for private use).

In cases where the Output incorporates works (or protected parts of works) of third parties, it may infringe the copyright in the incorporated work. Whether the Output infringes the third party's copyright depends on whether the Output qualifies as a reproduction of the work.

According to the relevant case law of the Court of Justice of the European Union,11 the criterion is whether the third party’s work is still recognizable in the Output in its unique qualities (reproduction) or not. This must be determined on a case-by-case basis.

Under Dutch law, reproduction of a copyright-protected work only occurs if copyright-protected elements are derived from a (individually determined, specific) work. A mere coincidental resemblance does not constitute an infringement of copyright. However, a reversed burden of proof applies here: the person who argues in defense against an infringement claim that the similarity of its work (in this case Output) with the earlier work is purely coincidental and not borrowed will have to prove that there was no question of borrowing, or even of unconscious borrowing. It is currently uncertain whether a User of an AI system can successfully substantiate that the reproduction of the work in the Output was ‘purely coincidental and not borrowed.’


Establishing copyright infringement requires the claimant to show that the defendant copied the claimant’s work. A rebuttable presumption of copying will arise if the claimant can show that: (i) the defendant had prior access to the claimant’s work; and (ii) there is a substantial similarity between these works.

The burden then shifts to the defendant to rebut the inference of copying (for example, by giving alternative explanations for the similarities).

There may be practical difficulties in proving that a Deployer or User of a Generative AI system had access to a copyright work when the Deployer or User was not aware of how the system was trained.

South Africa

Copyright infringement occurs not only when misusing or misappropriating the whole of a work, but also when a substantial part of the work has been misused or misappropriated.

However, actual copying is required. Therefore it would be necessary to prove that the Input contained the original work and that the Output contains a substantial reproduction or adaptation of that original work.


English law requires evidence of actual copying, for example, showing that the alleged infringer had access to, and actually copied, the copyright work in whole or substantial part.

The difficulty is in showing that the Deployer or User of a generative AI system had access to copyright work where the Deployer or User does not have access to all of the training data upon which the system was trained.

The Deployer or User may have referred to publicly available works and might not realize that a copyright work was used at all.

The independent creation of a work, regardless of similarity, does not amount to copyright infringement under English law.


US law requires copying for a finding of copyright infringement. Thus, the AI system must have had access to a copyright work and the Output of the AI system must be “substantially similar” to the copyright work. Access typically occurs when the AI system ingests the work.


The risk that use of any particular Output might give rise to a third-party infringement claim is clearly greatest where the Output is used in a public manner. The risk of third-party claims can be mitigated by measures that could include:

  • For Providers (and potentially for Deployers in enterprise deployments): Fine-tuning the system to train production of Outputs from limited data sources, to minimize the risk of verbatim copies being delivered as Outputs.
  • In respect of Output that is software code: Use of proprietary systems and/or vendors to scan code to detect if (publicly disclosed) third-party proprietary code and/or elements from open-source software (OSS) code are present. This can provide an element of comfort, particularly with Output to be used for commercial purposes. A Deployer could then take requisite steps to mitigate risks following the results of such scanning – for example, by seeking licenses or removing the code from its Output.
  • In respect of other Output content: Use of plagiarism-checking software tools to identify if there are any elements of its Output content that replicate third-party content.
  • General training of personnel on avoiding practices that might tend to increase risk – for example, asking the system to copy or replicate a competitor’s marketing literature or product description.
  • For Deployers, seeking contractual protection (for example, through an indemnity) from the Provider in relation to third-party infringement claims brought against the Deployer. Providers will be reluctant to offer this and a challenge will be drawing a line between the Provider’s responsibility for the infringement and the Deployer’s responsibility.

What are the consequences of the Output reproducing OSS code?

If the Output of the generative AI system includes OSS code, then additional considerations apply. In practice, this is most likely to arise where the system was trained on OSS code, but in theory the system could independently create code that is identical, or substantially similar, to OSS code.

The starting position will be the same as the general infringement position already outlined – unless a license applies, there is a risk that the copyright owner could bring a claim for copyright infringement. However, with OSS code there is obviously the opportunity for the Deployer/User to take the position that the license terms (on which the OSS code was published to the world) apply.


Typical OSS license terms

OSS license terms typically contain certain conditions that must be fulfilled in order for the license to apply:

  • For so-called permissive licenses, these may be limited solely to a requirement that the copyright owner is acknowledged. This may be difficult to comply with if (as is likely to be the case) this information is not delivered as part of the Output, but has to be independently searched.
  • For so-called restrictive or ‘copyleft licences,’ such as GNU General Public License (GPL) Version 3 (the GPL), the conditions for the license are more onerous, including that if the licensee includes the licensed OSS code into their own proprietary code (either verbatim or with modifications and/or translated into another language), and then distributes it, it may be required to license that proprietary code on the same open-source basis and to make the source code available to others.

If the Deployer/User had not itself previously downloaded the OSS code in question in a manner in which it could be said to have already accepted the OSS licende terms, then it is probably unlikely that the OSS license terms could be enforced by the copyright owner against the Deployer/User in a breach of contract claim.

Even if an agreement could be said to be in place, it is unclear whether the agreement could be enforced by the copyright owner to require that the restrictive terms are complied with, as opposed to the owner having the right to sue for infringement, because the conditions of the license have not arisen.


2   EMI Songs Australia v Larrikin Music Publishing (2011) 90 IPR 50; [2011] FCAFC 47.

3   Brennan v Foster Blake [2021] FedCFamC2G 261.

5   Pokemon Company International, Inc v Redbubble Ltd (2017) 351 ALR 676.

6   Cinar Corporation v. Robinson, 2013 SCC 73

7   s.3(1) of Copyright Act

8   Cinar Corporation v. Robinson, 2013 SCC 73

9   Shanghai Xinchuang Culture Development Co. v. AI Co.(pseudonym). (2024) Guangdong 0192 Minchu No. 113.

10   Court of Justice of the European Union, Case ref. C-476/17 – Pelham, ECLI:EU:C:2019:624.

11   Court of Justice of the European Union, Case ref. C-476/17 – Pelham, ECLI:EU:C:2019:624.

Subscribe and stay up to date with the latest legal news, information and events . . .