Open source software and components can provide useful building blocks to reduce development time and effort. In particular, certain open source software and components are made available for use under open source licences with “copyleft” requirements designed to enable downstream users to, among others, make changes to the software, obtain and modify source code of derivative works, and redistribute copies of the software or modified versions of the software.
Incorporating open source software and components released under copyleft licences may trigger specific source code disclosure requirements for derivative works, where the underlying source code may need to be made freely available (e.g., through offering access through a publicly available network server, accompanied with the object code, or through a written offer). Source code disclosure requirements can be problematic in situations where a company wishes to keep the source code of derivative works (or portions thereof) confidential or proprietary.
When incorporating open source components into proprietary (and confidential) code or creating new works based on open source components, it is important to carefully consider the open source licence terms to avoid inadvertently triggering a source code disclosure requirement.
Open source components can include computer software, structures, and datasets.
Trained neural networks are a special type of computing structure where a “neural network,” comprising computer-based representations of various mathematical functions, is refined through a typically iterative process to “train” the neural network by tuning the computer-based representations of various mathematical functions through shifting various parameter weights (e.g., co-efficients) in the neural network architecture to optimize for a particular outcome or outcomes.
As an example, a neural network can be “trained” over a period of time to distinguish between cat and non-cat images by tweaking the parameter weights up and down in response to a training dataset of cat and non-cat images. After training, the trained neural network has specific parameter weights that represent the neural network’s “understanding” of “a cat” and “not a cat.” The trained neural network can be deployed to receive new unknown images as input, and generate output of a classification indicative of how confident the trained neural network is that a new unknown image is an image of a cat.
Trained neural networks can be particularly useful in high-complexity applications such as machine translations, audio transcription, document classification, weather modelling, among others, where the underlying relationships and dependencies are not known or very complex. How the neural network is trained can have significant impacts on the accuracy and performance of the neural network. A trained neural network can thus be coupled to a processing engine that provides the new inputs for generating outputs using the trained neural network.
It can be very expensive and time-consuming to train a neural network. For example, intricate approaches may be required to train the neural network (e.g., specific training steps, training approaches), and expensive and/or painstakingly collated data sets may be required to accurately train the neural network. For example, the training set could include raw data that had to be manually labelled and classified, or non-public data that had to be purchased or collated from third parties.
The trained neural network itself can be a valuable component of a particular software project, and it may be desirable to keep it confidential.
When using open source components to develop trained neural networks, care needs to be taken to avoid inadvertently triggering open source disclosure obligations if the trained neural network itself is to be kept proprietary and confidential.
For example, portions of the other software that are being used with or accompanying the trained neural network may have open source components embedded therein with copyleft requirements that may attach to the trained neural network.
Given complexities of development processes and system architectures, it is unclear whether the scope of open source licences covers trained neural networks. For example, it is unclear whether trained neural networks and their specific parameters would be considered a “modified work” or “derivative work” requiring source code disclosure, or merely “data” that may not be within the scope of the licence disclosure requirements. A company should also document and track the development of trained neural networks when involving open source components.
It is important to note the question of whether open source obligations extend to the trained neural network is an open area of dispute in on-going litigation in respect of disclosure requirements of a neural network supporting a chess engine, and until there is further clarity, the following steps may aid in a future determination of whether open source obligations extend to the trained neural network.
- Segregate, for example, into a separate library, the source code relating to the operation of the trained neural network from other components of the system, such as the processing engine that receives the inputs and provides the outputs for downstream usage.
- Avoid compiling a combined binary having both the trained neural network and the open source components together inside the final binary object. Rather, utilize fully separately compiled binaries that interact with one another.
- If possible, separate the actual underlying weight parameters of the trained neural network into a separate file to be referenced by the main processor engine, such that if needed, the main processor engine having the open source components can be functional using a different trained neural network.
- If possible, maintain the trained neural network on separate infrastructure (e.g., implementing a cloud-hosted model or a SaaS model) such that a customer does not run or store locally on the customer’s devices the trained neural network.
Accordingly, it would be prudent to carefully consider open source licence terms when contemplating using open source components for developing a trained neural network. There are many different open source licences with varying permissions, obligations and limitations, making it important to identify all contemplated open source components and their corresponding licences. Otherwise, a company may inadvertently trigger a source code disclosure requirement for a valuable, confidential trained neural network.