LLMs

Navigating Licensing in Open Source LLMs

Jun 5, 2024

5 min read

Over the past few months, we have been fine-tuning existing large language models on our dataset and using our optimization techniques. But how far have we gotten in using these models legally for personal and professional projects? From Source Model usage to Dynamic Training upon User-Collected Data, Licensing is the hour’s topic.

In the following blog, we at Tune.ai have tried to explain to developers worldwide how to legally use their large language models and how to avoid using Intellectual Property that may land them in trouble.

The Promise and Peril of Open Source LLMs

Open Source Communities have existed for decades. Their simple definition is code made publicly available by the developers for the community to play around with, use, and contribute to. The code itself is protected by licenses, which vary in form and size and dictate the capacity to which you can use the project. 

Open Source LLMs have allowed the community to understand on a code level what is happening in these bright black boxes of systems called proprietary large language models. A deeper understanding of data’s role in these models has allowed the entire community to hold the system liable for biases and develop this technology as a joint venture.

Try Open Source Large Language Models on Tune Chat!

However, with it comes its very own perils in the form of code reliability. Proprietary systems hold the capacity to control the monopoly over their product due to the reliability of the product, which may not be true around products built around open-source large language models, which quickly stop being relevant to their paid counterparts with unexpected changes to the project’s open-source license, taking away thousands of jobs at some instances.

Licensing 101

Licenses in Large Language Models are the legal gateways for an organization, individual, or group of enthusiasts to a legal framework that governs to which capacity you can modify, use, distribute, or share parts or the entirety of the large language model. This licensing can be done in two significant ways: Permissive and Copy-Left License.

Permissive Licensing

Permissive Licenses, as the name suggests, offer a great deal of flexibility in the terms under which the user can use the licensed product. These licenses are famous for their lenient terms and allow users to do almost anything with the software, including modifying, distributing, and using it for commercial requirements.

Some examples of Permissive Licenses are:

  • BSD License: Berkeley Software Distribution Licenses are a family of Open Source Licensing, which ranges from various clauses to healthily promoting open source software.

  • MIT License: This license offers an array of uses, such as free use and sale of the software, while also making the original author accessible from any liability associated with the licensed product.

  • Apache License: The license is aimed at software that needs to be available for use immediately in enterprises, giving them the ability not to disclose their code modifications and granting them patent rights, making it perfect for a product that enterprises might be interested in using.

Some examples of successful Open Source Large Language Models using Permissive Licenses are:

Explore Open Source Large Language Models and their Licenses on LLM Explorer.

Copyleft License

Copyleft licenses are another family of licenses for Open-Source software that differ from permissive licenses in one major stance: the software and all its derivatives shall remain accessible to all users. These licenses also tie all its derivatives to the same license, i.e., derivatives are offered for free.

In addition to the above conditions, the license ensures that the source code (or model weights in the case of large language models) must be made available and that all its derivatives perform under the same conditions.

Some examples of Copyleft Licenses are:

  • GNU General Public License (GPL): This license is one of the stricter ones, restricting the use of any other permit on its derivatives.

  • Creative Commons Attribution-ShareAlike (CC BY-SA) is a flexible copyleft license that allows for adaptation and redistribution as long as the License terms are carried forward with proper credit attribution to the original author.

Some examples of successful Open Source Large Language Models using Copyleft Licenses are:

As we explore and evolve licensing in Open-Source LLMs, we have yet to understand the underlying legal compliances with such a development endeavor. We invite you to stick around as we cover License Compatibility, Derivative Works, Attribution Requirements, and a breakdown of famous case studies involving Intellectual Property in GenAI.

Conclusion

In this blog, we have covered the various Licenses used by open-source large Language Models to raise awareness of their usage and promote healthy development in the sector. Stay tuned as we dive deeper into the landscape of Open-Source Generative AI models in our next blog, discussing Multi-Model LLM Implementations.

Along the blog, we look at famous LLMs that you can use immediately for your hobby project or enterprise applications. All of them can be done simply through clicks on Tune Studio, so head over there to start fine-tuning, deploying, and playing with LLMs.

Written by

Aryan Kargwal

Data Evangelist

Edited by

Abhishek Mishra

DevRel Engineer