When can I train AI with public content?

David Rosenthal, partner at VISCHER, lecturer at the Department of Humanities, Social and Political Sciences at ETH Zurich, lecturer at the Faculty of Law at the University of Basel

Thursday, 31 October 2024, 17.15

The training of AI models is based on the use of data that will improve the output of the respective models. For this reason, it is important to think about the origin of the data used right from the start. This includes knowing how it was collected, where it was obtained from, and which licence terms they are subject to. Data protection aspects should also be taken into account before the training programme begins: Has the data been sufficiently anonymised, and can the results no longer be traced back to individuals? Of course, you should also be aware of the legal conditions of the data set before feeding it into the AI model.

At the upcoming 17:15 Colloquium by the ETH Library, David Rosenthal will explain on which legal basis published content can be used for the training of AI models, especially large language models, and what needs to be considered. He will place a special focus on questions of copyright and licence agreements.

Presentation slides

Please note the following when consulting the Download presentation (PDF, 1 MB) (in German) by Mr David Rosenthal:

The legal opinion expressed by D. Rosenthal in the slides does not necessarily reflect that of ETH Zurich.

The legal situation regarding artificial intelligence is currently unregulated and controversial. One particularly sensitive issue is the training of AI models with existing data, which can violate copyrights and personal rights on a large scale. Similar questions may arise when applying AI. There are various legal opinions on this subject; D. Rosenthal's is just one of them, there are also contrary and other opinions.

If you are involved in a specific research project at ETH Zurich in which AI is also trained with public data/data available on the internet, please consult the Legal Office in good time; if you are working with external partners, consult the Research Contracts Group.

ETH Zurich is responsible for such projects, which is why only its own legal and risk-related assessment may be decisive.

If you are employed at a different institution, contact the legal department of your institution.

Risks such as the unintentional violation of any existing copyrights of data available on the internet must always be assessed on a case-by-case basis.

__________________

This text is a translation from German and is provided for information purposes only. In case of doubt, the German-language version is decisive.

David Rosenthal has broad experience in advising and representing national and multinational clients in the areas of data protection and other aspects of data law, technology law, AI, eDiscovery, technology arbitration and internal investigations. He studied law at the University of Basel and initially worked as a software developer, ran an independent press office in Basel and provided his own legal advice. In 2001, he joined Homburger, one of the high end Swiss commercial law firms, as counsel, where he became co-head of its IT practice group.

On June 1, 2020, he became partner at external page VISCHER, one of the leading Swiss business law firms with a particular strong expertise in regulatory and TMT (Technology, Media and Telecommunications) matters. David authored numerous publications on data protection law in Switzerland , is a frequent speaker at events and lectures at the ETH Zurich and the University of Basel. He is secretary of the external page Association for Corporate Data Protection (VUD) and the external page Cross-border eDiscovery Privacy & Investigations Association (CeDIV) and on the board of the external page Swiss Forum of Communications Law (SF-FS).