OpenAI states that it is ‘impossible’ to develop AI tools such as ChatGPT without the use of copyrighted material.
According to the developer OpenAI, it would not have been possible to develop tools such as the revolutionary chatbot ChatGPT without the use of copyrighted material. The pressure on artificial intelligence companies is increasing due to concerns over the content used to train their products.
Chatbots such as ChatGPT and image generators like Stable Diffusion are “trained” on a vast trove of data taken from the internet, with much of it covered by copyright – a legal protection against someone’s work being used without permission.
The New York Times filed a lawsuit against both OpenAI and Microsoft, a major investor in OpenAI and a user of its tools in their own products. The accusation is that they have illegally utilized the work of the New York Times to develop their own products.
OpenAI stated in their submission to the House of Lords communications and digital select committee that they are unable to train their GPT-4 model, which powers ChatGPT, without access to copyrighted material.
According to OpenAI’s submission, which was first reported by the Telegraph, copyright now applies to a wide range of human creations such as blogposts, photographs, forum posts, snippets of software code, and government documents. This means that it would be impossible to train the most advanced AI models without utilizing copyrighted materials.
The statement stated that restricting training materials to books and drawings that are no longer protected by copyright would not result in effective AI systems. It suggested that using only public domain books and drawings from over a century ago may be an interesting experiment, but would not fulfill the requirements of modern society.
In response to the lawsuit filed by the New York Times last month, OpenAI stated that it values the rights of those who create and own content. When AI companies use copyrighted material, they often rely on the concept of “fair use” in the legal system, which permits the use of content in certain situations without obtaining permission from the owner.
According to OpenAI, copyright law does not prohibit training.
The New York Times’ legal action is just one of many lawsuits brought against OpenAI. In September, 17 authors, including John Grisham, Jodi Picoult, and George RR Martin, filed a complaint accusing OpenAI of deliberate and widespread theft.
Getty Images, the company that possesses a vast collection of photos, has filed lawsuits against Stability AI, the creator of Stable Diffusion, for copyright infringement in both the US and England and Wales. In the US, a coalition of music publishers, including Universal Music, is also suing Anthropic, the company backed by Amazon responsible for the Claude chatbot, for allegedly using numerous copyrighted song lyrics to train its model.
In its submission to the House of Lords, OpenAI stated its support for conducting independent evaluations of its security protocols in regards to AI safety. The organization also expressed its endorsement of “red-teaming,” a process in which external researchers simulate the actions of malicious individuals to assess the safety of a product.
Last year, at a worldwide safety summit in the UK, OpenAI and other companies made a commitment to collaborate with governments in testing the safety of their most advanced models both before and after they are put into use.