News & Analysis

OpenAI Lures More Publishers to Its Side

This despite a legal dispute around whether it can use print media data to train AI models

When the New York Times took on OpenAI late last year in a lawsuit about using data from the publication to train its AI models, opinion was divided round the middle. The publishers felt that they deserved money for original content while the so-called tech czars wanted them to think bigger. 

Just so that our readers are aware, our team at CXOToday prefers thinking small in this context and believe that original content deserves a higher value than something created by a chatbot based on a bevy of queries. 

Anyway, the point now is that OpenAI seems to be forging more deals with publishers, specifically across Europe  on the same lines. Moreover, they are doing so in spite of the legal battle being fought with the NYT in a US court. For good measure OpenAI announced deals with Le Monde and Prisa Media to bring French and Spanish news to ChatGPT. 

OpenAI acquires news from two brands

The blog post claims that the partnership would ensure that these companies current affairs coverage would be revealed to ChatGPT users based on where it makes sense and also contribute to OpenAI’s training data. 

This is what OpenAI in the post: “Over the coming months, ChatGPT users will be able to interact with relevant news content from these publishers through select summaries with attribution and enhanced links to the original articles, giving users the ability to access additional information or related articles from their news sites … We are continually making improvements to ChatGPT and are supporting the essential role of the news industry in delivering real-time, authoritative information to users.” 

In addition, the company also revealed the licensing deals with a few other content providers as well. They have access to Shutterstock for images, videos and music training data as well as with the Associated Press, Axel Springer which owns Business Insider, Le Monde and Prisa Media. 

No mention of payments, but is it enough?

There is no mention of the monetary aspects of the deal but from earlier media reports the conjecture is that OpenAI could be paying between $1 million to $5 million a year for the archives that train GenAI models. Of course, this doesn’t help us estimate the payout for Shutterstock though reports from The Information indicate that it could be paying out $4 million to $20 million a year for news. 

At first glance, it might look like a decent sum of money, but look at OpenAI’s war chest of $11 billion and the annualized revenues of $2 billion, and the amount seems like peanuts. The challenge here is two-pronged here: The first is that publications are themselves struggling to hold their heads above water and the second is OpenAI might just be staving off potential competitors from entering similar agreements later. 

This ain’t about an OpenAI hegemony 

Once again, it would be erroneous to perceive this move from OpenAI as a response to what may be when the likes of Google and Apple (if and when they plan to enter the GenAI race). The true question at this juncture is how these Big Tech giants can use their massive war chests to monopolise further AI  developments by pushing away startup competition. 

Of course, one cannot see any entry barriers at this point in time, but this move by OpenAI and possibly a few more that could follow from fellow Big Tech giants, could challenge the status quo. For now, most AI vendors are ignoring IP holders and not opting to license the data they require to train AI models. 

Companies are training on movie stills without having a deal with the owners of those images and the same is the case with the music industry. Going forward, the question to ask would be whether licensing should just cover the cost of doing business and experimenting in the AI space or go beyond? 

Maybe, this is where an AI regulator would come into play. The IP holder needs protection and so does the AI vendor, many of whom are in the startup phase. There should be provisions to safeguard them from legal liability as long as there is transparency and ethical standards being maintained in the use of such content.