We need clarity about the deals between AI companies and news publishers. Here’s why

“What might be good for the parties involved isn’t necessarily beneficial to either industry, let alone society at large,” argues our researcher Felix Simon
OpenAI logo and AI Artificial Intelligence words are seen in this illustration taken, May 4, 2023. REUTERS/Dado Ruvic/Illustration

OpenAI logo and AI Artificial Intelligence words are seen in this illustration. REUTERS/Dado Ruvic/Illustration

10th October 2024

2024 has been deal-making time for AI companies and news publishers. The Financial Times, Conde Nast, the Texas Tribune, Der Spiegel – according to data from the Tow Center at Columbia University at least 26 international publishers (depending on how you count) have struck licensing deals with (at times several) AI companies such as OpenAI, Microsoft and Perplexity since the start of the year. This is not counting various agreements about training programmes or the joint development of tools, or deals struck between AI firms and other companies such as Reddit, Stack Overflow or academic publisher Informa.  

That we should eventually see such deals – and so many of them – is not surprising. Disruptive technologies and their developers have a habit of testing and pushing the boundaries of copyright. And so it is in the case of AI. 

At the heart of the tension between AI companies and publishers is the issue of whether the former can scrape and use copyrighted content without permission to train their systems. Their defence is that this constitutes fair use and is necessary for innovation and improving AI capabilities. Publishers, meanwhile, argue that this practice violates intellectual property rights. Historically, what has often happened in such disputes between copyright holders and inventors is that a concord is eventually reached: Enter 2024, and the AI deals between publishers and companies. 

On the face of it, these deals are undoubtedly good for both sides involved, at least in the short term. 

AI companies get training data, which they need not only for more elemental training purposes of new model iterations but also to provide up-to-date responses to user queries. These agreements also lower their risk of being sued by even more publishers over copyright infringements and using their data without permission. 

For the publishers, the same deals provide some welcome cash in the here and now and the promise of greater control over how their content will be used and presented. And there is, of course, the hope that a closer relationship will give them an edge over their competitors when it comes to the adoption of cutting-edge AI (if that is actually happening is a different matter). 

But what might be good for the parties involved isn’t necessarily beneficial to either industry, let alone society at large. And despite the growing number of agreements, the exact conditions of these deals are shrouded in mystery. Apart from what is contained in press releases and the odd statement or nugget of information in industry coverage, these deals might as well just contain the phrase ‘We agree that company A pays company B no less than X amount of money for their data, thank you very much’ and we wouldn’t know any better. 

Why the terms matter

Of course, it is well within the right of these companies to keep the details secret. This does, however, not mean that this cannot be problematic both for themselves and for others. 

For one, it is difficult to independently assess if the terms of these deals are fair, for example, if they accurately reflect not just the current value of the data but also the future “transfer value” – what the data is worth beyond the specific purpose which it is used for in the first instance. In the case of AI, this question looms large. How should one price the fact that a model trained with e.g. publisher’s data does not only get better at factual accuracy or providing news-like summaries, but also improves more generally across other tasks? 

Of course, the companies will have tried to calculate this. But without outside scrutiny, it’s hard to say if these calculations are fair or useful – or if one side gets pulled over the barrel. It would ultimately be in the interest of both sides if independent bodies could give their verdict. This would also help in setting industry standards.  

Not knowing how much training data is worth also makes it potentially more difficult for smaller market participants to assess the terms of any deals they are being offered. While a large publishing group might be able to hire an economic consultancy which will do the maths for them (the usefulness of which is, again, hard to assess without knowing more about these calculations in the first place), smaller outlets might not be able to afford such a luxury. Instead, they risk being presented with a fait accompli with little room for negotiation.

The same is true for smaller AI startups, too, which might want to acquire training data. Without accurate knowledge about the actual worth of such data, they not only run the risk of being overcharged – in the light of a (so far) muddy legal situation it could also provide another incentive to not bother in the first place and simply scrape away.  

Finally, policymakers and regulators – and the academics whose work ideally helps inform their decisions, even though the reality is often more messy and decidedly less utopian – have a hard time, too, without a better understanding of the details of these agreements. They cannot accurately assess the benefits and downsides of these partnerships and the wider shape of the transforming information ecosystem, nor the wider ramifications for society. 

Understanding the power shifts that arise from the current AI boom in the space of information – and figuring out which regulatory adjustments are required, if any, to account for the same – is more difficult if you do not know what you are dealing with. But without a better understanding of the terms and conditions, winner-take-most dynamics could calcify – in both directions. 


This article originally appeared on NIKKEI Digital Governance. It represents the views of the author, not the view of the Reuters Institute.

Join our free newsletter on the future of journalism

In every email we send you'll find original reporting, evidence-based insights, online seminars and readings curated from 100s of sources - all in 5 minutes.

  • Twice a week
  • More than 20,000 people receive it
  • Unsubscribe any time

signup block

Join our free newsletter on the future of journalism

In every email we send you'll find original reporting, evidence-based insights, online seminars and readings curated from 100s of sources - all in 5 minutes.

  • Twice a week
  • More than 20,000 people receive it
  • Unsubscribe any time

signup block