The AI Training Conundrum: Why We Can’t Apply The Old Rules
There’s a heated debate about the ethics of training AI using copyrighted material. And, at first, it looks like a clear-cut case of greedy corporations profiting off others without compensating them. But, is this way of thinking failing to grasp the realities of the AI age?
ARTIFICIAL INTELLIGENCE
Oliver Cook
10/20/20236 min read
Right now, there’s a heated debate about the ethics of training AI using copyrighted material. And, to many, it seems like a clear-cut case of greedy corporations profiting off the hard work of others without compensating them. But, I believe most people are failing to grasp the realities of the situation we are now in. Simply put, they are looking at the issue, and the wider world, through a now completely obsolete lens.
Medium says it’s not fair
What got me thinking about this? Well, yesterday I got an email from Medium, the online publishing platform, proudly informing me that it is “blocking AI companies from training their bots” on my writing. And, as someone who earns their living from writing, a part of me really wants to agree with Medium CEO, Tony Stubblebine, when he said:
"AI companies have nearly universally broken the fundamental rules of fairness. They are making money on your writing without asking for your consent, nor are they offering you compensation and credit. There's a lot more one could ask for, but these '3 Cs' are the minimum."
At first, it seems like a no-brainer. Of course, if big corporations are going to ‘make money’ from my writing, then I want a piece of the pie. I’d bet the majority of creatives agree. But, in reality, I believe Mr Stubblebine has got something very fundamentally wrong, and he’s not going to like hearing the truth. Because, as much as I want to believe that the future could be a creative utopia where all of us are fairly compensated for our work, it simply isn’t possible. And, I’m not talking about technicalities of paying royalties and things like that, but rather because of something at the most fundamental level.
But, humans are also constantly training on all material they encounter
The truth is, that the training of AI is no different from the training of humans. And, I mean all of us. Everything that every single one of us creates is the result of a lifelong training process. We have been trained by everything we’ve ever seen, heard, felt, tasted, smelt, done. Everything we’ve ever experienced. Of course, most of us aren’t ever aware of this training, but it most certainly is happening. It is also known as learning. Occasionally, when we are aware of this learning process immediately before having an idea, we’ll refer to it as inspiration. And, sometimes, like when we’re at school or college, we’re forced to try and document our ideas in an iterative manner. But, the truth is that none of us live in a void and nothing we ever come up with hasn’t taken from things that came before. From the moment we’re born, we’re being trained. Just like from the moment AI is ‘live’ it is being trained.
Going back to Medium for a moment, every single writer on that platform, including yours truly, has been trained by everything they have ever read, watched, and listened to. So, by the logic of those who feel they should be asked for consent, and compensated and credited by AI companies, all of those writers should do the same for everything they trained themselves on. Now, I understand some of the things they’ve been trained on will be books they purchased, so in those cases, they did compensate the other writer. But, really, as a sum total of the information that trained any of us, the books we purchased are going to be a tiny fraction of a percent. And, that goes even for book nerds like myself who’ve spent a fortune on real books and ebooks over the years.
Are we seriously supposed to monetarily compensate everyone responsible for every input? It would be impossible and bring everything to a grinding halt. It would be insane, right? But, that is exactly what people like Mr Stubblebine are proposing - though I doubt they’re actually aware of the implications of their stance.
Another example of a company failing to grasp the new reality is Getty Images, which earlier this year sued Stability AI for what it calls the “brazen infringement” of its intellectual property. According to Getty, the AI company used over 12 million of Getty’s copyrighted images to train its Stable Diffusion product. The training consisted of the AI model looking at the images and their metadata. But, here’s the thing. I can go to Getty’s website and look at that data too. You can. Anyone can. Of course, we can’t download high-resolution watermark-free images to use for things that are covered by the various licenses without paying. But, we can definitely go and browse the vast library of images. And, when we do that we are all training ourselves on those images. For example, if I draw a picture of an 1890s vintage British steam locomotive five years from now, then the inputs I absorbed from looking at relevant pictures on Getty’s site will be in play.
The core issues aren’t new, it’s just the scale and speed that’s changed
In fact, I’ve been very aware of this phenomenon for most of my life, because I’m also an artist, illustrator, and photographer. When you’re in those fields, you can’t help but be aware of how other artists influence your own work. Yes, you try to avoid copying, but inevitably every new artwork is actually a result of everything that the artist trained their minds on before they created it. Of course, some things will have a much bigger influence than others or at least a more detectable influence, but it is an undeniable truth.
I mean, let’s be honest. Writers have always been battling plagiarism, and trying to combat image thieves has long been a part of a visual creative’s life - especially in the internet age. But, when we’re talking about training, whether it’s training AI or training humans, we’re not talking about copying or theft, we’re talking about creating new content that is inspired by and draws upon that which has gone before - granted, sometimes heavily.
So, if AI is essentially just doing what humans have always done, and that process is critically fundamental to creativity itself, then how can we start treating it differently? How can we really start applying one logic to it and another to humans, when they are doing the same thing? Especially when we humans are increasingly integrating AI technology into our own learning/training processes. To do so will inevitably end up creating a chaos of contradictions. And, really, the world already has enough of that.
Maybe, everyone is just missing the point. They are trying to apply logic and morality from a different era - one that, even if you don’t like to acknowledge it, is gone forever. Maybe the real issue here is not how AI is being trained, but the speed and scale of AI learning and creation is so vastly faster and bigger than the human scale, that it is starting to cause severe trauma to the status quo. Could it be that businesses like Medium and Getty will just have to face the fact that their comfortable reality can’t exist anymore? Do writers, artists, and other creatives just have to accept that AI is only doing what they themselves have always done, but almost infinitely faster?
I think they do.
AI is just doing exactly what humans have always done, but on a scale and at a speed that is overwhelming our ability to adapt.
Whether or not companies like Stability AI were right to let the technology loose on the world in the first place is another matter altogether. As much as I’m fascinated by AI, I suspect it is simply too powerful for our current society to cope with. The main issue is the sheer speed - the exponential nature of its advancement. Humans have taken millions of years to evolve to the point we’re at, and our institutions advance at a snail’s pace. But that’s a topic for another post, and one I’ve already discussed before.
For now, I’m just saying that any attempt to use a different set of ethics for AI and human training/learning is doomed to failure.