- cross-posted to:
- technews
- cross-posted to:
- technews
Stephen King: My Books Were Used to Train AI::One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.
I mean, yeah, duh. Just ask any of them to write a paragraph “in the style of INSERT AUTHOR”.
If it can, then it was trained on that author. I’m not sure how that’s a problem though.
We don’t have the legal framework for this type of thing. So people are going to disagree with how using training data for a commercial AI product should work.
I imagine Steven King would argue they didn’t have licenses or permission to use his books to train their AI. So he should be compensated or the AI deleted/retrained. He would argue buying a copy of the book only lets it be used for humans to read. Similar to buying a CD doesn’t allow you to put that song in your advert.
I would argue we do have a legal precedent for this sort of thing. Companies hire creatives all the time and ask them to do things in the style of other creatives. You can’t copyright a style. You don’t own what you inspire.
That’s not what’s happening though. His works are being incorporated into a LLM without permission. I hope he sues the hell out of these people.
But that is what’s happening in the minds of creatives. Reading a book and taking inspiration is functionally the same mechanism that an LLM uses to learn. They read Stephen King, they copy some part of the style. Potentially very closely and for a corporation’s gain if that’s what’s asked of them.
One person being influenced by a prose style isn’t the same as a company using a copyrighted work without permission to train a LLM.
Every learning material a company or university has ever used has been used to train an LLM. Us.
Okay I’m being a bit facetious here. I know people and chat GPT aren’t equivalent. But the gap is closing. Maybe LLMs will never bridge the gap, but something will. I hesitate to write into law now that any work can never be ingested or emulated by another intelligent entity. While the difference between a machine and a human are clear to you now, one day they won’t be.
The longer we hold onto the idea that our brains are somehow magically different from the way computers (are) will learn to think, the harder we’ll get blindsided by reality when they’re indistinguishable from us.
There’s very little a LLM has in common with the human brain. We can’t do AGI yet and there’s no evidence that we will be able to create AGI any time soon.
The main issue as I see it is that we have companies trying to make money by creating LLMs. The people who created the source materials for these LLMs are not only not getting paid, they’re not even being asked permission. To me that’s dead wrong and I hope the courts agree.
Is that illegal though? As long as the model isn’t reproducing the original then copyright isn’t being violated. Maybe in the future there will be laws against it but as of now the grounds for a lawsuit are shaky at best.
There are already laws around what you can’t and can’t do with copyrighted material. If the owners of the LLM didn’t obtain written permission I’d say they are on very shaky ground here.
What laws specifically? The only ones I can find refer to limits on redistribution, which isn’t happening here. If the models were able to reproduce the contents of the books that would be another issue that would need to be resolved. But I can’t find anything that would prohibit training.
What laws specifically?
Existing laws to protect copywritten material.
“AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may consist of existing works such as text and images from the internet. This training process may involve making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.” Creating such copies, without express or implied permission from the various copyright owners, may infringe the copyright holders’ exclusive right to make reproductions of their work.”