A project of cloud word processor Shaxpir, Prosecraft compiled over 27,000 books, comparing, ranking and analyzing them based on the “vividness” of their language.
“I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things,” Smith wrote.
Smith’s Prosecraft was not a generative AI tool, but authors worried it could become one, since he had amassed a dataset of a quarter billion words from published books, which he found by crawling the internet.
“Since I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesn’t require the consent of the original author,” Smith wrote.
But tools like ChatGPT are basically trained on the sum total of the internet, so this means that real travel writers or children’s books authors could be getting inadvertently plagiarized.
“I don’t think any writer is seriously convinced that AI is going to ruin books because like, well, that’s not how literature works, and everything I’ve seen ChatGPT write as a ‘story’ is just really fucking boring with no voice or real craft or style,” Masad said.
This is the best summary I could come up with:
A project of cloud word processor Shaxpir, Prosecraft compiled over 27,000 books, comparing, ranking and analyzing them based on the “vividness” of their language.
“I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things,” Smith wrote.
Smith’s Prosecraft was not a generative AI tool, but authors worried it could become one, since he had amassed a dataset of a quarter billion words from published books, which he found by crawling the internet.
“Since I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesn’t require the consent of the original author,” Smith wrote.
But tools like ChatGPT are basically trained on the sum total of the internet, so this means that real travel writers or children’s books authors could be getting inadvertently plagiarized.
“I don’t think any writer is seriously convinced that AI is going to ruin books because like, well, that’s not how literature works, and everything I’ve seen ChatGPT write as a ‘story’ is just really fucking boring with no voice or real craft or style,” Masad said.
I’m a bot and I’m open source!