This is the best summary I could come up with:
As reported by Adweek, the NYT updated its Terms of Service on August 3rd to prohibit its content — inclusive of text, photographs, images, audio/video clips, “look and feel,” metadata, or compilations — from being used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”
The updated terms now also specify that automated tools like website crawlers designed to use, access, or collect such content cannot be used without written permission from the publication.
Despite introducing the new rules to its policy, the publication doesn’t appear to have made any changes to its robots.txt — the file that informs search engine crawlers which URLs can be accessed.
Many large language models powering popular AI services like OpenAI’s ChatGPT are trained on vast datasets that could contain copyrighted or otherwise protected materials scraped from the web without the original creator’s permission.
That said, the NYT also signed a $100 million deal with Google back in February that allows the search giant to feature Times content across some of its platforms over the next three years.
Earlier this month, several news organizations including The Associated Press and the European Publishers’ Council signed an open letter calling for global lawmakers to usher in rules that would require transparency into training datasets and consent of rights holders before using data for training.
I’m a bot and I’m open source!