The Biggest Websites in the World are Blocking Large Language Models
The big boys don't want their content scraped by AI
AI companies built their large language models on the backs of other people’s content.
That’s just a fact.
You can decide for yourself whether or not that’s ethical (I think it isn’t), but the facts are the facts. It’s the world we live in and we all have to decide how to react to it.
The big boys of the internet are reacting by signaling that they DON’T want their content scraped by LLMs.
A new study from Orginality.AI goes into the details.
Let’s take a look and find out what it means for your site. I’ll also go over the steps you should take if you want to do the same to protect your content.
Results of the study
About 18.6% of the top 1000 websites are blocking at least one AI bot.
Broken down by crawler, the data shows that:
111 sites blocked GPTbot
32 blocked ChatGPT-User (plugin bot)
63 blocked CCBot
None blocked Anthropic-AI
The number of major sites blocking GPTbot is increasing.
Some of the biggest sites blocking ChatGPT are:
New York Times
Here’s the spreadsheet with the full list: https://docs.google.com/spreadsheets/d/1KR7PXruxpMBMZoUkPNNY8foUwGx1GjgMAiVWseGVHGw/edit#gid=0