Reddit communities are still private in protest of new API rules. Twitter moved beyond a login wall and is rate-limiting users. Users are frustrated but still using these sites.
But — what will happen to the Google Index? Millions of search results are effectively dead links. Users that refined Reddit search results via Google are now out of luck (Reddit’s search is inferior). Tweets in the search engine results page (SERP) now lead to a login wall for many users.
Advancements in AI might disrupt Google Search in a roundabout way:
Large models are trained on public data scraped via API. Content-heavy sites are most likely to be disrupted (why post on StackOverflow?) by models trained on their own data. Naturally, they want to restrict access and either (1) sell the data or (2) train their own models. This restriction prevents (or complicates) Google’ automatic scraping of the data for Search (and probably for training models, too).
Google will lose results, site by site — it will be Google Search’s death by a thousand cuts.
It’s estimated that Wikipedia shows up on the first page of 99% of searches on Google. What if Wikipedia started charging or restricting API access? It’s a dataset found in almost every large language model corpus. The Wikimedia Foundation is constantly looking for financial assistance (“please donate” banners) and has already launched an enterprise API product (Wikimedia Enterprise, 2021).
One by one, search results become dead links and are removed from the index. Users will start to rely on site-specific searches behind walled gardens. The first page of search results will not only be filled with ads but will be missing key results. Google may try to augment results with AI-generated answers, but (1) not all of these answers will be good enough, and (2) the data needed to train these answers will increasingly be found behind login or paywalls. Search might progressively get worse over the years until a new alternative arises.