In consumer tech, there’s a concept of “magic numbers” that, if satisfied, represents a high likelihood of a converted user. For Facebook, it was users who had “seven friends in ten days” had much higher retention than the segment who didn’t.
Twitter had “30 follows”, LinkedIn “50 connections”, Airbnb “four positive reviews,” and Dropbox “seven-day active user”.
What if we could explain emergent behavior on the Internet with the same type of simple threshold? A social network can’t emerge on the Internet without enough users. Search isn’t important if there aren’t enough websites. PageRank doesn’t have enough signal if there aren’t enough links. LLMs don’t work without enough data.
Payments/Commerce — 12 million users on the Internet in 1995 when Amazon was launched (it would be interesting to know Internet transaction revenue).
Search — 2.4 million websites when Google was founded in 1998 (it would be interesting to know how many hyperlinks).
Social Networking — 670 million people (10% of the world population) were on the Internet in 2003 when MySpace was founded (Facebook, 2004).
Video — 1 megabit per second broadband (20x faster than dialup) was hitting mainstream when YouTube was founded in 2004.
Cloud — Couldn’t find an accurate number anywhere, but a guess on the metric: (1) # startups created, (2) # data centers built.
Deep Learning — 22 nm processor architecture in 2012 when ImageNet kicked revitalized interest in deep neural networks.
Large Language Models (LLMs) — 500 billion tokens in the GPT-3 training data (Common Crawl, WebText2, Books1, Books2, Wikipedia).
LLMs (code) — 200 million repositories on GitHub in 2021, used to train OpenAI’s code-davinci-002
and unlock new capabilities in LLMs.