I worked on machine learning infrastructure in the last AI cycle, which was marked by stepwise advancements in deep learning. Some lessons learned and other observations:
- Human-in-the-loop doesn't work. The idea of "faking it until you make it" by being a negative margin services company with the hopes of quickly automating away the expensive humans doesn't work. The magic assistant startups never reached automation. Even large companies like Uber fell for the narrative. It stems from two miscalculations: (1) model performance may be somewhat predictable with more compute, but "capability" is much harder to predict. (2) the "AI" is rarely the full product. Human-in-the-loop and other AI Mistakes.
- Machine learning infrastructure changed too quickly for companies to gain a foothold. The hypothesis that the next generation of startups would mimic the machine learning stacks of Uber and Airbnb was false. Infrastructure takes time to build (and time to sell to enterprises). By the time it was productionized and SaaS-ified, it was too late, and the paradigms had changed. See A New ML Stack and MLOps, Convergent or Divergent?
- As a corollary, adaptation was a key trait of successful companies. OpenAI started with reinforcement learning on video games and ended up with large language models.
- The simplest workflows won – complicated machine learning dev tools never found mass adoption. Instead, the startups that sat at the beginning of the workflow (i.e., data labeling) did the best (Scale AI). The data stack (e.g., much more grounded analysis) saw more investment and success. Even then, the simplest form of the data stack won (batch, not streaming). PyTorch vs. TensorFlow and Solving the Simple Case.
- The smallest models and frameworks saw the most adoption, not the largest or the most sophisticated (e.g., Yolo, PyTorch). Local AI Part 1 and Part 2.