GPUs are hard to come by, often fetching significant premiums in their aftermarket prices (if you can find them). Cloud regions see frequent shortages. On-demand prices aren’t that much cheaper.
But there’s a different type of strategy in AI for the GPU-poor startups that don’t have access to large clusters of machines. Many will hypothesize that GPU-poor startups have no moat — that’s only part of the story. There are hardware/software cycles and distribution moats, often the best hardware moats. In fact, I believe that GPU-poor startups might be in better positions than their GPU-rich counterparts as soon as the next few quarters.
But how do you operate as a GPU-Poor startup?
A few ideas:
- On-device inference. Running small models on end-user machines. That might mean running in the browser or on a mobile phone. There is no network latency and better data privacy controls, but you’re capped at the device power (so, only smaller models).
- Commoditize your complement. HuggingFace is a one-stop shop for uploading, downloading, and discovering models. It’s not the best place to run them, but they benefit from growing traffic from some of the best machine learning researchers and hackers.
- Thin wrappers. Benefit from the growing competition at the inference layer to switch behind the lowest cost providers without wasting cycles on optimization for specific models. Large language models are interchangeable (in theory).
- Vertical markets. While other companies are stuck trying to train large models over months, GPU-Poor startups can focus on solving real customer issues. No GPUs before Product-Market Fit.
- Efficient inference. You might not have access to large training clusters, but you do have access to the latest open-source optimizations for inference. Plenty of ways to speed up inference and do more with less.