In part 1, I talked about how it was inevitable that many of these models would eventually find their way on-device.
What are the implications?
- Enables low latency applications – The last wave of low latency AI models were image recognition models. Some of the most popular ones were low-footprint models that could be run in embedded environments (see the post on YOLO)
- Different business model – with on-device AI, you might not charge for API access but rather subscription or for the device itself. It also is likely a cheaper serving model for companies shipping models. See Generative AI Value Chain.
- Privacy-preserving – data can be used to fine-tune models while never leaving the user's device. Again, in the last wave of AI, we saw research on things like federated learning (a vastly simplified explanation: train on-device and only send the weights). Opens up workflows to data that users (or enterprises) don't want to lose control of. While it might mean that models can be more personalized, historically foundational models have captured personalization fairly well (e.g., Google).
- The importance of edge – Even if things are easy to run locally, they still might live in a data center. In Where is the Edge? I talked about the different things this could mean. Serving inference at the edge seems like a good medium – low latency but easy for end-users and accessible anywhere over HTTP.