CUDA (Compute Unified Device Architecture) is a closed-source low-level API that interfaces software with NVIDIA GPUs.
CUDA is a major moat for NVIDIA. It’s part of why NVIDIA GPUs command such a premium over other hardware (and are perpetually in short supply).
A few reasons why the monopoly exists:
- Hardware/software synergy. NVIDIA has consistently shipped the fastest hardware) and software. It’s been difficult for other companies to build this flywheel (software companies don’t have the hardware capabilities, and vice versa). Open-source libraries are magnitudes slower.
- First mover. NVIDIA introduced CUDA in 2006. Both consumers and enterprises were locked in by designing their applications for CUDA.
And how it could be disrupted in the future:
- Alternative Open Standards / Abstraction Layer. OpenAI released Triton, and PyTorch 2.0 utilizes Triton (via TorchInductor). Today they only act as a compilation layer over CUDA, but in the future, they might support other platforms (or bypass CUDA directly).
- Competing product. NVIDIA has managed to ship the best products over the last decade. But there’s still a chance that other big tech companies could build a compelling alternative. Cloud providers are building their own chips and have specific data on workloads as feedback for architecture and design.
- Specialized Hardware. Custom hardware accelerators, like Google’s TPU (Tensor Processing Unit), could become more popular than general-purpose GPUs.
- CPU-bound. While GPUs are ideal for AI because they excel at matrix multiplication (among other things), there’s potentially a future where small models can run “good enough” on CPUs.