Product teams and application engineers will be the buyers of the foundational model stack, not data teams. Why?
- Direct value without a data pipeline. Application engineers can get direct value out of LLMs without involving a data team. For a proof-of-concept or demo, all they have to do is build some infrastructure around a hosted foundational model. They don’t need access to the data warehouse since a tiny bit of copy-pasted data can validate an idea. Python won’t be the only language of LLMs.
- Shared infrastructure often means shared responsibility. MLOps and the data stack are beginning to converge toward DevOps primitives. Kubernetes has already infiltrated the data stack as the substrate for orchestration, data ingestion, and ETL. OpenAI uses Kubernetes to run distributed training and inference.
- Products, not insights. The early use cases around generative AI have been product-centric, not insight-centric. Just look at the number of companies refreshing old products with new generative AI features. For now, data scientists are safe — insights are hard to automatically extract from data — it requires a mix of technical expertise, domain knowledge, and exploration that is tough for models to emulate.
Of course, organizations will rush to fine-tune their own foundational models eventually. This will require data pipelines and data expertise. Prompt engineering is important, and data scientists are in the best position to figure it out.
Will the data teams and engineering teams merge? Will they coexist in some new configuration?