Literate programming is a paradigm first introduced by Donald Knuth that mixes natural language and machine code in the same file. Extending explanations from just comments to markdown and formatted text, graphs, and more.
Jupyter Notebooks were a reimagining of literate programming for the data science world. Now, with LLM-assisted environments like OpenAI’s code interpreter, will we see another form?
The intermixed natural language is twofold — (1) it serves as future documentation and explanation, and (2) it helps the model reason through chain-of-thought.
The constraints of Jupyter Notebooks
- Works best with interpreted languages by just sending the commands to a running REPL
- Suffers from out-of-order execution (since the REPL is reused and the user may run cells in any order)
- Hard to convert to production code that can run unassisted
- Hard to test
- Hard to version
LLMs solve some of these problems (but not all, yet). Some unordered thoughts:
- LLMs potentially solve the notebook-to-production gap. They can extract, rewrite, and productionize code that would have lived in a notebook otherwise
- They might be able to generate unit tests and other scaffolds for the code
- They can run more complex build tasks in a language-agnostic way (compiled languages, etc.)