Rethinking the PDF

Feb 2, 2022

Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals, etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed. – John Warnock, Co-founder of Adobe and PDF format

It's founder, John Warnock (co-founder of Adobe) prototyped a compatibility layer where documents would look and, most importantly, print (!) the same regardless the computer they were viewed on (1993 video). The PDF is now 30 years old and outlived the printer. The "killer app" for PDF was tax returns – the IRS adopted PDF in 1996 because of a rumored frustration with the US Postal Service.

Entire businesses have been built around the file format. There's Adobe, which sells Adobe Acrobat as a part of its Creative Cloud. There's eSignature businesses like DocSign which build workflow features around the document (Adobe also has a competitor). DocSend, a document sharing and analytics platform, so you can see who is reading your PDF and for how long. Scribd tried to be the search engine for PDFs. But the PDF is showing its age. With billions of users and even more billions of PDF documents, what would it take to rethink the format?

Some open problems with PDFs:

  • Enterprise-grade OCR (optical character recognition) for PDF documents still doesn't exist in 2022. I'm maybe dismissing the complexity of a generalized solution, but with state-of-the-art computer vision techniques, I'd expect a much better benchmark.
  • Interactive and web-enabled forms. I'll admit, I still have trouble every time I'm asked to fill out a PDF form. Different behaviors on different platforms. Sometimes it saves without the data filled in. Haven't dug to the bottom of this, but why isn't this easy?
  • Slow page loads – PDFs are inherently slow to load. Adobe now runs Acrobat in the browser with WebAssembly. Their main reasons for this change were performance (time until first render) and high fidelity. More opportunities to make PDFs (especially large ones) instant to view and browse.
  • Bloated size – More lightweight alternatives like ePub and MOBI exist for e-books. For generic use cases, there have been smaller file-size alternatives like DjVu for many decades – but they haven't caught on.