How do you think about PDF documents? Many people talk about PDF as "old" or "boring". These sentiments aren't necessarily wrong, so why would anyone be interested in a blog or podcast about PDFs? Why talk about PDFs at all, their inner workings, history, and how PDFs affect modern society in so many ways? And why now, when so many other technologies seem more exciting, relevant, or important?
Before we get started in exploring the dark forest that is PDF, this first post is meant to unpack some answers to those whys.
Why talk about PDF?
The easy answer to this question is that PDFs are ✨everywhere✨. Every family, every nation, every culture, every business and organization depends upon PDFs in some way. The scale of PDF usage is so vast that it's hard to overstate. If you are curious about the world around you, anything as widely used as PDFs is worth understanding to some degree.
Some basic facts will help more than hyperbole though, so:
- PDFs are the most-used document format in the world, by a wide margin. Adobe's 2020 10-K filing with the U.S. Securities and Exchange Commission estimated that trillions of PDF documents are generated every year (yes, "trillions" with a 't').
- PDF is among the most mainstream, front-of-mind technologies. The word "PDF" is one of the most-searched-for terms on the web, even surpassing some of the most popular consumer technology brands, like "iphone" and "android". This is not what a niche or dying technology looks like:
So, PDFs are widely used — perhaps one of the most widely used consumer-facing technologies ever, for that matter — and that usage and PDF's mindshare among the public seems as high as it could possibly be.
PDF is semi-permanent infrastructure
While it's useful to understand the scale of PDF's mindshare and usage like any other popular technology, there are ways in which PDF is more like permanent infrastructure, with a lifespan measurable in human generations.
At least three factors make this a reasonable characterization of PDF:
No PDF alternatives exist
The last serious effort at an alternative to PDF was Microsoft's
XPS in the
mid-aughts (now called OpenXPS, after its standardization retirement to the
ECMA standards body, to be published as a
standard just
once,
in 2009), and it never gained any real traction.
Beyond XPS, PDF alternatives have only ever been successfully fielded to serve relatively niche use cases, such as ebook formats like ePub and .mobi.
PDF technologies are stable
Compared to the churn we've all become accustomed to in software libraries, frameworks, programming languages, consumer computing hardware, and even instruction sets and operating systems, PDF as a living platform for its constituent technologies has been remarkably stable: a PDF generated in 1991 when Adobe first introduced the format is still a completely functional document today, even using the most modern tools. Few things in computing can make similar claims.
PDF is a living standard in industry and government
While originally created as a published but proprietary Adobe technology, PDF has been an ISO standard since 2008, revised in 2020, and has additionally seen domain-specific subsets of the PDF standard (PDF/A, PDF/E, PDF/UA, PDF/VT, and PDF/X, all of which we'll get acquainted with in later posts) published more than a dozen times.
Meanwhile, PDF has been enshrined via statute and regulation by many governments around the world as the preferred (and sometime only) document format for official publications, legal agreements, archives, and so on.
Given all this, I fully expect not just my children to use and rely on PDFs throughout their lives, but (if they come to be) my grandchildren as well. Making a prediction like this about almost any other technology in computing would be foolish, but this one feels quite safe.
Why talk about PDF now?
People often talk about PDF as being "mature", and not in a good way. Many would (and do) call it "boring". I have occasionally been guilty of the same; even though most of my professional life has been built around PDFs, that familiarity (and some occasional shiny-object FOMO) has sometimes bred contempt. Certainly compared to faster-moving lanes of the tech world, PDF might seem like a backwater.
However, like any long-lived infrastructure that's mostly taken for granted, once you lift the cover on PDF, there's a wealth of fascinating history and technologies inside that are worth exploring and learning about:
- As a publishing medium, parts of PDF have their roots in typography and principles of printing that go back to the invention of the printing press.
- As a programmable document format, PDF incorporates lessons learned in programming language theory and information architecture that are not only still relevant, but oftentimes sorely lacking in modern software practices.
- As a container format, PDF incorporates dozens of other technologies to produce complete documents: image formats and rasters, color spaces, font rendering, and encryption and digital signature standards and techniques all play their part.
- As a global standard, PDF has grown up alongside and incorporated the best practices of internationalization, including support for every kind of human script, and right-to-left and bidirectional text.
- And, as common infrastructure that touches every single industry in the world, often in very sensitive places, PDF has a significant impact on how businesses and organizations operate, how they're structured, and how they interact with each other.
There is a beautiful depth to what PDF is, how it works, and the ways it affects us all. It deserves to be more widely understood. And yet, outside of a handful of insider-y industry blogs, there is little to no media dedicated to exploring it. I hope to change that, here…and of course, now is always the best time to start anything.
Why me?
For almost 25 years, I've been building tools to work with PDF documents — creating, "editing", converting, searching, obfuscating, extracting data from, and redacting them. Most of that work has been at Snowtide, where I've been the lead engineer for a loooong time on PDFxStream, a PDF data extraction library for Java and .NET. In the process, I've had the opportunity to work with a wide variety of customers and use cases, seeing PDF used and abused in ways I'm sure the original designers never intended.
As a side hobby, I've also spent some considerable amount of time learning of the history of PDF: the context in which it was initially created, the functional predecessors that informed its design, and so on.
All that experience means that I've seen the best and worst of what PDFs have to offer, in dozens of industries and roles. I know how the PDF format itself is designed, and how to engineer tools to work with it well.
I'm hoping to make the content here at The PDF Minute reflect that experience, and hopefully in time it will become an accessible educational resource for anyone that wants to better understand how PDF documents work, why they are the way they are, and how they influence and enable large swaths of modern society.
I'll be publishing here weekly, each time exploring one aspect of PDF's history, technologies, or impact on society. Every post will also be available in podcast form, as this one is. I hope you decide to join me on this little side quest, and subscribe — either to the blog via its RSS feed or mailing list, or to the podcast via your favorite podcast app — and follow The PDF Minute on Twitter, Mastodon, and Bluesky to know when new content drops, and to ask your own questions about PDF and related topics, maybe to be discussed in future posts.