This week I would like to address the assertion that "code is data" and how the application developer might benefit or be harmed by this idea in the practical pursuit of deadlines and functioning code. For some reason my essay written back in May, Minimize Code, Maximize Data got picked up on the blogosphere on Thursday, and comments on ycombinator, on reddit.com, and on the post itself have suggested the thesis is flawed or unworkable because "code is data." Let's take a look at that.
Credit Where Credit Due
I first heard the thesis "Minimize Code, Maximize Data" from A. Neil Pappalardo. I consider it the "best kept secret in programming" because I personally have found it to be almost completely absent from my own day-to-day experience with other programmers.
However, glomek over at reddit.com also credits Eric Raymond with the following quote, "Smart data structures and dumb code works a lot better than the other way around."
Also, sciolizer over on the news.ycombinator.com comments area gives us these quotes from some of the greats:
- Fred Brooks: "Show me your flow charts and conceal your tables and I shall continue to be mystified, show me your tables and I won't usually need your flow charts; they'll be obvious."
- Rob Pike: "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming."
- Eric S. Raymond (again): "Fold knowledge into data, so program logic can be stupid and robust."
- Peter Norvig: "Use data-driven programming, where pattern/action pairs are stored in a table."
And finally, Kragen Javier Sitaker left a comment on my original essay mentioning Tim Berners-Lee and his theory of "least power." You can read a description of that here.
So Why Do They Say Code is Data?
The suprising answer is that code is data, in particular contexts and when trying accomplish certain tasks. The contexts do not include application development, and the tasks do not involve storing of customer information, but the fact remains true for those who work in the right contexts.
As an example, at the bottom layer of the modern computer are the physical devices of CPU and RAM. Both the computer program being executed and the data it operates on are stored in RAM in the same way. This is called the Von Neumann architecture. Its a fascinating study and a programmer can only be improved by understanding it. At this level code is data in the most fundamental ways. There are many many other contexts and tasks for which it is true that code is data.
But we who create applications for customers are separated from Von Neumann by decades. These decades have seen a larger and larger stack of tools that allow us to concentrate on specialized tasks without worrying about how the tools below are doing their jobs. One of the most significant sets of tools that we use allow us to cleanly separate code from data and handle them differently.
The One and Only Difference
Trying to explain the differences between code and data is like trying to explain the differences between a fish and a bicycle. You can get bogged down endlessly explaining the rubber tire of the wheel, which the fish does not even have, or explaining the complexity of the gills, which the bicycle does not even have.
To avoid all of that nonsense I want to go straight to what data is and what code is. The differences after that are apparent.
Data is an inert record of fact. It does nothing but sit there.
A program is the actor, the agent, the power. The application program picks up the data, shakes it, polishes, and puts it down somewhere else (as in picking it up from the db server, transforming it into HTML, and delivering it to a browser).
To repeat: data is facts. Code is actions that operate on facts. The one and only difference is simply that they are not the same thing at all, they are a fish and a bicycle.
Exploiting The Difference
All of the quotes listed above, and my original essay in May on the subject, try to bring home a certain point. This point is simply that the better class of programs are those that begin with a distinction between fact and action, and seek first to organize the facts and only then to plan the actions.
Put another way, it is of enormous practical advantage to the programmer to fully understand that first and always he is manipulating facts (data). If he ignores the principles of how facts are organized and operated on, he can never reach his full abilities as a programmer. Only when he understands how the facts are organized can he see the clearest program designs.
And Again: Understand the facts first. From there design your data structures. After that the algorithms write themselves.
Minimizing and Maximizing
The specific advice to minimize code and maximize data is nothing more than taking the idea to its logical conclusion. If I write program X so that the data structures are paramount, and I find the algorithms to be simple (or "dumb" as ESR would say), easy to write and easy to maintain, don't I want to do that all of the time?
Conclusion
The wise programmer is one who can take the wisdom and theory of the industry and correctly judge what is appropriate and applicable and what is not. This is the programmer who has a shot at keeping focused, making budget and making deadlines. He knows when a generalized routine will support the overall project and when to just code the case at hand and move on.
The unwise programmer is one who cannot properly apply a theoretical concept to the correct context, or cannot judge the context in which a concept is appropriate. He is the one who produces mammoth abstractions, loses sight of the end-goals of the check-signers and end-users, and never seems to be able to make the deadline.
Of all of the advice I have received over the years, one of the most useful and productive has been to "minimize code, maximize data." As an application developer and framework developer it has served me better than most.