Sunday, April 19, 2009

The Relational Model

If you look at any system that was born on and for the internet, like Ruby on Rails, or the PHP language, you find an immense wealth of resources on the internet itself, in endless product web sites, blogs, and forums. But when you look for the same comprehensive information on products or ideas that matured before the web you find it is not there. Relational databases stand out as a product family that matured before the internet, and so their representation in cyberspace is very different from the newer stuff.

The Math Stuff

You may have heard relational theorists argue that the strength of relational databases comes from their solid mathematical foundations. Perhaps you have wondered, what does that mean? And why is that good?

To understand this, we have to begin with Edsger W. Dijkstra, a pioneer in the area of computer science with many accomplishments to his name. Dijkstra believed that the best way to develop a system or program was to begin with a mathematical description of the system, and then refine that system into a working program. When the program completely implemented the math, you were finished.

There is a really huge advantage to this approach. If you start out with a mathematical theory of some sort, which presumably has well known behaviors, then the working program will have all of those behaviors and, put simply, everybody will know what to expect of it.

This approach also reduces time wasted on creative efforts to work out how the program should behave. All those decisions collapse intot he simple drive to make the program mimic the math.

A Particular Bit of Math

It so happens that there is a particular body of math known as Relational Theory, which it seemd to E. F. Codd would be a very nice fit for storing business information. In his landmark 1970 paper A Relational Model of Data for Large Shared Data Banks (pdf) he sets out to show how these mathematical things called "relations" have behaviors that would be ideal for storing business models.

If we take the Dijkstra philosophy seriously, which is to build systems based on well-known mathematical theories, and we take Codd's claim that "Relations" match well to business record-keeping needs, the obvious conclusion is that we should build some kind of "Relational" datastore, and so we get the Relational Database systems of today.

So there in a nutshell is why relational theorists are so certain of the virtues of the relational model, it's behaviors are well-known, and if you can build something that matches them, you will have a very predictable system.

They are Still Talking About It

If you want to know more about the actual mathematics, check out the comp.databases.theory Usenet group, or check out Wikipedia's articles on Relational Algebra and Relational Calculus.

A Practical Downside

The downside to all of this comes whenever the mathematical model describes behaviors that are contrary to human goals or simply irrelevant to them. Examples are not hard to find.

When the web exploded in popularity, many programmers found that their greatest data storage needs centered on documents like web pages rather than collections of atomic values like a customer's discount code or credit terms. They found that relational databases were just not that good at storing documents, which only stands to reason because they were never intended to. In theory the model could be stretched, (if the programmer stretched as well), but the programmers could feel in their bones that the fit was not right, and they began searching for something new.

Another example is that of calculated values. If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory, and there are some very bright theorists who will refuse all requests for assistance in getting that value to work, because you have violated their theory. This is probably the most shameful behavior that relational theorists exhibit - a complete refusal to consider extending the model to better reflect real world needs.

The Irony: There are No Relational Databases

The irony of it all is that when programmers set out to build relational systems, they ran into quite a few practical downsides and a sort of consensus was reached to break the model and create the SQL-based databases we have today. In a truly relational system a table would have quite a few more rules on it than we have in our SQL/TABLE based systems of today. But these rules must have seemed impractical or too difficult to implement, and they were scratched.

There is at least one product out there that claims to be truly relational, that is Dataphor.

The Weird Optional Part

Probably the grandest irony in the so-called relational database management systems is that any programmer can completely break the relational model by making bad table designs. If your tables are not normalized, you lose much of the benefits of the relational model, and you are completely free to make lots of non-normalized and de-normalized tables.

Conclusion

I have to admit I have always found the strength of relational databases to be their simplicy and power, and not so much their foundations (even if shaky) in mathematical theory. A modern database is very good at storing data in tabular form, and if you know how to design the tables, you've got a great foundation for a solid application. Going further, I've always found relational theorists to be unhelpful in the extreme in the edge cases where overall application needs are not fully met by the underlying mathematical model. The good news is that the products themselves have all of the power we need, so I left the relational theorists to their debates years ago.