Sunday, March 23, 2008

Of Tables and Constraints

Hello and welcome to the Database programmer blog. This is a blog for anybody who wants to learn about databases on their own terms and find out how database decisions affect their applications.

Constraints are probably the least talked about topic in application and database design. I suspect that there is something deep in the souls of programmers that just does not like to talk about limitations. We like to talk about flexibility, extensions, abstraction and so forth. Constraints rub us wrong. But of course we also love to write absolute rules for ourselves and others (thou shalt always use integer primary keys, thou shalt not use HTML TABLE elements, and so on). So go figure.

If you want to make a deadline, you have to limit what you do. You cannot make a list a kilometer long and finish it in a day. By the same principle, if a database is going to accurately store data, there must be rules about what makes the data correct. These rules we call constraints.

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

Table Structure is The First Constraint

Table structure is not usually listed as a constraint, but your table structures represent the first and most basic constraints on your application.

The table structure can be pictured as the mission statement of the application. If you have a table of students, we can safely assume you must record certain facts about students. Likewise, if you do not have a table of Nobel Prize Winners, we can safely assume that Novel Laureates are outside the scope of your application.

This basic fact, that table structure represents a codification of design decisions, is worth getting into a little deeper.

Where Do You Want The Zebra?

The importance of table structures can be seen with an analogy. Imagine you purchase a building and some printing equipment, and then you hire some people and create a printing shop. Somebody comes in and says, "Where do you want this zebra?" The answer is that you have no need or use for a zebra in a printing shop, and it will actually get in the way and cause problems. So the answer is "Not here!" There is a constraint on the use of your building, which is that wild game is not welcome.

Database tables represent that decision to commit to a certain task or collection of tasks. They constrain the end users to perform the mission of the company, and not some other mission.

Sometimes the more naive programmers will not give up on this point, and they will respond with something like "Well, yes, but computers do not have the limitations of physical systems, they are so much more flexible." This is true but wrong. A computer language like C++ is incredibly flexible in its potential, but that potential is worth nothing until somebody sits down and writes code to do a specific task, at which point that code is good only for that task. In other words, the general purpose value of the computer is only realized when dedicated to a specific purpose. Likewise, a database server is in principle capable of storing data about anything, but is only useful once decisions have been made to create specific tables to store specific things.

To say it one more time, table designs are constraints because they say what will be tracked and what will not be tracked.

Primary Keys, Unique Constraints and Foreign Keys

The next category of constraints have been well discussed on this blog, and I do not want to spend too much time on them here, but we will do a quick review.

Your table designs are the foundation of your application. In simple terms, a well normalized database will have a table for every different kind of thing that must be tracked (students, courses, classrooms, etc). In addition, there are always plenty of cross references that list relationships between these primary entries (teacher to courses they are qualified to teach), and transactions that represent interactions between these (which courses a student has completed).

The primary key makes sure that there are no duplicates, and the foreign key ensures that data stored in separate tables makes sense. For instance, imagine a table that lists which courses each teacher is qualifed to teach. It would not make much sense to list teachers in this table that are not on the faculty, and the foreign key prevents these kinds of mistakes.

As a veteran database designer I am still often astonished at how many seemingly complicated and unique business requirements reduce to a simple collection of tables, with their primary and foreign keys. I have seen this happen so many times that I am now automatically skeptical when somebody tells me they have 'special needs' that probably will not fit into simple tables. They always end up fitting into tables.

Check Constraints

Once you have your tables and their keys, there is one more kind of constraint you can use, which is called the CHECK constraint. A check constraint is nothing more than some SQL expression that must always be true, and is attached either to a column or a table.

MySQL does not support CHECK constraints as of version 5.0. This is one of the reasons why users of other database systems often say nasty things about MySQL.

A typical example for a check constraint is that a discount price must be lower than a regular price. Another use is to make sure that a number is between one and 100, so that it can be used as a percentage. This bit of SQL illustrates these ideas in the PostgreSQL dialect:

CREATE SEQUENCE orderlines_line
CREATE TABLE ORDERLINES (
   order int references orders (order)
   ,line int DEFAULT nextval('orderlines_line')
   ,sku char(10) references items (sku)
   ,price numeric(10,2)
   ,discount_price numeric(10,2) CHECK (discount_price < price)
   ,tax_pct numeric(3) CHECK (tax_pct >= 0 AND tax_pct <= 100)
)

Like all server-side technologies, the CHECK constraint can greatly reduce the amount of client-side code you have to write. Going further, the CHECK constraint makes your application easier to port to other languages, if for no other reason than there is less code to port.

More Complex Constraints

Sometimes constraints are more complex than can be expressed with a CHECK constraint. In these cases you can attach a "trigger" to a table that fires when a row is written. The trigger can modify the data being written or prevent the write from happening. Triggers are a large topic so they will be treated separately in other essays. Like all other computer technologies, triggers have plenty of fans who quietly use them to great affect, while there are plenty of noisy types telling you your nose will fall off if you use them.

Conclusion

This week we have seen that constraints are used to help to ensure that all data going into a database is correct. Constraints begin with the basic decisions about tables themselves, and then go through the selection of keys, and can be more complex CHECK constraints or even be completely general purpose code-based rules enforced in Triggers.

Other Posts

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

No comments: