Welcome to the Database Programmer. This is a blog for anybody who wants to learn about databases. I try to make it simple and easy to read without dumbing it down. Most programmers these days work on web sites of one sort or another, and since all non-trivial websites require a database, there is very good reason to learn how they work.
There is a new entry every Monday morning, and the complete table of contents is here. The overall theme right now is that good table design leads to tight and efficient code. Last week's entry gave a collection of guiding rules for crafting your primary keys, and now we can turn to the concept of Design Patterns in Table Design. Table Design Patterns are recurring patterns in tables and the relationships between them. If you learn what they are, you can learn to recognize these patterns in the requirements that users give you.
Most database programmers are comfortable with the idea of Table Design Patterns, and have their own rules they have worked out through experience. The patterns I present here are well known amongst database programmers, my only contribution is to explain them and to try over time to put them all in one place.
The Example: A School Class Schedule
In this series we are using the example of an application that manages a school of a few dozen faculty and few hundred or perhaps a couple thousand students. Each year the school administration must make up the actual class assignments for each teacher, including what classroom and period each class will be taught in. Then students must be assigned into the classes.
There are five rules that must be followed:
- A teacher must be qualified in advance for each course they teach.
- No two classes can be given in the same classroom in the same period.
- No teacher can be in two places at once, a teacher can only teach one class in one period.
- No student can be in two places at once, so no student can be in more than one class in the same period.
- A student cannot take the same class again after passing the class once.
All five rules can be handled with some smart table design and we will not need any application code.
Next week we will look at rules 2 - 4, and the week after that we will tackle rule 5. This week we look at Rule 1.
Rule One and The Cross Reference Validation Pattern
When a user gives us their requirements, they will not express them as as computer programs or table designs. The user will express his needs in his own terms, expecting us to translate those terms into tables and programs. So our job is to translate Rule 1 into one or more tables with proper primary and foreign keys.
When we look at rule one we see that it splits into two requirements:
- Subrule A: There must exist of a list of what course a teacher is qualified to teach.
- Subrule B: All actual course assignments must match to the list of allowed (or qualified) classes.
When I see these two requirements together I automatically think "validate against a cross reference." I call this the "Cross Reference Validation" pattern. It is different from a simple foreign key validation. Let's go through it and see why.
We can start start at subrule B, because it is easy to recognize. Any time some entry in a table must match an entry in another table, that means a foreign key.
Now we want to work out what the parent table looks like. Is it a simple master table with a one-column character key? Is it a transaction table? We answer this buy drilling deeper into Subrule A:
- Assumption 1: There is a list of teachers
- Assumption 2: There is a list of courses
- Assumption 3: There will be a list of teachers and courses together.
So the parent table must be a cross reference because each entry will list a teacher and a course. It should go without saying that we will have a table of tgeachers and another table of courses, so the table of qualifications must be a cross reference between these two.
Here is a picture of the Cross Reference Validation pattern:
CLASSROOM | PERIOD | COURSE | TEACHER | STUDENT | ASSIGN_ID ----------+------------+----------+----------+----------+----------- XXX | XXXX | XX | XXX | XXX | 1 XXX | XXXX | XX | XXX | XXX | 2 XXX | XXXX | XX | XXX | XXX | 3 XXX | XXXX | XX | XXX | XXX | 4 ----------+------------+----------+----------+----------+----------- | | | | Use A foreign | | key to make sure | | teacher is qualified | | for each course | | COURSE | TEACHER -----------+-------- XXX | XXXX XXX | XXXX
Here is the SQL to create these tables. I have left out anything not directly related to our Cross Reference Validation pattern:
-- Create the bottom table: allowed courses by teacher CREATE TABLE courses_x_teachers ( teacher char(10) ,course char(10) ,foreign key (teacher) references teachers(teacher) ,foreign key (courses) references courses(course) ,primary key (teacher,course) ); -- Create the main table (assumes we have already created -- teh TEACHERS table, the CLASSROOMS table and so forth CREATE TABLE enrollment ( classroom char(5) ,period char(5) ,course char(10) ,teacher char(10) ,student int -- this is a trx table, so use an integer ID for pk ,assign_id int IDENTITY ,primary key (assign_id) -- The cross reference validation pattern needs a foreign key ,foreign key (teacher,course) references courses_x_teachers (teacher,course)
Another Comment On Integer Keys
In last week's essay we saw that the rule of thumb for cross-references is to use a multi-column primary key and not to use an integer key. It should be obvious now why we do that. If we used an integer key, then it could not help us!
The tables as described above completely satisfy the business requirement: no entry is allowed if the teacher is not approved for that course. But an integer key provides no such value, it is useless. If you use an integer key for the cross-reference table, then you are required to write extra program code to "chase" the key out to the cross-reference and make sure the values for teacher and course match what you have in the child table.
Foreign Keys and Performance
One time I was giving a presentation and somebody in the audience said they had been told not to use foreign keys because they reduced performance. To me this was like saying don't put gas in your car because it weighs down the car and reduces mileage. The question was such a shock and so unexpected that I had no decent answer.
Since that time I have discovered that this kind of nonsense is actually prevelant these days. Such advice may make sense in cases where the data structure is trivial (at most 7-10 tables) and the presentation is paramount, but in any serious application such advice is ridiculous. Here is the real scoop.
Somehow, some way, your code must ensure that every class assignment made in the above example is valid. This means that the check for teacher-course validity must be made. If you do not perform this check then your software is structurally unsound and you will have bad data, with no way to stop it and you will always be patching live systems in crisis mode.
So because you must make the checks, we can ask, which is better, to make the checks in program code or using the declarative constraints in the table definition? This is easy to answer. Without the foreign key, our program must make one round trip to the server to make the check, and then a second round trip to do the insert. But if you use foreign keys you only have to make one round trip. Since the overhead of making a trip to the server is often more than the time spent executing on the server, the foreign key solution will often perform twice as well as the application code solution.
In short, if you're worried about performance, use foreign keys, not application code.
Architecture Note: Server-Side Errors
Many programmers are taught not to use foreign keys and so they are not used to the idea that the database server will throw errors. Once you start using the database for all that it is worth, you will depend more and more on the errors it throws, so you want to make sure your framework can read them and report them to the user the same way it reports your application-generated errors.
Conclusion: Learn to Recognize Foreign Keys
User requirements will never be expressed as program code or table design, but we can recognize common patterns in them. One of those patterns is the Cross Reference Validation pattern, which we implement with a foreign key into a cross reference table. This and other patterns will stand out if we examine user requirements with an aim to identifying:
- Lists of things you are keeping track of. These go into master tables, like courses and teachers.
- Relationships between those master items, like a list of teacher-course qualifications.
- Restrictions on how things can interact, so that a teacher must be qualified to teach courses means there will be a foreign key somehow into that teacher-course cross reference from some other table.
Next week we will examine Rules 2-4 and find out more about how unique constraints and their associated patterns can reduce the amount of code in our applications.