Monday, February 11, 2008

The Framework And The Database

Welcome to the Database Programmer. This is a regular blog for anybody who wants practical information about how databases work and how your database and application code work together.

There is a new entry every Monday morning, and the complete table of contents is here. This week we are going to depart from our discussion of Table Design Patterns to get into the area of frameworks and Tools.

The Trouble With Deep Nesting

Today's example deals with a situation of "deep nesting", where one table has not just a parent, but a grandparent and perhaps a great-grandparent. This case is going to bring into clear light one of the basic realities of writing database applications, which is that the tools and framework you use matter a lot. Complex and deeply nested schemas really lay bare how a framework supports the use of a database.

The trouble becomes plain to see when you have dived deeply into a complete application and are starting to spell out a large number of relationships between entities. Orthodox database theory will lead you first to simple primary keys and foreign keys, but from there it will lead you to multi-column primary and foreign keys and overlapping unique keys and overlapping foreign keys. Your choice of framework and other tools can have a dramatic effect on the choices you have in table design.

The Example: A School Class Schedule

In the past three weeks we have looked at the enrollment of students in courses and the assignments of teachers to teach those courses. In these three weeks we have been working through five rules:

  1. A teacher may not teach a class he/she is not qualified to teach.
  2. No two classes can be given in the same classroom in the same period.
  3. No teacher can be in two places at once, a teacher can only teach one class in one period.
  4. No student can be in two places at once, so no student can be in more than one class in the same period.
  5. A student cannot take the same class again after passing the class once.

In each of the past three weeks we have looked at just one or two rules. Now we are going to look at all of them together, including Rule 4 which so far we have not looked at.

For the Sake Of Argument: Completely Normalized

We will start by looking at a schema that has no integer primary keys and uses only natural keys. This is a very useful learning exercise even if you do not do things this way yourself.

  • Complex multi-table schemas stretch your table design skills so they improve your overall abilities.
  • Normalized tables that enforce all rules with primary keys and foreign keys require zero application code to maintain correctness, so if our framework supports the normalized schema then that is the way we want to go.
  • Following up with the previous point, if our framework does not make it easy for us to use the normalized tables, the exercise allows us to see what we are giving up (in terms of having to write more application code) for whatever benefits the framework offers.
  • The completed normalized schema is a solid reference point for whatever design we actually choose or are forced to choose. Having it available may be useful in resolving difficulties that come up if we are forced to abandon it by inadequate tools.

With that being said, here is the completely normalized schema where five rules are enforced solely through primary keys and foreign keys:

COURSES          TEACHERS
---------        ---------
 XXXXX            XXXXXX
 XXXXX            XXXXXX
  |                  |
  |         +--------+
  |         |
 /|\       /|\
COURSE | TEACHER         Teacher qualifications table
-------+---------        Primary key: course + teacher
XXXXX  | XXXXX
   |      |
   +--+---+ 
      |
      |Rule 1: Teachers must be qualified for each course
      |
     /|\             +--------+-------+---------> Rule 2: Only one course
  ===========        |        |       |           per classroom
COURSE | TEACHER | PERIOD | YEAR | CLASSROOM  
-------+---------+--------+------+-----------   
 XXXX  | XXXXX   | XXXXX  | XXXX | XXXXX    
 XXXX  | XXXXX   | XXXXX  | XXXX | XXXXX
 XXXX  | XXXXX   | XXXXX  | XXXX | XXXXX
 XXXX  | XXXXX   | XXXXX  | XXXX | XXXXX   
           |         |        |
           +---------+--------+-----------> Rule 3: Teacher cannot be
                                            in two places at the 
   |                  |       |      |      same time.
   |                  |       |      |
   +----------------+-+-------+------+
                    |            Primary key describes 
                    |            the actual unique entry,
                    |            4 column primary key!!
                    |
                    |
                    |  Unstated Rule: We never said this anywhere 
                    |  but a student can only enroll in 
                    |  classes that are actually offered
                   /|\          
   ===========================
COURSE | PERIOD | YEAR | CLASSROOM | STUDENT
-------+--------+------+-----------+---------
 XXXX  | XXXXX  | XXX  | XXXX      | XXXXX    
 XXXX  | XXXXX  | XXX  | XXXX      | XXXXX
 XXXX  | XXXXX  | XXX  | XXXX      | XXXXX
 XXXX  | XXXXX  | XXX  | XXXX      | XXXXX
   |       |       |                   |
   |       +-------+-------------------+---> PRIMARY KEY  
   |                                         Rule 4: Student cannot be
   |                                   |     in two places at once
   +-----------------------------------+     
                                       |
                                       |
                      Rule 5: Student may not take         
                      a course more than once.

Wow! That is quite a bit more complicated than anything we have seen before in these essays. Yet for all of its complexity it is composed entirely of unique constraints, primary keys and foreign keys. Because a primary key is functionally the same as a unique constraint, we have satisfied all five rules with only two building blocks.

So we are now ready to look what this may look like if we go with integer primary keys. Then we will be ready to compare them.

Looking at it With Integer Keys

I am going to take just the bottom two tables for this example, because the changes for those tables will be the same as the others. I am also concentrating only on rules 4 and 5, since the pattern repeats for the others. These two tables might look like this:

ID | COURSE | TEACHER | PERIOD | YEAR | CLASSROOM  
---+--------+---------+--------+------+-----------   
 1 | XXXX   | XXXXX   | XXXXX  | XXXX | XXXXX    
 2 | XXXX   | XXXXX   | XXXXX  | XXXX | XXXXX
 3 | XXXX   | XXXXX   | XXXXX  | XXXX | XXXXX
 4 | XXXX   | XXXXX   | XXXXX  | XXXX | XXXXX   
 |     |                  |       |
 |     |                  |       |   
 |     |        +---------+       |  These three columns
 |     |        |        +--------+  must be copied??
 |     |        |        |
 |    \|/      \|/      \|/
ID | COURSE | PERIOD | YEAR | STUDENT
---+--------+--------+------+---------
 2 | XXXX   | XXXXX  | XXX  | XXXXX    
 2 | XXXX   | XXXXX  | XXX  | XXXXX
 3 | XXXX   | XXXXX  | XXX  | XXXXX
 3 | XXXX   | XXXXX  | XXX  | XXXXX
      |        |        |       | |
      |        +--------+---------+---> Rule 4: Student cannot
      |                         |       be in two places at once
      |                         |                                  
      +-------------------------+            
                                |       
                                |       
                      Rule 5: Student may not take         
                      a course more than once.

In this case the child entries need only an ID value from the parent and a value for STUDENT. We can still use unique constraints to enforce rules 4 and 5 if we copy values from parent to child.

Only One Difference Between Them

There is only one difference between these two approaches. The normalized approach requires no application code to maintain correctness, but the non-normalized integer key requires applicaton code to enforce rules four and five. This application code can take one of two approaches:

  1. Copy Values from parent table to child and use unique constraints enforced by the server, which is how I've depicted it, or
  2. Check Rules 4 and 5 in Code before an insert to the child table. There will be other considerations like possibly disallowing changes to PERIOD or CLASSROOM once the schedule is populated, otherwise you have to recheck every student when these change.

Because the normalized version requires less applicaton code, that would be my choice, barring any downstream difficulties that would make it impractical. Before today I have never in these essays raised the issue of possible problems with "downstream code." Most of use libraries of code called "frameworks", some of them taken from the internet and some of them homegrown. I now raise this issue of frameworks because we have now seen quite a few examples and a very strong pattern is emerging. The pattern is that many business requirements can be cast as unique keys and foreign keys in normalized tables. Normalized tables require less code and keys require no code. If this pattern continues as we see more Table Design Patterns, it will become more and more important to make sure the framework can support this approach.

Why The Framework Matters

Let us look at a very common user interface issue, filling in values from lookup tables. Consider the case where the complete schedule has been generated by a custom program, but now a school administrator must make a manual change for a particular student. The administrator will need some editing screen where she brings up the student's schedule, drops some course, and adds another.

This operation requires that the administrator be shown a list of available courses. A few years ago we would have done this with an HTML SELECT, but nowadays we probably have some fancy Ajax-ified DIV-based list doing it. But the result is the same: when the user has made her selection, the values are filled in.

Now at this point, if the framework does not know how to handle multi-column foreign keys, the whole normalized idea is out the window. Because it is true that a well-designed database is no good without a user interface, and because we usually depend on the framework to provide the user interface, the choice of framework suddenly seems very serious indeed.

Conclusion: The Importance of Tools

From the beginning of this series in November 2007 until this essay, February 12 2008, I have shown examples of how normalized tables lead to less code and easier code. Further, I have shown examples of how different table design patterns can be used to enforce business rules with the two basic tools of a unique constraint and a foreign key.

Now however we have seen enough examples and learned enough basics that we have to begin to ask about the layers above the database. Do they support our decisions in the database layer and can they implement those decisions? Or will they get in the way and cause us to compromise our table designs and end up with more application code that is complicated and error prone?

So now it is time to expand the original statement that 'good tables lead to good applications' and say as well that 'a good framework enhances (or wipes out) the strengths of the table design'.

Next week we are going to continue to examine how much tools matter by looking at the process of modifying table structures. Many frameworks and philosophies that are out there today are based consciously or subconsciously on a simple desire to try to avoid changing table structures. However, since table structure changes are a fact of life, we will look at how best to handle them.

5 comments:

Bruno Martinez said...

I didn't understand the schema based on oids. There are repeated ids in the example.

KenDowns said...

Bruno: I should have given them separate names in the two tables instead of "ID" in both. They are meant to represent the commonly-held but counter-productive idea that an integer pk on every table is the easiest way to go.

Naresh Dharmi Vatsal said...

why we have more than one foreign keys in particular table related to Parent table same primary key.
What is benefit of this approach?
i am posting table structure below

CREATE TABLE "VW_VISA_RECORDS"
( "ID" NUMBER NOT NULL ENABLE,
"PER_ID_CONTACT_PERSON" NUMBER,

//Here CMS_PERSONS table id is referred 2 times in this table


CONSTRAINT "CMS_VVI_PER_VWU_FK" FOREIGN KEY ("VWU_PER_ID")
REFERENCES "CMS_PERSONS" ("ID") ON DELETE CASCADE ENABLE,


CONSTRAINT "CMS_VVI_PER_CONTACT_FK" FOREIGN KEY ("PER_ID_CONTACT_PERSON")
REFERENCES ""CMS_PERSONS" ("ID") ON DELETE CASCADE ENABLE,

KenDowns said...

@naresh, I understand your schema, but not your question. What are you asking?

Rolfen said...

How is the last rule enforced, ie
"The student cannot pass the class more than once" (but he can fail it more than once...)?