Monday, December 10, 2007

Database Skills: Foreign Keys

This is the third in a series of articles that will help you develop your database skills, Since all non-trivial websites require a database, and since database skills are different from coding skills, there is very good reason for any programmer to master the principles of database design and use.

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

The Example

This week the example continues from the last two weeks. A school has 8 periods in a day that are varying lengths, sometimes an assembly, and of course homeroom. Our assignment is to code a screen that shows the begin and end times of each period depending on what schedule was being followed for the day. We have already decided to "spell it out" by populating a table with the start and end times for each period in each kind of day (the schedule was described in the first essay in the series. Here is a list of about half of the rows in the table:

DAY_TYPE  |  PERIOD | SEQUENCE  |  START    |   END
----------+---------+-----------+-----------+----------
NORMAL    |  HOME   |   0       | 8:00 am   |  8:10 am
NORMAL    |  1      |   1       | 8:15 am   |  9:00 am
NORMAL    |  2      |   2       | 9:05 am   |  9:50 am   
NORMAL    |  3      |   3       | 9:55 am   |  10:40 am
NORMAL    |  4      |   4       | 10:45 am  |  11:30 pm
NORMAL    |  LUNCH  |   5       | 11:35 pm  |  12:20 pm
NORMAL    |  6      |   6       | 12:25 pm  |  1:10 pm
NORMAL    |  7      |   7       | 1:15 pm   |  2:00 pm
NORMAL    |  8      |   8       | 2:05 pm   |  2:50 pm
ALONG     |  HOME   |   0       | 8:00 am   |  8:10 am
ALONG     |  1      |   1       | 8:15 am   |  8:50 am
ALONG     |  2      |   2       | 8:55 am   |  9:30 am
ALONG     |  3      |   3       | 9:35 am   |  10:10 am
ALONG     |  4      |   4       | 10:15am   |  10:50 am
ALONG     |  LUNCH  |   5       | 10:55 am  |  11:30 am
ALONG     |  6      |   6       | 11:35 am  |  12:10 pm
ALONG     |  7      |   7       | 12:15 pm  |  12:50 pm
ALONG     |  8      |   8       | 12:55 pm  |  1:30 pm
ALONG     |  ASMB   |   9       | 1:35 pm   |  2:50 pm
....
.... same for other two schedules
....

Now we will see how a foreign key will help to make sure the above table is correct.

The Foreign Key

A foreign key is used to prevent certain kinds of errors. In our example above we see that the column DAY_TYPE has only two values, NORMAL and ALONG (and ASHORT and ALUNCH that were not dipslayed). If a user were typing in this schedule and accidentally typed 'NRMAL' or 'NORMALL' we want to catch that and prevent it. A foreign key is how we do this. Making the column SCHEDULE_TYPE a foreign key means here is some other table that holds the allowed values. We call that other table the parent table. Such tables are also called lookup tables, reference tables and sometimes leaf tables. Here is the parent table of SCHEDULE_TYPES, which lists the allowed values of SCHEDULE_TYPE.

SCHEDULE_TYPE     |   DESCRIPTION
------------------+--------------------------------
NORMAL            |   Normal Schedule 
ALONG             |   Long Assembly
ASHORT            |   Short Assembly
ALUNCH            |   Short Assembly After Lunch

The column PERIOD will also be a foreign key, and the parent table will look like this:

PERIOD       |   DESCRIPTION
-------------+-----------------------------
HR           |   Homeroom
1            |   Period 1
2            |   Period 2
3            |   Period 3
4            |   Period 4
LUNCH        |   Lunch
6            |   Period 6
7            |   Period 7
8            |   Period 8
ASSM         |   Assembly

Database people say that a foreign key enforces "referential integrity". This term comes from the fact that one column reference another, and this reference ensures the integrity of the data.

The SQL Code

This SQL is illustrative only. Your particular database may require additional commands or syntactical variations.

Creating the two lookup tables is accomplished like so:

CREATE TABLE periods (
     PERIOD char(8)
    ,DESCRIPTION char(25)
    ,PRIMARY KEY (PERIOD)
)
CREATE TABLE day_types (
     DAY_TYPE char(8)
    ,DESCRIPTION char(25)
    ,PRIMARY KEY (DAY_TYPE) 
)

The table of actual periods for each kind of day is created with the code below. I have used type 'character' for the start and end times to keep things simple for now. Note the two foreign keys, one to each table:

CREATE TABLE periods_x_days (
     DAY_TYPE char(8)
    ,PERIOD char(8)
    ,SEQUENCE int
    ,START char(8)
    ,END char(8)
    ,PRIMARY KEY (DAY_TYPE,PERIOD)
    ,FOREIGN KEY (PERIOD) REFERENCES PERIODS(period)
    ,FOREIGN KEY (DAY_TYPE) REFERENCES DAY_TYPES(DAY_TYPE)
)

There is a multiple-column primary key that I snuck in there. By enforcing a primary key on DAY_TYPE and PERIOD, we make sure that there is no accidental double listing for a particular type-period combination.

The Basic Building Blocks

Now that we know how a foreign key works and what it does, we can start to talk about why it is important. In short, the foreign key connects your tables together so that you have a true database. Without the foreign key all you have is a jumbled bag of unrelated tables.

An analogy to code may be in order here. Most of us can list many ways in which bits of code can connect to each other. You can call a function, you can make complex objects via composition, you can make classes via inheritance, along with many others. All of these mechanisms tie bits of code together into a coherent application. The corresponding fundamental way to tie tables together in a database is the foreign key.

So to sum it up, we have seen so far that:

  1. A database programmer by inclination wants to spell everything out in tables.
  2. Each table stores information about exactly one kind of thing, and a primary key makes sure there are no duplications.
  3. Foreign keys define how different kinds of things relate to each other.

The Server Does The Work

Using a foreign key is a good way towards moving your business logic into the database server itself. Once you have defined the foreign keys above, the server will never allow you (or your users) to accidentally or maliciously insert bad data. This command:

INSERT INTO periods_x_days (
    day_type,period,sequence,start,end
)
values (
   'FICTION','99',5,'11:00 pm','11:05 pm'
)

...will throw an error when you try to execute it. The exact wording varies from product to product, but the bottom line is you are fully protected from these errors without doing any coding of your own.

Foreign Keys and The Classic Blunder

Last week we noted that many programmers have the idea that they can combine the PERIODS table and the DAY_TYPES table into a single table, thereby "saving" a table. I did my best to debunk this idea by pointing out that

  1. First, it is actually less efficient to do so because it complicates code and requires at least one extra colum, and
  2. Second, it destroys the value of the primary key column as a unique identifier of a single kind of thing.

We now have a third argument, based on our understanding of the foreign key. If we combine the tables then we can no longer use the combined table as a lookup table for foreign keys. The classic blunder of "saving a table" means the resulting table cannot be tied properly to anything else. This kills the idea off completely, since the entire purpose of creating the tables is to use them as parent tables for foreign keys!

Conclusion: The Building Blocks

When we grind out code, we do not usually expect the customer to give us rigorous specs. They tell us what they need in their own language and we have to convert those needs into functioning code. We use building blocks like classes, functions, conditionals and iterators to make those programs.

Designing a database is the same as designing code insofar as the customer will not be able to design it for you. They will be able to tell you want they want in their own language, but you will have to design the tables. That process of table design is all about identifying primary and foreign keys. Next week we will begin to see how to do this, through the process of normalization.

Other Posts

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

6 comments:

zippy1981 said...

Ken,

Yet another outstanding article. I hope you continue the series and touch on such things as mapping tables and multi column primary keys.

Subbu said...

This is a great tutorial. I am enjoying it :)

KenDowns said...

subbu: Thanks!

Anonymous said...

Another good article, thanks.

Noticed another little typo. You used "ASMB" for the assemble in the first table, but "ASSM" in the third table.

Louis

Mário Marinato said...

Hello, buddy. This is another great article. You really got the gift to write clearly. Thanks for taking some of your time to do so.

Here's my contribution to the typo-hunting: just before the code to create the third table, you wrote "I have used type 'character' for the start and times". There's an 'END' missing here.

Keep up the good work, man!

Cheers from Brazil

KenDowns said...

Mario, thank you much, I've corrected it.