Monday, November 26, 2007

Database Skills: Introduction

This is the first in a series of essays that will help you develop your database skills from the beginning to the advanced.

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

Since all non-trivial websites require a database, there is plenty good reason to master the principles of database programming.

Database skills are very different from coding skills. In many cases, the reflexes and intuition that lead an experienced programmer into great code will just as easily lead his database efforts into one classic blunder after another. To avoid these blunders, and indeed to produce masterful database sites, the programmer must learn something of the database in its own terms.

The recurring theme of these essays will be that good database design makes for good and efficient code, the two ideas do not oppose each other. Another basic thrust of the entire series will be that databases should be learned and understood in their own terms, instead of trying to gloss over the differences between database design and coding.

The Example

During the series we are going to build a school database, and in this first essay we are going to look only at the start and end times for the class periods. Our school administrator, Dr. Ann Clayborne, explains the schedule to us in plain English like so:

Each day is divided into 8 periods, where period 5 is always lunch. The day begins promptly with homeroom at 8:00 am which lasts for 10 minutes. Period one begins at 8:15 am. All periods are 45 minutes and there is a five minute break between each period. On assembly days the periods are 35 minutes each. If we are going to have a short assembly the periods are 40 minutes. Assemblies are at the end of the day and fill out the remainder of day. Sometimes we have the short assembly after lunch.

Figure it Out or Spell it Out?

Our assignment is to code up a screen that displays the start and end times of each period, including homeroom, lunch, and an assembly if necessary. The start and end times of periods will vary based on what kind of day it is.

For this introduction, I want to begin with the different mindsets that can be used to get the job done. In a nutshell, a code grinder is going to figure it out, while a database programmer is going to spell it out. Let us look at how the coder thinks of the problem, and then we will look at how the database programmer thinks of the problems.

The coder says to himself something like, "Well let's see, I'll hardcode homeroom at the start of each day. Then I'll need a loop that goes from one to eight. I'll hardode 8:15 as the start of period one, and depending on the day, I'll keep adding 35, 40, or 45 minutes to figure out the end of the period. Then I'll add 5 minutes to get the start of the next period. If there's an assembly I'll just have to add some if/else stuff in there. Oh, wait, I'll also have to calculate ahead of time the end of the day so I can work out the length of an assembly, since they did not tell me that."

Now let us consider what the database programmer might do. He says to himself, "Hmmm, I think I'll make a table that lists the start and end times of each period for each kind of day. I will put the assemblies and homeroom in there. On the scheduling page I'll just pull the schedule out of the database and put it on the screen."

The coder has cast the problem in terms of actions, which is what programs are all about. This may seem so obvious that there is no reason to bring it up, but but we do need to make this point strongly. No matter what language or orientation (procedural, object, aspect, functional), every program always comes down to a series of actions take in sequence. It is not so much knowing this that is important as remembering this when making design decisions, as we shall see in this series.

The database programmer, on the other hand, has cast the problem in terms of information he can store for later retrieval. Our example, while very simple, illustrates very clearly how different mindsets approach a problem. The coder thinks in terms of a series of actions while the database programmer thinks in terms of storing and retrieving facts. If we need a bumber sticker for this or a T-shirt slogan, we restate our starting point, which is that the coder wants to figure it out while a database programmer wants to spell it out.

Conclusion: the Tables and the Code

So we know that the database programmer is going to create tables and populate them with the data that he can then read out of the screen. A portion of the table would look like so:

NORMAL    |  HOME   |   0       | 8:00 am   |  8:10 am
NORMAL    |  1      |   1       | 8:15 am   |  9:00 am
NORMAL    |  2      |   2       | 9:05 am   |  9:50 am 
NORMAL    |  3      |   3       | 9:55 am   |  10:40 am
NORMAL    |  4      |   4       | 10:45 am  |  11:30 pm
NORMAL    |  LUNCH  |   5       | 11:35 pm  |  12:20 pm
NORMAL    |  6      |   6       | 12:25 pm  |  1:10 pm
NORMAL    |  7      |   7       | 1:15 pm   |  2:00 pm
NORMAL    |  8      |   8       | 2:05 pm   |  2:50 pm
SHORT     |  HOME   |   0       | 8:00 am   |  8:10 am
SHORT     |  1      |   1       | 8:15 am   |  8:50 am
SHORT     |  2      |   2       | 8:55 am   |  9:30 am
SHORT     |  3      |   3       | 9:35 am   |  10:10 am
SHORT     |  4      |   4       | 10:15am   |  10:50 am
SHORT     |  LUNCH  |   5       | 10:55 am  |  11:30 am
SHORT     |  6      |   6       | 11:35 am  |  12:20 pm
SHORT     |  7      |   7       | 12:15 pm  |  12:50 pm
SHORT     |  8      |   8       | 12:55 pm  |  1:30 pm
SHORT     |  ASMB   |   9       | 1:35 pm   |  2:50 pm
.... same for other two schedules

Returning now to the main theme, that good database design makes for good and efficient code, we look at the code that would display the periods for a single day (assume that $DAY is sanitized):

  "SELECT * FROM scheduleperiods
    where day_type=$DAY
    ORDER BY sequence"
foreach($rows as $row) {
  // Display start and end times

I will contend here with little argument that you cannot reduce the code to anything simpler than that. Any program that does not already know the period start and end times will have to figure them out, and will be more complicated, therefore harder to code, and more likely to contain errors.

In the next essay we will look at this table again, along with two other tables, and we will demystify primary and foreign keys.

Appendix: Fixing A Mistake

There is a deliberate mistake in the data listed above. How easy is it to fix, and how does this compare with fixing a mistake in code?

Other Posts

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.