Monday, November 26, 2007

Database Skills: Introduction

This is the first in a series of essays that will help you develop your database skills from the beginning to the advanced.

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.

Since all non-trivial websites require a database, there is plenty good reason to master the principles of database programming.

Database skills are very different from coding skills. In many cases, the reflexes and intuition that lead an experienced programmer into great code will just as easily lead his database efforts into one classic blunder after another. To avoid these blunders, and indeed to produce masterful database sites, the programmer must learn something of the database in its own terms.

The recurring theme of these essays will be that good database design makes for good and efficient code, the two ideas do not oppose each other. Another basic thrust of the entire series will be that databases should be learned and understood in their own terms, instead of trying to gloss over the differences between database design and coding.

The Example

During the series we are going to build a school database, and in this first essay we are going to look only at the start and end times for the class periods. Our school administrator, Dr. Ann Clayborne, explains the schedule to us in plain English like so:

Each day is divided into 8 periods, where period 5 is always lunch. The day begins promptly with homeroom at 8:00 am which lasts for 10 minutes. Period one begins at 8:15 am. All periods are 45 minutes and there is a five minute break between each period. On assembly days the periods are 35 minutes each. If we are going to have a short assembly the periods are 40 minutes. Assemblies are at the end of the day and fill out the remainder of day. Sometimes we have the short assembly after lunch.

Figure it Out or Spell it Out?

Our assignment is to code up a screen that displays the start and end times of each period, including homeroom, lunch, and an assembly if necessary. The start and end times of periods will vary based on what kind of day it is.

For this introduction, I want to begin with the different mindsets that can be used to get the job done. In a nutshell, a code grinder is going to figure it out, while a database programmer is going to spell it out. Let us look at how the coder thinks of the problem, and then we will look at how the database programmer thinks of the problems.

The coder says to himself something like, "Well let's see, I'll hardcode homeroom at the start of each day. Then I'll need a loop that goes from one to eight. I'll hardode 8:15 as the start of period one, and depending on the day, I'll keep adding 35, 40, or 45 minutes to figure out the end of the period. Then I'll add 5 minutes to get the start of the next period. If there's an assembly I'll just have to add some if/else stuff in there. Oh, wait, I'll also have to calculate ahead of time the end of the day so I can work out the length of an assembly, since they did not tell me that."

Now let us consider what the database programmer might do. He says to himself, "Hmmm, I think I'll make a table that lists the start and end times of each period for each kind of day. I will put the assemblies and homeroom in there. On the scheduling page I'll just pull the schedule out of the database and put it on the screen."

The coder has cast the problem in terms of actions, which is what programs are all about. This may seem so obvious that there is no reason to bring it up, but but we do need to make this point strongly. No matter what language or orientation (procedural, object, aspect, functional), every program always comes down to a series of actions take in sequence. It is not so much knowing this that is important as remembering this when making design decisions, as we shall see in this series.

The database programmer, on the other hand, has cast the problem in terms of information he can store for later retrieval. Our example, while very simple, illustrates very clearly how different mindsets approach a problem. The coder thinks in terms of a series of actions while the database programmer thinks in terms of storing and retrieving facts. If we need a bumber sticker for this or a T-shirt slogan, we restate our starting point, which is that the coder wants to figure it out while a database programmer wants to spell it out.

Conclusion: the Tables and the Code

So we know that the database programmer is going to create tables and populate them with the data that he can then read out of the screen. A portion of the table would look like so:

NORMAL    |  HOME   |   0       | 8:00 am   |  8:10 am
NORMAL    |  1      |   1       | 8:15 am   |  9:00 am
NORMAL    |  2      |   2       | 9:05 am   |  9:50 am 
NORMAL    |  3      |   3       | 9:55 am   |  10:40 am
NORMAL    |  4      |   4       | 10:45 am  |  11:30 pm
NORMAL    |  LUNCH  |   5       | 11:35 pm  |  12:20 pm
NORMAL    |  6      |   6       | 12:25 pm  |  1:10 pm
NORMAL    |  7      |   7       | 1:15 pm   |  2:00 pm
NORMAL    |  8      |   8       | 2:05 pm   |  2:50 pm
SHORT     |  HOME   |   0       | 8:00 am   |  8:10 am
SHORT     |  1      |   1       | 8:15 am   |  8:50 am
SHORT     |  2      |   2       | 8:55 am   |  9:30 am
SHORT     |  3      |   3       | 9:35 am   |  10:10 am
SHORT     |  4      |   4       | 10:15am   |  10:50 am
SHORT     |  LUNCH  |   5       | 10:55 am  |  11:30 am
SHORT     |  6      |   6       | 11:35 am  |  12:20 pm
SHORT     |  7      |   7       | 12:15 pm  |  12:50 pm
SHORT     |  8      |   8       | 12:55 pm  |  1:30 pm
SHORT     |  ASMB   |   9       | 1:35 pm   |  2:50 pm
.... same for other two schedules

Returning now to the main theme, that good database design makes for good and efficient code, we look at the code that would display the periods for a single day (assume that $DAY is sanitized):

  "SELECT * FROM scheduleperiods
    where day_type=$DAY
    ORDER BY sequence"
foreach($rows as $row) {
  // Display start and end times

I will contend here with little argument that you cannot reduce the code to anything simpler than that. Any program that does not already know the period start and end times will have to figure them out, and will be more complicated, therefore harder to code, and more likely to contain errors.

In the next essay we will look at this table again, along with two other tables, and we will demystify primary and foreign keys.

Appendix: Fixing A Mistake

There is a deliberate mistake in the data listed above. How easy is it to fix, and how does this compare with fixing a mistake in code?

Other Posts

This essay is part of the Keys, Normalization and Denormalization series. There is also a Complete Table of Contents and a Skills-oriented Table Of Contents.


Donald said...

You may want to post the code to SQL_AllRows this way people know how the data gets returned since it is not a natural PHP function.

Chris said...

Thanks for starting this column. I foresee you generating lots of useful links for my web design students.

garyamort said...

The primary key is the key that uniquely identifies a row, and due to changing requirements, it is best to simply set it as an integer which is auto incremented.

Other columns that need to be unique can simply have a unique index placed on the column(s) to ensure uniqueness.

Lastly, * is a bad idea in any code as it depends on the naming convention to be as expected and in the proper order, depending on code. Far better to be explicit, SELECT pid, code, description FROM...

KenDowns said...


With regard to best practice for primary keys, early posts in this series will concentrate on the basic unambiguous and uncontroversial definitions, such as what a primary key actually is. Later on we will get into different schools of thought on best practices for assigning primary keys, that is when I will treat the practice you mention as well as other equally valid practices.

rmarscher said...

I hate to even point this out, because it's so minor... but the SQL_AllRows query is missing it's closing double-quote. Looks like a nice blog though.

KenDowns said...

Rob, thanks, I fixed it.

elharo said...

Surprisingly, not every program comes down to a series of actions take in sequence. This model reflects common imperative programming languages such as Java, C, and Fortran. However functional programming languages such as XSLT and Scheme do not specify any particular sequence for their actions, and indeed the same program may be run in different orders by different compilers/interpreters.

John said...


So XSLT does not *have* to do iterative processing? Is that comment theory or practice, or both?

Can you provide an example of how XSLT is actually used in a way that is not a sequence of actions? I know one way to look at XSLT processing is as a series of SAX events, but I don't see how two different XSLT processors could produce two "different" documents. In fact, I don't see an advantage to this. Why would I want two different XSLT processors to produce two "different" documents? Do I gain anything significant that is worth considering when writing XSLT documents?

Note: "different" is used here such that two documents are "different" if a UNIX diff of each output reveals a line-by-line difference.

elharo said...

I said XSLT might run the statements in a different order. I didn't say it would produce different output because it ran them in a different order.

The key to functional programming is designing the language such that the order of execution of functions does not change the eventual outcome.

For example, consider the composition of two functions f(g(x)) where

f(x) := 2*x
g(x) := 3 - x

You can plug numbers into this function, but it doesn't matter whether you evaluate f or g first. For example, suppose we plug in 7. Evaluating g first we have:

g(7) = 3 - 7 = -4
f(-4) = 2 * -4 = -8

Evaluating f first we have

f(g(7) = 2 * g(7) = 2 * (3 - 7) = 6 - 14 = -8

Comes out the same in both orders.

This order independence comes out naturally as long as the programming language is defined as the composition of pure functions.

KenDowns said...


"The key to functional programming is designing the language such that the order of execution of functions does not change the eventual outcome"

There is much literature along the same lines for "declarative" languages. CSS is declarative for example, while HTML is not.

I mention it because one of Andromeda's
features is the ability to map procedural business logic into declarative business logic.

elharo said...

Yes. That would be a shared characteristic of SQL and functional languages.

However, functional languages are still 3GLs in the sense that an algorithm is provided. The algorithm simply doesn't include any constraints on which parts are carried out when.

4GL declarative languages like SQL go one giant step further: they don't even specify an algorithm. They leave it up to the system to decide how best to find the requested information.

In essence with a 4GL, you're figuring that the database can optimize the query better than you yourself can. In practice, that's true far more often than not.

John said...


Still, you said:

"Surprisingly, not every program comes down to a series of actions take in sequence."

I'm applying your statement to your other statement:

"However functional programming languages such as XSLT [...] do not specify any particular sequence for their actions."

It's clear now that you are referring to the implementor's view-point. The tip-off was the mention of #GLs. That conceptual breakdown is mostly useful from the compiler implementor's point of view. I can't see how it applies to the context Ken is discussing sequences of actions.

If an XSLT document is viewed as a series of SAX events, then the most important thing that matters is the SAX events result in the same transformation regardless of the XML processor. In fact, most solutions for parsing, transforming, validating and querying XML documents use some structural pattern (like an Adapter) to make the application independent of the vendor API. For XML stream processing, an example would be the StAX extension to JAXP.

No matter how you look at an XSLT document, hierarchical structure of the input data necessarily specifies what the output data will look like. Sure, an XSLT for-each can be executed in parallel, but it's output is glued together conceptually as a sequence of actions. A good example of this is MapReduce. That appears to be how Ken is using the idea of "sequence of actions". Ken's talking about data flow.

The only way to object to Ken's view is to point out event-driven programming can have almost arbitrary asynchronous actions. (Some synchronization can be achieved by reserving slots for signals and only firing an event off to all listeners when the signals in these slots form a known state.)

Ken's talking from a programmer's perspective. You appear to be talking from a compiler implementor's perspective.

Mário Marinato said...

Hey, buddy, that's a great blog you've got here. I just started a database administration course and your site will surely help me through the next months. Thanks for taking your time to write.


It look like no one pointed out the mistake on the table you listed, so here I go: I found it on the fifth and sixth lines: 11:30 and 11:35 are listed as 'pm', when they should be 'am'.

In fact, that's really easy to fix.