Welcome to the Database Programmer, the blog for anybody who
wants to learn the practical realities of working
with databases.
There is a new entry every Monday morning, and the
complete table of contents is here. This week we are
talking about tools again, specifically upgrades.
Requirements and Changes
We all write programs for other people to use, a
programmer without his users is like a musician
without an audience. So it goes without saying
that we need to know what our customers want.
The requirements we are given will never be right.
We are destined
to get requirements that are incomplete, incorrect,
contradictory, and impossible. We cannot control this
reality, we can only control our response to it.
Most programming theory is at its base a response to
this reality, so that you have the "waterfall" crowd
trying to get everything right before you start, the
CMM crowd trying to figure out exactly how long it
will take, and
the "agile" crowd saying you'll never get it right so
get started and modify as you go.
On the database side we do not have such a wealth of
theory on how to deal with mistakes (and changes).
This may be because the trend these
days is for theories and frameworks that preach a
gleeful ignorance of database principles, or it may
be because SQL/Relational databases were already firmly
established before the explosion of the web. But whatever
the reason, we have a general need for a basic review
of how to build your database when you know that
the requirements you are given will be full of holes.
The Perfect Requirement
Before we look at best practices, we will spend a few paragraphs
looking at why the requirements will never be right.
For one particular requirement to go completely right,
each of these seven things numbered events below must happen:
- The customer herself must understand the
requirement.
- The customer must consider the requirement worth
mentioning.
- The customer must state the requirement correctly.
- The requirement must get to the programmer
without corruption.
- The programmer must understand it.
- The programmer must design tables correctly.
- The programmer must correctly write supporting code.
These are not impossible, but since human beings are not
perfect, each step will be wrong some percent of the time,
and sometimes more than one will be wrong, and so on.
The last 3 all involve programmer error, and they all point
to the need for your development process to have some
iterative element, because it is inevitable
that somebody will be going back and repeating some work.
Steps 1 and 3 involve customer error. If you are lucky
the error will be bad enough to be noticeable, so that
you can go back to the customer and ask about it before
any tables are built or code is written. In any event,
dealing with these problems is much more about diplomacy
than technical knowledge. On the plus side, since these
things are inevitable, if you build your diplomatic skills
you will win a lot
of good will from your customers, which is worth far more
than technical skills.
Item 4 deserves some special mention. Item 4
is more likely to get screwed up the larger your organization
is. Some schools of thought recognize this and preach
a super-close connection between customers and programmers,
though this is often very difficult to get across to
managers who believe that layers of communication provide
protection against mistakes.
Item 2 is probably the worst. This gets messed up when
some requirement is so much a part of a customer's experience
that it would never occur to them to tell you, anymore than
they would include "remember to keep breathing"
in instructions on how to setup a home theatre system.
(See the earlier
essay on Limited Transactions for an example).
It Only Gets Worse
The seven items listed above are the simplest possible
list of requirements. In a larger company there will be
intermediate steps, each of which is usually meant to introduce
oversight but which usually introduces more errors. This
can only make things worse for the requirements.
Moreover, the guiding philosophy of the technical manager(s)
can introduce serious systematic error. For instance, certain
fashionable ideas like
"Table structure is not that important, just get the
code right first" can take hold of a project and cause huge
amounts of wasted effort because it just ain't so.
There are lots more problems like prima donna programmers,
staff changes, politics, and so on. There is no end to the
number of factors that can interfere with the
correct articulation, communication, and implementation of
requirements.
Iterative Database Development
The experience of decades has led again and again to the
conclusion that software development is iterative,
you will always go back and change the code.
There is no such thing as the finished program (except
maybe for PING), and there is rarely any such thing as the
bug-free program (again except maybe for PING).
But how exactly do we do iterative develpment with a
database? That is what I would like to address now.
We have to begin by looking at the kinds of changes
you can make to a database.
- Trivial Changes Like widening a column
- Additons such as new columns and new
tables.
- Tangled and Difficult changes like correcting
a structure mistake and copying and modifying data
into new locations.
Any sensible person is going to try to avoid #3,
and by contrast, changes of type #1
are no problem if we have tools that support structure changes. This leaves us with the surprising
conclusion that a strategy for iterative database development
is to spend your time making additions. When requirements are
found to be wrong, incomplete, contradictory, or in any other
way messed up, the ideal situation is to add a column or
two, add a table or two, and add some code, without ever having
to touch the existing code.
We will now see how to do this.
Identify Master Tables and Master Transactions
Every program has a core purpose, and at the center of this
purpose will be master entities that have to
be tracked. For an online store these will be at the very
least customers, items, and a cart (header and lines). For
the school management program that I have used in these
essays, these are teachers, students, courses, and
course enrollments.
When you are strongly confident that you have a few master
entities identified, you can start to frame your efforts
around them. You can ask for clarification of customer
requirements in terms of these master entities by asking,
"Is this a property of the customer?" or "Can a student
have more than one major?"
The deliverable at this stage is skeleton tables,
primary keys and
foreign keys. That is, to identify the central entities
and their relationship to each other. This effort is
not so much about identifying the attributes
or properties of these entities, but simply the entities
themselves. Once you have identified the entities, you will
basically spend the the rest of the project adding columns
(and code) as the requirements are worked through. This is
the essence of doing iterative development as a series of
additions.
The great thing about this approach is that if you
miss some major table you rarely have to perform major
surgery on the work you have already done.
If further discussions
reveal a central entity that was missed before, just make
a new table for it. The amazing and cool thing about this
is that you rarely have to change the existing tables
you have already worked up, except perhaps to add a new
foreign key.
Translate Features Into Columns and Tables
Many features can be resolved into columns once you have
the main tables worked out. For instance, on a school
system the specifications may say "every student must
choose a major." A seasoned database veteran immediately
recognizes this as a foreign key. There is a table
of majors (which is definitely related to the table
of departments somehow) and a "MAJOR" column in the
students table.
In terms of iterative development, we can see that it
should be fairly painless to add in a system for majors
without disturbing the rest of the system. We add a
table of MAJORS (probably with a foreign key to the
table of departments), and add a MAJOR column to the
STUDENTS table as a foreign key to the MAJORS table.
To repeat the main theme, the last thing we want to do
is get stuck massaging data or moving it from one table
to another on a live system. By far the
much easier route is to simply add tables or columns.
If you can learn to think of features in terms of the
underlying tables and columns, then you can move towards
always making additions, which are far easier.
Identify Inert Columns
I like to use the term "inert" to refer to columns that
are not involved in any business rules but must be
printed out or displayed from time to time. The easiest
example would be a student's mailing address. This has
nothing to do with her major, her grades, or her
courses, but must be printed occassionally for mailings
or on paperwork.
The main point with inert columns is that you can
add them in whenever you want without affecting
the rest of the system. When the requirements
omit these features they are the easiest to put in
after the fact.
Go For Maximum Detail
Imagine you are asked to make a simple Business-to-Business
system in which customers are shipped their goods and billed
later. The requirements say only that "customers often pay
more than one invoice with a single check and this should be
easy for the user to process."
So you know already that you have a table of customers, and
a table of invoices. Now we have to make a table of
payments. Here is the killer question: does the table of
payments link to invoices or to customers?
If we link payments to customers then we have the easiest
system for the user, they just enter the customer and the
check and we are done. By contrast, linking the payments to the
invoices is much harder because we have to actually create
two tables, a table of PAYMENTS and a table of SPLITS, which
details how a payment is split out to apply to various
payments. Worse, we probably have to add some special
user interface screens to make it easy for the user to
drag-n-drop or otherwise enter a payment and apply it to
their open invoices.
However, the right thing to do is to go for maximum detail,
and apply the payments to the invoices. There are two
major reasons. First, it is more faithful record
keeping. If you do not do it then the customer will
sooner or later ask for the details you have not provided.
Second, horror of horrors, if you have a live system and have
been applying payments to customers, then when you go
over to the other system you will have to move data
around and, worse, you will have to guess at how
to apply already entered data. This is why tables
should always be created with maximum detail in
terms of primary and foreign keys.
Everything Else
We have now seen a basic outline of how to do database
design in such a way that you can do it iteratively
by making mostly additions and avoiding painful
data re-arrangments.
There are more details than will
fit into a single blog entry, but hopefully this
will give the basic idea.
Next week we will begin to see query building. I had
originally expected to do calculated values first,
but upon reflection realized that queries should
come first, because query needs often determine
strategies for putting in calculated values.
Next Essay: Introduction To Queries