Welcome to the Database Programmer, the blog for anybody who wants to learn the practical realities of working with databases.
There is a new entry every Monday morning, and the complete table of contents is here. This week we are talking about tools again, specifically upgrades.
The Impermanent Primary Key
Sometimes you get a situation where you have a great possibility for a natural key, except that the key will change from time to time. I call this the "Impermanent Primary Key" and we are going to look at this pattern today.
Magazines and BIPADS
I am currently writing a system for a magazine distributor, while also maintaining his old system for him. When I first started with him he explained that a magazine is identified by a "BIPAD", a 5 digit number unique to the magazine. It seemed to me that if you take a BIPAD, and issue number, and a volume year, you have a pretty good choice for a natural primary key.
Then one day he walked up to me and said, "I have fourteen magazines with new bipads this month." This was a surprise, as I had been fairly sure that a BIPAD and a magazine were uniquely matched for all eternity.
Sidebar: Did the Requirement Change?
Before we get into the design pattern for this, it is worth asking, is this a case where the customer changed the requirement? The answer is a most emphatic no! The customer told me, if I had listened precisely, that a magazine "has a BIPAD". He never said that a magazine has only one unique BIPAD, I inferred that myself. Such inferrences are mistakes, and we cannot blame them on the customer, because it was my job to ask all sides of the question, such as "can a magazine have more than one BIPAD?" or "Is a BIPAD permanent?" or "Can a BIPAD ever be used again on another magazine?"
Spelling Out The Requirements
So the reality is that a BIPAD is assigned to only one magazine, but a magazine may get more than one BIPAD, as the bipad will change from time to time. My customer's operational requirements are:
- The history of individual BIPADs must be preserved in the transaction tables.
- The histories of multiple BIPADs for a single magazine may need to be combined from time to time.
- During a BIPAD change, transactions will continue for both BIPADs.
I should also note that transaction information comes from outside sources and always contains the BIPAD. It is our responsibility to make sure the BIPAD can be mapped into our database correctly. We have no power to ask outside parties to use numbers created or assigned by us.
Some Candidate Solutions
There are a few ways to handle this. Here are four choices that are most likely to come up in a brainstorming session:
- Use BIPAD as a primary key in the BIPADS table, and change the BIPAD as required.
- Create a From-To table that maps old BIPAD values to new values.
- Use an integer primary key in the BIPADS table and make BIPAD just a regular column, so you can just change it as required.
- Make the table of BIPADs a child table to a table of MAGAZINEs. When a BIPAD changes make a new entry in the BIPADs table.
We can eliminate option one immediately, because foreign keys do not allow this possibility. If you have a table of BIPADS, with BIPAD as a primary key, then your transaction tables will have foreign keys to this table. If you could change the value of a BIPAD then suddenly those transactions that reference the old value would not be valid. We say in every day language that you would orphan the transactions. A foreign key exists to prevent such things, so option one is out.
Option two looks pretty reasonable, but it is no good at all. I included option two as an example of what happens when you use a coding mentality to design tables. Option two records accurately an action, the changing of a BIPAD, by recording the old and new values. While this may seem harmless, it makes a wreck out the idea of using primary keys and foreign keys. What is the primary key of such a table? What table and column do the transaction tables reference? When an idea gives problems for the basic building blocks of a database, we know that nothing but trouble is going to follow.
This leaves options three and four, which we will now consider in more detail.
The First Two Requirements in Detail
The first two requirements are that we must be able to examine the individual histories and the combined histories as requested. Options three and four both allow this, as both of them have the BIPAD values in the transaction tables and both options also have a table that can be used to combine histories.
Option four uses the BIPAD values natively in all tables, so there is no need for any special planning or actions when saving data or querying it. If you want to query for a single BIPAD, then specify that. If you want to query for a magazine, then JOIN to the MAGAZINES table and filter on your magazine and there you go. Option four satisfies the requirements with no fanfare.
Option three requires some additional storage. The transaction tables need to keep track of the BIPAD for accuracy, but they also need that meaningless integer foreign key to the table of BIPADS. This always strikes me as funny because the integer key is promoted as a performance method, but the basic reality of a disk drive is that performance goes as the amount of data you have to read and write. Where option four writes only the value of BIPAD, option three must write more data, and therefore will always be slower.
Option three also requires more code and more disk activity. Because we receive transaction data as BIPAD values, but we are using a meaningless number as the foreign key to the BIPADS table, we have to do a lookup into the BIPADS table to find out the meaningless integer key. This means we need a read operation that was not necessary with Option four. Again this is rather ironic since the integer key is promoted as a performance tool.
Finally, the problem of having to look up the value of the BIPAD's integer key introduces application code, so now with Option three I have to write code where for Option four I do not.
But when all is said and done, if your tools lock you into option three and you have no choice, you can in fact satisfy requirements one and two. You wll be able to examine both the individual histories and the combined histories. So we probably have to say so far that it is a matter of taste, tools, and habits.
Disqualifying Option Three
The third requirement is that transactions will occur for both BIPADs during a transition. For option four this is not really an issue because there are two rows in the BIPADS table, one for each BIPAD. Any transaction table that is a child table of BIPADS is automatically covered by the foreign key.
Option three however has a fatal problem. When a transaction comes in with a BIPAD on it, we have to validate it by looking up the BIPAD in the BIPADs table, so we can get the meaningless integer primary key. Except that after a new value is entered the old value will not be there. We cannot satisfy the third requirement and Option three is now disqualified.
Those who are comfortable using integer primary keys for everything may suggest a way to rescue the situation, but because Option four does not require any application code and does not require rescuing, I will leave it to others to rescue option three.
More Details On The Impermanent Primary Key
When we have an impermanent primary key, one that changes from time to time, we can create a pair of tables. The master entities are tracked in the top table, and the child table tracks the impermanent values. This second table is the parent of all of the transactions:
In the example above, which has to do with magazines and BIPADs, the master table I built uses character primary keys, and the child table uses the BIPAD values, so it looks something like this:
MAGAZINE (PK) | DESCRIPTION --------------+-------------------------- TVGUIDE | TV GUIDE COSMO | COSMOPOLITAN ASTORIES | AMAZING STORIES CROSSW | CROSSWORDS | | | | /|\ MAGAZINE | BIPAD (PK) | OTHERS... ---------+------------+------------------- TVGUIDE | 12345 | XXXXX TVGUIDE | 34345 | XXXXX COSMO | 29830 | XXXXX COSMO | 23813 | XXXXX | | | /|\ Transaction Tables have foreign keys to this column
The SQL that would create these two tables might look like this:
CREATE TABLE MAGAZINES ( magazine char(10) ,description varchar(35) ,primary key (magazine) ); CREATE TABLE BIPADS ( magazine char(10) ,bipad char(5) ,primary key (biapd) ,foreign key (magazine) references magazines (magazine) )
Conclusion
The "Impermanent Primary key" pattern occurs when some value is permanent as far as individual transactions are concerned, but may change over the lifetime of the master entity. In these cases, we create a parent-child table pair. The actual master table sits at the very top, with a child table below it that holds the semi-permanent values. All transactions are children of the child table.
Next Essay: The Requirements are Always Wrong, Or, Iterative Database Development.