The stunning power of a data dictionary comes into play once the dictionary contains formulas for calculated values. The dictionary can then be used to generate code, and also to generate documentation. This double-win is not available without the calculations because the resulting docs and database would be incomplete, requiring tedious and error-prone manual completion.
There are links to related essays on normalization and denormalization at the bottom of this post.
This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.
Calculations and Normalization
Before I begin, I will point out that all calculated values stored in a database are denormalizing, they all introduce redundancies. This does not mean they are bad, it just means you need a way to make sure they stay correct (see Keeping Denormalized Values Correct, also see Normalization and Automation). If you cannot keep them correct, they will get very bad very fast. This essay will show you one approach to ensuring calculated values are always correct.
However, before I start, I have to point out how important it is to begin by normalizing your database (to at least 3NF) and adding calculations only upon the strong foundation of a normalized database. If you do not normalize first, you will discover that it is impossible to work up formulas that make any sense -- values will always seem to be not quite where you need them, and it will always seem you need one more kind of calculation to support, and it will be very difficult to write the code generator that gives strong results. But if you build on a normalized database, it turns out you only need a few features in your dictionary and your code generator.
Use Denormalization Patterns
Once you have normalized your database, you will find that your calculations all fall into three basic categories (detailed in April 2008 in Denormalization Patterns). These three patterns are:
- FETCH operations, like copying an item's price from the ITEMS table to the ORDERLINES table.
- EXTEND operations, which are calculations within a row, such as assigning EXTENDED_PRICE the value of QUANTITY * PRICE.
- AGGREGATE operations, like a SUM of the lines of an order to the order header.
This week we will look at the first type of operations, the FETCH operations.
Putting the FETCH Into Your Data Dictionary
So we have an ORDERLINES table, and it contains a PRICE column, and the value of that column should be copied from the ITEMS table. This is an extremely common operation in most database applications, so we decide it would be really cool if we could specify that in the data dictionary and have the code generator take care of it. This would chop a lot of labor off the development process.
Here is how a column like this would appear in my own dictionary format:
table orderlines: description: Order Lines module: sales column price: automation_id: fetch auto_formula: items.price ...more columns...
This looks nice, I have put the formula for the PRICE column into the data dictionary. Now of course I need that formula to get out into the application somehow so that it will always be executed and will never be violated. We will now see how to do that.
The Trigger Approach
When it comes to code generators, if there are ten programmers in a room, there are going to be at least 10 opinions on how to write and use a code generator (the non-programmer boss will also have an opinion, so that makes 11). I have no interest in bashing anybody's approach or trying to list all of the possibilities, so I will stick with the approach I use myself, which is to generate database trigger code. If you want to know why that approach works for me, check out Triggers, Encapsulation and Composition.
When I work on code generators, I begin by manually coding an example of what I'm getting at, so I know it works. The trigger snippet we are looking for must do two things. It must make sure the price is always copied, and it must make sure that no user can subvert the value. This snippet (which is in the PostgreSQL flavor of server-side SQL) does this on an insert:
-- top of trigger.... -- PART 1: Prevent users from subverting the -- the formula by throwing error if they -- try to supply a value: IF new.price IS NOT NULL THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;'; END IF; -- PART 2: If the value of SKU exists, use it to look -- up the price and copy it into the new row IF new.sku IS NOT NULL THEN SELECT INTO new.price par.price FROM items par WHERE new.sku = par.sku ; END IF; -- more trigger stuff
NOTE! You may notice my trigger code somehow seems to "know" to use the SKU column when searching the ITEMS table, yet my formula did not specify that. I am assuming your data dictionary contains definitions of primary keys and foreign keys, otherwise it is of no real use. I am further assuming that when I see the formula to "FETCH" from the ITEMS table, I can look up the foreign key that matches ORDERLINES to ITEMS and find out what column(s) to use.
The example above works on INSERT operations only. You need a slightly different version for updates, which throws an error if the user attempts to change the price, and which does a new FETCH if the user has changed the SKU value.
IF new.price <> old.price THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;'; END IF; IF coalesce(new.sku,'') <> coalesce(old.sku,'') THEN SELECT INTO new.price par.price FROM items par WHERE new.sku = par.sku ; END IF;
Sidebar: A Complete Trigger
If you want a teaser on how many amazing things the trigger can do once you've loaded up your dictionary and builder with features, here is a bit of code from a demo application. Most everything in it will get treated in this series on the data dictionary.
CREATE OR REPLACE FUNCTION orderlines_upd_bef_r_f() RETURNS trigger AS $BODY$ DECLARE NotifyList text = ''; ErrorList text = ''; ErrorCount int = 0; AnyInt int; AnyInt2 int; AnyRow RECORD; AnyChar varchar; AnyChar2 varchar; AnyChar3 varchar; AnyChar4 varchar; BEGIN SET search_path TO public; -- 1010 sequence validation IF (new.recnum_ol <> old.recnum_ol) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'recnum_ol,3002, may not be re-assigned;'; END IF; -- 1010 sequence validation IF (new.skey <> old.skey) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'skey,3002, may not be re-assigned;'; END IF; -- 3100 PK Change Validation IF new.recnum_ol <> old.recnum_ol THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'recnum_ol,1003,Cannot change value;'; END IF; -- 3100 END IF new.flag_taxable <> old.flag_taxable THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'flag_taxable,5001,may not be explicitly assigned;'; END IF; IF new.price <> old.price THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;'; END IF; IF coalesce(new.sku,'') <> coalesce(old.sku,'') THEN SELECT INTO new.flag_taxable ,new.price par.flag_taxable ,par.price FROM items par WHERE new.sku = par.sku ; END IF; -- 5000 Extended Columns IF new.amt_retail <> old.amt_retail THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'amt_retail,5002,Cannot assign value directly to column amt_retail ;'; ELSE new.amt_retail = CASE WHEN 1 = 1 THEN new.price*new.qty ELSE 0 END ; END IF; IF new.pct99_discount <> old.pct99_discount THEN IF new.pct99_discount <> (SELECT par.pct99_discount FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'pct99_discount,5001,may not be explicitly assigned;'; END IF; END IF; IF new.taxauth <> old.taxauth THEN IF new.taxauth <> (SELECT par.taxauth FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'taxauth,5001,may not be explicitly assigned;'; END IF; END IF; IF new.taxpct <> old.taxpct THEN IF new.taxpct <> (SELECT par.taxpct FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'taxpct,5001,may not be explicitly assigned;'; END IF; END IF; IF coalesce(new.recnum_ord,0) <> coalesce(old.recnum_ord,0) THEN SELECT INTO new.pct99_discount ,new.taxauth ,new.taxpct par.pct99_discount ,par.taxauth ,par.taxpct FROM orders par WHERE new.recnum_ord = par.recnum_ord ; END IF; -- 5000 Extended Columns IF new.amt_discount <> old.amt_discount THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'amt_discount,5002,Cannot assign value directly to column amt_discount ;'; ELSE new.amt_discount = CASE WHEN 1 = 1 THEN new.amt_retail*new.pct99_discount*.01 ELSE 0 END ; END IF; -- 5000 Extended Columns IF new.amt_net <> old.amt_net THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'amt_net,5002,Cannot assign value directly to column amt_net ;'; ELSE new.amt_net = CASE WHEN 1 = 1 THEN new.amt_retail-new.amt_discount ELSE 0 END ; END IF; -- 5000 Extended Columns IF new.amt_tax <> old.amt_tax THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'amt_tax,5002,Cannot assign value directly to column amt_tax ;'; ELSE new.amt_tax = CASE WHEN new.flag_taxable = 'Y' THEN new.amt_net*new.taxpct*.01 ELSE 0 END ; END IF; -- 5000 Extended Columns IF new.amt_due <> old.amt_due THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'amt_due,5002,Cannot assign value directly to column amt_due ;'; ELSE new.amt_due = CASE WHEN 1 = 1 THEN new.amt_net+new.amt_tax ELSE 0 END ; END IF; -- 7010 Column Constraint new.flag_taxable = UPPER(new.flag_taxable); IF NOT (new.flag_taxable IN ('Y','N')) THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'new.flag_taxable,6001,Column -Taxable- can be either Y or N;'; END IF; -- 8001 Insert/Update Child Validation: NOT NULL IF new.sku IS NULL THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'sku,1005,Required Value;'; END IF; -- 8001 FK Insert/Update Child Validation IF new.sku IS NULL THEN --Error was reported above, not reported again --ErrorCount = ErrorCount + 1; --ErrorList = ErrorList || '*,1005,Foreign key columns may not be null: sku;'; ELSE -- LOCK TABLE items IN EXCLUSIVE MODE; SELECT INTO AnyInt COUNT(*) FROM items par WHERE par.sku = new.sku; IF AnyInt= 0 THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'sku,1006,Please Select Valid Value: ' || new.sku::varchar || ';'; END IF; END IF; -- 8001 Insert/Update Child Validation: NOT NULL IF new.recnum_ord IS NULL THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'recnum_ord,1005,Required Value;'; END IF; -- 8001 FK Insert/Update Child Validation IF new.recnum_ord IS NULL THEN --Error was reported above, not reported again --ErrorCount = ErrorCount + 1; --ErrorList = ErrorList || '*,1005,Foreign key columns may not be null: recnum_ord;'; ELSE -- LOCK TABLE orders IN EXCLUSIVE MODE; SELECT INTO AnyInt COUNT(*) FROM orders par WHERE par.recnum_ord = new.recnum_ord; IF AnyInt= 0 THEN ErrorCount = ErrorCount + 1; ErrorList = ErrorList || 'recnum_ord,1006,Please Select Valid Value: ' || new.recnum_ord::varchar || ';'; END IF; END IF; IF ErrorCount > 0 THEN RAISE EXCEPTION '%',ErrorList; RETURN null; ELSE IF NotifyList <> '' THEN RAISE NOTICE '%',NotifyList; END IF; RETURN new; END IF; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER COST 100; ALTER FUNCTION orderlines_upd_bef_r_f() OWNER TO postgresql;
Variatons on FETCH
I have found two variations on FETCH that have proven very useful in real world applications.
The first I call DISTRIBUTE. It is dangerous because it can be a real performance killer, and turns out you very rarely need it. However, that being said, sometimes you want to copy a value from a parent table down to every row in a child table when the value changes in the parent. The first time I did this was to copy the final score from a GAMES table into a WAGERS table on a fake sports betting site.
The other variation I have found useful is FETCHDEF, my shorthand for "fetch by default." In this variation the user is free to supply a value of their own, but if they do not supply a value then it will be fetched for them.
The Code Generator Itself
As for writing the code generator itself, that is of course far more than I can cover in one blog entry or even 10. Morever, since anybody who decides to do so will do so in their own language and in their own style, there is little to be gained by showing code examples here.
Conclusion: Expand Your Dictionary!
If you make up a data dictionary that only contains structure information like columns and keys, and you write a builder program to build your database, you can get a big win on upgrades and installs. However, you can take that win much farther by adding calculated values to your database and expanding your builder to write trigger code. This week we have seen what it looks like to implement a FETCH calculation in your dictionary and what the resulting trigger code might look like.
Related Essays
This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.
The normalization essays on this blog are:
- Revisiting Normalization and Denormalization
- Pay Me Now Or Pay Me Later
- The Argument for Normalization
- First Normal Form
- Second Normal Form
- Third Normal Form and Calculated Values
- The Argument for Denormalization
- Denormalization Patterns
- Keeping Denormalized Values Correct
- Triggers, Encapsulation and Composition
- The Data Dictionary and Calculations, Part 1 (this essay)
- The Data Dictionary and Calculations, Part 2
4 comments:
It would be helpful if the author provides a working example of the logic above for a popular database such as Northwind (MS SQL Server) in the corresponding flavour of the SQL language. Otherwise it sounds a bit academic and ... not very clear.
But thanks for the effort anyway.
Regards,
Sergei.
I miss this blog! More, please! :)
Sorry Jorge, Unfortunately the blogger is no longer updating this blog.
Essential day to all, open minded you need to have a verifiably magnificent hair styling, you can without a completely striking stretch go to the website where you can pick an organized proficient. I see that it will be useful for you to pick a specialist for his work.
Post a Comment