Sunday, January 25, 2009

The Data Dictionary and Calculations, Part 2

There are links to related essays on normalization and denormalization at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The Simple Case Is Not Much Help

We will begin by examining a simple case of a shopping cart. We have columns QTY and PRICE, and we want to add the column EXTENDED_PRICE that will contain PRICE * QTY. Our dictionary might look something like this:

table orderlines:
    column price:
        # see last week's essay for details on FETCH
        automation_id: FETCH
        auto_formula: items.price
    # A user-entered value, no automation
    column quantity:
    # The extended price:
    column extended_price:
        automation_id: extend
        auto_formula: price * qty

This seems simple enough, we have specified the formula right in the table definition, and now we are free to make use of that formula in any way we want -- either by generating code or interpreting it at run-time.

Unfortunately it is a bad idea to start coding right now with this example. The problem is that it is too simple, and will lead us down paths that cause great problems when we hit more complex cases. We must begin with a more complex case before we consider how to use this formula in our framework.

Not All Items Are Taxable

Consider the case where you have a shopping cart online and you must collect sales tax, but not all items are taxable. This means you need conditional logic of some sort, you must look at a flag and then decide whether to add tax. Here is a first stab at what this might look like:

table orderlines:
    # We'll skip the details on these columns for now
    column price:    
    column quantity:
    column flag_taxable:
    column taxrate:
    # We need to know the extended amount, that is
    # what we will tax
    column extended_amount:
        automation_id: extend
        auto_formula: price * qty
    # Here is the column that matters
    column tax:
        automation_id: extend
        auto_formula: CASE WHEN flag_taxable = 'Y'
                           THEN taxrate * extended_amount
                           ELSE 0 END

While this looks like a simple enough extension to the first example, it gets us into a thorny decision, the decision between parsing and assembling

Parse Versus Assemble

Before I get into the parse vs. assemble, question, let me pull back and explain why the example bothers me, and why it is worth an entire essay to discuss. In short, we intend to use the dictionary to implement a radical form of DRY - Don't Repeat Yourself (see The Wikipedia article on DRY.) Once we have specified the formula in the dictionary, we want to use it for all code generation and docs generation at very least, but we may also want to refer to the formulas in Java code (or PHP, Python, Ruby etc.) or even in Javascript code on the browser.

But the problem with the example is that it is coded in SQL. In the form I presented, it can be used for generating triggers, but not for anything else, unless you intend to use a parser to split it all up into pieces that can be reassembled for different presentations. The example as written is useful only for a single purpose -- but everything in our dictionary ought to be useful at any layer in the framework for any purpose.

But it gets worse. What if the programmer uses a dialect of SQL aimed for one platform that does not work on another? To guarantee cross-server compatibility, we not only have to parse the phrase, but then re-assemble it.

There is a third argument against the use of SQL expressions. We may be able to parse the expression and satisfy ourselves that it is valid, but that still does not mean it will work -- it may refer to non-existent columns or require typecasts that the programmer did not provide. This leads to one terrible event that you ought to be able to prevent when you use a dictionary: having an upgrade run successfully only to hit a run-time error when somebody uses the system.

A much simpler method is to assemble expressions by having the programmer provide formulas that are already cut up into pieces.

The Assembly Route

So I want to have formulas, including conditionals, and I want to be able to use the formulas in PHP, Java, Javascript, inside of triggers, and I want to be able to generate docs out of them that do not contain code fragments, and I want to be able to guarantee when an upgrade has run that there will be no errors introduced through programmer mistakes in the dictionary. The way to do this is to specify the formulas a little differently:

    column taxable:
            case 00:
                compare: @flag_taxable = Y
                return: @taxrate * @extended_amount
            case 01:
                return: 0

Here are the changes I have made for this version:

  1. The programmer must specify each case in order
  2. Each case is a compare statement followed by a return
  3. A case without a compare is unconditional, it always returns and processing ends
  4. I stuck little '@' signs in front of column names, I will explain those in a moment.

In short, we want the programmer to provide us with the conditional statements already parsed out into little pieces, so when we load them they look like data instead of code. We now have the responsibility for assembling code fragments, but in exchange we have pre-parsed data that can be handed to any programming language and used.

Conclusion: Assembly Means Data

The decision to go the assembly route is simply another example of the Minimize Code, Maximize Data principle. The dictionary itself should be composed entirely of data values, no code snippets should be allowed to sneak in. The reason is simple. No matter what route we follow we will have to validate and assemble the formula - be it for PHP, Javascript, or an alternate database server. But if we let the programmer give us code snippets we have the extra burden of parsing as well. Who needs it?

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The normalization essays on this blog are:


Donald J Organ IV said...

auto_formulas: CASE WHEN flag_taxable = 'Y'
THEN taxrate * extended_amount

Donald J Organ IV said...

Oh sorry meant to ask if that is actually valid in Andromeda

zippy1981 said...

Glad to see your blog is back from the dead. I never realized that CTE's were implemented in databases other than Microsoft SQL server.

oleg said...

Hvis du vil have den maksimale glæde af at spille, så spil . Der er et meget stort antal forskellige spil – både klassiske og nye – alle vil finde noget, der passer til deres smag. Du kan spille både på computeren og fra telefonen, hvilket er meget praktisk. Og bonussystemet er også meget flot. Og vigtigst af alt casinoet har licens. Så jeg råder alle.