Sunday, January 25, 2009

The Data Dictionary and Calculations, Part 2

There are links to related essays on normalization and denormalization at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The Simple Case Is Not Much Help

We will begin by examining a simple case of a shopping cart. We have columns QTY and PRICE, and we want to add the column EXTENDED_PRICE that will contain PRICE * QTY. Our dictionary might look something like this:

table orderlines:
    column price:
        # see last week's essay for details on FETCH
        automation_id: FETCH
        auto_formula: items.price
    # A user-entered value, no automation
    column quantity:
    # The extended price:
    column extended_price:
        automation_id: extend
        auto_formula: price * qty

This seems simple enough, we have specified the formula right in the table definition, and now we are free to make use of that formula in any way we want -- either by generating code or interpreting it at run-time.

Unfortunately it is a bad idea to start coding right now with this example. The problem is that it is too simple, and will lead us down paths that cause great problems when we hit more complex cases. We must begin with a more complex case before we consider how to use this formula in our framework.

Not All Items Are Taxable

Consider the case where you have a shopping cart online and you must collect sales tax, but not all items are taxable. This means you need conditional logic of some sort, you must look at a flag and then decide whether to add tax. Here is a first stab at what this might look like:

table orderlines:
    # We'll skip the details on these columns for now
    column price:    
    column quantity:
    column flag_taxable:
    column taxrate:
    # We need to know the extended amount, that is
    # what we will tax
    column extended_amount:
        automation_id: extend
        auto_formula: price * qty
    # Here is the column that matters
    column tax:
        automation_id: extend
        auto_formula: CASE WHEN flag_taxable = 'Y'
                           THEN taxrate * extended_amount
                           ELSE 0 END

While this looks like a simple enough extension to the first example, it gets us into a thorny decision, the decision between parsing and assembling

Parse Versus Assemble

Before I get into the parse vs. assemble, question, let me pull back and explain why the example bothers me, and why it is worth an entire essay to discuss. In short, we intend to use the dictionary to implement a radical form of DRY - Don't Repeat Yourself (see The Wikipedia article on DRY.) Once we have specified the formula in the dictionary, we want to use it for all code generation and docs generation at very least, but we may also want to refer to the formulas in Java code (or PHP, Python, Ruby etc.) or even in Javascript code on the browser.

But the problem with the example is that it is coded in SQL. In the form I presented, it can be used for generating triggers, but not for anything else, unless you intend to use a parser to split it all up into pieces that can be reassembled for different presentations. The example as written is useful only for a single purpose -- but everything in our dictionary ought to be useful at any layer in the framework for any purpose.

But it gets worse. What if the programmer uses a dialect of SQL aimed for one platform that does not work on another? To guarantee cross-server compatibility, we not only have to parse the phrase, but then re-assemble it.

There is a third argument against the use of SQL expressions. We may be able to parse the expression and satisfy ourselves that it is valid, but that still does not mean it will work -- it may refer to non-existent columns or require typecasts that the programmer did not provide. This leads to one terrible event that you ought to be able to prevent when you use a dictionary: having an upgrade run successfully only to hit a run-time error when somebody uses the system.

A much simpler method is to assemble expressions by having the programmer provide formulas that are already cut up into pieces.

The Assembly Route

So I want to have formulas, including conditionals, and I want to be able to use the formulas in PHP, Java, Javascript, inside of triggers, and I want to be able to generate docs out of them that do not contain code fragments, and I want to be able to guarantee when an upgrade has run that there will be no errors introduced through programmer mistakes in the dictionary. The way to do this is to specify the formulas a little differently:

    column taxable:
            case 00:
                compare: @flag_taxable = Y
                return: @taxrate * @extended_amount
            case 01:
                return: 0

Here are the changes I have made for this version:

  1. The programmer must specify each case in order
  2. Each case is a compare statement followed by a return
  3. A case without a compare is unconditional, it always returns and processing ends
  4. I stuck little '@' signs in front of column names, I will explain those in a moment.

In short, we want the programmer to provide us with the conditional statements already parsed out into little pieces, so when we load them they look like data instead of code. We now have the responsibility for assembling code fragments, but in exchange we have pre-parsed data that can be handed to any programming language and used.

Conclusion: Assembly Means Data

The decision to go the assembly route is simply another example of the Minimize Code, Maximize Data principle. The dictionary itself should be composed entirely of data values, no code snippets should be allowed to sneak in. The reason is simple. No matter what route we follow we will have to validate and assemble the formula - be it for PHP, Javascript, or an alternate database server. But if we let the programmer give us code snippets we have the extra burden of parsing as well. Who needs it?

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The normalization essays on this blog are:


Donald J Organ IV said...

auto_formulas: CASE WHEN flag_taxable = 'Y'
THEN taxrate * extended_amount

Donald J Organ IV said...

Oh sorry meant to ask if that is actually valid in Andromeda

zippy1981 said...

Glad to see your blog is back from the dead. I never realized that CTE's were implemented in databases other than Microsoft SQL server.

The Marketer said...

I enjoyed your blog Thanks for sharing such an informative post. We are also providing the best services click on below links to visit our website.

digital marketing company in nagercoil
digital marketing services in nagercoil
digital marketing agency in nagercoil
best marketing services in nagercoil
SEO company in nagercoil
SEO services in nagercoil
social media marketing in nagercoil
social media company in nagercoil
PPC services in nagercoil
digital marketing company in velachery
digital marketing company in velachery
digital marketing services in velachery
digital marketing agency in velachery
SEO company in velachery
SEO services in velachery
social media marketing in velachery
social media company in velachery
PPC services in velachery
online advertisement services in velachery
online advertisement services in nagercoil

web design company in nagercoil
web development company in nagercoil
website design company in nagercoil
website development company in nagercoil
web designing company in nagercoil
website designing company in nagercoil
best web design company in nagercoil
web design company in velachery
web development company in velachery
website design company in velachery
website development company in velachery
web designing company in velachery
website designing company in velachery
best web design company in velachery

Thanks for Sharing - ( Groarz branding solutions )

matterlandsen said...

Did u try to use external powers for studying? Like ? They helped me a lot once.

oleg said...

Hvis du vil have den maksimale glæde af at spille, så spil . Der er et meget stort antal forskellige spil – både klassiske og nye – alle vil finde noget, der passer til deres smag. Du kan spille både på computeren og fra telefonen, hvilket er meget praktisk. Og bonussystemet er også meget flot. Og vigtigst af alt casinoet har licens. Så jeg råder alle.