Sunday, April 20, 2008

Advanced Table Design: Resolutions

Welcome to the Database Programmer! Every Monday morning this blog contains a new essay. This blog is for people who want to learn the practical realities of databases. Topics range from simple to advanced.

The main theme of these essays is that your applications will be leaner, faster, and easier to write and maintain if you understand how databases work.

The complete table of contents is here.

The Resolution Pattern

A resolution is an interesting database pattern because it involves both table design and a very specific form of SELECT statement. Just getting the tables down is not enough, you have to know how to write the SELECT that will return the correct value.

Basic Description and Example

A resolution pattern occurs when you need a value and there is more than one place where it might be. As an example, consider the case of a computer services shop that provides complete IT services, including programming. In their billing system, they have a simple table that lists the rates for their various activities.

ACTIVITY  | RATE
----------+-------
ITGENERAL | 100
PROJMGT   | 200
SOFTWARE  | 150

This is simple enough, but now suppose that you have a particular employee that you bill out at $175.00/hour for software development. This makes the picture a little more complicated. But suppose that it gets more complicated, suppose that you enter into an arrangement with a particular customer to do volume software development for them for $135.00. And just to make it interesting, suppose you have a very specific arrangement with a particular customer to provide the services of a particular employee for $185.00 for project management.

With this many possible billing arrangements, your super-simple invoicing program is suddenly not so simple. On any particular invoice line, you must resolve the actual hourly billing rate out of several possibilities. Because you must resolve the value, this pattern is called a resolution.

Precise Description of the Resolution Pattern

A resolution pattern has these characteristics:

  • The goal of a resolution is to find a particular value. In our example this is a billing rate.
  • Resolutions examine multiple possible values and pick the first match according to precedence.
  • Precedence usually begins with the most specific and falls back to the most general. In our example the most specific possible rate is defined for a customer-activity-employee, while the most general is the default rate for an activity.

Resolutions are not always easy to recognize. Mostly this is because customers do not tell you "we have a resolution." Instead they tell you they have a billing rate. The explanation of the special overrides for employees comes in a different conversation, and perhaps to a different member of your team. Then later comes the explanation of the other overrides. The resolution only becomes apparent when the various requirements are all sorted out and put next to each other. Then somebody says, "Hey, there are four different formulas for the billing rate!" Then you know you have a resolution.

The Table Design

A resolution requires one table for each possible level of detail where a value might be supplied. In our example there will be a table for:

  • Rates by activity-customer-employee
  • Rates by activity-customer
  • Rates by activity-employee
  • Final default values by activity

These table only contain values when they are relevant. The table of activity-customer does not contain a row for every possible combination of activities and customers, it only contains a row when there has been some agreement to provide an activity to a specific customer for a special rate.

Here are the tables:

ACTIVITY | CUSTOMER | RATE
---------+----------+------
PROJMGT  | PRAXIS   |  225
SOFTWARE | PRAXIS   |  235


ACTIVITY | EMPLOYEE | RATE
---------+----------+------
PROJMGT  | SRUSSEL  |  225


ACTIVITY | EMPLOYEE | CUSTOMER | RATE
---------+----------+----------+------
PROJMGT  | HIROKO   | PRAXIS   |  250

Resolving In Client Code Will Kill Performance

Now consider that the there is a table somewhere that is used to drive billing. Maybe the employees themselves record their time in this table, or maybe some clerical staff member is entering them. Whoever puts them in, each record has an activity, an employee, and a customer (and of course hours). You need to write a program that finds the correct billing rate for each row.

A die-hard code grinder will do all of this in the client. He will write a query to pull all of the rows from the time entry table. Then he will loop through these rows. For each line he will query the server for a the most detailed value, activity-employee-customer. If it is not found he will do a second query for the next table in line, and so forth. This will be a performance disaster because his program will be making a huge number of round trips to the server. If he understood the LEFT JOIN he would need only one trip to the server.

First Stab with A LEFT JOIN

Here is a query that does most of what we need for the resolution:

SELECT ol.activity,ol.employee,ol.customer
      ,aec.rate as aec_rate
      ,ac.rate  as ac_rate
      ,ae.rate  as ae_rate
      ,a.rate
  FROM orderlines ol
  LEFT JOIN act_emp_cust_rates     aec 
    ON ol.activity = aec.activity
   AND ol.customer = aec.customer
   AND ol.employee = aec.employee
  LEFT JOIN act_cust_rates         ac
    ON ol.activity = ae.activity
   AND ol.customer = ae.customer
  LEFT JOIN act_emp_rates          ae 
    ON ol.activity = aec.activity
   AND ol.employee = aec.employee
  JOIN activities                  a 
    ON ol.activity = a.activity  
 WHERE (....relevant search conditions....)

The LEFT JOIN tells the server to return all matching rows from the orderlines table, even if there is no match in the various override tables. The above query will return something like this:

ACTIVITY | EMPLOYEE | CUSTOMER | AEC_RATE | AC_RATE | AE_RATE | RATE
---------+----------+----------+----------+---------+---------+------
PROJMGT  | HIROKO   | PRAXIS   |      250 |  null   |  null   |  200
PROJMGT  | NIRGAL   | PRAXIS   |     null |   225   |  null   |  200
SOFTWARE | SRUSSEL  | PRAXIS   |     null |   235   |  null   |  150
PROJMGT  | SRUSSEL  | GE       |     null |  null   |   225   |  200
PROJMGT  | SRUSSEL  | NASA     |     null |  null   |   225   |  200
SOFTWARE | HIROKO   | PRAXIS   |     null |   235   |  null   |  150
SOFTWARE | HIROKO   | GE       |     null |  null   |  null   |  150

The Final Form of the Query

The first form of the query returns all four possible rates, and the effect of a LEFT JOIN is to have a NULL value where there was no match on the right side.

We can do better than this and return the actual rate by using a COALESCE function. A COALESCE allows us to list two or more values, and the function returns the first one that is not null. This lets us return the actual resolved value from the server:

SELECT ol.activity,ol.employee,ol.customer
      ,COALESCE(aec.rate,ac.rate,ae.rate,a.rate) as rate
  FROM orderlines ol
  LEFT JOIN act_emp_cust_rates     aec 
    ON ol.activity = aec.activity
   AND ol.customer = aec.customer
   AND ol.employee = aec.employee
  LEFT JOIN act_cust_rates         ac
    ON ol.activity = ae.activity
   AND ol.customer = ae.customer
  LEFT JOIN act_emp_rates          ae 
    ON ol.activity = aec.activity
   AND ol.employee = aec.employee
  JOIN activities                  a 
    ON ol.activity = a.activity  
 WHERE (....relevant search conditions....)

...which gives us the complete answer:

ACTIVITY | EMPLOYEE | CUSTOMER | RATE 
---------+----------+----------+------
PROJMGT  | HIROKO   | PRAXIS   |  250 
PROJMGT  | NIRGAL   | PRAXIS   |  225 
SOFTWARE | SRUSSEL  | PRAXIS   |  235 
PROJMGT  | SRUSSEL  | GE       |  225 
PROJMGT  | SRUSSEL  | NASA     |  225 
SOFTWARE | HIROKO   | PRAXIS   |  235 
SOFTWARE | HIROKO   | GE       |  150 

Conclusion: Tables and Queries Go Together

We have seen this week our first table design pattern that requires a certain form of query. This shows us as well that queries themselves will fall into patterns, and we will definitely see more of these patterns in future essays.

11 comments:

Phu Son said...

That's a great trick! Thanks!

Chui Tey said...

This pattern is also applicable when determining user preferences. Usually it falls back to a group preference, then system preference if a value isn't set.

KenDowns said...

Chui: very good point. Resolutions show up all over the place once you know about them.

vdibart said...

I've approached this in the past with a different solution. In my experience you usually know in advance which tables hold the data with the highest priority. In your example, you're reasonable sure that the order of priority would be:

1) rates by activity-customer-employee
2) rates by activity-employee
3) rates by activity-customer

With that in mind you can ease up on the joins a bit and do:

select activity, rate, employee, customer
from (
select activity, rate, customer, employee, 1 priority
from act_emp_cust_rates
where <...>
union all
select activity, rate, '' customer, employee, 2 priority
from act_emp_rates
where <...>
union all
select activity, rate, '' customer, '' employee, 3 priority
from act_cust_rates
where <...>
) order by priority asc

It results in something similar to your first stab query (with blanks for company/employee instead of for rate), but it's much much easier on the database than either version. It can also be embellished further to reduce the unused data, but that starts to reduce the performance.

jakyra said...

I've got to say, this is probably one of the single most helpful blog posts I've read in a while.

I had no idea this existed. Thanks for this. I love your blog!

KenDowns said...

jakyra: thanks!

xamdam said...

Thanks - this is great info! I was searching (without much hope) for pretty much this info, you caught me by surprise ;)
An interesting complication to this would be a situation where rates change over time (which is the version of the problem I am unfortunately struggling with). Wonder if you have any thoughts on that.

KenDowns said...


>xamdam
: Changing rates over time involves adding dates to your keys. This is a large topic and one I have in mind for a separate post.

But in a nutshell, you add date ranges to your various tables and add filters to your resolution WHERE clauses.

Jeremy said...

I have spent the last hour or so reading your blog entries and while all the ones I have read so far have been excellent, I think this one falls a little short.

This idea sounds simple enough. And it looks like it would work quite well for trivial tasks. But as Chui Tey mentioned what if you are doing this with user preferences. And...

What if you have dozens of user preferences? This makes the SQL look quite ugly.

What if preferences are added often? Now you have an ugly SQL that you have to change quite often.

What if the preferences follow different hierarchys? ...

Anyway, thanks for the taking the time to help people create better databases.

KenDowns said...

Jeremy: do you have a specific case where you have tried it and it is falling down?

vipes said...

I realise that this is an old post, but thought I would add my 2 pennorth (2 cents ;-) In one of the systems I've worked with before, there were a lot of settings that could be configured at different levels: user, question, paper, subject, series or default. Now I'm not saying that every setting was required at that level, but some were. So how about the following pattern. You have a table called Setting with columns: SettingId, Name, Description and possibly a default value. Then you have another table SettingValue which has FkSettingID, FkSettingLevelTypeId and Value which is the actual value. FkSettingLevelTypeId is a foreign key into table SettingLevelType which indicates the level that the setting value refers to. In my case that would be user, question, paper etc, with an indication of the priority of each. You can then create a select statement using 'TOP 1' to retrieve the highest priority value for a particular setting.