The Database Programmer: Theorems Regarding Business Logic

Tuesday, January 4, 2011

Theorems Regarding Business Logic

In yesterday's Rigorous Definition of Business Logic, we saw that business logic can be defined in four orders:

First Order Business Logic is entities and attributes that users (or other agents) can save, and the security rules that govern read/write access to the entitites and attributes.
Second Order Business Logic is entities and attributes derived by rules and formulas, such as calculated values and history tables.
Third Order Business Logic are non-algorithmic compound operations (no structure or looping is required in expressing the solution), such as a month-end batch billing or, for the old-timers out there, a year-end general ledger roll-up.
Fourth Order Business Logic are algorithmic compound operations. These occur when the action of one step affects the input to future steps. One example is ERP Allocation.

A Case Study

The best way to see if these have any value is to cook up some theorems and examine them with an example. We will take a vastly simplified time billing system, in which employees enter time which is billed once/month to customers. We'll work out some details a little below.

Theorem 1: 1st and 2nd Order, Analysis

The first theorem we can derive from these definitions is that we should look at First and Second Order Schemas together during analysis. This is because:

First Order Business Logic is about entities and atrributes
Second Order Business Logic is about entities and attributes
Second Order Business Logic is about values generated from First Order values and, possibly, other Second Order values
Therefore, Second Order values are always expressed ultimately in terms of First Order values
Therefore, they should be analyzed together

To give the devil his due, ORM does this easily, because it ignores so much database theory (paying a large price in performance for doing so) and considers an entire row, with its first order and second order values together, as being part of one class. This is likely the foundation for the claims of ORM users that they experience productivity gains when using ORM. Since I usually do nothing but bash ORM, I hope this statement will be taken as utterly sincere.

Going the other way, database theorists and evangelists who adhere to full normalization can hobble an analysis effort by refusing to consider 2nd order because those values denormalize the database, so sometimes the worst of my own crowd will prevent analysis by trying to keep these out of the conversation. So, assuming I have not pissed off my own friends, let's keep going.

So let's look at our case study of the time billing system. By theorem 1, our analysis of entities and attributes should include both 1st and 2nd order schema, something like this:

 
 INVOICES
-----------
 invoiceid      2nd Order, a generated unique value
 date           2nd Order if always takes date of batch run
 customer       2nd Order, a consequence of this being an
                           aggregation of INVOICE_LINES
 total_amount   2nd Order, a sum from INVOICE_LINES
               
 INVOICE_LINES
---------------
 invoiceid      2nd order, copied from INVOICES
 customer         +-  All three are 2nd order, a consequence
 employee         |   of this being an aggregration of
 activity         +-  employee time entries
 rate           2nd order, taken from ACTIVITIES table
                           (not depicted)
 hours          2nd order, summed from time entries
 amount         2nd order, rate * hours
 
 TIME_ENTRIES
--------------
 employeeid     2nd order, assuming system forces this
                    value to be the employee making
                    the entry
 date           1st order, entered by employee
 customer       1st order, entered by employee
 activity       1st order, entered by employee
 hours          1st order, entered by employee

Now, considering how much of that is 2nd order, which is almost all of it, the theorem is not only supported by the definition, but ought to line up squarely with our experience. Who would want to try to analyze this and claim that all the 2nd order stuff should not be there?

Theorem 2: 1st and 2nd Order, Implementation

The second theorem we can derive from these definitions is that First and Second Order Business logic require separate implementation techniques. This is because:

First Order Business Logic is about user-supplied values
Second Order Business Logic is about generated values
Therefore, unlike things cannot be implemented with like tools.

Going back to the time entry example, let's zoom in on the lowest table, the TIME_ENTRIES. The employee entering her time must supply customer, date, activity, and hours, while the system forces the value of employeeid. This means that customer and activity must be validated in their respective tables, and hours must be checked for something like <= 24. But for employeeid the system provides the value out of its context. So the two kinds of values are processed in very unlike ways. It seems reasonable that our code would be simpler if it did not try to force both kinds of values down the same validation pipe.

Theorem 3: 2nd and 3rd Order, Conservation of Action

This theorem states that the sum of Second and Third Order Business Logic is fixed:

Second Order Business Logic is about generating entities and attributes by rules or formulas
Third Order Business Logic is coded compound creation of entities and attributes
Given that a particular set of requirements resolves to a finite set of actions that generate entities and values, then
The sum of Second Order and Third Order Business Logic is fixed.

In plain English, this means that the more Business Logic you can implement through 2nd Order declarative rules and formulas, the fewer processing routines you have to code. Or, if you prefer, the more processes you code, the fewer declarative rules about entitities and attributes you will have.

This theorem may be hard to compare to experience for verification because most of us are so used to thinking in terms of the batch billing as a process that we cannot imagine it being implemented any other way: how exactly am I suppose to implement batch billing declaratively?.

Let's go back to the schema above, where we can realize upon examination that the entirety of the batch billing "process" has been detailed in a 2nd Order Schema, if we could somehow add these facts to our CREATE TABLE commands the way we add keys, types, and constraints, batch billing would occur without the batch part.

Consider this. Imagine that a user enters a a TIME_ENTRY. The system checks for a matching EMPLOYEE/CUSTOMER/ACTIVITY row in INVOICE_DETAIL, and when it finds the row it updates the totals. But if it does not find one then it creates one! Creation of the INVOICE_DETAIL record causes the system to check for the existence of an invoice for that customer, and when it does not find one it creates it and initializes the totals. Subsequent time entries not only update the INVOICE_DETAIL rows but the INVOICE rows as well. If this were happening, there would be no batch billing at the end of the month because the invoices would all be sitting there ready to go when the last time entry was made.

By the way, I coded something that does this in a pretty straight-forward way a few years ago, meaning you could skip the batch billing process and add a few details to a schema that would cause the database to behave exactly as described above. Although the the format for specifying these extra features was easy enough (so it seemed to me as the author), it seemed the conceptual shift of thinking that it required of people was far larger than I initially and naively imagined. Nevertheless, I toil forward, and that is the core idea behind my Triangulum project.

Observation: There Will Be Code

This is not so much a theorem as an observation. This observation is that if your application requires Fourth Order Business Logic then somebody is going to code something somewhere.

An anonymous reader pointed out in the comments to Part 2 that Oracle's MODEL clause may work in some cases. I would assume so, but I would also assume that reality can create complicated Fourth Order cases faster than SQL can evolve. Maybe.

But anyway, the real observation here is is that no modern language, either app level or SQL flavor, can express an algorithm declaratively. In other words, no combination of keys, constraints, calculations and derivations, and no known combination of advanced SQL functions and clauses will express an ERP Allocation routine or a Magazine Regulation routine. So you have to code it. This may not always be true, but I think it is true now.

This is in contrast to the example given in the previous section about the fixed total of 2nd and 3rd Order Logic. Unlike that example, you cannot provide enough 2nd order wizardry to eliminate fourth order. (well ok maybe you can, but I haven't figured it out yet myself and have never heard that anybody else is even trying. The trick would be to have a table that you truncate and insert a single row into, a trigger would fire that would know how to generate the next INSERT, generating a cascade. Of course, since this happens in a transaction, if you end up generating 100,000 inserts this might be a bad idea ha ha.)

Theorem 5: Second Order Tools Reduce Code

This theorem rests on the acceptance of an observation, that using meta-data repositories, or data dictionaries, is easier than coding. If that does not hold true, then this theorem does not hold true. But if that observation (my own observation, admittedly) does hold true, then:

By Theorem 3, the sum of 2nd and 3rd order logic is fixed
By observation, using meta-data that manages schema requires less time than coding,
By Theorem 1, 2nd order is analyzed and specified as schema
Then it is desirable to specify as much business logic as possible as 2nd order schema, reducing and possibly eliminating manual coding of Third Order programs.

Again we go back to the batch billing example. Is it possible to convert it all to 2nd Order as described above. Well yes it is, because I've done it. The trick is an extremely counter-intuitive modification to a foreign key that causes a failure to actually generate the parent row that would let the key succeed. To find out more about this, check out Triangulum (not ready for prime time as of this writing).

Conclusions

The major conclusion in all of this is that anlaysis and design should begin with First and Second Order Business Logic, which means working out schemas, both the user-supplied values and the system-supplied values.

When that is done, what we often call "processes" are layered on top of this.

Tomorrow we will see part 4 of 4, examining the business logic layer, asking, is it possible to create a pure business logic layer that gathers all business logic unto itself?