Sunday, January 2, 2011

Business Logic: From Working Definition to Rigorous Definition

This is part 2 of a 4 part mini-series that began before the holidays with A Working Definition Business Logic. Today we proceed to a rigorous definition, tomorrow we will see some theorems, and the series will wrap up with a post on the "business layer."

In the first post, the working definition said that business logic includes at least:

  • The Schema
  • Calculations
  • Processes

None of these was very rigorously defined, kind of a "I'll know it when I see it" type of thing, and we did not talk at all about security. Now the task becomes tightening this up into a rigorous definition.

Similar Reading

Toon Koppelaars has some excellent material along these same lines, and a good place to start is his Helsinki Declaration (IT Version). The articles have a different focus than this series, so they make great contrasting reading. I consider my time spent reading through it very well spent.

Definitions, Proofs, and Experience

What I propose below is a definition in four parts. As definitions, they are not supposed to prove anything, but they are definitely supposed to ring true to the experience of any developer who has created or worked on a non-trivial business application. This effort would be a success if we reach some concensus that "at least it's all in there", even if we go on to argue bitterly about which components should be included in which layers.

Also, while I claim the definitions below are rigorous, they are not yet formal. My instinct is that formal definitions can be developed using First Order Logic, which would allow the theorems we will see tomorrow to move from "yeah that sounds about right" to being formally provable.

As for their practical benefit, inasmuch as "the truth shall make you free", we ought to be able to improve our architectures if we can settle at very least what we are talking about when we use the vague term "business logic."

The Whole Picture

What we commonly call "business logic", by which we vaguely mean, "That stuff I have to code up", can in fact be rigorously defined as having four parts, which I believe are best termed orders, as there is a definite precedence to their discovery, analysis and implementation.

  • First Order: Schema
  • Second Order: Derivations
  • Third Order: Non-algorithmic compound operations
  • Fourth Order: Algorithmic compound operations

Now we examine each order in detail.

A Word About Schema and NoSQL

Even "schema-less" databases have a schema, they simply do not enforce it in the database server. Consider: an eCommerce site using MongoDB is not going to be tracking the local zoo's animal feeding schedule, because that is out of scope. No, the code is limited to dealing with orders, order lines, customers, items and stuff like that.

It is in the very act of expressing scope as "the data values we will handle" that a schema is developed. This holds true regardless of whether the datastore will be a filesystem, an RDBMS, a new NoSQL database, or anything else.

Because all applications have a schema, whether the database server enforces it or whether the application enforces it, we need a vocabulary to discuss the schema. Here we have an embarrasment of choices, we can talk about entities and attributes, classes and properties, documents and values, or columns and tables. The choice of "entities and attributes" is likely best because it is as close as possible to an implementation-agnostic language.

First Order Business Logic: Schema

We can define schema, including security, as:

that body of entities and their attributes whose relationships and values will be managed by the application stack, including the authorization of roles to read or write to entities and properties.

Schema in this definition does not include derived values of any kind or the processes that may operate on the schema values, those are higher order of business logic. This means that the schema actually defines the entire body of values that the application will accept from outside sources (users and other programs) and commit to the datastore. Restating again into even more practical terms, the schema is the stuff users can save themselves.

With all of that said, let's enumerate the properties of a schema.

Type is required for every attribute.

Constraints are limits to the values allowed for an attribute beyond its type. We may have a discount percent that may not exceed 1.0 or 100%.

Entity Integrity is usually thought of in terms of primary keys and the vague statement "you can't have duplicates." We cannot have a list of US States where "NY" is listed 4 times.

Referential Integrity means that when one entity links or refers to another entity, it must always refer to an existing entity. We cannot have some script kiddie flooding our site with sales of items "EAT_ME" and "F***_YOU", becuase those are not valid items.

The general term 'validation' is not included because any particular validation rule is is a combination of any or all of type, constraints, and integrity rules.

Second Orders Business Logic: Derived values

When we speak of derived values, we usually mean calculated values, but some derivations are not arithmetic, so the more general term "derived" is better. Derivations are:

A complete entity or an attribute of an entity generated from other entities or attributes according to a formula or rule.

The definition is sufficiently general that a "formula or rule" can include conditional logic.

Simple arithmetic derived values include things like calculating price * qty, or summing an order total.

Simple non-arithmetic derivations include things like fetching the price of an item to use on an order line. The price in the order is defined as being a copy of the item's price at the time of purchase.

An example of a complete entity being derived is a history table that tracks changes in some other table. This can also be implemented in NoSQL as a set of documents tracking the changes to some original document.

Security also applies to generated values only insofar as who can see them. But security is not an issue for writing these values because by definition they are generated from formulas and rules, and so no outside user can ever attempt to explicitly specify the value of a derived entity or property.

One final point about Second Order Business Logic is that it can be expressed declaratively, if we have the tools, which we do not, at least not in common use. I wrote one myself some years ago and am re-releasing it as Triangulum, but that is a post for another day.

Sorting out First and Second Order

The definitions of First and Second Order Business Logic have the advantage of being agnostic to what kind of datastore you are using, and being agnostic to whether or not the derived values are materialized. (In relational terms, derivations are almost always denormalizing if materialized, so in a fully normalized database they will not be there, and you have to go through the application to get them.)

Nevertheless, these two definitions can right off bring some confusion to the term "schema." Example: a history table is absolutely in a database schema, but I have called First Order Business Logic "schema" and Second Order Business Logic is, well, something else. The best solution here is to simply use the terms First Order Schema and Second Order Schema. An order_lines table is First Order schema, and the table holding its history is Second Order Schema.

The now ubiquitous auto-incremented surrogate primary keys pose another stumbling block. Because they are used so often (and so often because of seriously faulty reasoning, see A Sane Approach To Choosing Primary Keys) they would automatically be considered schema -- one of the very basic values of a sales order, check, etc. But they are system-generated so they must be Second Order, no? Isn't the orderid a very basic part of the schema and therefore First Order? No. In fact, by these definitions, very little if any of an order header is First Order, the tiny fragments that are first order might be the shipping address, the user's choice of shipping method, and payment details provided by the user. The other information that is system-generated, like Date, OrderId, and order total are all Second Order.

Third Order Business Logic

Before defining Third Order Business Logic I would like to offer a simple example: Batch Billing. A consulting company bills by the hour. Employees enter time tickets throughout the day. At the end of the month the billing agent runs a program that, in SQL terms:

  • Inserts a row into INVOICES for each customer with any time entries
  • Inserts a row into INVOICE_LINES that aggregates the time for each employee/customer combination.

This example ought to make clear what I mean by definining Third Order Business Logic as:

A Non algorithmic compound operation.

The "non-algorithmic" part comes from the fact that none of the individual documents, an INVOICE row and its INVOICE_LINES, is dependent on any other. There is no case in which the invoice for one customer will influence the value of the invoice for another. You do not need an algorithm to do the job, just one or more steps that may have to go in a certain order.

Put another way, it is a one-pass set-oriented operation. The fact that it must be executed in two steps is an artifact of how database servers deal with referential integrity, which is that you need the headers before you can put in the detail. In fact, when using a NoSQL database, it may be possible to insert the complete set of documents in one command, since the lines can be nested directly into the invoices.

Put yet a third way, in more practical terms, there is no conditional or looping logic required to specify the operation. This does not mean there will be no looping logic in the final implementation, because performance concerns and locking concerns may cause it to be implemented with 'chunking' or other strategies, but the important point is that the specification does not include loops or step-wise operations because the individual invoices are all functionally independent of each other.

I do not want to get side-tracked here, but I have had a working hypothesis in my mind for almost 7 years that Third Order Business Logic, even before I called it that, is an artifact, which appears necessary because of the limitations of our tools. In future posts I would like to show how a fully developed understanding and implementation of Second Order Business Logic can dissolve many cases of Third Order.

Fourth Order Business Logic

We now come to the upper bound of complexity for business logic, Fourth Order, which we label "algorithmic compound operations", and define a particular Fourth Order Business Logic process as:

Any operation where it is possible or certain that there will be at least two steps, X and Y, such that the result of Step X modifies the inputs available to Step Y.

In comparison to Third Order:

  • In Third Order the results are independent of one another, in Fourth Order they are not.
  • In Third Order no conditional or branching is required to express the solution, while in Fourth Order conditional, looping, or branching logic will be present in the expression of the solution.

Let's look at the example of ERP Allocation. In the interest of brevity, I am going to skip most of the explanation of the ERP Allocation algorithm and stick to this basic review: a company has a list of sales orders (demand) and a list of purchase orders (supply). Sales orders come in through EDI, and at least once/day the purchasing department must match supply to demand to find out what they need to order. Here is an unrealistically simple example of the supply and demand they might be facing:

  *** DEMAND ***          *** SUPPLY ***

    DATE    | QTY           DATE    | QTY
------------+-----      ------------+----- 
  3/ 1/2011 |  5          3/ 1/2011 |  3
  3/15/2011 | 15          3/ 3/2011 |  6
  4/ 1/2011 | 10          3/15/2011 | 20
  4/ 3/2011 |  7   

The desired output of the ERP Allocation might look like this:

 *** DEMAND ***      *** SUPPLY ****
    DATE    | QTY |  DATE_IN   | QTY  | FINAL 
------------+-----+------------+------+-------
  3/ 1/2011 |  5  |  3/ 1/2011 |  3   |  no
                  |  3/ 3/2011 |  2   | Yes 
  3/15/2011 | 15  |  3/ 3/2011 |  4   |  no
                  |  3/15/2011 | 11   | Yes
  4/ 1/2011 | 10  |  3/15/2011 |  9   |  no
  4/ 3/2011 |  7  |    null    | null |  no

From this the purchasing agents know that the Sales Order that ships on 3/1 will be two days late, and the Sales Orders that will ship on 4/1 and 4/3 cannot be filled completely. They have to order more stuff.

Now for the killer question: Can the desired output be generated in a single SQL query? The answer is no, not even with Common Table Expressions or other recursive constructs. The reason is that each match-up of a purchase order to a sales order modifies the supply available to the next sales order. Or, to use the definition of Fourth Order Business Logic, each iteration will consume some supply and so will affect the inputs available to the next step.

We can see this most clearly if we look at some pseudo-code:

for each sales order by date {
   while sales order demand not met {
      get earliest purchase order w/qty avial > 0
         break if none
      make entry in matching table
      // This is the write operation that 
      // means we have Fourth Order Business Logic
      reduce available qty of purchase order
   }
   break if no more purchase orders
}

Conclusions

As stated in the beginning, it is my belief that these four orders should "ring true" with any developer who has experience with non-trivial business applications. Though we may dispute terminology and argue over edge cases, the recognition and naming of the Four Orders should be of immediate benefit during analysis, design, coding, and refactoring. They rigorously establish both the minimum and maximum bounds of complexity while also filling in the two kinds of actions we all take between those bounds. They are datamodel agnostic, and even agnostic to implementation strategies within data models (like the normalize/denormalize debate in relational).

But their true power is in providing a framework of thought for the process of synthesizing requirements into a specification and from there an implementation.

Tomorrow we will see some theorems that we can derive from these definitions.

8 comments:

Anonymous said...

> Now for the killer question: Can the desired output be generated in a single SQL query?

Not even with the MODEL-clause...?

:-)

br1 said...

Here's a completely formal way to think of schemas: http://corp.galois.com/blog/2010/5/27/tech-talk-categories-are-databases.html

Sudhir DBAKings said...

Nice post very helpful

dbakings

Garry Laly said...

Very nice.
Follow my blog for tutorial about computer and programming vb.net : http://gcomxp.blogspot.com/

simon jarry said...

I don't have any words to enjoy this publish.....I am really amazed ....the particular person who developed this post certainly realized the topic well..thanks for sharing this with us.

BrainShakers Interactive

Martin Mathur said...

Hi, saw your post and find it very helpful as we are also involved in Database appending and Mailing Database

yogi shuemantri said...

nice post :), please visit back :D http://yosmantri.student.ipb.ac.id/2013/11/20/5-bacaan-wajib-seorang-programmer/ thanks :D

Chui Tey said...

Ken

Please write the follow up on how 3rd order is limitation of our tools. Many thanks!

Chui