Tuesday, December 21, 2010

A Working Definition of Business Logic, with Implications for CRUD Code

Update: the Second Post of this series is now available.

Update: the Third Post of this series is now available.

The Wikipedia entry on "Business Logic" has a wonderfully honest opening sentence stating that "Business logic, or domain logic, is a non-technical term... (emphasis mine)". If this is true, that the term is non-technical, or if you like, non-rigorous, then most of us spend the better part of our efforts working on something that does not even have a definition. Kind of scary.

Is it possible to come up with a decent working definition of business logic? It is certainly worth a try. This post is the first in a four part series. The second post is about a more rigorous definition of Business Logic.

This blog has two tables of contents, the Complete Table of Contents and the list of Database Skills.

The Method

In this essay we will pursue a method of finding operations that we can define as business logic with a minimum of controversey, and identify those that can likely be excluded with a minimum of controversey. This may leave a bit of gray area that can be taken up in a later post.

An Easy Exclusion: Presentation Requirements

If we define Presentation Requirements as all requirements about "how it looks" as opposed to "what it is", then we can rule these out. But if we want to be rigorous we have to be clear, Presentation Requirements has to mean things like branding, skinning, accessibility, any and all formatting, and anything else that is about the appearance and not about the actual values fetched from somewhere.

Tables as the Foundation Layer

Database veterans are likely to agree that your table schema constitutes the foundation layer of all business rules. The schema, being the tables, columns, and keys, determines what must be provided and what must be excluded. If these are not business logic, I guess I don't know what is.

What about CouchDB and MongoDB and others that do not require a predefined schema? These systems give up the advantages of a fixed schema for scalability and simplicity. I would argue here that the schema has not disappeared, it has simply moved into the code that writes documents to the database. Unless the programmer wants a nightmare of chaos and confusion, he will enforce some document structure in the code, and so I still think it safe to say that even for these databases there is a schema somewhere that governs what must be stored and what must be excluded.

So we have at least a foundation for a rigorous definition of business rules: the schema, be it enforced by the database itself or by the code, forms the bottom layer of the business logic.

Processes are Business Logic

The next easy addition to our definition of business logic would be processes, where a process can be defined loosely as anything involving multiple statements, can run without user interaction, may depend on parameters tables, and may take longer than a user is willing to wait, requiring background processing.

I am sure we can all agree this is business logic, but as long as we are trying to be rigorous, we might say it is business logic because:

  • It must be coded
  • The algorithm(s) must be inferred from the requirements
  • It is entirely independent of Presentation Requirements

Calculations are Business Logic

We also should be able to agree that calculated values like an order total, and the total after tax and freight, are business logic. These are things we must code for to take user-supplied values and complete some picture.

The reasons are the same as for processes, they must be coded, the formulas must often be inferred from requirements (or forced out of The Explainer at gunpoint), and the formulas are completely independent of Presentation Requirements.

The Score So Far

So far we have excluded "mere" Presentation Requirements, and included three entries I hope so far are non-controversial:

  • Schema
  • Processes
  • Calculations

These are three things that some programmer must design and code. The schema, either in a conventional relational database or in application code. Processes, which definitely must be coded, and calculations, which also have to be coded.

What Have We Left Out?

Plenty. At very least security and notifications. But let's put those off for another day and see how we might handle what we have so far.

For the Schema, I have already mentioned that you can either put it into a Relational database or manage it in application code when using a "NoSQL" database. More than that will have to wait for 2011, when I am hoping to run a series detailing different ways to implement schemas. I'm kind of excited to play around with CouchDB or MongoDB.

For processes, I have a separate post that examines the implications of the stored procedure route, the embedded SQL route, and the ORM route.

This leaves calculations. Let us now see how we might handle calculations.

Mixing CRUD and Processes

But before we get to CRUD, I should state that if your CRUD code involves processes, seek professional help immediately. Mixing processes into CRUD is an extremely common design error, and it can be devastating. It can be recognized when somebody says, "Yes, but when the salesman closes the sale we have to pick this up and move it over there, and then we have to...."

Alas, this post is running long already and so I cannot go into exactly how to solve these, but the solution will always be one of these:

  • Spawning a background job to run the process asynchronously. Easy because you don't have to recode much, but highly suspicous.
  • Examining why it seems necessary to do so much work on what ought to be a single INSERT into a sales table, with perhaps a few extra rows with some details. Much the better solution, but often very hard to see without a second pair of eyes to help you out.

So now we can move on to pure CRUD operations.

Let The Arguments Begin: Outbound CRUD

Outbound CRUD is any application code that grabs data from the database and passes it up to the Presentation layer.

A fully normalized database will, in appropriate cases, require business logic of the calculations variety, otherwise the display is not complete and meaningful to the user. There is really no getting around it in those cases.

However, a database Denormalized With Calculated Values requires no business logic for outbound CRUD, it only has to pick up what is asked for and pass it up. This is the route I prefer myself.

Deciding whether or not to include denormalized calculated values has heavy implications for the architecture of your system, but before we see why, we have to look at inbound CRUD.

Inbound CRUD

Inbound CRUD, in terms of business logic, is the mirror image of outbound. If your database is fully normalized, inbound CRUD should be free of business logic, since it is simply taking requests and pushing them to the database. However, if you are denormalizing by adding derived values, then it has to be done on the way in, so inbound CRUD code must contain business logic code of the calculations variety.

Now let us examine how the normalization question affects system architecture and application code.

Broken Symmetry

As stated above, denormalizing by including derived values forces calculated business logic on the inbound path, but frees your outbound path to be the "fast lane". The opposite decision, not storing calculated values, allows the inbound path to be the "fast lane" and forces the calculations into the outbound path.

The important conclusion is: if you have business logic of the calculation variety in both lanes then you may have some inconsistent practices, and there may be some gain involved in sorting those out.

But the two paths are not perfectly symmetric. Even a fully normalized database will often, sooner or later, commit those calculated values to columns. This usually happens when some definition of finality is met. Therefore, since the inbound path is more likely to contain calculations in any case, the two options are not really balanced. This is one reason why I prefer to store the calculated values and get them right on the way in.

One Final Option

When people ask me if I prefer to put business logic in the server, it is hard to answer without a lot of information about context. But when calculations are involved the answer is yes.

The reason is that calculations are incredibly easy to fit into patterns. The patterns themselves (almost) all follow foreign keys, since the foreign key is the only way to correctly relate data between tables. So you have the "FETCH" pattern, where a price is fetched from the items table to the cart, the "EXTEND" pattern, where qty * price = extended_Price, and various "AGGREGATE" patterns, where totals are summed up to the invoice. There are others, but it is surprising how many calculations fall into these patterns.

Because these patterns are so easy to identify, it is actually conceivable to code triggers by hand to do them, but being an incurable toolmaker, I prefer to have a code generator put them together out of a data dictionary. More on that around the first of the year.

Updates

Update 1: I realize I never made it quite clear that this is part 1, as the discussion so far seems reasonable but is hardly rigorous (yet). Part 2 will be on the way after I've fattened up for the holidays.

Update 2: It is well worth following the link Mr. Koppelaars has put in the comments: http://thehelsinkideclaration.blogspot.com/2009/03/window-on-data-applications.html

8 comments:

JulesLt said...

Validation is always an interesting one.

Anything more complex that simple data type checking is business logic - and that ranges from things like min & max allowed values, to more complex validations where the allowed values for B are dependent on the value of A.

We can implement these as constraints, ensuring the integrity of the database and protecting against rogue applications (and direct SQL).

We can implement them in our business objects - which is the proper way to do it. Constraints are therefore seen as wrong/evil as they implement business logic in the persistence layer. The application ensures the integrity of the data, and of course there will never ever be a second application / rewrite into a new language.

And then there is validation in the presentation layer itself. Again, this is the 'wrong' place to put business logic, but from a user perspective we all know how annoying it is to use a web application that only validates a form when you press submit. We like it when drop-lists are restricted based on previous input.

I'm not really sure what the answer is - obviously having the rules in the presentation layer ONLY is wrong, having the rules in the database only is too late - users do not want to see 'foreign key integrity constraint error on table PEOPLE'.

The business object only approach only seems to work when the presentation layer and business objects exist on the same tier - i.e. desktop applications. With client-server and the web request-response we end up needing some level of local/presentation layer code.

CASE tools and code generation were supposed to be the solution, but I think people were put off by finding themselves tied into dead products.

Toon Koppelaars said...

Nice post: it has been my interest too to find a clear definition of "business logic". Everybody uses the word, nobody ever provides a clear definition.
In my Helsinki-declaration I offer a definition, that's not too far away from what you are trying to achieve here. You can find it here:

http://thehelsinkideclaration.blogspot.com/2009/03/window-on-data-applications.html

I actually split the term "business logic" into "data logic" and "the rest" (also excluding all "UI-logic").

KenDowns said...

Toon:

I had actually just downloaded your document on Database-centric J2EE development, and am planning to cite it in early January in a post discussing the Fat-thin-fat model, which i have used exclusively since moving to web development.

I will read the Helsinki declaration and see if I can make a more detailed comment.

KenDowns said...

JulesLT:

"....is business logic - and that ranges from things like min & max allowed values..." Agreed. I deliberately glossed over these details for brevity, I'm glad you fleshed that out.

"...and of course there will never ever be a second application / rewrite into a new language." LOL.

Overall, it seems that all tiers at very least need to know about the business logic, especially the foundation schema with types and constraints. I think the best solution, using DRY, is to have a single authoritative data dictionary that can be used for many tasks, beginning with the database upgrade itself, but continuing through to publishing itself in various forms that can be used by other tiers.

Anonymous said...

I like this little non sequitur:

"...If this is true, that the term is non-technical, or if you like,
non-rigorous, then most of us spend the better part of our
efforts working on something that does not even have a
definition...."

nyetter said...

"Inbound CRUD, in terms of business logic, is the mirror image of outbound. If your database is fully normalized, inbound CRUD should be free of business logic, since it is simply taking requests and pushing them to the database."

This seems either highly naive, or using too specific a definition of CRUD as to be useless.

Let's say I'm creating a user. Just an insert right? Well yes, but of many more values than I am supplying in the form. We need to store date at which the user was created, who created the user, their session id, perhaps a marketing affiliate or other source identification, and so on. Those values come from code that I would certainly call "business logic."

KenDowns said...

nyetter: Your point is well taken in cases where such items need to be added. However, I would still say that those items should be trivial to add, the date is nothing, the user is in context somewhere (like the session), and so the "fast lane" concept holds.

If by "affiliate information" you mean something fetched from another table, it is no longer "highly normalized" so the claim of inbound CRUD being simple no longer applies.

I've just about finished a new post, which I'm expecting to have finished for Jan 3rd, that goes the next step from a working definition to a rigorous definition. Not to give too much away, but in that system your example data would be "Second Order Business Logic".

Then if I ever get to it in copious free time, I believe the rigorous definition can be resolved to a formal definition.

Val said...

Let me add my thanks for this discussion, Ken.

I have spent a long time considering "inbound crud", which I characterize as transaction logic - logic that must be executed as transactions are saved, for database integrity.

I suggest this consists of:
* multi-attribute/table constraints
(eg, balance < creditLimit)
* multi-attribute/table derivations
(eg, balance = sum (unpaid orders)
* actions
(eg, send mail, start process, auditing)

Accepting a Service Layer as an anti-pattern, I believe a good approach is a Rich Active Declarative Data Model.

By this, I mean utilize ORM events to encapsulate declarative multi-table transaction logic into ORM Domain Objects. The small examples above suggest the approach, which includes chaining and automated dependency management.

This approach can (really) replace 100 lines of code with a single rule, and can plug into existing architectures/frameworks since ORM APIs are preserved.

Perhaps hard to believe, but easy to check out - examples provided.