<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-426922399870577072</id><updated>2012-01-29T11:07:56.007-05:00</updated><category term='primary keys'/><category term='candidate keys'/><category term='tools'/><category term='Lots of Links'/><category term='denormalization'/><category term='keys'/><category term='Data Dictionary'/><category term='Philosophy'/><category term='Ken&apos;s Law'/><category term='object ids'/><category term='table design'/><category term='third normal form'/><category term='table design patterns'/><category term='normalization'/><category term='spell it out'/><category term='calculated values'/><category term='surrogate keys'/><category term='cursors'/><category term='database skills'/><category term='abstraction'/><category term='orm'/><category term='Relational Model'/><category term='foreign keys'/><category term='SQL SELECT'/><category term='unique constraints'/><category term='Window Functions'/><category term='database programming'/><category term='Common Table Expressions'/><category term='recursion'/><category term='database'/><category term='Upsert'/><title type='text'>The Database Programmer</title><subtitle type='html'>All things related to database applications, both desktop and web.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>72</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-3712061853478620564</id><published>2011-01-21T22:21:00.001-05:00</published><updated>2011-01-21T22:21:39.967-05:00</updated><title type='text'>Maintaining One Code Base with Possibly Conflicting Custom Features</title><content type='html'>&lt;p&gt;Today's essay deals with the tricky issue of custom features
   for individual customers who are running instances of your
   software.
&lt;/p&gt;

&lt;p&gt;The question comes by way of a regular reader who prefers to
   remain anonymous, but asks this:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;... I work on a large (to me, anyway) application that serves as a client database, ticket system, time-tracking, billing, asset-tracking system.  We have some customers using their own instances of the software.  Often, those customers want additional fields put in different places (e.g., a priority column on tickets).  This results in having multiple branches to account for versions with slight changes in code and in the database.  This makes things painful and time-consuming in the long run: applying commits from master to the other branches requires testing on every branch; same with database migrate scripts, which frequently have to be modified.
&lt;/p&gt;

&lt;p&gt;
Is there an easier way?  I have thought about the possibility of making things "optional" in the database, such as a column on a table, and hiding its existence in the code when it's not "enabled."  This would have the benefit of a single code set and a single database schema, but I think it might lead to more dependence on the code and less on the database -- for example, it might mean constraints and keys couldn't be used in certain cases.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;Restating the Question&lt;/h2&gt;

&lt;p&gt;Our reader asks, is it better to have different code branches
   or to try to keep a lot of potentially conflicting and optional
   items mixed in together?
&lt;/p&gt;

&lt;p&gt;Well, the wisdom of the ages is to maintain a single code branch,
   including the database schema.  I tried exactly once, very early
   in my career, to fork my own code, and gave up almost within days.
   When I went to work in larger shops I always arrived in a situation
   where the decision had already been made to maintain a single
   branch.  Funny thing, since most programmers cannot agree on the
   color of the sky when they're staring out the window, this is
   the only decision I have ever seen maintained with absolute
   unanimity no matter how many difficulties came out of it.
&lt;/p&gt;

&lt;p&gt;There is some simple arithmetic as to why this is so.  If you have
   single feature for a customer that is giving you a headache, and
   you fork the code, you now have to update both code branches for
   every change plus regression test them both, including the feature
   that caused the headache.  But if you keep them combined you only
   have the one headache feature to deal with.  That's why people
   keep them together.
&lt;/p&gt;

&lt;h2&gt;Two Steps&lt;/h2&gt;

&lt;p&gt;Making custom features work smoothly is a two-step process.
   The first step is arguably more difficult than the second, 
   but the second step is absolutely crucial if you have
   business logic tied to the feature.
&lt;/p&gt;   

&lt;p&gt;Most programmers when confronted with this situation
   will attempt to make various features optional.  I 
   consider this to be a mistake because it complicates
   code, especially when we get to step 2.  By far the
   better solution is to make features &lt;i&gt;ignorable&lt;/i&gt;
   by anybody who does not want them.
&lt;/p&gt;

&lt;p&gt;The wonderful thing about ingorable features is 
   they tend to eliminate the problems with apparently
   conflicting features.  If you can rig the features
   so anybody can use either or both, you've eliminated
   the conflict.
&lt;/p&gt;

&lt;h2&gt;Step 1: The Schema&lt;/h2&gt;

&lt;p&gt;As mentioned above, the first step is arguably more
   difficult than the second, because it may involve
   casting requirements differently than they are
   presented.
&lt;/p&gt;

&lt;p&gt;For example,
   our reader asks about a priority column on tickets, 
   asked for by only one customer.  This may seem like
   a conflict because nobody else wants it, but we
   can dissolve the conflict when we make the feature
   ignorable.  The first step involves doing this at
   the database or schema level.
&lt;/p&gt;

&lt;p&gt;But first we should mention that the UI is easy,
   we might have a control panel 
   where we can make fields invisible.  Or maybe our
   users just ignore the fields they are not interested
   in.  Either way works.
&lt;/p&gt;

&lt;p&gt;The problem is in the database.
   If the values for priority come
   from a lookup table, which they should,
   then we have a foreign key, and
   we have a problem if we try to ignore it:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;We can allow nulls in the foreign key, which is
    fine for the people ignoring it, but
    &lt;li&gt;This means the people who require it can end
    up with tickets that have no priority because it does
    not prevent a user from leaving it blank.
&lt;/ul&gt;

&lt;p&gt;A simple answer here is to pre-populate your priority
   lookup table with a value of "Not applicable", perhaps
   with a hardcoded id of zero.  Then we set the default
   value for the TICKET.priority to zero.  This means people
   can safely ignore it because it will always be valid.
&lt;/p&gt;

&lt;p&gt;Then, for the customer who paid for it, we just go in
   after the install and delete the default entry.  It's
   a one-time operation, not even worth writing a script
   for, and it forces them to create a set of priorities
   before using the system.  Further, by leaving the 
   default of zero in there, it forces valid answers
   because users will be dinged with an FK violation if
   they do not provide a real priority.
&lt;/p&gt;

&lt;p&gt;For this particular example, there is no step 2, because
   the problem is completely solved at the schema level.
   To see how to work with step 2, I will make up an
   example of my own.
&lt;/p&gt;

&lt;h2&gt;Step 2: Unconditional Business Logic&lt;/h2&gt;

&lt;p&gt;To illustrate step 2, I'm going to make up an
   example that is not really appropriate to our 
   reader's question, frankly because I cannot think
   of one for that situation.
&lt;/p&gt;

&lt;p&gt;Let's say we have an eCommerce system, and one
   of our sites wants customer-level discounts based
   on customer groups, while another wants discounts
   based on volume of order -- the more you buy, the
   deeper the discount.  At this point most programmers
   start shouting in the meeting, "We'll make them
   optional!"  Big mistake, because it makes for lots
   of work.  Instead we will make them ignorable.
&lt;/p&gt;

&lt;p&gt;Step 1 is to make ignorable features in the schema.
   Our common code base contains a table of customer
   groups with a discount percent, and in the customers
   table we make a nullable foreign key to the customer
   groups table.  If anybody wants to use it, great, and
   if they want to ignore it, that's also fine.  We do
   the same thing with a table of discount amounts, 
   we make an empty table that lists threshhold amounts
   and discount percents.  If anybody wants to use it
   they fill it in, everybody else leaves it blank.
&lt;/p&gt;

&lt;p&gt;Now for the business logic, the calculations of
   these two discounts.  The crucial idea here is
   &lt;i&gt;not to make up conditional logic that tries to
   figure out whether or not to apply the discounts.&lt;/i&gt;
   It is vastly easier to &lt;i&gt;always apply both 
   discounts, with the discounts coming out zero for
   those users who have ignored the features.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;So for the customer discount, if the customer's
   entry for customer group is null, it will not match
   to any discount, and you treat this as zero.
   Same for the sale amount discount, the lookup to
   see which sale amount they qualify doesn't find
   anything because the table is empty, so it treats 
   it as zero.  
&lt;/p&gt;

&lt;p&gt;So the real trick at the business logic level is
   not to figure out which feature to use, which leads
   to complicatec conditionals that always end up
   conflicting with each other, but to &lt;i&gt;always use
   all features and code them so they have no effect
   when they are being ignored.&lt;/i&gt;
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Once upon a time almost everybody coding for a living
   dealt with these situations -- we all wrote code that
   was going to ship off to live at our customer's site.
   Nowadays this is less common, but for those of us
   dealing with it it is a big deal.
&lt;/p&gt;

&lt;p&gt;The wisdom of the ages is to maintain a common code
   base.  The method suggested here takes that idea
   to its most complete implementation, a totally common
   code base in which all features are active all of
   the time, with no conditionals or optional features
   (except perhaps in the UI and on printed reports),
   and with schema and business logic set up so that
   features that are being ignored simply have no
   effect on the user.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-3712061853478620564?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/3712061853478620564/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=3712061853478620564' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/3712061853478620564'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/3712061853478620564'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2011/01/maintaining-one-code-base-with-possibly.html' title='Maintaining One Code Base with Possibly Conflicting Custom Features'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-829227930117479016</id><published>2011-01-06T20:21:00.005-05:00</published><updated>2011-01-06T23:41:42.192-05:00</updated><title type='text'>Can You Really Create A Business Logic Layer?</title><content type='html'>&lt;p&gt;The past three posts of this little mini-series
   have gone from a &lt;a href="http://database-programmer.blogspot.com/2010/12/working-definition-of-business-logic.html"
   &gt;Working definition of business logic&lt;/a&gt;
   to a &lt;a href="http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"
   &gt;Rigorous definition of business logic&lt;/a&gt;
   and on to some &lt;a href="http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html"
   &gt;theorems about business logic&lt;/a&gt;.
   To wrap things up, I'd like to ask the question,
   is it possible to isolate business logic into
   a single tier?  
&lt;/p&gt;

&lt;h2&gt;Related Reading&lt;/h2&gt;

&lt;p&gt;There are plenty of opinions out there.
   For a pretty thorough explanation of how to put
   everything into the DBMS, check out 
   &lt;a href="http://thehelsinkideclaration.blogspot.com/2009/04/helsinki-code-layers-in-dbms.html"
   &gt;Toon Koppelaar's description&lt;/a&gt;.  Mr.
   Koppelaars has some good material, but you do 
   need to read through his earlier posts to get
   the definitions of some of his terms.  You can also
   follow his links through to some high quality
   discussions elsewhere.
&lt;/p&gt;
   
&lt;p&gt;Contrasting Mr. Koppelaar's opinion is a piece
   which does not have nearly the same impact, IMHO,
   because in 
   &lt;a href="http://www.codeproject.com/KB/architecture/DudeWheresMyBusinessLogic.aspx"
   &gt;Dude, Where's My Business Logic?&lt;/a&gt; we get some solid
   history mixed with normative assertions based on
   either anecdote or nothing at all.  I'm a big believer
   in anecdote, but when I read a sentence that 
   says, "The database should not have any knowledge of what a customer is, but only of the elements that are used to store a customer." then
   I figure I'm dealing with somebody who needs to see 
   a bit more of the world.
&lt;/p&gt;


&lt;h2&gt;Starting At the Top: The User Interface&lt;/h2&gt;

&lt;p&gt;First, let's review that our rigorous definition of business logic 
   includes schema (types and constraints), 
   derived values (timestamps, userstamps, calculations,
   histories), non-algorithmic compound operations
   (like batch billing) and algorithmic compound
   operations, those that require looping in their
   code.  This encompasses everything we might do
   from the simplest passive things like a constraint
   that prevents discounts from being over 100% to
   the most complex hours-long business process,
   along with everything in between accounted for.
&lt;/p&gt;

&lt;p&gt;Now I want to start out by using that definition
   to see a little bit about what is going on in
   the User Interface.  This is not the &lt;i&gt;presentation&lt;/i&gt;
   layer as it is often called but the &lt;i&gt;interaction&lt;/i&gt;
   layer and even the &lt;i&gt;command&lt;/i&gt; layer.
&lt;/p&gt;


&lt;p&gt;Consider an admin interface to 
   a database, where the user is entering or modifying
   prices for the price list.  Now, if the user could
   enter "Kim Stanley Robinson" as the price, that would be
   kind of silly, so of course the numeric inputs
   only allow numeric values.  Same goes for dates. 
&lt;/p&gt;

&lt;p&gt;So the foundation of usability for a UI 
   is at very least
   knowlege of &lt;i&gt;and enforcement of&lt;/i&gt; types in
   the UI layer.  Don't be scared off that I am
   claiming the UI is enforcing anything, we'll 
   get to that a little lower down.
&lt;/p&gt;
   
&lt;p&gt;Now consider the case where the user is 
   typing in a discount rate for this or that,
   and a discount is not allowed to be over 100%.
   The UI really ought to enforce this,
   otherwise the user's time is wasted when she
   enters an invalid value, finishes the entire form,
   and only then gets an error when she tries to
   save.  In the database world we call this
   a constraint, so the UI needs to know about
   constraints to better serve the user.
&lt;/p&gt;

&lt;p&gt;Now this same user is typing a form where there
   is an entry for US State.  The allowed values are
   in a table in the database, and it would be nice
   if the user had a drop-down list, and one that
   was auto-suggesting as the user typed.  Of course 
   the easiest way to do something like this is just
   make sure the UI form "knows" that this field is
   a foreign key to the STATES table, so it can generate
   the list using some generic library function that
   grabs a couple of columns out of the STATES
   table.  Of course, this kind of lookup thing will
   be happening all over the place, so it would work
   well if the UI knew about &lt;i&gt;and enforced&lt;/i&gt; foreign
   keys during entry.
&lt;/p&gt;

&lt;p&gt;And I suppose the user might at some point be 
   entering a purchase order.  The purchase order is
   automatically stamped with today's date.  The
   user might see it, but not be able to change it,
   so now our UI knows about system-generated values.
&lt;/p&gt;

&lt;p&gt;Is this user allowed to delete a customer?
   If not, the button should either be grayed out or not
   be there at all.  The UI needs to know about 
   &lt;i&gt;and enforce&lt;/i&gt; some security.
&lt;/p&gt;

&lt;h2&gt;More About Knowing and Enforcing&lt;/h2&gt;

&lt;p&gt;So in fact the UI layer not only knows the logic
   but is enforcing it.  It is enforcing it for
   two reasons, to improve the user experience with
   date pickers, lists, and so forth, and to prevent the user
   from entering invalid data and wasting round trips.
&lt;/p&gt;

&lt;p&gt;And yet, because we cannot trust what comes in
   to the web server over the wire, we have to
   &lt;i&gt;enforce every single rule a second time when
   we commit the data.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;You usually do not hear people say that the UI
   enforces business logic.  They usually say the
   opposite.  But the UI does enforce business logic.
   The problem is, everything the UI enforces has
   to be enforced again.  That may be why we often
   overlook the fact that it is doing so.
&lt;/p&gt;
   
&lt;h2&gt;The Application and The Database&lt;/h2&gt;

&lt;p&gt;Now let's go through the stuff the UI is
   enforcing, and see
   what happens in the application and the database.
&lt;/p&gt;
   
&lt;p&gt;With respect to &lt;b&gt;type&lt;/b&gt;, a strongly typed language
   will throw an error if the type is wrong, and a weakly
   typed language is wise to put in a type check anyway.
   The the DBMS is going to only allow correctly typed
   values, so, including the UI, 
   &lt;i&gt;type is enforced three times&lt;/i&gt;.
&lt;/p&gt;

&lt;p&gt;With respect to &lt;b&gt;lookups&lt;/b&gt; like US state, in
   a SQL database we always let the server do that
   with a foreign key, if we know what is good for
   us.  That makes double enforcement for lookups.
&lt;/p&gt;

&lt;p&gt;So we can see where this is going.  As we look at
   constraints and security and anything else that
   must be right, we find it will be enforced at least
   twice, and as much as three times.
&lt;/p&gt;

&lt;h2&gt;You Cannot Isolate What Must be Duplicated&lt;/h2&gt;

&lt;p&gt;By defining First Order Business Logic, the simplest
   foundation layer, as including things like types
   and keys and constraints, we find that the enforcement
   of this First Order stuff is done 2 or 3 times, but
   never only once.  
&lt;/p&gt;

&lt;p&gt;This more or less leaves in tatters the idea of a 
   "Business Logic Layer" that is in any way capable of
   handling all business logic all by its lonesome.
   The UI layer is completely useless unless it is
   also enforcing as much logic as possible, and 
   even when we leave the Database Server as the
   final enforcer of First Order Business Logic
   (types, constraints, keys), it is still often good
   engineering to do some checks to prevent 
   expensive wasted trips to the server.
&lt;/p&gt;

&lt;p&gt;So we are wasting time if we sit around trying to figure
   out how to get the Business Logic 
   "where it belongs", because it "belongs" in at
   least two places and sometimes three.  Herding 
   the cats into a single pen is a fool's errand, it
   is at once unnecessary, undesirable, and impossible.
&lt;/p&gt;   

&lt;p&gt;&lt;b&gt;Update: Regular reader Dean Thrasher of Infovark summarizes
   most of what I'm saying here using an apt industry
   standard term: Business Logic is a &lt;i&gt;cross-cutting concern&lt;/i&gt;.
&lt;/b&gt;&lt;/p&gt;

&lt;h2&gt;Some Real Questions&lt;/h2&gt;

&lt;p&gt;Only when we have smashed the concept that Business
   Logic can exist in serene isolation in its own layer
   can we start to ask the questions that would actually
   speed up development and make for better engineering.
&lt;/p&gt;

&lt;p&gt;Freed of the illusion of a separate layer, when we
   look at the higher Third and Fourth Order Business
   Logic, which always require coding, we can decide where
   they go based either on &lt;a href="http://database-programmer.blogspot.com/2010/12/critical-analysis-of-algorithm-sproc.html"
   &gt;engineering&lt;/a&gt; or the availability of qualified
   programmers in particular technologies,
   but we should not make
   the mistake of believing they are going where they
   go because the gods would have it so.
&lt;/p&gt;
   
&lt;p&gt;But the real pressing question if we are seeking
   to create efficient manageable large systems is
   this: how we distribute
   the same business logic into 2 or 3 (or more)
   different places so that it is enforced
   consistently everywhere.  Because a smaller code
   base is always easier to manage than a large one,
   and because configuration is always easier than
   coding, this comes down to meta-data, or if you
   prefer, a data dictionary.  That's the trick that
   always worked for me.
&lt;/p&gt;   

&lt;h2&gt;Is This Only A Matter of Definitions?&lt;/h2&gt;

&lt;p&gt;Anybody who disagrees with the thesis here has
   only to say, "Ken, those things are not business
   logic just because you wrote a blog that says they
   are.  In my world business logic is about &lt;b&gt;code&lt;/b&gt;
   baby!"  Well sure, have it your way.
   After all, the nice thing about definitions is that we
   can all pick the ones we like. 
&lt;/p&gt;

&lt;p&gt;But these definitions, the theorems I derived on
   Tuesday, and the multiple-enforcement thesis presented
   here today should make sense to anbyody struggling
   with where to put the business logic.  That struggle
   and its frustrations come from the mistake of
   &lt;i&gt;imposing abstract
   conceptual responsibilities&lt;/i&gt; on each tier instead
   of &lt;i&gt;using the tiers as each is able to get the
   job done.&lt;/i&gt;  Databases are wonderful for type,
   entity integrity (uniqueness), referential integrity,
   ACID compliance, and many other things.  Use them!
   Code is often better when the problem at hand cannot
   be solved with a combination of keys and constraints
   (Fourth Order Business Logic), but even that code can
   be put into the DB or in the application.
&lt;/p&gt;

&lt;p&gt;So beware of paradigms that assign responsibility
   without compromise to this or that tier.  It cannot
   be done.  Don't be afraid to use code for doing things
   that require structured imperative step-wise operations,
   and don't be afraid to use the database for what it is
   good for, and leave the arguments about "where everything
   belongs" to those with too much time on their hands.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-829227930117479016?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/829227930117479016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=829227930117479016' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/829227930117479016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/829227930117479016'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2011/01/can-you-really-create-business-logic.html' title='Can You Really Create A Business Logic Layer?'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-7146648252760964212</id><published>2011-01-04T17:33:00.001-05:00</published><updated>2011-01-04T17:33:38.639-05:00</updated><title type='text'>Theorems Regarding Business Logic</title><content type='html'>&lt;p&gt;In yesterday's &lt;a href="http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"
   &gt;Rigorous Definition of Business Logic&lt;/a&gt;, we saw that
   business logic can be defined in four orders:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;First Order Business Logic is entities and
    attributes that users (or other agents) can save,
    and the security rules that govern read/write
    access to the entitites and attributes.
    &lt;li&gt;Second Order Business Logic is entities
    and attributes derived by rules and formulas,
    such as calculated values and history tables.
    &lt;li&gt;Third Order Business Logic are non-algorithmic
    compound operations (no structure or looping is
    required in expressing the solution), such as
    a month-end batch billing or, for the old-timers
    out there, a year-end general ledger
    roll-up.
    &lt;li&gt;Fourth Order Business Logic are algorithmic
    compound operations.  These occur when the action
    of one step affects the input to future steps.
    One example is ERP Allocation.
&lt;/ul&gt;

&lt;h2&gt;A Case Study&lt;/h2&gt;

&lt;p&gt;The best way to see if these have any value is to
   cook up some theorems and examine them with an
   example.  We will take
   a vastly simplified time billing system, in which
   employees enter time which is billed once/month to
   customers.  We'll work out some details a little below.
&lt;/p&gt;

&lt;h2&gt;Theorem 1: 1st and 2nd Order, Analysis&lt;/h2&gt;

&lt;p&gt;The first theorem we can derive from these definitions
   is that we should look at First and Second Order Schemas
   together during analysis.  This is because:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;First Order Business Logic is about entities and atrributes
    &lt;li&gt;Second Order Business Logic is about entities and attributes
    &lt;li&gt;Second Order Business Logic is about values
    generated from First Order values and, possibly,
    other Second Order values
    &lt;li&gt;Therefore, Second Order values are always 
    expressed ultimately in terms of First Order
    values
    &lt;li&gt;Therefore, they should be analyzed together
&lt;/ul&gt;

&lt;p&gt;To give the devil his due, ORM does this easily, because
   it ignores so much database theory (paying a large price
   in performance for doing so) and 
   considers an entire row, with its first order and
   second order values together, as being part of one class.
   This is likely the foundation for the claims of ORM
   users that they experience productivity gains when
   using ORM.  Since I usually do nothing but bash ORM,
   I hope this statement will be taken as utterly sincere.
&lt;/p&gt;

&lt;p&gt;Going the other way, database theorists and evangelists
   who adhere to full normalization can hobble an
   analysis effort by refusing to consider
   2nd order because those values &lt;i&gt;denormalize&lt;/i&gt; the database,
   so sometimes the worst of my own crowd will prevent
   analysis by trying to keep these out of the conversation.
   So, assuming I have not pissed off my own friends,
   let's keep going.
&lt;/p&gt;

&lt;p&gt;So let's look at our case study of the time billing
   system.  By theorem 1, our analysis of entities and
   attributes should include both 1st and 2nd order
   schema, something like this:
&lt;/p&gt;

&lt;pre&gt; 
 INVOICES
-----------
 invoiceid      2nd Order, a generated unique value
 date           2nd Order if always takes date of batch run
 customer       2nd Order, a consequence of this being an
                           aggregation of INVOICE_LINES
 total_amount   2nd Order, a sum from INVOICE_LINES
               
 INVOICE_LINES
---------------
 invoiceid      2nd order, copied from INVOICES
 customer         +-  All three are 2nd order, a consequence
 employee         |   of this being an aggregration of
 activity         +-  employee time entries
 rate           2nd order, taken from ACTIVITIES table
                           (not depicted)
 hours          2nd order, summed from time entries
 amount         2nd order, rate * hours
 
 TIME_ENTRIES
--------------
 employeeid     2nd order, assuming system forces this
                    value to be the employee making
                    the entry
 date           1st order, entered by employee
 customer       1st order, entered by employee
 activity       1st order, entered by employee
 hours          1st order, entered by employee
&lt;/pre&gt;

&lt;p&gt;Now, considering how much of that is 2nd order, which
   is almost all of it, the theorem is not only supported
   by the definition, but ought to line up squarely
   with our experience.  Who would want to try to analyze
   this and claim that all the 2nd order stuff should
   not be there?
&lt;/p&gt;

&lt;h2&gt;Theorem 2: 1st and 2nd Order, Implementation&lt;/h2&gt;

&lt;p&gt;The second theorem we can derive from these definitions
   is that First and Second Order Business logic require
   separate implementation techniques.  This is because:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;First Order Business Logic is about user-supplied values
    &lt;li&gt;Second Order Business Logic is about generated values
    &lt;li&gt;Therefore, unlike things cannot be implemented with
    like tools.
&lt;/ul&gt;

&lt;p&gt;Going back to the time entry example, let's zoom in on
   the lowest table, the TIME_ENTRIES.  The employee 
   entering her time must supply customer, date, activity, and
   hours, while the system forces the value of employeeid.
   This means that customer and activity must be validated
   in their respective tables, and hours must be checked
   for something like &lt;= 24.  But for employeeid the
   system provides the value out of its context.
   So the two kinds of values are processed in very
   unlike ways.  It seems reasonable that our code would
   be simpler if it did not try to force both kinds of
   values down the same validation pipe.
&lt;/p&gt;
    
&lt;h2&gt;Theorem 3: 2nd and 3rd Order, Conservation of Action&lt;/h2&gt;

&lt;p&gt;This theorem states that
   the sum of Second and Third Order
   Business Logic is fixed:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Second Order Business Logic is about generating
    entities and attributes by rules or formulas
    &lt;li&gt;Third Order Business Logic is coded
    compound creation of entities and attributes
    &lt;li&gt;Given that a particular set of requirements
    resolves to a finite set of actions that generate
    entities and values, then
    &lt;li&gt;The sum of Second Order and Third Order Business
    Logic is fixed.
&lt;/ul&gt;

&lt;p&gt;In plain English, this means that the more Business
   Logic you can implement through 2nd Order
   &lt;i&gt;declarative&lt;/i&gt; rules and formulas, the fewer
   processing routines you have to code.  Or, if you
   prefer, the more processes you code, the fewer 
   declarative rules about entitities and 
   attributes you will have.
&lt;/p&gt;

&lt;p&gt;This theorem may be hard to compare to experience
   for verification
   because most of us are so used to thinking in 
   terms of the batch billing as a process that we cannot imagine it
   being implemented any other way: how exactly am I
   suppose to implement batch billing &lt;i&gt;declaratively?&lt;/i&gt;.
&lt;/p&gt;

&lt;p&gt;Let's go back to the schema above, where we can 
   realize upon examination that the entirety of the batch
   billing "process" has been detailed in a 2nd Order
   Schema, if we could somehow add these facts to our
   CREATE TABLE commands the way we add keys, types,
   and constraints, batch billing would occur
   without the batch part.
&lt;/p&gt;

&lt;p&gt;Consider this.  Imagine that a user enters a 
   a TIME_ENTRY.  The system
   checks for a matching EMPLOYEE/CUSTOMER/ACTIVITY
   row in INVOICE_DETAIL, and when it finds the row
   it updates the totals.  But if it does not find 
   one then it creates one!  Creation
   of the INVOICE_DETAIL record causes the system to
   check for the existence of an invoice for that
   customer, and when it does not find one it creates
   it and initializes the totals.  Subsequent time entries
   not only update the INVOICE_DETAIL rows but the
   INVOICE rows as well.  If this were happening, there would be no
   batch billing at the end of the month because the
   invoices would all be sitting there ready to go
   when the last time entry was made.
&lt;/p&gt;

&lt;p&gt;By the way, I coded something that does this in a
   pretty straight-forward way a few years ago, meaning
   you could skip the batch billing process and add a few
   details to a schema that would cause the database to
   behave exactly as described above.  Although the
   the format for specifying these extra features
   was easy enough (so it seemed to me as the author),
   it seemed the &lt;i&gt;conceptual shift of thinking&lt;/i&gt;
   that it required of people was far larger than I
   initially and naively imagined.  Nevertheless, 
   I toil forward, and that is
   the core idea behind my &lt;a href="http://code.google.com/p/triangulum-db/"
   &gt;Triangulum&lt;/a&gt; project.
   
   

&lt;h2&gt;Observation: There Will Be Code&lt;/h2&gt;

&lt;p&gt;This is not so much a theorem as an observation.
   This observation is that if your application
   requires Fourth Order Business Logic then somebody
   is going to code something somewhere.
&lt;/p&gt;

&lt;p&gt;An anonymous reader pointed out in the comments
   to &lt;a href="http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"
   &gt;Part 2&lt;/a&gt; that Oracle's MODEL clause may work
   in some cases.  I would assume so, but I would also
   assume that reality can create complicated Fourth
   Order cases faster than SQL can evolve.  Maybe.
&lt;/p&gt;
   

&lt;p&gt;But anyway, the real observation here is is that
   no modern language, either app 
   level or SQL flavor, can express an algorithm
   declaratively.  In other words, no combination
   of keys, constraints, calculations and derivations,
   and no known combination of advanced SQL functions
   and clauses
   will express an ERP Allocation routine or a
   Magazine Regulation routine.  So you have to code it.
   This may not always be true, but I think it is
   true now.
&lt;/p&gt;

&lt;p&gt;This is in contrast to the example given in the
   previous section about the fixed total of
   2nd and 3rd Order Logic.  Unlike that example,
   you cannot provide enough
   2nd order wizardry to eliminate fourth order.
   &lt;i style="color:gray"&gt;(well ok maybe you can,
   but I haven't figured it
   out yet myself and have never heard that anybody
   else is even trying.  The trick would be to have
   a table that you truncate and insert a single row
   into, a trigger would fire that would know how
   to generate the
   next INSERT, generating a cascade.  Of course, since
   this happens in a transaction, if you end up 
   generating 100,000 inserts this might be a bad
   idea ha ha.)&lt;/i&gt;
&lt;/p&gt;

&lt;h2&gt;Theorem 5: Second Order Tools Reduce Code&lt;/h2&gt;

&lt;p&gt;This theorem rests on the acceptance of an observation,
   that using meta-data repositories, or data dictionaries,
   is easier than coding.  If that does not hold true,
   then this theorem does not hold true.  But if that 
   observation (my own observation, admittedly) does
   hold true, then:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;By Theorem 3, the sum of 2nd and 3rd order
    logic is fixed
    &lt;li&gt;By observation, using meta-data that manages
    schema requires less time than coding,
    &lt;li&gt;By Theorem 1, 2nd order is analyzed and specified
    as schema
    &lt;li&gt;Then it is desirable to specify as much business
    logic as possible as 2nd order schema, reducing
    and possibly eliminating manual coding of Third
    Order programs.
&lt;/ul&gt;

&lt;p&gt;Again we go back to the batch billing example.
   Is it possible to convert it all to 2nd Order as
   described above.  Well yes it is, because I've done
   it.  The trick is an extremely counter-intuitive
   modification to a foreign key that causes a 
   failure to actually generate the parent row that
   would let the key succeed.  To find out more about
   this, check out &lt;a href="http://code.google.com/p/triangulum-db/"
   &gt;Triangulum&lt;/a&gt; (not ready for prime time as of this
   writing).
&lt;/p&gt; 

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;The major conclusion in all of this is that anlaysis
   and design should begin with First and Second Order
   Business Logic, which means working out schemas, both
   the user-supplied values and the system-supplied
   values.
&lt;/p&gt;

&lt;p&gt;When that is done, what we often call "processes" 
   are layered on top of this.
&lt;/p&gt;

&lt;p&gt;Tomorrow we will see part 4 of 4, examining the
   business logic layer, asking, is it possible to
   create a pure business logic layer that gathers
   all business logic unto itself?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-7146648252760964212?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/7146648252760964212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=7146648252760964212' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7146648252760964212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7146648252760964212'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html' title='Theorems Regarding Business Logic'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-7772768006334848953</id><published>2011-01-02T13:05:00.004-05:00</published><updated>2011-01-04T17:36:12.995-05:00</updated><title type='text'>Business Logic: From Working Definition to Rigorous Definition</title><content type='html'>&lt;p&gt;This is part 2 of a 4 part mini-series that began
   before the holidays with  &lt;a href="http://database-programmer.blogspot.com/2010/12/working-definition-of-business-logic.html"
   &gt;A Working Definition Business Logic&lt;/a&gt;.  Today we proceed
   to a rigorous definition, tomorrow we will see &lt;a href=
   "http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html"&gt;some theorems&lt;/a&gt;,
   and the series will wrap up with a post on the "business layer."
&lt;/p&gt;

&lt;p&gt;In the first post, the working definition said that
   business logic includes at least:

&lt;ul&gt;&lt;li&gt;The Schema
    &lt;li&gt;Calculations
    &lt;li&gt;Processes
&lt;/ul&gt;

&lt;p&gt;None of these was very rigorously defined, kind of a 
   "I'll know it when I see it" type of thing, and we did
   not talk at all about security.  Now the task becomes
   tightening this up into a rigorous definition.
&lt;/p&gt;

&lt;h2&gt;Similar Reading&lt;/h2&gt;

&lt;p&gt;Toon Koppelaars has some excellent material along
   these same lines, and a good place to start is his
   &lt;a href="http://thehelsinkideclaration.blogspot.com/2009/03/window-on-data-applications.html"&gt;Helsinki Declaration (IT Version)&lt;/a&gt;.
   The articles have a different focus than this series,
   so they make great contrasting reading.  I consider
   my time spent reading through it very well spent.
&lt;/p&gt;
   

&lt;h2&gt;Definitions, Proofs, and Experience&lt;/h2&gt;

&lt;p&gt;What I propose below is a definition in four parts.
   As definitions, they are not supposed
   to prove anything, but they are definitely supposed
   to ring true to the experience of any developer
   who has created or worked on
   a non-trivial business application.  This effort
   would be a success if we reach some concensus that
   "at least it's all in there", even if we go
   on to argue bitterly about which components
   should be included in which layers.
&lt;/p&gt;

&lt;p&gt;Also, while I claim the definitions below are
   rigorous, they are not yet &lt;i&gt;formal&lt;/i&gt;.  My
   instinct is that formal definitions can be
   developed using &lt;a href="http://en.wikipedia.org/wiki/First-order_logic"
   &gt;First Order Logic&lt;/a&gt;, which would allow the
   theorems we will see tomorrow to move from
   "yeah that sounds about right" to being
   formally provable.
&lt;/p&gt;

&lt;p&gt;As for their practical benefit, inasmuch as
   "the truth shall make you free", we ought to be
   able to improve our architectures if we can settle
   at very least &lt;i&gt;what we are talking about&lt;/i&gt;
   when we use the vague term "business logic."
&lt;/p&gt;   

&lt;h2&gt;The Whole Picture&lt;/h2&gt;

&lt;p&gt;What we commonly call "business logic", by
   which we vaguely mean, "That stuff I have
   to code up",
   can in fact be rigorously defined
   as having four parts, which I believe are
   best termed &lt;i&gt;orders&lt;/i&gt;, as there is a definite
   precedence to their discovery, analysis and implementation.
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;First Order: Schema 
    &lt;li&gt;Second Order: Derivations
    &lt;li&gt;Third Order: Non-algorithmic compound operations
    &lt;li&gt;Fourth Order: Algorithmic compound operations
&lt;/ul&gt;

&lt;p&gt;Now we examine each order in detail.
&lt;/p&gt;

&lt;h2&gt;A Word About Schema and NoSQL&lt;/h2&gt;

&lt;p&gt;Even "schema-less" databases have a schema, they
   simply do not enforce it in the database server.
   Consider: an eCommerce site using MongoDB is not
   going to be tracking the local zoo's animal
   feeding schedule, because that is out of scope.
   No, the code
   is limited to dealing with orders, order lines,
   customers, items and stuff like that.
&lt;/p&gt;
   
&lt;p&gt;&lt;i&gt;It is in the very act of expressing scope as
   "the data values we will handle" that a schema is
   developed.&lt;/i&gt;  This holds true regardless of whether
   the datastore will be a filesystem, an RDBMS, a 
   new NoSQL database, or anything else.
&lt;/p&gt;

&lt;p&gt;Because all applications have a schema, whether the
   database server enforces it or whether the 
   application enforces it, we need a vocabulary
   to discuss the schema.  Here we have an embarrasment
   of choices, we can talk about entities and attributes,
   classes and properties, documents and values, or
   columns and tables.  The choice of "entities and
   attributes" is likely best because it is as close as
   possible to an implementation-agnostic language.
&lt;/p&gt;

&lt;h2&gt;First Order Business Logic: Schema&lt;/h2&gt;

&lt;p&gt;We can define schema, including security, as:
&lt;/p&gt;

&lt;p class="quote"&gt;that body of entities and 
   their attributes whose relationships and
   values will be managed by the
   application stack, including the authorization of
   roles to read or write to entities and properties.
&lt;/p&gt;

&lt;p&gt;Schema in this definition does not include derived
   values of any kind or the processes that may operate
   on the schema values, those are higher order of 
   business logic.  This means that the schema 
   actually defines &lt;i&gt;the entire body of values that
   the application will accept from outside sources
   (users and other programs) and commit to the
   datastore.&lt;/i&gt; Restating again into even more
   practical terms, the schema is the stuff users
   can save themselves.
&lt;/p&gt;
   
&lt;p&gt;With all of that said, let's enumerate the properties
   of a schema.  
&lt;/p&gt;   

&lt;p&gt;&lt;b&gt;Type&lt;/b&gt; is required for every attribute.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Constraints&lt;/b&gt; are limits to the values allowed
   for an attribute beyond its type.  We may have a
   discount percent that may not exceed 1.0 or 100%.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Entity Integrity&lt;/b&gt; is usually thought of 
   in terms of primary keys
   and the vague statement "you can't have duplicates."
   We cannot have a list of US States where "NY" is
   listed 4 times.  
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Referential Integrity&lt;/b&gt; means that when one
   entity links or refers to another entity, it must
   always refer to an existing entity.
   We cannot have some script kiddie flooding our 
   site with sales of 
   items "EAT_ME" and "F***_YOU", becuase those are
   not valid items.
&lt;/p&gt;

&lt;p&gt;The general term 'validation' is not included 
   because any particular validation rule is 
   is a combination of any or all of type, constraints,
   and integrity rules.
&lt;/p&gt;

&lt;h2&gt;Second Orders Business Logic: Derived values&lt;/h2&gt;

&lt;p&gt;When we speak of derived values, we usually 
   mean calculated values, but some derivations
   are not arithmetic, so the more general term
   "derived" is better.  Derivations are:
&lt;/p&gt;

&lt;p class="quote"&gt;A complete entity or an attribute
   of an entity generated from other entities
   or attributes according to a formula or rule.
&lt;/p&gt;

&lt;p&gt;The definition is sufficiently general that
   a "formula or rule" can include conditional
   logic.
&lt;/p&gt;

&lt;p&gt;Simple arithmetic derived values include things
   like calculating price * qty, or summing an
   order total.
&lt;/p&gt;

&lt;p&gt;Simple non-arithmetic derivations include
   things like
   fetching the price of an item to use on an
   order line.  The price in the order is &lt;i&gt;defined&lt;/i&gt;
   as being a copy of the item's price at the
   time of purchase.
&lt;/p&gt;

&lt;p&gt;An example of a complete entity being derived
   is a history table that tracks changes
   in some other table.
   This can also be implemented
   in NoSQL as a set of documents tracking the
   changes to some original document.
&lt;/p&gt;

&lt;p&gt;Security also applies to generated values
   only insofar as who can see them.  But security
   is not an issue for writing these values
   because by definition they are generated from
   formulas and rules, and so no outside user 
   can ever attempt to explicitly specify the
   value of a derived entity or property.
&lt;/p&gt;

&lt;p&gt;One final point about Second Order Business
   Logic is that it can be expressed declaratively,
   &lt;i&gt;if we have the tools&lt;/i&gt;, which we do not, at
   least not in common use.  I wrote one myself some
   years ago and am re-releasing it as &lt;a href=
   "http://code.google.com/p/triangulum-db/"
   &gt;Triangulum&lt;/a&gt;, but that is a post for another day.
&lt;/p&gt;

&lt;h2&gt;Sorting out First and Second Order&lt;/h2&gt;

&lt;p&gt;The definitions of First and Second Order Business Logic
   have the
   advantage of being agnostic to what kind of
   datastore you are using, and being agnostic
   to whether or not the derived values are
   materialized.  (In relational terms, derivations
   are almost always &lt;i&gt;denormalizing&lt;/i&gt; if
   materialized, so in a fully normalized database
   they will not be there, and you have to go through
   the application to get them.)
&lt;/p&gt;

&lt;p&gt;Nevertheless, these two definitions can right off
   bring some confusion to the term "schema." 
   Example: a history table is absolutely in a database schema,
   but I have called First Order Business Logic "schema" and
   Second Order Business Logic is, well, something else.
   The best solution here is to simply use the
   terms First Order Schema and Second Order Schema.
   An order_lines table is First Order schema, and 
   the table holding its history is Second Order Schema.
&lt;/p&gt;

&lt;p&gt;The now ubiquitous auto-incremented surrogate primary
   keys pose another stumbling block.  Because they are
   used so often (and so often because of seriously faulty
   reasoning, see &lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html"
   &gt;A Sane Approach To Choosing Primary Keys&lt;/a&gt;) they
   would automatically be considered schema -- one of the
   very basic values of a sales order, check, etc.  But
   they are system-generated so they must be Second Order, no?
   Isn't the orderid a very basic part of the schema and
   therefore First Order?  No.  In fact, by these 
   definitions, very little if any of an order header 
   is First Order, the tiny fragments that are first order
   might be the shipping address, the user's choice of
   shipping method, and payment details provided by the
   user.  The other information that is system-generated,
   like Date, OrderId, and order total are all Second
   Order.
&lt;/p&gt;

&lt;h2&gt;Third Order Business Logic&lt;/h2&gt;

&lt;p&gt;Before defining Third Order Business Logic
   I would like to offer a simple example:
   &lt;b&gt;Batch Billing&lt;/b&gt;.  A consulting
   company bills by the hour.  Employees enter time
   tickets throughout the day.  At the end of the
   month the billing agent runs a program that, in
   SQL terms:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Inserts a row into INVOICES for each
    customer with any time entries
    &lt;li&gt;Inserts a row into INVOICE_LINES that
    aggregates the time for each employee/customer
    combination.
&lt;/ul&gt;

&lt;p&gt;This example ought to make clear what I mean by
   definining Third Order Business Logic as:
&lt;/p&gt;

&lt;p class="quote"&gt;A Non algorithmic compound 
   operation.
&lt;/p&gt;

&lt;p&gt;The "non-algorithmic" part comes from the fact that
   none of the individual documents, an INVOICE
   row and its INVOICE_LINES, is dependent on any other.
   There is no case in which the
   invoice for one customer will influence the value
   of the invoice for another.   You do not need an
   algorithm to do the job, just one or more steps
   that may have to go in a certain order.
&lt;/p&gt;

&lt;p&gt;Put another way, it is a one-pass set-oriented
   operation.  The fact that it must be executed in
   two steps is an &lt;i&gt;artifact&lt;/i&gt; of how database
   servers deal with referential integrity, which is
   that you need the headers before you can put in
   the detail.  In fact,
   when using a NoSQL database, it may be possible to 
   insert the complete set of documents in one 
   command, since the lines can be nested directly
   into the invoices.
&lt;/p&gt;

&lt;p&gt;Put yet a third way, in more practical terms,
   there is no conditional or looping logic required
   to &lt;i&gt;specify the operation&lt;/i&gt;.  This does not
   mean there will be no looping logic in the final
   implementation, because performance concerns and
   locking concerns may cause it to be implemented
   with 'chunking' or other strategies, but the
   important point is that the &lt;i&gt;specification&lt;/i&gt;
   does not include loops or step-wise operations
   because the individual invoices are all 
   functionally independent of each other.
&lt;/p&gt;

&lt;p&gt;I do not want to get side-tracked here, but I
   have had a working hypothesis in my mind for
   almost 7 years that Third Order Business Logic,
   even before I called it that, is an &lt;i&gt;artifact&lt;/i&gt;,
   which appears necessary because of the limitations
   of our tools.  In future posts I would like to
   show how a fully developed understanding and
   implementation of Second Order Business Logic 
   can dissolve many cases of Third Order.
&lt;/p&gt;

&lt;h2&gt;Fourth Order Business Logic&lt;/h2&gt;

&lt;p&gt;We now come to the upper bound of complexity
   for business logic, Fourth Order, which
   we label "algorithmic compound operations",
   and define a particular Fourth Order Business
   Logic process as:
&lt;/p&gt;

&lt;p class="quote"&gt;Any operation where it
   is possible or certain that
   there will be at least
   two steps, X and Y, such that the result
   of Step X modifies the inputs available to
   Step Y.
&lt;/p&gt;

&lt;p&gt;In comparison to Third Order:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;In Third Order the results are 
    independent of one another, in Fourth Order
    they are not.
    &lt;li&gt;In Third Order no conditional or branching
    is required to express the solution, while in
    Fourth Order conditional, looping, or branching
    logic will be present in the expression of the
    solution.
&lt;/ul&gt;

&lt;p&gt;Let's look at the example of ERP Allocation.
   In the interest of brevity, I am going to skip most
   of the explanation of the ERP Allocation algorithm
   and stick to this basic review: a company has a list
   of sales orders (demand) and a list of purchase
   orders (supply).  Sales orders come in through EDI,
   and at least once/day the purchasing department
   must match supply to demand to find out what they
   need to order.  Here is an unrealistically simple
   example of the supply and demand they might be facing:
&lt;/p&gt;

&lt;pre&gt;
  *** DEMAND ***          *** SUPPLY ***

    DATE    | QTY           DATE    | QTY
------------+-----      ------------+----- 
  3/ 1/2011 |  5          3/ 1/2011 |  3
  3/15/2011 | 15          3/ 3/2011 |  6
  4/ 1/2011 | 10          3/15/2011 | 20
  4/ 3/2011 |  7   
&lt;/pre&gt;

&lt;p&gt;The desired output of the ERP Allocation
   might look like this:
&lt;/p&gt;

&lt;pre&gt;
 *** DEMAND ***      *** SUPPLY ****
    DATE    | QTY |  DATE_IN   | QTY  | FINAL 
------------+-----+------------+------+-------
  3/ 1/2011 |  5  |  3/ 1/2011 |  3   |  no
                  |  3/ 3/2011 |  2   | Yes 
  3/15/2011 | 15  |  3/ 3/2011 |  4   |  no
                  |  3/15/2011 | 11   | Yes
  4/ 1/2011 | 10  |  3/15/2011 |  9   |  no
  4/ 3/2011 |  7  |    null    | null |  no
&lt;/pre&gt;

&lt;p&gt;From this the purchasing agents know that the
   Sales Order that ships on 3/1 will be two days
   late, and the Sales Orders that will ship on
   4/1 and 4/3 cannot be filled completely.  They
   have to order more stuff.
&lt;/p&gt;

&lt;p&gt;Now for the killer question: Can the desired
   output be generated in a single SQL query?
   The answer is no, not even with Common
   Table Expressions or other recursive constructs.
   The reason is that &lt;b&gt;each match-up of a purchase
   order to a sales order modifies the supply
   available to the next sales order.&lt;/b&gt;  Or,
   to use the definition of Fourth Order Business
   Logic, each iteration will consume some supply
   and so &lt;i&gt;will affect the inputs available to
   the next step&lt;/i&gt;.
&lt;/p&gt;
                  
&lt;p&gt;We can see this most clearly if we look at some
   pseudo-code:
&lt;/p&gt;

&lt;pre&gt;
for each sales order by date {
   while sales order demand not met {
      get earliest purchase order w/qty avial &amp;gt; 0
         break if none
      make entry in matching table
      &lt;b&gt;&lt;font color="blue"&gt;// This is the write operation that 
      // means we have Fourth Order Business Logic&lt;/font&gt;
      reduce available qty of purchase order&lt;/b&gt;
   }
   break if no more purchase orders
}
&lt;/pre&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;As stated in the beginning, it is my belief
   that these four orders should "ring true" with 
   any developer who has experience with non-trivial
   business applications.  Though we may dispute
   terminology and argue over edge cases, the 
   recognition and naming of the Four Orders should 
   be of immediate benefit during analysis, design,
   coding, and refactoring.  They rigorously
   establish both the minimum and maximum bounds of
   complexity while also filling in the two kinds of
   actions we all take between those bounds. 
   They are datamodel agnostic,
   and even agnostic to implementation strategies
   within data models (like the normalize/denormalize
   debate in relational). 
&lt;/p&gt;

&lt;p&gt;But their true power is in providing a framework
   of thought for the process of synthesizing 
   requirements into a specification and from there
   an implementation.
&lt;/p&gt;

&lt;p&gt;Tomorrow we will see some theorems that we can
   derive from these definitions.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-7772768006334848953?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/7772768006334848953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=7772768006334848953' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7772768006334848953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7772768006334848953'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html' title='Business Logic: From Working Definition to Rigorous Definition'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-1727788788282862567</id><published>2010-12-21T22:25:00.006-05:00</published><updated>2011-01-04T17:34:42.521-05:00</updated><title type='text'>A Working Definition of Business Logic, with Implications for CRUD Code</title><content type='html'>&lt;p&gt;&lt;b&gt;Update: the &lt;a href="http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"
&gt;Second Post&lt;/a&gt; of this series is now available.
&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Update: the &lt;a href="http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html"
&gt;Third Post&lt;/a&gt; of this series is now available.
&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Business_logic"
   &gt;Wikipedia entry on "Business Logic"&lt;/a&gt; has a 
   wonderfully honest opening sentence stating
   that "Business logic,
   or domain logic, is a &lt;i&gt;non-technical term&lt;/i&gt;... 
   (emphasis mine)".  If this is true, that the term
   is non-technical, or if you like, &lt;i&gt;non-rigorous&lt;/i&gt;,
   then most of us spend the better part of our efforts
   working on something that &lt;i&gt;does not even have a definition&lt;/i&gt;.
   Kind of scary.
&lt;/p&gt;

&lt;p&gt;Is it possible to come up with a decent working 
   definition of business logic?  It is certainly
   worth a try.  This post is the first in a four
   part series.  The &lt;a href="http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"&gt;second post&lt;/a&gt; is about
   a more rigorous definition of Business Logic.
&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Complete Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;The Method&lt;/h2&gt;

&lt;p&gt;In this essay we will pursue a method of finding
   operations that we can define as business logic
   with a minimum of controversey, and identify those
   that can likely be excluded with a minimum of
   controversey.  This may leave a bit of gray area
   that can be taken up in a later post.
&lt;/p&gt;

&lt;h2&gt;An Easy Exclusion: Presentation Requirements&lt;/h2&gt;

&lt;p&gt;If we define &lt;b&gt;Presentation Requirements&lt;/b&gt;
   as all requirements about "how it looks" 
   as opposed to "what it is", then we can rule
   these out.  But if we want to be rigorous
   we have to be clear, Presentation Requirements
   has to mean things like branding, skinning,
   accessibility, any and all formatting, and 
   anything else that is about the appearance
   and not about the actual values fetched from
   somewhere.
&lt;/p&gt;

&lt;h2&gt;Tables as the Foundation Layer&lt;/h2&gt;

&lt;p&gt;Database veterans are likely to agree that your
   table schema constitutes the foundation layer of all
   business rules.  The schema, being the tables,
   columns, and keys, determines &lt;i&gt;what must be
   provided&lt;/i&gt; and &lt;i&gt;what must be excluded&lt;/i&gt;.
   If these are not business logic, I guess I don't
   know what is.
&lt;/p&gt;

&lt;p&gt;What about CouchDB and MongoDB and others that do
   not require a predefined schema?  These systems
   give up the advantages of a fixed schema for
   scalability and simplicity.  I would argue here
   that the schema has not disappeared, it has simply
   moved into the code that writes documents to the
   database.  Unless the programmer wants a nightmare
   of chaos and confusion, he will enforce some document
   structure in the code, and so I still think it safe
   to say that even for these databases there is a
   schema &lt;i&gt;somewhere&lt;/i&gt; that governs what must be
   stored and what must be excluded.
&lt;/p&gt;

&lt;p&gt;So we have at least a foundation for a rigorous 
   definition of business rules: the schema, be it
   enforced by the database itself or by the code,
   forms the bottom layer of the business logic.
&lt;/p&gt;

&lt;h2&gt;Processes are Business Logic&lt;/h2&gt;

&lt;p&gt;The next easy addition to our definition of
   business logic would be processes, where
   a process can be defined loosely as anything
   involving multiple statements, can run without
   user interaction, may depend on parameters
   tables, and may take longer
   than a user is willing to wait, requiring background
   processing.
&lt;/p&gt;

&lt;p&gt;I am sure we can all agree this is business logic,
   but as long as we are trying to be rigorous, we
   might say it is business logic because:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;It must be coded
    &lt;li&gt;The algorithm(s) must be inferred from the requirements
    &lt;li&gt;It is entirely independent of Presentation Requirements
&lt;/ul&gt;

&lt;h2&gt;Calculations are Business Logic&lt;/h2&gt;

&lt;p&gt;We also should be able to agree that calculated
   values like an order total, and the total after
   tax and freight, are business logic.  These are
   things we must code for to take user-supplied
   values and complete some picture.
&lt;/p&gt;

&lt;p&gt;The reasons are the same as for processes, they 
   must be coded, the formulas must often be inferred
   from requirements (or forced out of The Explainer
   at gunpoint), and the formulas are completely
   independent of Presentation Requirements.
&lt;/p&gt;
   
&lt;h2&gt;The Score So Far&lt;/h2&gt;

&lt;p&gt;So far we have excluded "mere" Presentation 
   Requirements, and included three entries I hope
   so far are non-controversial:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Schema
    &lt;li&gt;Processes
    &lt;li&gt;Calculations
&lt;/ul&gt;

&lt;p&gt;These are three things that some programmer must
   design and code.  The schema, either in a 
   conventional relational database or in application
   code.  Processes, which definitely must be
   coded, and calculations, which also have to be
   coded.
&lt;/p&gt;

&lt;h2&gt;What Have We Left Out?&lt;/h2&gt;

&lt;p&gt;Plenty.  At very least security and notifications.
   But let's put those off for another day and see
   how we might handle what we have so far.
&lt;/p&gt;

&lt;p&gt;For the Schema, I have already mentioned that you
   can either put it into a Relational database or
   manage it in application code when using a "NoSQL" 
   database.  More than that will have to wait for 
   2011, when I am hoping to run a series detailing
   different ways to implement schemas. I'm kind of
   excited to play around with CouchDB or MongoDB.
&lt;/p&gt;

&lt;p&gt;For processes, I have a &lt;a href="http://database-programmer.blogspot.com/2010/12/critical-analysis-of-algorithm-sproc.html"
   &gt;separate post&lt;/a&gt; that examines the implications
   of the stored procedure route, the embedded SQL route,
   and the ORM route.
&lt;/p&gt;

&lt;p&gt;This leaves calculations.  Let us now see how
   we might handle calculations.
&lt;/p&gt;

&lt;h2&gt;Mixing CRUD and Processes&lt;/h2&gt;

&lt;p&gt;But before we get to CRUD, I should state that
   if your CRUD code involves processes,
   &lt;i&gt;seek professional help immediately&lt;/i&gt;.
   Mixing processes into CRUD is an extremely
   common design error, and it can be
   devastating.  It can be recognized
   when somebody says, "Yes, but when the salesman
   closes the sale we have to pick this up and move
   it over there, and then we have to...."  
&lt;/p&gt;   

&lt;p&gt;Alas, this post is running long already and so
   I cannot go into exactly how to solve these, but
   the solution will always be one of these:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Spawning a background job to run the process
    asynchronously.  Easy because you don't have to
    recode much, but highly suspicous.
    &lt;li&gt;Examining why it seems necessary to do so
    much work on what ought to be a single INSERT
    into a sales table, with perhaps a few extra
    rows with some details.  Much the better solution,
    but often very hard to see without a second pair
    of eyes to help you out.
&lt;/ul&gt;

&lt;p&gt;So now we can move on to pure CRUD operations.&lt;/p&gt;
   
&lt;h2&gt;Let The Arguments Begin: Outbound CRUD&lt;/h2&gt;

&lt;p&gt;Outbound CRUD is any application code that
   grabs data from the database and passes it
   up to the Presentation layer. 
&lt;/p&gt;

&lt;p&gt;A &lt;b&gt;fully normalized database&lt;/b&gt; 
   will, in appropriate cases, require business logic of the
   calculations variety, otherwise the
   display is not
   complete and meaningful to the user.
   There is really no
   getting around it in those cases.
&lt;/p&gt;

&lt;p&gt;However, a database &lt;b&gt;Denormalized With
   Calculated Values&lt;/b&gt; requires no business logic
   for outbound CRUD, it only has to pick up what
   is asked for and pass it up.  This is the route
   I prefer myself.
&lt;/p&gt;

&lt;p&gt;Deciding whether or not to include denormalized
   calculated values has heavy implications for
   the architecture of your system, but before we
   see why, we have to look at inbound CRUD.
&lt;/p&gt;

&lt;h2&gt;Inbound CRUD&lt;/h2&gt;
   
&lt;p&gt;Inbound CRUD, in terms of business logic, is
   the mirror image of outbound.  If your
   database is fully normalized, inbound CRUD
   should be free of business logic, since it
   is simply taking requests and pushing them to
   the database.  However, if you are denormalizing
   by adding derived values, then it has to be
   done on the way in, so inbound CRUD code must
   contain business logic code of the calculations
   variety.
&lt;/p&gt;

&lt;p&gt;Now let us examine how the normalization
   question affects system architecture and
   application code.
&lt;/p&gt;

&lt;h2&gt;Broken Symmetry&lt;/h2&gt;

&lt;p&gt;As stated above, denormalizing by including
   derived values forces calculated business
   logic on the inbound path, but frees your
   outbound path to be the "fast lane".  
   The opposite decision, not storing calculated
   values, allows the inbound path to be the "fast lane"
   and forces the calculations into the outbound
   path.
&lt;/p&gt;

&lt;p&gt;The important conclusion is: if you have business
   logic of the calculation variety in both lanes
   then you may have some inconsistent practices,
   and there may be some gain involved in sorting
   those out.
&lt;/p&gt;
   
&lt;p&gt;But the two paths are not perfectly symmetric.
   Even a fully normalized database will often,
   sooner or later, commit those calculated values
   to columns.  This usually happens when some
   definition of finality is met.  Therefore, since
   the inbound path is more likely to contain calculations
   in any case, the two options are not really 
   balanced.  This is one reason why I prefer
   to store the calculated values and get them right
   on the way in.
&lt;/p&gt;
   
&lt;h2&gt;One Final Option&lt;/h2&gt;

&lt;p&gt;When people ask me if I prefer to put business
   logic in the server, it is hard to answer without
   a lot of information about context.  But when
   calculations are involved the answer is yes.
&lt;/p&gt;

&lt;p&gt;The reason is that calculations are incredibly
   easy to fit into patterns.  The patterns themselves
   (almost) all follow foreign keys, since the foreign
   key is the only way to correctly relate data between
   tables.  So you have the "FETCH" pattern, where a 
   price is fetched from the items table to the cart,
   the "EXTEND" pattern, where qty * price = extended_Price,
   and various "AGGREGATE" patterns, where totals are
   summed up to the invoice.  There are others, but it
   is surprising how many calculations fall into these
   patterns.
&lt;/p&gt;

&lt;p&gt;Because these patterns are so easy to identify, it
   is actually conceivable to code triggers by hand
   to do them, but being an incurable toolmaker, I
   prefer to have a code generator put them together
   out of a data dictionary.  More on that around the
   first of the year.
&lt;/p&gt;

&lt;h2&gt;Updates&lt;/h2&gt;

&lt;p&gt;&lt;b&gt;Update 1:&lt;/b&gt;  I realize I never made it quite clear that this is part
   1, as the discussion so far seems reasonable but is
hardly rigorous (yet).  Part 2 will be on the way after I've
fattened up for the holidays.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Update 2:&lt;/b&gt; It is well worth following the link Mr. Koppelaars has put in the comments:
&lt;a href="http://thehelsinkideclaration.blogspot.com/2009/03/window-on-data-applications.html"&gt;http://thehelsinkideclaration.blogspot.com/2009/03/window-on-data-applications.html&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-1727788788282862567?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/1727788788282862567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=1727788788282862567' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1727788788282862567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1727788788282862567'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/working-definition-of-business-logic.html' title='A Working Definition of Business Logic, with Implications for CRUD Code'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-105520872492947858</id><published>2010-12-19T13:10:00.001-05:00</published><updated>2010-12-19T18:38:55.356-05:00</updated><title type='text'>User-Submitted Analysis Topic: Email</title><content type='html'>&lt;p&gt;Reader &lt;a href="mailto:dean.thrasher@infovark.com"&gt;Dean Thrasher&lt;/a&gt;
   of &lt;a href="http://www.infovark.com"&gt;Infovark&lt;/a&gt; has submitted
   a schema for review and analysis as part of my 
   &lt;a href="http://database-programmer.blogspot.com/p/submit-your-analysis-request.html"
   &gt;User-Submitted Analysis Request&lt;/a&gt; series.
  Today we are going to take a first look
   at what he has.  Mr. Thrasher and I both hope that any and all readers
   will benefit from the exercise of publicly reviewing the schema.
&lt;/p&gt;

&lt;p&gt;This particular analysis request is a great start to the series,
   because it has to do with email.  Everybody uses email so we all
   understand at a very basic level what data will be handled.

&lt;h2&gt;Brief Introduction to User-Submitted Schemas&lt;/h2&gt;

&lt;p&gt;Mr. Thrasher and I have exchanged a couple of emails, but we have
   avoided any in-depth discussion.  Instead, we want to carry out the
   conversation on the public blog.  So I am not aiming to provide
   any "from on high" perfect analysis, instead this essay will contain
   a lot of questions and suggestions, and we will then move into the
   comments to go forward.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Disclosure:&lt;/b&gt;  None.  We are not paying each other anything, nor have
   I received any merchandise that would normally carry a licensing fee.
&lt;/p&gt;
  
&lt;p&gt;Today's essay is the very first in the
   &lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html#user"
   &gt;User-Submitted Anlaysis Requests&lt;/a&gt; series.  If you would like to see an analysis
   of your schema, follow that link and contact me.
&lt;/p&gt;

&lt;p&gt;This blog has a &lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Complete Table of Contents&lt;/a&gt; and a list
   of &lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Brief Description and Starting Point&lt;/h2&gt;

&lt;p&gt;To get us started, I am going to quote the &lt;a href="http://www.infovark.com/product/"
   &gt;Infovark Product Page&lt;/a&gt;, and then we will see what we want to zoom in on:

&lt;p style="padding: 10px 20px 10px 20px"&gt;Infovark automatically collects and catalogs your files and email. It consolidates your digital life into a personal wiki based on what it finds. Once you set Infovark to work, it will monitor your computer and keep your web site up-to-date&lt;/p&gt;

&lt;p&gt;So we know even before we see anything technical that we are going to have 
   tables of contacts, emails, phones, addresses, appointments and many other
   things pulled in from email systems, plus the value-add provided by the
   product.
&lt;/p&gt;

&lt;h2&gt;The Schema As-Is&lt;/h2&gt;

&lt;p&gt;We are going to start by looking at how the details of a CONTACT
   are stored.  The schema models contacts with a group of
   cross references, aka many-to-many relationships, like so:
&lt;/p&gt;

&lt;pre&gt;
CONTACTS +----- CONTACTS-X-EMAILS -------- EMAILADDRESSES
         |
         +----- CONTACTS-X-PHONES -------- PHONES
         |
         +----- CONTACTS-X-ADDRESSES ----- ADDRESSES
         |
         +----- CONTACTS-X-WEBADDRESSES--- WEBADDRESSES
&lt;/pre&gt;

&lt;p&gt;The first thing we have to note is that there is nothing wrong
   with this at all.  It is fully normalized and so it will be
   very easy to make sure that database writes will not produce
   anomalies or bad data.  
&lt;/p&gt;

&lt;p&gt;But, not surprisingly, Mr. Thrasher notes this makes for complicated
   SELECTS, so we want to ask if 
   perhaps it is &lt;i&gt;over-normalized&lt;/i&gt;, are there complications
   in there that do not need to be there?
&lt;/p&gt;

&lt;h2&gt;Email as A Property of Contact&lt;/h2&gt;

&lt;p&gt;If I were to 
   &lt;a href="http://database-programmer.blogspot.com/2008/01/table-design-patterns.html"
   &gt;follow my own advice&lt;/a&gt;, I would first want to identify the
   master tables.  Master tables generally represent real things in the
   world: people, places, products, services, events. 
&lt;/p&gt;

&lt;p&gt;So my first question is this: is an email address a free-standing
   entity in its own right that deserves a master table?  Or is it
   instead a property of the CONTACT?  I am going to suggest that an
   email address is a property of a CONTACT, and, since a CONTACT
   may have more than one email address, they should be stored in
   a child table of the CONTACTS, more like this:
&lt;/p&gt;

&lt;pre&gt;
CONTACTS +----- &lt;font color="red"&gt;&lt;s&gt;CONTACTS-X-EMAILS -------- EMAILADDRESSES&lt;/s&gt;&lt;/font&gt;
         +----- &lt;font color="green"&gt;CONTACTEMAILADDRESSES&lt;/font&gt;
         |
         +----- CONTACTS-X-PHONES -------- PHONES
         |
         +----- CONTACTS-X-ADDRESSES ----- ADDRESSES
         |
         +----- CONTACTS-X-WEBADDRESSES--- WEBADDRESSES
&lt;/pre&gt;

&lt;p&gt;Whether or not we make this switch depends not on
   technical arguments about keys or data types, but on 
   &lt;i&gt;whether this accurately models reality&lt;/i&gt;.  If in fact
   email addresses are simply properties of contacts, then
   this is the simplest way to do it.  Going further, the
   code that imports and reads the data will be easier to
   code, debug and maintain for two reasons: one, because 
   it is simpler, but more importantly, two, because it
   accurately models reality and therefore will be easier
   to think about.
&lt;/p&gt;

&lt;p&gt;If this proves to be the right way to go, it may be
   a one-off improvement, or it may repeat itself for
   Phones, Addresses, and Web Addresses, but we will take
   that up in the next post in the series.
&lt;/p&gt;

&lt;p&gt;I am going to proceed as if this change is correct, and
   ask then how it will ripple through the rest of the
   system.
&lt;/p&gt;

&lt;h2&gt;Some Specifics on the Email Addresses Table&lt;/h2&gt;

&lt;p&gt;The EMAILADDRESSES table currently has these columns:
&lt;/p&gt;

&lt;pre&gt;-- SQL Flavor is Firebird
CREATE TABLE EMAILADDRESS (
  ID           INTEGER NOT NULL,
  USERNAME     VARCHAR(64) NOT NULL COLLATE UNICODE_CI,
  HOSTNAME     VARCHAR(255) NOT NULL COLLATE UNICODE_CI,
  DISPLAYNAME  VARCHAR(255) NOT NULL
);

ALTER TABLE EMAILADDRESS
  ADD CONSTRAINT PK_EMAILADDRESS
  PRIMARY KEY (ID);

CREATE TRIGGER BI_EMAILADDRESS FOR EMAILADDRESS
ACTIVE BEFORE INSERT POSITION 0
AS
BEGIN
  IF (NEW.ID IS NULL) THEN
  NEW.ID = GEN_ID(GEN_EMAILADDRESS_ID,1);
END^&lt;/pre&gt;

&lt;p&gt;&lt;b&gt;Suggestion:&lt;/b&gt;  The first thing I notice is that the
   complete email itself is not actually stored.  So we need to
   ask Mr. Thrasher what the thinking behind that was.  My first
   instinct is to store that, because it is the original natural
   value of interest.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Suggestion:&lt;/b&gt;  The columns USERNAME and HOSTNAME I could
   go either way on.  If they are needed for querying and statistics,
   it is better to put them in.  While this violates 3rd Normal Form and
   so puts us at risk, the values are supplied AFAIK by a batch import,
   and so there is only one codepath populating them, and we are likely safe.
   However, if we DO NOT need to query these values for statistics,
   and  they are only there for convenience at display time, I would
   likely remove them and generate them on-the-fly in application
   code.  There are some other good reasons to do this that will
   come up a little further along.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Suggestion:&lt;/b&gt;  Unless I missed something in the schema
   sent over, we need a unique constraint on the combination of
   CONTACTID and USERNAME and HOSTNAME.  Or, if we remove
   USERNAME and HOSTNAME in favor of the original EMAILADDRESS,
   we need a unique constraint on CONTACTID + EMAILADDRESS.
&lt;/p&gt;

&lt;h2&gt;Before We Get To Transactions&lt;/h2&gt;

&lt;p&gt;We are about to go into Part 2, which is about the other
   tables that reference EMAILADDRESSES, but before we do
   let's look at what the two tables would be if we made all
   changes suggested so far:
&lt;/p&gt;

&lt;pre&gt;
 CONTACTS             EMAILADDRESSES 
------------         --------------------
                      ID            (surrogate key)
 CONTACT_ID --------&amp;amp; CONTACT_ID
 other columns...     EMAILADDRESS  
                      LABEL
                      USERNAME      (possibly removed)
                      HOSTNAME      (possibly removed)
                      DISPLAYNAME
&lt;/pre&gt;                               

&lt;p&gt;You may notice the LABEL column showed up out of nowhere.
   That column was previously in the cross-reference.  When
   the cross-reference went away it landed in EMAILADDRESSES.
   That column LABEL holds values like "work", "home" and
   so on.  It is supplied from whatever system we pull 
   emails from, and so we have no constraints on it or
   rules about it.
&lt;/p&gt;

&lt;h2&gt;Changing Emails And Transactions&lt;/h2&gt;

&lt;p&gt;Now we move on from the basic storage of EMAIL addresses
   to the other tables that reference those addresses.
   These are things like emails themselves with their lists
   people sent to/from, and meetings, and presumably other
   types of transactions as well.
&lt;/p&gt;

&lt;p&gt;When we look at transactions, which will reference 
   contacts and email addresses, we also have to consider
   the fact that a CONTACT may change their email address
   over time.  Consider a person working for Vendor A, who
   moves over to Vendor B.  For some of the transactions
   they will have been at Vendor A, and then going forward
   they are all at Vendor B.  This leads to this very
   important question:
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Do Transactions store details about the CONTACTS as
   they were at the time of the transaction, or as they
   are now?&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;In other words, if a CONTACT moves from one company to
   another, and you look at a meeting with that person
   from last year, should it link to where they are now?
   Or should it be full of information about where they
   were at the time?
&lt;/p&gt;   
   
&lt;p&gt;The answer to this question is important because it
   determines how to proceed on the two final points I
   would like to raise:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Should the various transactions have a foreign
    key back to EMAILADDRESSES, or should they simply
    link back to CONTACTS and contain the EMAILADDRESS
    itself?
    &lt;li&gt;Do we need an integer surrogate key on the
    EMAILADDRESSES table, especially if we do not link
    back to it?
&lt;/ol&gt;  

&lt;h2&gt;First Final Suggestion&lt;/h2&gt;

&lt;p&gt;So the first of the final two suggestions is: maybe
   the transactions tables should just link back to CONTACTID
   and contain a freestanding EMAILADDRESS.  The first argument for
   this is that it preserves the history as it was, and
   if that is what we want, then this accomplishes it.
   The second argument is that by putting the actual value
   instead of an integer key back to some table, we 
   simplify coding by removing a join.
&lt;/p&gt;

&lt;p&gt;The arguments against embedding the email address might
   be basically, "hey, if this is a kind of a data warehoues,
   you are really supposed to be doing the snowflake thing and
   you don't want to waste space on that value."  To which I
   respond that the engineer always has the choice of trading
   space for speed.  Putting the email in directly is a correct
   recording of a fact, and takes more space, but eliminates
   a very common JOIN from many queries, so Mr. Thrasher may
   choose to make that call.
&lt;/p&gt;

&lt;p&gt;This also plays back to my question about whether we should
   have USERNAME and HOSTNAME in the EMAILADDRESSES table.
   If we start putting email addresses directly into tables,
   we can also keep putting these other two columns in, which
   trades again space for speed.  We could also skip them and
   code a parser in the application that generates them 
   on-the-fly as needed.
&lt;/p&gt;

&lt;h2&gt;Second Final Suggestion&lt;/h2&gt;

&lt;p&gt;Now we go all of the way back to the child table
   and ask a basic question: Why is there is an integer
   surrogate key there?  Integer surrogate keys are 
   useful in many situations, but contrary to what the
   web generation learned, they are not some kind of
   required approach in relational databases.
&lt;/p&gt;

&lt;p&gt;Consider: we need a unique constraint on CONTACTID+EMAILADDRESS
   anyway, so we have to justify why we would add a new
   column that does not add value.  The reflex answer tends
   to be "because they join faster" but that ignores the fact
   that if you use the natural key of CONTACTID+EMAILADDRESS,
   and put these columns into child tables, &lt;i&gt;you do not need
   to join at all!&lt;/i&gt;  If we use the surrogate key and embed
   it in child tables, then getting the CONCTACT information
   forces two joins: through EMAILADDRESS to CONTACTS.  But if
   we use the natural key of CONTACTID + EMAILADDRESS &lt;i&gt;we
   already have the contact id&lt;/i&gt; which saves a JOIN when we
   are after CONTACTS details, and, unless we want to know
   something like LABEL, &lt;i&gt;we do not have to JOIN back to
   EMAILADDRESSES at all&lt;/i&gt;.
   
&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Well that's it.  As promised, we have a few suggestions
   and a lot of questions for Mr. Thrasher.  Check back in the
   coming days to see how the various questions work themselves
   out in the comments.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-105520872492947858?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/105520872492947858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=105520872492947858' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/105520872492947858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/105520872492947858'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/user-submitted-analysis-topic-email.html' title='User-Submitted Analysis Topic: Email'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6904346884395834551</id><published>2010-12-16T23:24:00.003-05:00</published><updated>2010-12-21T22:52:08.094-05:00</updated><title type='text'>Critical Analysis of an Algorithm: Sproc, Embedded SQL, and ORM</title><content type='html'>&lt;p&gt;This is a follow-up to yesterday's 
   &lt;a href="http://database-programmer.blogspot.com/2010/12/historical-perspective-of-orm-and.html"
   &gt;historical perspective
   on ORM.&lt;/a&gt;  In this essay we examine a particular class of
   business logic and ask what happens if we go server-side,
   embedded SQL, or ORM.
&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Complete Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Processes&lt;/h2&gt;

&lt;p&gt;We are going to look at a process.  The term is not
   well defined, but my own working definition is any
   operation that has as many of the following properties
   as I seem to think are important at the time I make
   the decision:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Is categorically not CRUD: requires much more
    than displaying data to user or sending a single-row
    operation to the server
    &lt;li&gt;Involves reading or writing many rows
    &lt;li&gt;Involves reading or writing from multiple tables
    &lt;li&gt;Involves multiple passes of the same data
    &lt;li&gt;Involves no user interaction while executing
    &lt;li&gt;If not coded to be idempotent can cause huge
    headaches
    &lt;li&gt;Will likely take longer than a user is willing
    to wait (these days 3-10 seconds) and so runs in the
    background.
    &lt;li&gt;Depends upon rules tables that control its behavior
&lt;/ul&gt;    
    
&lt;h2&gt;A Particular Process: Magazine Regulation&lt;/h2&gt;

&lt;p&gt;I have a more complete description of this problem
   &lt;a href="http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"&gt;here&lt;/a&gt;, so this is going to be very
   short.  The system handles magazine deliveries to
   stores.  The shop running the system has thousands of stores
   and thousands of magazines.  Every store has a 
   &lt;i&gt;default quantity&lt;/i&gt; of the particular magazines they
   carry.
   For one particular magazine, &lt;b&gt;NewsTime&lt;/b&gt;,
   there are 1000 stores that get an average default
   quantity of 50, requiring 50,000 magazines each weak.
&lt;/p&gt;

&lt;p&gt;Here is the twist.  You never get exactly 50,000, no
   matter what you tell the distributor.  Some weeks you
   get 45,000, others you get 55,000, with any variation
   in between.  So the &lt;b&gt;&lt;i&gt;Regulation
   Process&lt;/i&gt;&lt;/b&gt; adjusts the defaults for each store
   until the delivery amounts equal the on-hand total that
   was delivered on the truck.
&lt;/p&gt;

&lt;h2&gt;The Naive or Simple Algorithm&lt;/h2&gt;

&lt;p&gt;In the first pass, we are going to consider an
   unrealistically simple version of Magazine Regulation,
   where we have too many magazines and must up the
   quantities until we're giving out the entire amount on-hand.
&lt;/p&gt;

&lt;p&gt;Assume a table has already been populated that has
   the default quantities for each store, where the relevant
   columns for the moment would be these:
&lt;/p&gt;

&lt;pre&gt;
 StoreId   |  MagazineId   |  QTY_DEFAULT | QTY_REGULATED 
-----------+---------------+--------------+---------------
    1      |      50       |      75      |      0        
    2      |      50       |      23      |      0        
    4      |      50       |      48      |      0        
   10      |      50       |      19      |      0        
   17      |      50       |     110      |      0        
   21      |      50       |      82      |      0        
&lt;/pre&gt;
    
&lt;p&gt;We are told only that the increases must be evenly
   distributed, we can't just ship 5000 extra magazines to
   a single store.  That makes sense. A simple algorithm to do this would be:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Give each store one additional magazine until you
    run out of magazines or run out of stores.
    &lt;li&gt;Repeat step 1 until you run out of magazines.
&lt;/ol&gt;

&lt;h2&gt;The Pseudo-Code&lt;/h2&gt;

&lt;p&gt;Let's mock something up in pseudo-code that
   shows the structure of the solution:
&lt;/p&gt;

&lt;pre&gt;
magsToDeliver = get total magazines...
magsDefault   = get total of defaults...

-- Outer loop implements rule 2:
-- "repeat until you run out of magazines"
while magsToDeliver &gt; magsDefault {

   -- Inner loop implements rule 1:
   -- "increase each store by 1 until you
   --  run out of stores or run out of magazines"
   for each store getting this magazine {
        if magsToDeliver &lt;= magsDefault break
   
        -- If you want to go nuts, and even allow
        -- the accidental simultaneous execution
        -- of two routines doing the same thing, 
        -- put these lines here instead
        magsToDeliver = get total magazines...
        magsDefault   = get total of defaults...
   
        -- This is the actual job getting done
        qty_regulate  +=1
        magsToDeliver -=1
   }
}
&lt;/pre&gt;

&lt;h2&gt;The Three Methods&lt;/h2&gt;

&lt;p&gt;Let's characterize what happens with our three
   choices of stored procedure, embedded SQL, or
   ORM.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Stored Procedure.&lt;/b&gt;  
   Likely the fastest
   solution, considering all of that row-by-row
   going on.  If that were in app code (ORM or not) we would
   be making two 
   &lt;a href="http://database-programmer.blogspot.com/2010/12/cost-of-round-trips-to-server.html"
   &gt;round trips&lt;/a&gt; to the server per iteration.
&lt;/p&gt;

&lt;p&gt;The really crazy thing about the stored procedure
   solution is that it is &lt;i&gt;utterly neutral to the
   ORM question&lt;/i&gt;.  The entire ORM good-bad debate
   dissolves because there is no OOP code involved.
   So this could be the magic solution that ORM lovers
   and haters could both agree upon.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;App Code With Embedded SQL (no ORM).&lt;/b&gt;  Just about
   everybody hates this idea these days, but it should
   be here for completeness, and because there are some
   advantages.  The top of the pseudo-code requires to
   aggregate pulls, and if you are not afraid of SQL you
   can pull down the result in one pass, instead of 
   iterating on the client.  Further, the innermost operation
   can be coded in SQL as a "UPDATE deliveries from (SELECT
   TOP 1 deliverid From deliveries...)" so that you get only
   one round trip per iteration, where ORM will cost two.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Any Kind of ORM.&lt;/b&gt;  By this I mean the code will
   contain no SQL, and the innermost loop will likely
   instantiate some "delivery" objects, one after another,
   increment their qty_regulated property by one, and flush them out.
   This is twice as expensive as embedded SQL because you
   have to fetch the row from the database and then
   write it back, where the embedded SQL can issue a single
   command that locates and updates the row in a single
   statement.
&lt;/p&gt;

&lt;p&gt;Some may argue that I misunderstand ORM
   here, in that the library may be smart enough to allow
   the write without the read, and &lt;i&gt;without forcing you
   to embed SQL&lt;/i&gt;.  It would have to be something like
   A) instantiate empty object with key, B) assign value
   as an expression, like  "+=1", C) save.
   I welcome any such examples and will
   update the post accordingly if any are forthcoming.
   I am assuming that no ORM tool I have seen can do this
   and would be happy to be corrected.
&lt;/p&gt;

&lt;p&gt;If the ORM forces us to calculate the initial sum
   of QTY_Default by fetching each row as an object and summing
   them in the app, we get an extra complete set of round
   trips.   Bummer.  But if we say, "Hey my ORM tool lets
   me embed SQL in *emergencies*" then perhaps we can embed
   a query with an aggregrate and skip that cost.  But
   oops, we've got some embedded SQL.  Leaky abstraction.
&lt;/p&gt;


&lt;h2&gt;The Score So Far&lt;/h2&gt;

&lt;p&gt;So how does it look so far?  All methods have 
   the same number of reads and writes to disk, 
   so we are scoring
   them on round trips.  If "X" is the number
   of extra magazines to be distributed, and "Y" is
   the number of stores getting the magazine, we have
   for round trips:
&lt;/p&gt;   

&lt;ul&gt;&lt;li&gt;&lt;b&gt;Stored Procedure:&lt;/b&gt; 1
    &lt;li&gt;&lt;b&gt;Embedded SQL:&lt;/b&gt; X + 1  (the first pull plus one trip per
    extra copy of the magazine)
    &lt;li&gt;&lt;b&gt;ORM, Hypothetical:&lt;/b&gt;X + 1 (if the ORM tool can figure out how
    to do an update without first reading the row to the app)
    &lt;li&gt;&lt;b&gt;ORM, Best Case:&lt;/b&gt; 2X + 1 (if the first pull can be an aggregrate
        without embedding SQL, and two round trips per iteration)
    &lt;li&gt;&lt;b&gt;ORM, Worst Case:&lt;/b&gt;2X + Y (if the first pull must aggregate
    app-side and there are two round trips per iteration)
    
&lt;/ul&gt;

&lt;p&gt;&lt;b&gt;Update: if you want a laugh, check out the image on the
   &lt;a href="http://en.wikipedia.org/wiki/Business_logic"&gt;Wikipedia page for "Business Logic"&lt;/a&gt;, it depicts
   aggregation occuring on the client side.&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;This gives us the shape of the engineering decision.
   With all options reading and updating the same number of
   rows, it call comes down to round trips.  As soon as you
   go client side your round trips go way up, and if your
   ORM tool does not support Update without Select, then
   it doubles from there.
&lt;/p&gt;

&lt;p&gt;Now multiply this across the entire application,
   every single action in code, every bit of "business
   logic" with any kind of loop that iterates over
   rows.
&lt;/p&gt;
 
&lt;h2&gt;It Gets Worse/Better: Considering SQL Possibilities&lt;/h2&gt;

&lt;p&gt;If you happen to know much about modern SQL,
   you may be aware of the amazingly cool SQL RANK()
   function.  If this function is used in the sproc
   or embedded SQL approaches, you can execute the
   algorithm with only one loop, in a maximum of
   &lt;b&gt;N=CEILING((Delivered-Regulated)/Stores)&lt;/b&gt;
   iterations.  This will go much faster than the
   row-by-row, and now those two options are pulling
   even further ahead of the naive row-by-row methods
   encouraged by an ORM tool.
&lt;/p&gt;

&lt;p&gt;This ability of SQL will become extremely
   important, as we are about to blow apart the
   simplicity of the algorithm.
&lt;/p&gt;

&lt;h2&gt;We Now Return You To The Real World&lt;/h2&gt;

&lt;p&gt;I have never been paid good money to write an
   algorithm as simple as the one described above.
   This is because mature businesses have always
   refined these simple methods for years or decades,
   and so the real situation is always more complex.
&lt;/p&gt;

&lt;p&gt;In a real magazine regulation algorithm, the rules
   tend to be more like this:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Apply all of these rules whether you are
    increasing or decreasing the amounts to deliver
    &lt;li&gt;Stop applying the rules when delivery amounts
    have been balanced to what we have on hand, no matter
    where you are in the process
    &lt;li&gt;Always increase/decrease any particular store
    by exactly one on each iteration, no matter which rule
    you are working on
    &lt;li&gt;Never decrease any store below 2
    &lt;li&gt;Decrease any store whose past 3 issues sold less
    than 60% by 1, unless this would project their sales
    of this issue above 60%, and prioritize
    by prior sales % ascending.
    &lt;li&gt;If the previous step completes, and we are
    &lt;b&gt;short of magazines&lt;/b&gt; decrease each store
    by 1 by order of previous sales percents
    ascending.  Repeat until we are in balance.
    &lt;li&gt;If all stores are reduced to 2 and we are
    still short, abort and write error to log.
    &lt;li&gt;If after the decreases we have &lt;b&gt;excess magazines&lt;/b&gt;,
    increase any store whose past 3 issues sold more than
    70% by 1, unless this would reduce their projected
    sales of this issue below 70%, and prioritize by
    prior sales % descending (so the stores with the most
    sales are handled first in case we don't get to all of them)
    &lt;li&gt;If the previous step completes, and we are
    &lt;b&gt;still in excess&lt;/b&gt;, increase each store by 1 in order
    of previous sales percents descending.  Repeat until
    we are in balance.
&lt;/ol&gt;

&lt;p&gt;This can also get doubled again if we must implement one
   set of rules when we start out with a too few magazines,
   and another set of rules when we start out with too many.
&lt;/p&gt;

&lt;p&gt;Well, it's not that hard.  It actually comes down to
   having four outer loops in succession.  The percent-based
   reduction, then the by 1 reduction, then the percent-based
   increase, then the by 1 increase.
&lt;/p&gt;

&lt;p&gt;But technically the more important matters are these:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;We now have to
    grab the sales % for every store for this magazine
    on their past 3 issues and keep it handy throughout
    the routine.
    
    &lt;li&gt;The rules stated above contain &lt;b&gt;constants&lt;/b&gt;
    like 70%, 60%.  These would properly be in some 
    parameter table to allow the user to control them,
    so those have to be fetched.
    
    &lt;li&gt;The loop through the stores is now much different,
    as we are &lt;i&gt;filtering&lt;/i&gt; on prior sales percent for
    the percent-based passes, and &lt;i&gt;filtering and ordering&lt;/i&gt;
    on prior sales percent for the by 1 passes.
&lt;/ul&gt;

&lt;h2&gt;Revisiting the Three Approaches&lt;/h2&gt;

&lt;p&gt;Now let's see how our three approaches would change.
&lt;/p&gt;
    
&lt;p&gt;&lt;b&gt;The Improved Stored Procedure.&lt;/b&gt;  If we change the
   sproc to use RANK() and make batch updates, we would
   pull the prior sales percents into a temp table and
   apply a &lt;a href="http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx"
   &gt;covering index&lt;/a&gt; to cut the reads from that table in
   half.  Our main loop would then simply join to this
   temp table and use it for both filtering and ordering.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;The Embedded SQL.&lt;/b&gt;   If we also changed the 
   embedded SQL so it was making batch updates with
   RANK(), we would also generate a temp table.  This
   option remains the same as the sproc except for where
   we put the SQL.  However, it now has far fewer 
   round trips, and starts to look much more like the
   sproc in terms of performance.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;The ORM Approach.&lt;/b&gt;  The idea here would be to
   get those prior sales percents down into an ordered
   collection and then use them as
   the basis for the loops.  The thorny part is that they
   must be aggregated and sorted.  If we want to avoid
   all embedded SQL, then the aggregation
   can be done client-side if we don't mind pulling down
   3 times as many rows as are required.  The sorting we
   can pull off if we put the objects into a 
   collection such as an associative array, where the key
   is the sales percent, then we can use &lt;b&gt;[language of choice]&lt;/b&gt;'s
   built-in sorting (hopefully), and we have escaped the
   dread evil of embedded SQL.
&lt;/p&gt;

&lt;p&gt;So we end up where we were, only more so.  The sproc 
   remains the fastest, and if we know how to code set-oriented
   nifty stuff with RANK() then the embedded SQL will run
   in almost the exact same time.  The ORM requires 
   most likely even more round trips and expensive 
   app-side operations that are performed much more efficiently
   in the db server, unless we are willing to break the
   abstraction and embed a bit of SQL.
&lt;/p&gt;

&lt;p&gt;But in the end, if all of that cost of the ORM kicks
   a 3 second routine to 7 seconds, still well below what
   any user would notice, and you avoid 
   embedded SQL, and it lets you keep your paradigm,
   who am I to judge?
&lt;/p&gt;
   
&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;I offer none.  There are so many conventions in play
   regarding where to put your code, what tool you are
   already using, and so forth, that it is really up to
   the reader to draw conclusions.  I only hope there is
   enough information here to do so.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6904346884395834551?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6904346884395834551/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6904346884395834551' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6904346884395834551'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6904346884395834551'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/critical-analysis-of-algorithm-sproc.html' title='Critical Analysis of an Algorithm: Sproc, Embedded SQL, and ORM'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-489689270196685281</id><published>2010-12-15T21:55:00.007-05:00</published><updated>2011-01-03T12:23:44.009-05:00</updated><title type='text'>Historical Perspective of ORM and Alternatives</title><content type='html'>&lt;p&gt;A couple of years ago I broke my basic rule of sticking
   to practical how-to and general programming philosophy
   and wrote &lt;a href="http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"
   &gt;Why I Do Not Use ORM&lt;/a&gt;.  It sure got a lot of hits,
   and is read every day
   by people searching such things as "orm bad" or "why use orm".
   But I have never been
   satisfied with that post, and so I decided to take
   another stab from another angle.  There are legitimate
   problems that led to ORM, and those problems need to
   be looked at even if we cannot quite agree on what they
   are or if ORM is the answer.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;UPDATE: In response to comments below and on reddit.com,
   I have a &lt;a href="http://database-programmer.blogspot.com/2010/12/critical-analysis-of-algorithm-sproc.html"&gt;new post&lt;/a&gt; that gives a detailed
   analysis of an algorithm implemented as a sproc, in app
   code with embedded SQL, and in ORM.&lt;/b&gt;
&lt;/p&gt;



&lt;p&gt;Here then, is one man's short history of commercial 
   database application programming, from long before
   the ORM system, right up to the present.
&lt;/p&gt;
   
&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-complete-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;The Way Back Machine&lt;/h2&gt;

&lt;p&gt;When I began my career the world was a different place.
   No Web, no Java, and Object Orientation had not yet 
   entered the mainstream.  My first 
   application was written on a timeshare system (a microVAX)
   and writing LAN applications made me a good living for
   awhile before I graduated to client/server.  
&lt;/p&gt;

&lt;p&gt;In those days there were three things a programmer
   (We were not "software engineers" yet, just
   programmers) had to know.  Every programmer I knew
   wanted to master all of these skills.  They were:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;How to design a database schema for correctness
    and efficiency.
    &lt;li&gt;How to code an application that could process
    data from the database, correctly and efficiently.
    &lt;li&gt;How to make a good UI, which came down to 
    hotkeys and stuffing the screen with as much info
    as possible.
&lt;/ul&gt;

&lt;p&gt;In this essay we are going to look at those first two.&lt;/p&gt;

&lt;p&gt;My own experience may be somewhat peculiar in that I
   have never worked on a team where the programmers
   were separated from the database. &lt;i style="color:#555"&gt;(OK, one exception, in
   my current assignment there is an iron curtain between
   the two, but happily it is not my problem from where
   I sit).&lt;/i&gt;  Coders made tables, and "tablers" wrote
   code.  So this focus on being a good developer by 
   developing both skills may be rare, enjoyed by those who
   have the same ecumenical background that I enjoyed.
&lt;/p&gt;

&lt;h2&gt;Some Changes That Did Not Matter&lt;/h2&gt;

&lt;p&gt;Things changed rapidly, but most of those changes
   did not really affect application development.
&lt;/p&gt;

&lt;p&gt;When Windows 95 came out, being "almost as good as
   a Mac", we recoded our DOS apps into Windows apps
   without too much trouble and life went on as before.
&lt;/p&gt;

&lt;p&gt;Laser printers replaced dot-matrix for most office use,
   CPUs kept getting faster (and Windows kept getting
   slower), each year there were more colors on the
   screen, disks got bigger and RAM got cheaper.
&lt;/p&gt;

&lt;p&gt;Only the internet and the new &lt;i&gt;stateless programming&lt;/i&gt;
   required any real adjustment, but it was easy for a
   database guy because good practice had always been to
   keep your transactions as short as possible.  The stateless
   thing just kind of tuned that to a sharp edge.
&lt;/p&gt;

&lt;p&gt;Finally, with the internet, the RDBMS finally lost its
   place as sole king of the datastore realm, but those new
   datastores will have to wait for another day, lest we
   get bogged down.
&lt;/p&gt;

&lt;h2&gt;Enter Object Orientation&lt;/h2&gt;

&lt;p&gt;Arguably nothing changed programming more than 
   Object Orientation.  Certainly not Windows 95, faster
   graphics or any of those other Moore's Law consequences.
   I would go so far as to say that even
   the explosion of the web just produced more programming,
   and of different kinds of apps, and even that did not
   come close to the impact of Object Orientation.
   Disagree if you like, but as it came in, it was
   new, it was strange, it was beautiful, and we were
   in love.
&lt;/p&gt;

&lt;p&gt;Now here is something you may not believe.  The biggest
   question for those of us already successfully developing
   large applications was: What is it good for?  What does
   it give me that I do not already have?  Sure its 
   beautiful, but &lt;i&gt;what does it do?&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;User interfaces were for me the easiest first place to
   see the benefits.  When the widgets became classes and objects,
   and we empolyed encapsulation, inheritance and
   composition, the world
   changed and I don't know anybody who ever looked back.
&lt;/p&gt;

&lt;h2&gt;OOP, Data, and Data Structures&lt;/h2&gt;

&lt;p&gt;But in the matter of processing data, things were not
   so clear cut.  The biggest reason may have been that
   all languages back then had &lt;i&gt;specialized data structures&lt;/i&gt;
   that were highly tuned to handling relational data.
   These worked so well that nobody at first envisioned
   anything like &lt;a href="http://en.wikipedia.org/wiki/Active_record_pattern"
   &gt;ActiveRecord&lt;/a&gt; because
   we just did not need it.
&lt;/p&gt;

&lt;p&gt;With these structures you could write applications 
   that ran processes involving dozens of tables, lasting
   hours, and never wonder, "Gosh, how do I map this data
   to my language of choice?"  You chose the language you
   were using &lt;i&gt;precisely because it knew how to handle
   data!&lt;/i&gt;

&lt;p&gt;I would like to throw in just one example to show how
   OOP was not relevant to getting work done back then.
   I was once asked to optimize something called "ERP
   Allocation" that ran once/day, but was taking 26 hours
   at the largest customer site, obviously a big problem.
   It turned out there was a call to the database inside of
   a tightly nested loop, and when I moved the query outside
   of the loop the results were dramatic.  The programmers
   got the idea and they took over from there.  The main
   point of course is that it was all about how to
   efficiently use a database.  The language was OOP, and
   the code was in a class, but that had nothing to do
   with the problem or the solution.  Going further, 
   coding a process so data intensive as this one
   using ActiveRecord
   was prima facia absurd to anybody who knew about data
   and code.
&lt;/p&gt;

&lt;h2&gt;Java and the Languages of The Internet&lt;/h2&gt;

&lt;p&gt;But the web had another impact that was far
   more important than just switching to stateless
   programming.  This was the introduction
   of an entirely new family of languages that took
   over the application space, listed here in no
   particular order: Perl, PHP,
   Python, Ruby, and the king of them all: Java.
&lt;/p&gt;

&lt;p&gt;All of these languages have one thing in common
   that positively jumps out at a veteran: &lt;i&gt;a complete
   lack of data structures specialized for handling
   relational data.&lt;/i&gt;  
   So as these languages exploded in popularity
   with their dismal offerings in data handling, 
   the need to provide something better in that
   area became rapidly clear.
&lt;/p&gt;

&lt;p&gt;Java has a special role to play because it was
   pure OOP from the ground up.  Even the whitespace
   is an object!  The impact of Java is very important
   here because Object Orientation was now the One True
   Faith, and languages with a more
   flexible approach were gradually demoted
   to mere 'scripting' languages.  &lt;i style='color:#555'&gt;(
   Of course proponents will quickly point out that 1/12 of the
   world's population is now using a single application
   written in one of those 'scripting' languages).&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;So the explosion of languages without decent 
   data handling abilities, coupled with a rise in
   OOP-uber-alles thinking led us quite naturally to:
&lt;/p&gt;

&lt;h2&gt;The First Premise of ORM: The Design Mismatch&lt;/h2&gt;

&lt;p&gt;The first premise of ORM is that there is a design
   mismatch between OOP and Relational, which must resolved
   before any meaningful work can be done.
&lt;/p&gt;

&lt;p&gt;This view is easy to sympathize with, even if you
   disagree, when you consider the points raised in the
   above sections, that the languages in play lack any real
   specialized data structures, and that a certain
   exclusive truthiness to OOP has arisen that is blind
   to entire classes of solutions.
&lt;/p&gt;

&lt;p&gt;So we must grant the ORM crowd their first 
   premise, in modified form.  It is not that there
   is a design mismatch, it is that there is something
   missing, something that was in older systems that
   is just not there in the newer languages.  Granting
   that this missing feature is an actual mismatch
   requires a belief in the Exclusive Truth of OOP,
   which I do not grant.   OOP is like the computer
   itself, of which Commander Spock said, "Computers
   make excellent servants, but I have no wish to be
   servant to a computer."
&lt;/p&gt;

&lt;p&gt;But anyway, getting back to the story, the race
   was on to replace what had been lost, and to do it
   in an OOPy way.  
&lt;/p&gt;

&lt;h2&gt;The Second Premise of ORM: Persistence&lt;/h2&gt;

&lt;p&gt;Fast forward and we soon have an entire family
   of tools known as Object-Relational-Mappers,
   or ORM.  With them came an old idea: persistence.
&lt;/p&gt;

&lt;p&gt;The idea has always been around that databases
   exist to &lt;i&gt;persist&lt;/i&gt; the work of the programmer.
   I thought that myself when I was, oh, about 25 or
   so.  I learned fast that my view of reality was,
   *cough*, lacking,
   and that in fact there are two things
   that are truly real for a developer:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The users, who create the paycheck, and
    &lt;li&gt;The data, which those users seemed to think
    was supposed to be correct 100% of the time.
&lt;/ul&gt;

&lt;p&gt;From this perspective, the application code suddenly
   becomes a go-between, the necessary appliance that
   gets data from the db to the user (who creates the
   paycheck), and takes instructions back from the user
   and puts them in the database (correctly, thank you,
   and don't make the user wait).  No matter how 
   beautiful the code was, the user would only ever see
   the screen (or page nowadays) and you only heard about
   it if it was wrong.  Nobody cares about my code, nobody
   cares about yours.
&lt;/p&gt;

&lt;p&gt;However, in the ORM world the idea of a database as the
   &lt;i&gt;persistence&lt;/i&gt; layer now sits on a throne reserved for
   axiomatic truth.  Those who disagree with me on this 
   may say that I have the mistaken perspective of an outsider,
   to which I could say only that it is this very idea that
   keeps me an outsider.
&lt;/p&gt;

&lt;p&gt;But we should not paint the world with a broad brush.
   Chris Wong writes an excellent blog where he occassionally
   details how to respect the database while using Hibernate, in 
   &lt;a href="http://chriswongdevblog.blogspot.com/2010/12/oops-mangling-your-database-with.html"&gt;this post&lt;/a&gt;
   and &lt;a href="http://chriswongdevblog.blogspot.com/2010/10/beware-magic-flush.html"&gt;this post&lt;/a&gt;.&lt;/p&gt;
   
&lt;h2&gt;An Alternative World View&lt;/h2&gt;

&lt;p&gt;There are plenty of alternatives to ORM, but I would
   contend that they begin with a different world view.
   Good business recognizes the infinite value of the
   users as the generators of the Almighty Paycheck, and
   the database as the permanent record of a job well
   done.
&lt;/p&gt;

&lt;p&gt;This worldview forces us into a humble position with
   respect to our own application code, which is that it
   is little more than a waiter, carrying orders to the
   kitchen and food back to the patrons.  When we see it
   this way, the goal becomes to write code that can 
   efficiently get data back and forth.  A small handful
   of library routines can trap SQL injection, validate
   types, and ship data off to the database.  Another
   set can generate HTML, or, can simply pass JSON 
   data up to those nifty browser client libraries
   like &lt;a href="http://www.sencha.com/"&gt;ExtJS (now
   "Sencha" for some reason)&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;This covers a huge amount of what an application
   does, if you do not have much in the way of
   business logic.
&lt;/p&gt;
   
&lt;h2&gt;But How Do You Handle Business Logic?&lt;/h2&gt;

&lt;p&gt;I have an entire essay on this about half-written,
   but in short, it comes down to understanding what
   business logic really is.  &lt;b&gt;Update: &lt;a href=
   "http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html"&gt;This post is now available&lt;/a&gt;&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;The tables themselves are the bottom layer of
   business logic.  The table design itself implements
   the foundation for all of the business rules.
   This is why it is so important to get it right.
   The tables are organized using normalization to
   have a place for everything and everything in its
   place, and after that the application code mostly
   writes itself.
&lt;/p&gt;

&lt;p&gt;The application code then falls into two areas:
   value-add and no value-add.  There is no value-add
   when the application simply ships data off to the
   user or executes a user request to update the
   database.  Those kinds of things should be handled
   with the lightest possible library that gets the
   job done.
&lt;/p&gt;

&lt;p&gt;But the value-add stuff is different, where a 
   user's request requires lookups, possibly computations
   and so forth.  The problem here is that a naive 
   analysis of requirements (particulary the
   &lt;a href="http://database-programmer.blogspot.com/2008/02/false-patterns-such-as-reverse-foreign.html"&gt;transliteration error (Scroll down to "The
   Customer Does Not Design Tables)&lt;/a&gt;
   will tend to generate many cases of perceived need for
   value-add where a simpler design can reduce these
   cases to no value-add.  But even when the database has
   been simplified to pristine perfection, there are jobs
   that require loops, multiple passes and so forth,
   which must be made idempotent and robust, which 
   will always require some extra coding.  But if you know
   what you are doing, these always turn out to be the
   ERP Allocation example given above: they are a lot more
   about the data than the classes.
&lt;/p&gt;

&lt;p&gt;Another huge factor is where you come down on the
   normalization debate, particularly on the inclusion of
   derived values.  If you keep derived values out of the database,
   which is technically correct from a limited perspective,
   then suddenly the value-add code is much more important
   because &lt;i&gt;without it your data is incomplete&lt;/i&gt;.  If
   you elect to put derived values into your database than
   value-add code is only required &lt;i&gt;when writing to the
   database&lt;/i&gt;, so huge abstractions meant to handle any
   read/write situation are unnecessary.  (And of course,
   it is extremely important to &lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;
   Keep denormalized values correct&lt;/a&gt;
   ).
&lt;/p&gt;

&lt;h2&gt;And the Rest of It&lt;/h2&gt;

&lt;p&gt;This essay hardly covers the entirety of 
   making code and data work together.  You still have
   to synchronize schema changes to code, and I still
   think a data dictionary is the best &lt;a href="http://en.wikipedia.org/wiki/Don't_repeat_yourself"&gt;D-R-Y&lt;/a&gt; way to
   do that.  
&lt;/p&gt;

&lt;p&gt;I hope this essay shows something of why many programmers
   are so down on ORM, but much more importantly that there
   are coherent philosophies out there that begin with a
   different worldview and deliver what we were all doing
   before ORM and what we will all still be doing after
   ORM: delivering data back and forth between user and
   database.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-489689270196685281?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/489689270196685281/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=489689270196685281' title='23 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/489689270196685281'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/489689270196685281'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/historical-perspective-of-orm-and.html' title='Historical Perspective of ORM and Alternatives'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>23</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-5417860549199354385</id><published>2010-12-11T14:33:00.003-05:00</published><updated>2010-12-11T14:37:40.467-05:00</updated><title type='text'>The Cost of Round Trips To The Server</title><content type='html'>&lt;p&gt;A database is not much without the applications
   that connect to it, and one of the most important
   factors that affects the application's performance
   is how it retrieves data from queries.  In this essay
   we are going to see the effect of &lt;i&gt;round trips&lt;/i&gt;
   on application performance.
&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Complete Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Pulling 15,000 Rows&lt;/h2&gt;

&lt;p&gt;The test will pull 15,000 rows from a 
   table.  We do it three different ways and see
   which is faster and by how much.
&lt;/p&gt;

&lt;h2&gt;Getting a Lot of Rows&lt;/h2&gt;

&lt;p&gt;The script below creates a table and puts 1 million
   rows into it.  We want far more rows in the table than
   we will actually pull so that we can pull fresh rows
   on every pass through the test. It is deliberately crafted to spread
   out the adjacent values of the integer primary key.
   This is because, inasmuch as can control what is
   going on, we want 
   every single row to be on a different page, so that
   in all tests the cost of retrieving the row is roughly
   the same and we are measuring only the effect of our
   retrieval methods.
&lt;/p&gt;

&lt;p&gt;The script can be run without modification in pgAdmin3,
   and with slight mods on MS SQL Server.
&lt;/p&gt;

&lt;pre class="sql" name="code"&gt;create table test000 (
    intpk int primary key
   ,filler char(40)
)


--  BLOCK 1, first 5000 rows
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--
declare @x,@y;
set @x = 1;
set @y = string(40,40,1);
while @x &lt;= 5000 begin
    insert into test000 (intpk,filler)
    values ((@x-1)*200 +1,'@y');

    set @x = @x + 1;
end

-- BLOCK 2, put 5000 rows aside 
--
select  * into test000_temp from test000

-- BLOCK 3, Insert the 5000 rows 199 more
--          times to get 1million altogether
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--  
declare @x;
set @x = 1;
while @x &lt;= 199 begin
    insert into test000 (intpk,filler)
    select intpk+@x,filler from test000_temp;

    set @x = @x + 1;
end&lt;/pre&gt;

&lt;h2&gt;Test 1: The Naive Code&lt;/h2&gt;

&lt;p&gt;The simplest code is a straight loop that
   pulls 15,000 consecutive rows by sending 
   an explicit query for each one.  
&lt;/p&gt;

&lt;pre class="php" name="code"&gt;# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 1, Individual explicit fetches
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 1, using $x1 to $x2";
$timeBegin = microtime(true);
while ($x1++ &lt;= $x2) {
    $dbResult = pg_exec("select * from test000 where intpk=$x1");
    $row = pg_fetch_array($dbResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 1, elapsed time: ".$elapsed;
echo "\n";&lt;/pre&gt;

&lt;h2&gt;Test 2: Prepared Statements&lt;/h2&gt;

&lt;p&gt;The next command asks the server to prepare a 
   statement, but it still makes 15,000 round trips,
   executing the prepared statement with a new parameter
   each time.  The code looks like this:
&lt;/p&gt;

&lt;pre class="php" name="code"&gt;# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 2, Individual fetches with prepared statements
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 2, using $x1 to $x2";
$timeBegin = microtime(true);
$dbResult = pg_prepare("test000","select * from test000 where intpk=$1");
while ($x1++ &lt;= $x2) {
    $pqResult = pg_execute("test000",array($x1));
    $row = pg_fetch_all($pqResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 2, elapsed time: ".$elapsed;
echo "\n";&lt;/pre&gt;

&lt;h2&gt;Test 3: A single round trip&lt;/h2&gt;

&lt;p&gt;This time we issue a single command to retrieve
   15,000 rows, then we pull them all down in one
   shot.
&lt;/p&gt;

&lt;pre class="php" name="code"&gt;# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 3, One fetch, pull all rows
$timeBegin = microtime(true);
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 3, using $x1 to $x2";
$dbResult = pg_exec(
    "select * from test000 where intpk between $x1 and $x2"
);
$allRows = pg_fetch_all($dbResult);
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 3, elapsed time: ".$elapsed;
echo "\n";&lt;/pre&gt;

&lt;h2&gt;Results&lt;/h2&gt;

&lt;p&gt;I ran this five times in a row, and this is what I got:&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;&lt;td style="border-bottom:1px solid black;"&gt;Naive 15,000&lt;/td&gt;
      &lt;td style="border-bottom:1px solid black;"&gt;Prepared 15,000&lt;/td&gt;
      &lt;td style="border-bottom:1px solid black;"&gt;One Round Trip&lt;/td&gt;
  &lt;tr&gt;&lt;td&gt;~1.800 seconds&lt;/td&gt;
      &lt;td&gt;~1.150 seconds&lt;/td&gt;
      &lt;td&gt;~0.045 seconds&lt;/td&gt;
&lt;/table&gt;
  
&lt;p&gt;Compared to the naive example, the &lt;i&gt;set-oriented&lt;/i&gt;
   fetch of al 15,000 rows in a single shot &lt;b&gt;&lt;i&gt;ran
   40 times faster&lt;/i&gt;&lt;/b&gt;.  This is what set-oriented
   code does for an application.
&lt;/p&gt;

&lt;p&gt;While the prepared statement option ran faster than
   the naive option, the
   set oriented example still ran &lt;b&gt;&lt;i&gt;25 times faster
   than the repeated prepared statements&lt;/i&gt;&lt;/b&gt;.

&lt;p&gt;I also re-arranged the order of the tests, and
   the results were the same.
&lt;/p&gt;

&lt;h2&gt;Does Server or Language Matter?&lt;/h2&gt;

&lt;p&gt;So this test was done using PHP against PostgreSQL, 
   will other servers and client languages get different
   results?  Given the same hardware, a different client
   language or server is going to have a different spread
   but the shape will be the same.  Fetching all rows in
   a single shot beats the living frack out of round trips
   inside of loops in any client language against any
   server.
&lt;/p&gt;

&lt;h2&gt;Putting It Into Use&lt;/h2&gt;

&lt;p&gt;The most obvious conclusion is that any query
   returning more than 1 row should return all rows
   as a set.  The advantage is so stark with large
   row counts that it is worthwhile making this the
   default for our applications, unless we can find
   a very good reason not to.  So what would the
   objections be?
&lt;/p&gt;

&lt;p&gt;One objection might go something like, "&lt;font color="#333"&gt;Ken, I
   see the numbers, but I know my app very well and
   we never pull more than 10-20 rows in a pop.  I 
   cannot imagine how it would matter at 10-20 rows,
   and I do not want to recode.&lt;/font&gt;"  This makes sense
   so I ran a few more
   tests with 20 and 100 rows, and found that, on 
   my hardware, you need about 100 rows to see a
   difference.  At 20 rows all three are neck-in-neck
   and at 100 the set is pulling 4 times faster than
   the prepared statement and 6 times faster than the
   naive statement.  So the conclusion is not an
   absolute after all, some judgment is in order.
&lt;/p&gt;

&lt;p&gt;Another thing to consider is how many simultaneous
   reads and writes might be going on at any given
   time.  If your system is known to have 
   simultaneous transactions running regularly, then the
   complete fetch may be a good idea even if you do some
   tests for best-guess row count and the tests are inconclusive.
   The reason is that the test is a &lt;i&gt;single user case&lt;/i&gt;,
   but multiple &lt;i&gt;simultaneous&lt;/i&gt; users put a strain on
   the database, even when they are not accessing the same
   tables.  In this case we want the application to
   play the "good citizen" and get in and out as quickly 
   as possible to reduce strain on the server, which will
   improve the performance of the entire application, not
   just the portions optimized for complete fetches. 
&lt;/p&gt;

&lt;p&gt;Another objection might be, "&lt;font color="#333"&gt;Well, my code needs to 
   pull from multiple tables, so I cannot really do this.
   When we do -PROCESS-X- we go row by row and need to pull
   from multiple tables for each row.&lt;/font&gt;"  In this case
   you *definitely* need to go set oriented and pull all
   associated quantities down in a query with a JOIN or two.
   Consider this, if on your particular hardware the ratio
   of naive row-by-row to single fetch is 10, and you must
   pull from 2 other tables for each row, that means you are
   really running 30 times slower (ratio is 10 x 3 reads) 
   than you could be.
&lt;/p&gt;

&lt;h2&gt;A Final Note About PHP, Data Structures, and Frameworks&lt;/h2&gt;

&lt;p&gt;Back when dinosaurs ruled the Earth and there was
   no internet (outside of Universities, etc),
   the languages we used had specialized data structures
   that were tuned to database use.  Compared to those
   older systems the newer languages born on the
   internet are more or less starving for such a 
   data structure.
&lt;/p&gt;

&lt;p&gt;PHP gets by fairly well because its associative
   array can be used as a passive (non object-oriented)
   data structure that comes pretty close to what we had
   before.  
&lt;/p&gt;

&lt;p&gt;I bring this up because the choice of a language and
   its support for a "fetch all" operation obviously
   impacts how well the conclusions of this post can 
   be implemented.  If your mapping tool has an iterator
   that absolves you of all knowledge of what is going
   on under the hood, it may be worthwhile to see if it
   is doing a complete fetch or a row-by-row.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-5417860549199354385?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/5417860549199354385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=5417860549199354385' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/5417860549199354385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/5417860549199354385'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/cost-of-round-trips-to-server.html' title='The Cost of Round Trips To The Server'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6268639473191324831</id><published>2010-12-08T18:50:00.005-05:00</published><updated>2010-12-08T21:23:36.053-05:00</updated><title type='text'>Submit Analysis Request to the Database Programmer</title><content type='html'>&lt;p&gt;
I generally do not reveal too many details about systems I design for customers
or employers.  This leaves me sometimes in a bind for example material.  I either
have to simplify it beyond what I would like, or make something up that I have
not actually put into Production. 
&lt;/p&gt;

&lt;p&gt;On top of that, one of the key themes of this blog is that table design
   is a crucial skill, and if the examples I give do not match what you
   are doing, they may be hard to make use of.
&lt;/p&gt;

&lt;p&gt;So I would like invite analysis requests.  Go over to the
   &lt;a href="http://database-programmer.blogspot.com/p/contact-author.html"
   &gt;Contact the Author&lt;/a&gt; page and drop me an email and tell me
   about the system you are trying to design or optimize.
&lt;/p&gt;

&lt;p&gt;There are no rules on the type of system.  
&lt;/p&gt;

&lt;p&gt;The most interesting mini-projects would be those where advice you
   have been given elsewhere (or here for that matter) does not seem
   to fit.
&lt;/p&gt;

&lt;p&gt;I will do my best to reply, even if I have to say no, so that
   nobody is left wondering.
&lt;/p&gt;

&lt;p&gt;Remember this blog is one of those hobby/professional things,
   good for all of us but nobody is getting paid, so if you are in
   a terrible hurry this might not be the best thing.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6268639473191324831?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6268639473191324831/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6268639473191324831' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6268639473191324831'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6268639473191324831'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/submit-analysis-request-to-database.html' title='Submit Analysis Request to the Database Programmer'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-2771898978072480219</id><published>2010-12-02T21:36:00.002-05:00</published><updated>2010-12-02T21:42:04.422-05:00</updated><title type='text'>A Case When Table Design is Easy and Predictable</title><content type='html'>&lt;p&gt;Good table design is a great foundation for a successful
   application stack.  Table design patterns basically resolve
   into master tables and transaction tables.  When we know
   a thing or two about the master tables (or entities if you
   prefer), we can infer a great deal about the transactions.
&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;A Time Billing System&lt;/h2&gt;

&lt;p&gt;Imagine we have been asked to recode the company's 
   time-billing system.  Because this is for the company
   we work for, we have some inside knowledge about how
   things work.  We know that:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;There are, of course, customers.
    &lt;li&gt;....and employees who record time
    &lt;li&gt;Each hour we record goes against a Work Order
    &lt;li&gt;There are different kinds of jobs, like
        project management, programming, programming
        management, and others.
&lt;/ul&gt;

&lt;p&gt;Knowing only this, is it
   possible to anticipate what the system will look like?
   A safe answer is "no", on the claim that we will 
   undoubtedly learn more, but this safe answer happens
   to be wrong.  We can in fact anticipate
   the overall shape of the system, and new information
   will shift details, but it will not change the shape.
&lt;/p&gt;

&lt;p&gt;We can anticipate the nature of the transactions
   if we determine the &lt;i&gt;upper bound of complexity&lt;/i&gt;
   and the &lt;i&gt;combinatorial completeness&lt;/i&gt; of the
   system.
&lt;/p&gt;

&lt;h2&gt;The Upper Bound of Complexity&lt;/h2&gt;

&lt;p&gt;We can safely assume that the big number to get
   right is going to be the billing rate.  Our employer
   assumes we will get everything else right, but the
   billing rate is going to have them chewing their fingernails
   until they know we understand it and have coded it 
   correctly.
&lt;/p&gt;

&lt;p&gt;The cool thing is that we already have enough information
   to establish an &lt;i&gt;upper bound on the complexity&lt;/i&gt; of
   the system by looking at the master tables, where a master table
   is generally one that lists details about real things
   like people, places, things, or activities.
   So far we know (or think we know) about three master tables:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Customers
    &lt;li&gt;Employees
    &lt;li&gt;Services
&lt;/ul&gt;

&lt;p&gt;Now we define the upper bound of complexity as:&lt;/p&gt;

&lt;p style="margin: 10px; border: 2px solid black; 
          padding: 8px;
          background-color: lightgreen;
          font-weight: bolder"&gt;The upper bound of complexity
   occurs when the billing rate is determined by all three
   master entities.
&lt;/p&gt;

&lt;p&gt;In plain English, calculating a billing rate can be as
   complicated as looking up a rate specific to a customer
   for a service for an employee &lt;i&gt;but cannot be more 
   complex than that&lt;/i&gt; because there are no other entities
   with which to work.
&lt;/p&gt;

&lt;h2&gt;Combinatorially Complete&lt;/h2&gt;

&lt;p&gt;We can also anticipate all possible calculations for
   the billing rate by working through the complete set
   of combinations of master entities.  This would look
   like the list below.  Note that we are not trying to 
   figure out right now which of these is likely to occur,
   we just want to get them listed out:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Each service has a default rate
    &lt;li&gt;Each customer has a negotiated rate
    &lt;li&gt;Each employee bills out at a default rate
    &lt;li&gt;The combination customer-service may have a rate
    &lt;li&gt;The combination customer-employee may have a rate
    &lt;li&gt;The combination customer-service-employee may have
        a rate (this is the upper bound of complexity, all
        three master entities determine the rate).
&lt;/ul&gt;

&lt;p&gt;Unless we live in a super-simple world where only the first 
   item in the list is present, we will end up dealing with
   several if not all of the combinations listed above.
&lt;/p&gt;

&lt;p&gt;Each of these combinations then becomes a table, and
   we know the billing rate will be determined by a
   &lt;a href="http://database-programmer.blogspot.com/2008/04/advanced-table-design-resolutions.html"&gt;resolution&lt;/a&gt;.

&lt;h2&gt;New Information&lt;/h2&gt;

&lt;p&gt;Now comes the big day and we interview with somebody
   we'll call "The Explainer" who is going to officially
   explain the billing system.  Can he break what we
   already know?  No.  At most he can:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Make us aware of new master entities, perhaps
    there are "projects" and "contracts" that get their
    own billing arrangements.
    &lt;li&gt;Dispel our notions about some of the combinations
    by saying, "Oh we never give a customer a default
    rate, the default rates come out of the services."
&lt;/ul&gt;

&lt;h2&gt;Going in Cold&lt;/h2&gt;

&lt;p&gt;What about the case where we know absolutely nothing
   about an assignment when we go in to begin the interviews?
   We can do a good job of thinking on our feet if we draw
   "The Explainer" towards the master entities.  As we gain
   confidence that we know what the master entities are,
   we can ask questions to probe Combinatorial Completeness
   and the Upper Bound of Complexity.&lt;/p&gt;

&lt;p&gt;One caveat: This method works for transactions between
   master entities.  When "The Explainer" starts describing
   something that cannot be recognized as an interaction 
   between master entities, do not try to stuff the problem
   into this box, it may not fit.  
&lt;/p&gt;

&lt;h2&gt;What About the Application?&lt;/h2&gt;

&lt;p&gt;At this point, we can also anticipate a lot of 
   what the application will look like.  We will need
   maintenance screens for all of the master entities,
   and a really slick UI will allow for very easy editing
   of those various cross-reference combination tables.
   As long as that much is done, we are almost finished,
   but not yet.
&lt;/p&gt;

&lt;p&gt;There will be some billing process that pulls
   the time entries, finds the correct billing rate for
   each one, and permanently records the invoices.  If
   we use a &lt;a href=""&gt;resolution&lt;/a&gt; this task is
   child's play to code, debug, and maintain.
&lt;/p&gt;

&lt;p&gt;Then of course there is the presentation, the actual
   bill.  Depending on the company, these may be delivered
   as hardcopy or in email.  That will of course have to
   be coded up.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;There are two conclusions.  First, as originally stated,
   many transactions can be anticipated when you know what
   the master entities are.
&lt;/p&gt;

&lt;p&gt;But secondly, and every bit as important, once the table
   design is sound, the application pretty much writes itself.
   On a personal note, this is probably why I do not find
   application coding as exciting as I once did.  Once I 
   realized that the real challenge and satisfaction was in
   working out the tables, the coding of the app became a 
   bit of a drudge, it requires no judgment as far as 
   business rules are concerned.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-2771898978072480219?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/2771898978072480219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=2771898978072480219' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2771898978072480219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2771898978072480219'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/12/case-when-table-design-is-easy-and.html' title='A Case When Table Design is Easy and Predictable'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-2401643054082238588</id><published>2010-11-30T22:29:00.006-05:00</published><updated>2010-12-01T21:43:22.904-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Window Functions'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL SELECT'/><title type='text'>The Really Cool NTILE() Window Function</title><content type='html'>&lt;p&gt;If you regularly code queries and have never been
   introduced to the &lt;i&gt;windowing functions&lt;/i&gt;, then
   you are in for a treat.  I've been meaning to write
   about these for over a year, and now it's time to get
   down to it.
&lt;/p&gt;

&lt;h2&gt;Support in Major Servers&lt;/h2&gt;

&lt;p&gt;SQL Server calls these functions 
   &lt;a href="http://msdn.microsoft.com/en-us/library/ms189798.aspx"
   &gt;Ranking Functions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;PostgreSQL supports a wider range of functions 
   than MS SQL Server, having put them in at
   8.4, and PostgreSQL and calls them
   &lt;a href="http://www.postgresql.org/docs/8.4/interactive/functions-window.html"
   &gt;Window Functions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Oracle's support is broader (by a reading of the docs)
   than SQL Server or PostgreSQL, and they call them
   &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions001.htm#i81407"
   &gt;Analytic Functions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I try to stay away from MySQL, but I did a quick Google on
   all three terms and came up with a few forum posts asking
   when and if they will be supported.
&lt;/p&gt;

&lt;h2&gt;The NTILE() Function&lt;/h2&gt;

&lt;p&gt;In this post we are going to look at NTILE, a cool function
   that allows you to segment query results into groups and 
   put numbers onto them.   The name is easy to remember because
   it can create any -tile, a percentile, a decile, or anything
   else.  In short, an &lt;i&gt;&lt;b&gt;n-&lt;/b&gt;&lt;/i&gt;tile.  But it is much easier to
   understand with an example, so let's go right to it.
&lt;/p&gt;

&lt;h2&gt;Finding percentiles&lt;/h2&gt;

&lt;p&gt;Consider a table of completed sales, perhaps on an eCommerce site.
   The Sales Manager would like them divided up into quartiles,
   four equally divided groups, and she wants the average and
   maximum sale in each quartile.  Let's say the company is not
   exactly hopping, and there are only twelve sales, which is good
   because we can list them all for the example.  If we already
   had the quartiles provided then the query would be easy, so if
   we were lucky enough to be starting with this:
&lt;/p&gt;

&lt;pre&gt;
 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 RETAIL   |   78.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 RETAIL   |  234.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 DEALER   |  249.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 DEALER   |  278.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 RETAIL   |  392.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 RETAIL   |  498.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 DEALER   |  500.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 RETAIL   |  738.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 DEALER   | 1250.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 RETAIL   | 2029.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
 RETAIL   | 2393.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
 RETAIL   | 3933.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
&lt;/pre&gt;

&lt;p&gt;The query would be child's play &lt;i&gt;if we already
   had the quartile&lt;/i&gt;:&lt;/p&gt;

&lt;pre&gt;
Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM ORDERS
 GROUP BY quartile
 ORDER BY quartile
&lt;/pre&gt;

&lt;h2&gt;The Problem is We Do Not Have Quartile&lt;/h2&gt;

&lt;p&gt;The problem of course is that we do not usually
   have handy columns like QUARTILE provided, but
   we can generate the QUARTILE column during the
   query by using NTILE.
&lt;/p&gt;

&lt;pre&gt;
Select quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        &lt;font color="green"&gt;-- The subquery is necessary
        -- to process all rows and add the quartile column&lt;/font&gt;
        SELECT amount
             , ntile(4) over (order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY quartile
 ORDER BY quartile
&lt;/pre&gt;

&lt;p&gt;This query will give us what the Sales Manager wants.&lt;/p&gt;

&lt;h2&gt;Dissecting the Function and The OVER Clause&lt;/h2&gt;

&lt;p&gt;The NTILE() function takes a single argument, which tells
   the server how many groups to divide the data into.  If 
   there are not an exact number of rows in each group, the
   server decides which groups will be missing one row.  So
   in an exact case all of your groups have the same count of
   rows, but when it does not divide evenly, one or more of them
   will be one row short.
&lt;/p&gt;

&lt;p&gt;If you pass 100 to NTILE(), you get a percentile.  If you
   pass 10, you get a decile, and so forth.
&lt;/p&gt;

&lt;p&gt;The magic is in the OVER() function.  This supports two clauses,
   and the example shows one, the ORDER BY.  Quite simply, the
   ORDER BY clause tells the server how to line up the rows when
   adding the NTILE values.  The clause is very flexible, and has
   nothing to do with your query's overall ORDER BY clause.


&lt;h2&gt;The Second Clause: PARTITION&lt;/h2&gt;

&lt;p&gt;Now we will pretend the Sales Manager is not satisfied, and
   wants separate numbers for the two Customer Types.  We could
   do this if the NTILE() function would create two sets
   of quartiles, one for each Customer Type, like so:
&lt;/p&gt;

&lt;pre&gt;
 CUSTTYPE | AMOUNT  | QUARTILE
----------+---------+----------
 DEALER   |  249.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 DEALER   |  278.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 DEALER   |  500.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 DEALER   | 1250.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
 RETAIL   |   78.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 RETAIL   |  234.00 |   &lt;font color="blue"&gt;1&lt;/font&gt;
 RETAIL   |  392.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 RETAIL   |  498.00 |   &lt;font color="red"&gt;2&lt;/font&gt;
 RETAIL   |  738.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 RETAIL   | 2029.00 |   &lt;font color="purple"&gt;3&lt;/font&gt;
 RETAIL   | 2393.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
 RETAIL   | 3933.00 |   &lt;font color="green"&gt;4&lt;/font&gt;
&lt;/pre&gt;

&lt;p&gt;We can do this by using the PARTITION BY clause,
   which tells the server to break the rows into
   groups and apply the NTILE() numbering separately
   within each group.  The new query would be this:
&lt;/p&gt;

&lt;pre&gt;
Select custtype
     , quartile
     , avg(amount) as avgAmount
     , max(amount) as maxAmount
  FROM (
        &lt;font color="green"&gt;-- The subquery is necessary
        -- to process all rows and add the quartile column&lt;/font&gt;
        SELECT amount
             , ntile(4) over (partition by custtype
                                 order by amount) as quartile
          FROM ORDERS
       ) x
 GROUP BY custtype,quartile
 ORDER BY custtype,quartile
&lt;/pre&gt;

&lt;h2&gt;Bonus Points: The Median&lt;/h2&gt;

&lt;p&gt;Now once again the Sales Manager, who is never satisified,
   comes down and says that the average is no good, she
   needs the max and the &lt;i&gt;median&lt;/i&gt; sale value within each quartile.
   To keep it simple, she does not need this broken out
   by customer type, it can be applied to the entire set.
&lt;/p&gt;

&lt;p&gt;This is a case where we can use NTILE() twice.  The first
   time we will break all sales up into four groups, to get
   the quartiles, and then we will break up each quartile into
   two groups to get the median.  The code looks like this:
&lt;/p&gt;

&lt;pre&gt;
Select quartile
     , max(case when bitile=1 then amount else 0 end) as medAmount
     , max(amount) as maxAmount
  FROM (
        &lt;font color="green"&gt;-- The second pass adds the
        -- 2-tile value we will use to find medians&lt;/font&gt;
        SELECT quartile
             , amount
             , ntile(2) over (partition by quartile
                                  order by amount) as bitile
          FROM (
                &lt;font color="green"&gt;-- The subquery is necessary
                -- to process all rows and add the quartile column&lt;/font&gt;
                SELECT amount
                     , ntile(4) over (order by amount) as quartile
                  FROM ORDERS
               ) x1
       ) x2
 GROUP BY quartile
 ORDER BY quartile
&lt;/pre&gt;

&lt;p&gt;The magic here is that we know we've divided the data
   evenly into four sets, so the median will be the maximum
   value half way through each set.  In other words, it will be the
   maximum value when the value of bitile=1 for each quartile.
&lt;/p&gt;

&lt;h2&gt;One More Note About Oracle&lt;/h2&gt;

&lt;p&gt;Once you get down the basics of the OVER clause, Oracle
   looks really good, because they support the clause over
   the largest range of functions, at least going by the
   respective doc pages for each platform.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-2401643054082238588?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/2401643054082238588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=2401643054082238588' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2401643054082238588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2401643054082238588'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/really-cool-ntile-window-function.html' title='The Really Cool NTILE() Window Function'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-8790205704344604408</id><published>2010-11-29T22:19:00.005-05:00</published><updated>2010-12-02T20:34:00.983-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cursors'/><title type='text'>Loops Without Cursors</title><content type='html'>&lt;h2&gt;Looping Without Cursors&lt;/h2&gt;

&lt;p&gt;Sometimes you need to process a table row-by-row,
   and the established approach is to use &lt;i&gt;cursors&lt;/i&gt;,
   which are verbose, slow, and painful to code 
   and use.  
&lt;/p&gt;

&lt;h2&gt;The Cursor Example&lt;/h2&gt;

&lt;p&gt;Here is the basic minimum syntax required to 
   loop through a table and get something done.
   The SQL flavor is MS SQL Server, but its not
   much better in any other flavor.
&lt;/p&gt;

&lt;pre&gt;
&lt;font color="red"&gt;-- I coded this off the top of my head, there
-- may be a minor syntax error or two&lt;/font&gt;

&lt;font color="green"&gt;-- Most of this is pseudo-code, but take
-- note that it is ordered on column1&lt;/font&gt;
declare someCursorName cursor for
 select column1, column2, column3 
   from anyTable
  ORDER BY column1

&lt;font color="green"&gt;-- Have to do this now&lt;/font&gt;
open someCursorName

&lt;font color="green"&gt;-- Now you need to declare some variables
-- For the example I'm just making everything int&lt;/font&gt;
declare @column1 int
      , @column2 int
      , @column3 int

&lt;font color="green"&gt;-- Gosh, we're actually about to start the loop!  Finally!&lt;/font&gt;
fetch next from someCursorName into @column1,@column2,@column3
while @@fetch_status = 0 begin

   &lt;font color="green"&gt;--  If you still remember what you actually wanted
   --  to do inside the loop, code it here:&lt;/font&gt;

&lt;font color="green"&gt;-- Repeat this line from the top here again:&lt;/font&gt;
fetch next from someCursorName into @column1,@column2,@column3
end

&lt;font color="green"&gt;-- Not done yet, these two lines are crucial&lt;/font&gt;
close someCursorName
deallocate someCursorName
&lt;/pre&gt;

&lt;p&gt;Call me petty, but what I hate about that code is that I
   have to refer to specific columns of interest 3 times (not
   counting the declarations).  You refer to them in the
   cursor declaration and in the two FETCH commands.  With
   a little clever coding, we can vastly simplify this
   and do it only once.
&lt;/p&gt;

&lt;h2&gt;Using An Ordered Column&lt;/h2&gt;

&lt;p&gt;We can execute the same loop without the cursor if 
   one of the columns is ordered and unique.  Let us say
   that column1 is the primary key, and is an auto-incremented
   integer.  So it is ordered and unique.  The code now
   collapses down to:
&lt;/p&gt;

&lt;pre&gt;&lt;font color="red"&gt;-- I coded this off the top of my head, there
-- may be a minor syntax error or two&lt;/font&gt;

&lt;font color="green"&gt;-- We can't get around declaring the vars, so do that&lt;/font&gt;
declare @column1 int
      , @column2 int
      , @column3 int

&lt;font color="green"&gt;-- If you know a safe value for initialization, you
-- can use the code below.  If this is not 100% 
-- safe, you must query for the value or it must
-- be supplied from some other source&lt;/font&gt;
set @column1 = -1

&lt;font color="green"&gt;-- BONUS POINTS: Can this become an infinite loop?&lt;/font&gt;
while 1 = 1 begin

&lt;font color="green"&gt;-- Now we code the query and exit condition&lt;/font&gt;
 select TOP 1
        @column1 = column1
      , @column2 = column2
      , @column3 = column3 
   from anyTable
  &lt;font color="red"&gt;WHERE column1 &gt; @column1  -- this is what advances the loop&lt;/font&gt;
  ORDER BY column1

if @@rowcount = 0 begin
    break
end

    &lt;font color="green"&gt;    -- Put the actions here    &lt;/font&gt;    

end
&lt;/pre&gt;

&lt;h2&gt;Final Notes&lt;/h2&gt;

&lt;p&gt;The only requirement for this approach is
   that you have a unique ordered column.  
   This usually means a unique key or primary
   key.  If "column1" is not unique, the loop
   will skip all but the first value in each
   group.
&lt;/p&gt;

&lt;p&gt;Also, it is very nice if you know a safe
   value to use as an initializer.  Without that,
   you must query for the minimum value that matches
   the condition and then decrement it by one.  
&lt;/p&gt;

&lt;p&gt;Finally, can this loop become infinite?  No.
   Well, if, in the extremely unlikely situation
   that rows are being added to the base table faster
   than you are processing them, then yes, it could
   go on for a very long time.  But if that were 
   happening I'd say there was a separate problem to
   look at.
&lt;/p&gt;

&lt;p&gt;It should probably go without saying, but if
   the particular loop is going to happen very
   often, the table should be indexed on your
   unique ordered column.  If it is a primary key
   or you already have a unique constraint it is not
   necessary to create an index explicitly because
   there will be one as part of the key or constraint.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-8790205704344604408?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/8790205704344604408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=8790205704344604408' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8790205704344604408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8790205704344604408'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/loops-without-cursors.html' title='Loops Without Cursors'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-3549269602674216393</id><published>2010-11-27T13:43:00.003-05:00</published><updated>2010-11-28T22:05:24.263-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='denormalization'/><category scheme='http://www.blogger.com/atom/ns#' term='normalization'/><title type='text'>Revisiting Normalization and Denormalization</title><content type='html'>&lt;p&gt;In this blog I have done at many articles on Normalization
   and Denormalization, but I have never put all of the arguments
   together in one place, so that is what I would like to do today.
&lt;/p&gt;

&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;The What and Why of Normalization&lt;/h2&gt;

&lt;p&gt;Normalization is the process of designing tables so that each fact is
   stored in exactly one place.  A "fact" in this case is any detail that
   we have to keep track of, such as a product's description, a product's
   price, an employee's social security number, and so forth.
&lt;/p&gt;

&lt;p&gt;The process is all about figuring out what tables you need and what
   columns each table will have.  If we are talking about an employee's
   social security number, then we can guess right from the start that
   will have a table of EMPLOYEES, and that one of the columns will be
   SSN.  As we get more details, we add more tables and columns.
&lt;/p&gt;

&lt;p&gt;The advantage of normalization comes when your application writes
   data to the database.  In the simplest terms, when the application
   needs to store some fact, it only has to go to one place to do it.
   Writing this kind of code is very easy.  Easy to write, easy to debug,
   easy to maintain and improve.  
&lt;/p&gt;

&lt;p&gt;When the database is not normalized, you end up spending more time
   writing more complicated application code that is harder to debug.
   The chances of bad data in your production database go way up.
   When a shop first experiences bad data in production, it starts to
   become tempting to "lock down" access to the database, either by
   forcing updates to go through stored procedures or by trying to
   enforce access to certain tables through certain codepaths.  Both
   of these strategies: stored procedures and code paths, are the 
   actually the same strategy implemented in different tiers, they
   both try to prevent bugs by routing access through some bit of 
   code that "knows what to do."  But if the database is normalized,
   you do not need any magic code that "knows what to do."
&lt;/p&gt;
   
&lt;p&gt;So that, in brief, is what normalization is and why we do it.
   Let's move on now to denormalization.
&lt;/p&gt;

&lt;h2&gt;Denormalization is Harder to Talk About&lt;/h2&gt;

&lt;p&gt;Normalization is easy to explain because there is a clearly
   stated end-goal: correct data.  Moreover, there are well-defined
   methods for reaching the goal, which we call the normal forms,
   &lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;, &lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;,
   and higher forms.  By contrast, denormalization is much harder
   to talk about because there is no agreed-upon end goal.  To make
   matters worse, denormalization violates the original theory of
   Relational Databases, so you still have plenty of people screaming
   not to do it all, making things even more confusing.  What we have
   now in our industry is different
   shops denormalizing in different ways for different reasons.
&lt;/p&gt;

&lt;p&gt;The arguments that I have heard in my career boil down to two
   basic groups.  The first set of arguments centers around
   calculated or derived values, and the second set centers
   around programmer convenience. 
&lt;/p&gt;

&lt;h2&gt;Arguments for Derived Values&lt;/h2&gt;

&lt;p&gt;My own experience comes down heavily in favor of denormalizing
   by storing derived values directly into the tables, with the
   extremely signficant caveat that you must have a way to ensure
   that they are always correct.  In this paradigm you maintain
   strict normalization for facts supplied from the outside,
   and then layer on additional facts that are calculated during
   write operations and saved permanently.
&lt;/p&gt;

&lt;p&gt;Here is a very simple example.
   A strictly normalized database happens to be missing data
   that many programmers would automatically assume should be
   stored.  Believe it or not, a simple value in a shopping
   cart like EXTENDED_PRICE is forbidden by 3rd normal form
   because it is a &lt;i&gt;non-key dependency&lt;/i&gt;, or, in plain
   English, since it can be derived from other values (QTY * PRICE),
   then it is redundant, and we no longer have each fact stored
   in exactly one place.  The value of EXTENDED_PRICE is only
   correct if it always equals QTY * PRICE, and so there is now
   a "fact" that is spread across three locations.
   If you store EXTENDED_PRICE, but do not have a way to ensure
   that it will always 100% of the time equal QTY * PRICE,
   then you will get bad data.
&lt;/p&gt;

&lt;p&gt;So, given the risk of bad data, what is to be gained by
   putting EXTENDED_PRICE into the cart?   The answer is that
   it adds value to the database and actually simplifies
   application code.  To see why, imagine a simple eCommerce
   shopping cart that does not store any derived values.  
   Every single display of the cart to the user must go all
   over the place to gather lots of details and recalculate
   everything.  This means re-calculating not just the
   EXTENDED_PRICE, but adding in item level discounts, taking
   account of possible tax exemptions for different items,
   rolling
   the totals to the cart, adding in tax, shipping, perhaps
   a customer discount, a coupon, and who knows what else.
   All of this just to display the cart, every time, no matter
   what the purpose.
&lt;/p&gt;

&lt;p&gt;This situation leads to three problems.  A pitifully slow
   application (too many disk reads and lots of cycles calculating
   the values), maddening bugs when an application update
   has subtle changes to the calculations so the customer's
   order no longer displays the same numbers as it did yesterday,
   and the frustrating requirement that the simplest of reports
   must route through application code to calculate these values
   instead of simply reading them off the disk, which leads to
   reporting systems that are orders of magnitude slower than they
   could be and horribly more complicated than they need to be
   because they can't just read straight from the tables.
&lt;/p&gt;
   
&lt;p&gt;Now let's look at how that same shopping cart would be used
   if all of those calculated values were generated and saved
   when the order is written.  Building on your foundation of
   normalized values (price, qty), you need only one body of code
   that has to perform calculations.  This magic body of code 
   takes the user-supplied values, adds in the calculations, 
   and commits the changes.  &lt;i&gt;All other subsequent operations
   need only to read and display the data, making them faster,
   simpler, and more robust.&lt;/i&gt; 
&lt;/p&gt;

&lt;p&gt;So the obvious question is how to make sure the derived
   values are correct.  If they are correct, we gain the
   benefits with no down side.  If there is the smallest chance
   of bad data, we will quickly pay back any benefit we gained
   by chasing down the mistakes.  
&lt;/p&gt;

&lt;p&gt;From a technical standpoint, what we really need is some 
   technology that will make sure the calculations cannot
   be &lt;i&gt;subverted&lt;/i&gt;, it cannot be possible for a stray
   bit of program code or SQL Statement to 
   put the wrong value in for EXTENDED_PRICE.  There are a
   few generally accepted ways to do this:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Require all writes to go through a certain codepath.
    The only PRO here is that you keep the logic in the 
    application code, and since most shops have more programmers
    than database people, this makes sense.  The only CON is that
    it never works.  One programmer working alone can maintain
    discipline, but a team cannot.  All it takes is one programmer
    who did not know about the required codepath to screw it all
    up.  Also, it makes your system inflexible, as it is no longer
    safe to write to the database except through a single application.
    &lt;li&gt;Require all writes to go through stored procedures.  
    This is nominally better than the codepath solution because it
    is not subvertible, and you can allow different side apps and
    utilities to safely write to the database.  But it makes a lot
    of work and tends to be very inflexible.
    &lt;li&gt;Putting triggers onto tables that perform the calculations
    and throw errors if a SQL statement attempts to explicitly
    write to a derived column.  This makes the values completely
    non-subvertible, ensures they will always be correct, and allows
    access from any application or utility.  The downside is that
    the triggers cannot be coded by hand except at extreme cost, and
    so must be generated from a data dictionary, which is fairly easy
    to do but tends to involve extreme psychological barriers.  In 
    these days of ORM many programmers mistakenly believe their
    class files define reality, but this is not true.  Reality is
    defined by the users who one way or another create the paychecks, and 
    by the database, which is the permanent record of facts.  But
    a programmer who thinks his classes define reality simply cannot
    see this and will reject the trigger solution for any number of
    invalid reasons.
&lt;/ul&gt;

&lt;p&gt;So denormalizing by putting in derived values can make a database
   much more valuable, but it does require a clear systematic
   approach to generating the derived values.  There is no technical
   problem associated with ensuring the values are correct because
   of course the application has to do that somehow somewhere anyway,
   the real barriers tend to be the psychological and political.
&lt;/p&gt;

&lt;h2&gt;Arguments For Programmer Convenience&lt;/h2&gt;

&lt;p&gt;The second set of arguments for denormalization tend to be
   rather weak, and come down to something like this (you have to
   picture the programmer whining like a child when he
   says this), "I don't like
   my data scattered around so many tables, can't we play some
   other game instead?"
&lt;/p&gt;

&lt;p&gt;Many programmers, when they first learn about normalization
   and build a normalized database,
   discover that the data they need to build a screen is "scattered"
   about in many tables, and that it is tedious and troublesome to
   get it all together for presentation to the user.  A simple 
   example might be a contacts list.  The main table is CONTACTS,
   and it contains not much more than first and last name.  A second
   table is a list of PHONES for each contact, and a third
   table is a list of various mailing addresses.  A fourth table
   of EMAILS stores their email addresses.  This makes four tables
   just to store a simple contact!  We programmers look at this and
   something inside of us says, "That's just way too complicated,
   can't I do something else instead?"
&lt;/p&gt;

&lt;p&gt;This is a case of programmer convenience clashing with correctness
   of data.  Nobody argues (at least not that I've heard) that they
   do not want the data to be correct, they just wonder if it is possible
   to simplify the tables so that they do not have to go out to so
   many places to get what they need.
&lt;/p&gt;

&lt;p&gt;In this case, programmers argue that denormalization will make
   for simpler code if they &lt;i&gt;deliberately skip one or more steps
   in the normalizing process.&lt;/i&gt;  (Technically I like to call the
   result a "non-normalized" database instead of denormalized, but
   most people call it denormalized, so we will go with that.)
&lt;/p&gt;

&lt;p&gt;The argument goes something like this:  I know for a fact that
   nobody in the contacts list will have more than 3 emails, so 
   I'm going to skip the EMAILS table and just put columns EMAIL1,
   EMAIL2, and EMAIL3 into the main CONTACTS table.  In this case,
   the programmer has decided to skip 1st Normal Form and put a
   &lt;i&gt;repeating group&lt;/i&gt; into the CONTACTS table.  This he argues
   makes for simpler database retrieval and easier coding.
&lt;/p&gt;

&lt;p&gt;The result is painfully predictable.  The simplification the 
   programmer sought at one stage becomes a raft of complications
   later on.  Here is an example that will appear trivial but really
   gets to the heart of the matter.  How do you count how many
   emails a user has?  A simple SELECT COUNT(*)...GROUP BY CONTACT
   that would have worked before now
   requires more complicated SQL.  But isn't this trivial?  Is it
   really that bad?  Well, if all you are coding is a CONTACTS
   list probably not, but if you are doing a real application with
   hundreds of tables and this "convenience" has been put out there
   in dozens of cases,
   than it becomes a detail that programmers need to know on a 
   table-by-table basis, it is an exception to how things ought
   to be that has to be accounted for by anybody who touches the
   table.  In any shop with more than 5 programmers, whatever 
   convenience the original programmer gained is lost quickly
   in the need to document and communicate these exceptions.
   And this is only a single trivial example.
&lt;/p&gt;

&lt;p&gt;Other examples come when it turns out you need more than
   three slots for phone.  In the normalized case this never comes
   up.  Any user can have any number of phones, and the code to
   display the phones is running through a loop, so it does not 
   need to be modified for the case of 1 phone, 2 phones, etc.
   But in the "convenient" denormalized case you now must
   modify the table structure &lt;i&gt;and the code that displays the contacts,&lt;/i&gt;
   making it quite inconvenient.
&lt;/p&gt;

&lt;p&gt;Then you have the case of how to define unused slots.  If the
   user has only one email, do we make EMAIL2 and EMAIL3 empty 
   or NULL?  This may also seem like a silly point until you've sat
   through a flamewar at the whiteboard and discovered just how
   passionate some people are about NULL values.  Avoiding that argument
   can save your shop a lot of wasted time.
&lt;/p&gt;

&lt;p&gt;In short, programmer convenience should never lead to a shortcut
   in &lt;i&gt;skipping normalization steps&lt;/i&gt; because it introduces far
   more complications than it can ever pay for.
&lt;/p&gt;

&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization (this essay)&lt;/a&gt;&lt;/i&gt;.
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-3549269602674216393?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/3549269602674216393/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=3549269602674216393' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/3549269602674216393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/3549269602674216393'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html' title='Revisiting Normalization and Denormalization'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-1844378116404145532</id><published>2010-11-19T23:06:00.004-05:00</published><updated>2010-11-28T22:08:51.814-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='orm'/><category scheme='http://www.blogger.com/atom/ns#' term='Ken&apos;s Law'/><category scheme='http://www.blogger.com/atom/ns#' term='abstraction'/><title type='text'>Prepare Now For Possible Future Head Transplant</title><content type='html'>&lt;p&gt;This is the Database Programmer blog, for anybody who wants
   practical advice on database use.&lt;/p&gt;

&lt;p&gt;There are links to other essays at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Planning For The Unlikely&lt;/h2&gt;

&lt;p&gt;We programmers love to plan for things that will hardly
   ever happen, like coding the system's upgrade engine to
   handle spontaneous human combustion, making sure the
   SQL scrubbing layer can also launch a rocket into
   space, and, well, trying to work out ahead of time
   what to do if 
   we ever need a head transplant.  
&lt;/p&gt;

&lt;p&gt;The boss comes over and says, "can you toss a simple
   plot onto the sales' staff home page that shows sales
   by day?  Use that 'jquicky' or whatever you call it.
   Should take a couple of hours, right?"  And three days
   later we're working on the world's greatest plotting
   system that can report &lt;i&gt;everything except what the
   boss actually asked for because we haven't gotten
   around to that part of it yet.&lt;/i&gt;  (Really, can he expect
   me to just bang this out without the required 
   Seven Holy Layers of Abstraction and Five Ritual
   Forms of Parameterization, and the Just and Wholesome
   Mobile Support, or the features Not Yet Required
   but visible to the Far Seeing Wise and Learned Men?)
&lt;/p&gt;

&lt;h2&gt;Abstraction Contraptions&lt;/h2&gt;

&lt;p&gt;So what I am getting at is that programmers of all
   stripes are addicted to abstraction, it gives us
   goosebumps and makes us feel warm and tingly, and
   so we do it even when we do not need to.  We
   build abstraction contraptions.
&lt;/p&gt;

&lt;p&gt;When it comes to designing a database, this 
   unhealthy proclivity can seriously slow you
   down, because of what I call:
&lt;/p&gt;

&lt;h2&gt;Ken's Law&lt;/h2&gt;

&lt;p&gt;Everybody wants to be remembered for something.  If
   I could write my own epitaph, it might be:
&lt;/p&gt;

&lt;p style="border: 2px solid black; font-weight:bolder; 
          text-align: center; padding: 5px 5px 5px 5px;
          background-color:lightgreen"&gt;
Table-based datastores are optimally abstract
&lt;/p&gt;

&lt;p&gt;This law is not about database access, it is about
   database design.  It can be expressed informally as:
&lt;/p&gt;

&lt;p style="border: 2px solid black; font-weight:bolder; 
          text-align: center; padding: 5px 5px 5px 5px;
          background-color:lightgreen"&gt;
People Understand Tables Just Fine
&lt;/p&gt;

&lt;p&gt;Or more rigorously as:&lt;/p&gt;

&lt;p style="border: 2px solid black; font-weight:bolder; 
          text-align: center; padding: 5px 5px 5px 5px;
          background-color:lightgreen"&gt;
Table-based datastores are optimally
abstract; they require no additional abstraction
when requirements are converted to desgin; they
cannot be reduced to a less abstract form.
&lt;/p&gt;

&lt;h2&gt;Structured Atomic Values&lt;/h2&gt;

&lt;p&gt;I should point out that this essay deals with
   structured atomic values, who live in the
   Kingdom of The Relational Database.  The 
   concepts discussed here do not apply to
   free-text documents or images, or sound
   files, or any other media.
&lt;/p&gt;

&lt;h2&gt;No Additional Abstraction Required&lt;/h2&gt;

&lt;p&gt;My basic claim here is that you cannot create
   an abstraction of data schemas that will pay
   for itself.  At best you will create a 
   description of a database where everything
   has been given a different name, where tables have been 
   designated
   'jingdabs' and columns have been designated
   'floopies' and in the end all of your jingdab
   floopies will become columns in tables.  Oh,
   and I suppose I should mention the Kwengars will
   be foreign keys and the Slerzies will be primary
   keys.
&lt;/p&gt;

&lt;p&gt;After that it goes downhill, because if we  
   generate an abstraction that is not a simple 
   one-to-one mapping, we actually obscure the
   end goal.  Consider an example so simple as to
   border on the trivial or absured.
    Why would we ever use the terms
   "One-to-Many" or "Many-to-Many" when the more
   precise terms "child table" and "cross-reference
   table" convey the same idea without the noise?
    I said above that this would sound
   trivial, and you can accuse me of nit-picking,
   but this is one of those camel's nose things,
   or perhaps a slippery slope.  When technical folk
   get together to design a system, we should call
   things what they are, and not make up new words
   to make ourselves sound smarter.
&lt;/p&gt;


&lt;h2&gt;No de-Abstraction is Possible&lt;/h2&gt;

&lt;p&gt;The second half of Ken's law says that you
   cannot de-Abstract a table schema into some
   simpler form.  This should be very easy to
   understand, because relational databases 
   deal with atomic values, that is, values 
   which cannot themselves be decomposed.  If
   you cannot decompose something, then it cannot
   be an abstraction of something more specific.
&lt;/p&gt;

&lt;p&gt;Going further, if the schema has been
   normalized, then every fact is stored in
   exactly one place, so no further simplification
   is possible.  If you cannot simplify it or
   resolve it into something more specific, then
   it is not an abstraction of something else.
&lt;/p&gt;


&lt;h2&gt;But Does it Work?&lt;/h2&gt;

&lt;p&gt;I originally began to suspect the existence
   of what I call in all humility "Ken's Law"
   when I was sitting at a large conference table
   with my boss, her boss, a couple of peers, and
   3 or 4 reps from a Fortune 5 company.  My job
   was basically to be C3PO, human-cyborg relations.
   Some people at the table protested loudly to 
   being non-technical, while others had technical
   roles.  But &lt;i&gt;everybody at the table spent all
   day discussing table design.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;Later on, when at a different position, the 
   programmers received their instructions from 
   Project Managers.  The best Project Managers worked
   with their customers to figure out what they were
   trying to keep track of, and handed us specs that
   were basically table layouts.  The customers loved
   these guys because they felt they could "understand
   what the project manager was talking about", and
   the project managers, who swore they were not technical,
   were respected because they handed us requirements
   we could actually understand and implement.
&lt;/p&gt;

&lt;p&gt;Since that time I have further learned that it 
   is very easy for anybody who deals with non-technical
   people to bring them directly to table design without
   telling them you are doing it.  All you have to do
   is listen to what words they use and adopt your
   conversation accordingly.  If they say things like
   "I need a screen that shows me orders by customer
   types" they have told you there will be a table of
   customer types.  Talk to them in terms of screens.
   If they say, "Our catalog has 3 different
   price list and four discount schemes" then you know that
   there will be a PRICELIST table, a DISCOUNTS table, and
   likely some cross-references and parent-child relationships
   going on here.
&lt;/p&gt;

&lt;h2&gt;So How Does ORM Come Into This?&lt;/h2&gt;

&lt;p&gt;One of the greatest abstraction contraptions of
   this century (so far), is ORM, or Object-Relational
   Mapping, which I &lt;a href="http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"&gt;do not use&lt;/a&gt; 
   precisely because it is an abstraction contraption.
&lt;/p&gt;

&lt;p&gt;To be clear, the mistake that ORM makes is not
   at the design phase, but at the access phase.
   The ORM meme complex instructs its victims 
   that it is ok to put 
   structured atomic values into a Relational
   Database, but when it comes time to access and
   use that data &lt;i&gt;we will pretend we did not put it
   into tables and we will pretend that the data is in
   objects.&lt;/i&gt;  In this sense the term Object-
   Relational Mapping is a complete misnomer, because
   the point is not to map data to objects but to
   create the illusion that the tables do not even
   exist.  
&lt;/p&gt;

&lt;p&gt;Perhaps ORM should stand for Obscuring Reality Machine.
&lt;/p&gt;

&lt;h2&gt;Getting Back to That Head Transplant&lt;/h2&gt;

&lt;p&gt;So what about that weird title involving head
   transplants?  Obviously a head transplant is
   impossible, making it also very unlikely, besides
   being silly and non-sensical.  It came to mind
   as a kind of aggregrate of all of the bizarre
   and unrealistic ideas about abstracting data
   designs that I have heard over the years.
&lt;/p&gt;

&lt;p&gt;One of these ideas is that it is possible and
   beneficial to create a design that is abstract
   so that it can be implemented in any model:
   relational, hierarchical, or network.  I'm not
   saying such a thing is impossible, it is likely
   just a &lt;a href="http://www.catb.org/jargon/html/S/SMOP.html"&gt;small matter of programming&lt;/a&gt;,
   but for heaven's sake, what's the point?
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;So don't waste time creating abstractions that
   add steps, possibly obscure the goal, and 
   add no value.  Don't plan for things that are
   not likely to happen, and avoid abstraction
   contraptions.
&lt;/p&gt;

&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Other philosophy essays are:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/prepare-now-for-possible-future-head.html"
        &gt;Prepare Now For Possible Future Head Transplant (This Essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html"
        &gt;The Quest for The Absolute&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/03/i-am-but-humble-filing-clerk.html"
        &gt;I Am But A Humble Filing Clerk&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"
        &gt;Why I Do Not Use ORM&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
        &gt;Minimize Code, Maximize Data&lt;/a&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-1844378116404145532?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/1844378116404145532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=1844378116404145532' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1844378116404145532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1844378116404145532'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/prepare-now-for-possible-future-head.html' title='Prepare Now For Possible Future Head Transplant'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4686458638958533585</id><published>2010-11-13T12:14:00.002-05:00</published><updated>2010-11-29T20:45:18.533-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Lots of Links'/><title type='text'>Database Skills</title><content type='html'>&lt;p&gt;It seems strange to me that I've been working on this blog
   for 3 years or so (with one very long break) and somehow never
   got around to writing a simple list of skills that all 
   database experts need.  So here it is!
&lt;/p&gt;

&lt;h2&gt;Various Job Tiles for Database People&lt;/h2&gt;

&lt;p&gt;There are three common job titles in the database area,
   which are Database Administrator (DBA), Database Programmer,
   and Database Architect.   These titles tend to be somewhat
   variable from shop-to-shop, but generally the "Architect"
   term indicates the highest level of skill combined with 
   considerable management responsibilities.  The "Programmer"
   term is somewhere below that, but the "DBA" is extremely
   variable.  I have seen shops where a person was called a 
   DBA and filled a relatively constrained role closer to
   IT or operations (routine tasks, no real programming) and
   other shops where a person with the DBA title was basically
   the Architect.
&lt;/p&gt;

&lt;p&gt;Because of this high variability in what titles mean, I am not
   going to waste time categorizing skills as belonging to one
   job title or another, I am simply going to list them all out.
&lt;/p&gt;

&lt;p&gt;The various levels of skills are these:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Before Hello World!: The basics of tables, columns, rows
    &lt;li&gt;The Hello World! Level: SQL Select
    &lt;li&gt;Just after Hello World!: Writing Data
    &lt;li&gt;Commands to create, modify and drop tables, or Data
        Definition Language (DDL)
    &lt;li&gt;Knowing how to use a Query Analyzer or optimization tool
    &lt;li&gt;Understanding Normalization
    &lt;li&gt;Understanding Denormalization
    &lt;li&gt;Understanding Primary Keys, Foreign Keys and Constraints
    &lt;li&gt;Understanding Transactions
    &lt;li&gt;Understanding ACID
    &lt;li&gt;Understanding Indexes as optimization tool
    &lt;li&gt;Views
    &lt;li&gt;Database Security
    &lt;li&gt;Upgrades and Installs
    &lt;li&gt;Efficient access of database from application
    &lt;li&gt;Bulk operations: loading or exporting large amounts
        of data
    &lt;li&gt;Understanding of Contexts and how they lead to
        different sets of Best Practices
    &lt;li&gt;Preventing performance degradation through 
        various maintenance tasks
    &lt;li&gt;Deployment strategies: partitioning, tablespaces
    &lt;li&gt;Deployment strategies, failure protection, from 
        simple backup to hot standbys
    &lt;li&gt;Server side coding: stored procedures and functions
    &lt;li&gt;Server side coding: triggers
    &lt;li&gt;Temporary tables
&lt;/ul&gt;

&lt;p&gt;As long as that list is, it only covers those of us who
   &lt;i&gt;use&lt;/i&gt; database systems.  There is an entire set of
   skills for those who actually &lt;i&gt;create and maintain&lt;/i&gt; these
   systems, but that is not something that will be treated
   in this blog.
&lt;/p&gt;

&lt;h2&gt;Before Hello World!: Tables and Columns&lt;/h2&gt;

&lt;p&gt;If you have never so much as typed a single SQL command,
   or seen a table diagram, or anything like that, then it is
   worth a few minutes to go through the basics of what
   a database does, which is to organize atomic values into
   tables.
&lt;/p&gt;

&lt;p&gt;I am going to write an essay on this soon, even though it
   may seem so obvious as to be completely unnecessary.  But I
   will do it because the most popular essay on this 
   blog is about using GROUP BY, which tells me newer programmers
   are starving for useful tutorials at the beginner level.
   So it seems to me, why not put something out there at the
   very beginning of the beginning?
&lt;/p&gt;
   

&lt;h2&gt;The Hello World! Level: SQL Select&lt;/h2&gt;

&lt;p&gt;If you are starting completely from scratch and want to know
   about database programming, you want to start with the SQL
   SELECT command.  This is the (almost) only command used to
   extract data from a database, and all of the possible ways to
   combine, filter and extract data are expressed in the many
   clauses of this command.
&lt;/p&gt;
  
&lt;ul&gt;&lt;li&gt;Simplest Possible &lt;a href="http://database-programmer.blogspot.com/2008/03/introduction-to-queries.html"&gt;SQL SELECT&lt;/a&gt; commands
    &lt;li&gt;Simple embellishments: renaming columns, calculations
    &lt;li&gt;Aggregrations: &lt;a href="http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html"&gt;GROUP BY...HAVING&lt;/a&gt;
    &lt;li&gt;Multiple table queries: &lt;a href="http://database-programmer.blogspot.com/2008/03/how-sql-union-affects-table-design.html"&gt;UNION&lt;/a&gt; and &lt;a href="http://database-programmer.blogspot.com/2008/03/join-is-cornerstone-of-powerful-queries.html"&gt;JOIN (part 1)&lt;/a&gt; and
        &lt;a href="http://database-programmer.blogspot.com/2008/04/joins-part-two-many-forms-of-join.html"&gt;JOIN (part 2)&lt;/a&gt;
    &lt;li&gt;Subqueries as column values
    &lt;li&gt;Subqueries as tables
    &lt;li&gt;Partitioning Functions: ntile(), row_number(), etc.
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html"&gt;Recursive queries using Common Table Expressions&lt;/a&gt;
    &lt;li&gt;Extracting results as XML
&lt;/ul&gt;


&lt;h2&gt;Just after Hello World!: Writing Data&lt;/h2&gt;

&lt;p&gt;When it comes time to change the data in a database
   there are three commands, listed below.  These commands
   are based on the tables-and-rows nature of databases,
   and allow to add a row (or rows), change a row (or rows)
   and delete a row (or rows).

&lt;ul&gt;&lt;li&gt;The INSERT command
    &lt;li&gt;The UPDATE command
    &lt;li&gt;The DELETE command
&lt;/ul&gt;

&lt;h2&gt;Commands to create, modify and drop tables, or Data
Definition Language (DDL)&lt;/h2&gt;

&lt;p&gt;The term "DDL" stands for "Data Definition Language"
   and includes all of the commands use to build the 
   tables that will hold the data for the INSERT, UPDATE,
   DELETE and SELECT statements.  The basic list of
   commands to be familiar with is:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Understanding Data Types (databases are strongly typed)
    &lt;li&gt;CREATE TABLE and ALTER TABLE
    &lt;li&gt;Commands to add and drop primary keys
    &lt;li&gt;Commands to add and drop foreign keys
    &lt;li&gt;Commands to add and drop constraints
    &lt;li&gt;Commands to add and drop indexes
&lt;/ul&gt;

&lt;p&gt;There are also numerous commands that are specific
   to different products.  Those will not be listed here
   today, but who knows what the future may bring.
&lt;/p&gt;


&lt;h2&gt;Knowing how to use a Query Analyzer or optimization tool&lt;/h2&gt;

&lt;p&gt;Database programmers, once they get started with the skills 
   listed above, tend to become more and more obsessed with 
   performance.  Every major database has some type of tool
   that lets you examine how the server is going to process
   a SQL SELECT, and database programmers depend on these tools
   to discover where they might alter tables or indexes or the
   SELECT itself to make the queries go faster.
&lt;/p&gt;
   

&lt;h2&gt;Understanding Normalization&lt;/h2&gt;

&lt;p&gt;The term "normalization" refers to the process of analyzing
   the data that your system is required to store, and organizing
   it so that every fact is stored in exactly one place.  Understanding
   how to normalize data is an absolute requirement for the 
   database programmer who wants to design databases.
&lt;/p&gt;

&lt;p&gt;We speak of normalization in "forms" as in "first normal form",
   "second normal form", and so on.  It is a good idea to understand
   &lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The argument for normalization&lt;/a&gt; and then to pursue
   at very least:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form&lt;/a&gt;
    &lt;li&gt;Fourth Normal Form and higher forms
&lt;/ul&gt;

&lt;p&gt;Normalization is a a fascinating topic to study, and it
   extends all they way up to "Domain-key Normal Form" which is
   considered the most complete normalization for a database.
&lt;/p&gt;


&lt;h2&gt;Understanding Denormalization&lt;/h2&gt;

&lt;p&gt;Every database programmer, after fully understanding
   normalization, realizes that there are severe practical
   problems with a fully normalized database, such a database
   solves many problems but generates problems of its own.
   This has led programmer after programmer down the path
   of &lt;i&gt;denormalization&lt;/i&gt;, the deliberate re-intoduction
   of redundant values to improve the usability of the
   database.
&lt;/p&gt;

&lt;p&gt;There is a surprising lack of material available on the
   web regarding denormalization strategies.  Most of what
   you find is arguments and flame wars about whether or not
   to do it, with little to nothing on how to actually do it.
   For this reason, I provide my own essays on this blog on
   the strategies and methods I have worked out over the years:
&lt;/p&gt;

&lt;p&gt;After reviewing &lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument For Denormalization&lt;/a&gt;
   it is worthwhile to follow up with:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Understanding the "Automation Constraint" and how
        to &lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keep denormalized values correct&lt;/a&gt;
    &lt;li&gt;Understanding the &lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;three denormalization patterns of FETCH, EXTEND, and AGGREGATE&lt;/a&gt;
    &lt;li&gt;Other Patterns
&lt;/ul&gt;

&lt;p&gt;The arguments for and against denormalization are heavily
   affected by the &lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay me now or pay me later&lt;/a&gt;
   design tradeoff.
&lt;/p&gt;


&lt;h2&gt;Understanding Primary Keys, Foreign Keys and Constraints&lt;/h2&gt;

&lt;p&gt;One might argue that this list of skills belongs much higher
   up the list, up there with the CREATE TABLE command.  However,
   I have it useful to distinguish between simply &lt;i&gt;knowing the
   commands&lt;/i&gt; to make a primary key and actually understanding
   &lt;i&gt;the tremendous power&lt;/i&gt; of keys.
&lt;/p&gt;

&lt;p&gt;In this author's opinion it is not truly possible to understand
   how powerful and beneficial Primary keys and Foreign Keys are
   for an entire application stack until you have learned the commands,
   built some databases, and worked through the concepts of normalization
   and denormalization.  Only then can you revisit these humble
   tools and realize how powerful they are.
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-primary-keys-this-is.html"&gt;Primary Keys and Table Design&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html"&gt;Choosing data types for primary keys&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-foreign-keys-this-is.html"&gt;Foreign keys and table design&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/different-foreign-keys-for-different.html"&gt;Foreign keys and cascading actions&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/03/of-tables-and-constraints.html"&gt;The How and Why of Constraints&lt;/a&gt;
&lt;/ul&gt;

&lt;h2&gt;Understanding Transactions&lt;/h2&gt;

&lt;p&gt;The word "transaction" has two meanings in common day-to-day
   database talk.  One meaning is very loose and refers to some
   individual command or set of commands.  You might hear somebody
   using the term loosely when they say, "We're seeing about 
   10 transactions per second this week."
&lt;/p&gt;

&lt;p&gt;The more rigorous use of the term refers to a statement or
   set of statements that &lt;i&gt;must be guaranteed to either 
   complete in their entirety or fail in their entirety.&lt;/i&gt;
   This is a profoundly important concept once you get beyond
   simply making tables with keys and get into real-world
   heavy multi-user activity.  And this leads us to the
   next topic...
&lt;/p&gt;

&lt;h2&gt;Understanding ACID&lt;/h2&gt;

&lt;p&gt;Modern relational databases expect multiple simultaneous
   users to be writing and reading data all of the time.
   The term "ACID Compliance" refers to both the philosophy
   of how to handle this and the actual methods that 
   implement that philosophy.  The term ACID refers to:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The Atomic nature of each transaction
    &lt;li&gt;The Consistentcy of the database during and
        after simultaneous overlapping transactions
    &lt;li&gt;The Isolation of each transaction
    &lt;li&gt;The Durability of the results
&lt;/ul&gt;

&lt;h2&gt;Understanding Indexes as optimization tool&lt;/h2&gt;

&lt;p&gt;An index is a special tool databases use to provide very 
   rapid access to large amounts of data.  Just like keys, it
   is not enough to know the commands, it is necessary to
   understand the subtle power of indexes when used with some
   craftsmanship.  The basic uses of indexes are:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;A simple index on a column to provide rapid search
        on that column
    &lt;li&gt;A "covering index" that includes extra columns that
        can further speed up certain common access patterns
    &lt;li&gt;Clustered indexes (MS SQL Server) and what they give
        and what they take away
    &lt;li&gt;The cost of indexes on write operations
&lt;/ul&gt;


&lt;h2&gt;Views&lt;/h2&gt;

&lt;p&gt;A view looks like a table to the SQL SELECT command.  The view
   itself is a stored SQL SELECT command that encodes some 
   query that is either used very often or is very compex.  In
   all cases, views are used to present the database data to
   the application in some simplified convenient or secure
   form.  The two major uses of views are:

&lt;ul&gt;&lt;li&gt;To simplify the application programmer's job
    &lt;li&gt;To provide a read-only interface for
        some applications
&lt;/ul&gt;

&lt;h2&gt;Upgrades and Installs&lt;/h2&gt;

&lt;p&gt;If you are a single programmer or hobbyist working with a
   database, it is all well and good to just add and drop tables
   as you wish.  But as soon as you get into development
   with quality control stages and multiple programmers, it becomes
   evident that you need a strategy for handling the
   &lt;i&gt;schema changes&lt;/i&gt; that come with with new versions of
   the system.  There are multiple essays available on 
   this blog, covering:
&lt;/p&gt;


&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html"&gt;Dictionary based upgrades&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/upgrading-indexes-with-data-dictionary.html"&gt;Upgrading indexes and keys&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/upgrading-indexes-with-data-dictionary.html"&gt;Upgrades, dictionary, and calculated values part 1&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;Upgrades, dictionary, and calculated values part 2&lt;/a&gt;
&lt;/ul&gt;
   

&lt;h2&gt;Database Security&lt;/h2&gt;

&lt;p&gt;Databases provide incredible security provisions that are
   just about completely ignored by modern web developers.
   Sometimes there is good reason for this, but overall anybody
   who wants to become a truly accomplished Database Programmer
   or Database Architect must have a thorough understanding
   of database security and how it can simplify the entire
   system stack.
&lt;/p&gt;

&lt;p&gt;Database security comes down to specifying who is allowed
   to perform the 4 basic operations of INSERT, UPDATE,
   DELETE and SELECT against which tables:
&lt;/p&gt;

&lt;p&gt;My basic introduction to security is &lt;a href="http://database-programmer.blogspot.com/2008/05/introducing-database-security.html"&gt;here&lt;/a&gt;.

&lt;ul&gt;&lt;li&gt;Understanding roles (we used to say users and groups)
    &lt;li&gt;Simple table-level security
    &lt;li&gt;Column-level security (not widely supported)
    &lt;li&gt;Row-level security (not widely supported)
&lt;/ul&gt;


&lt;h2&gt;Efficient access of database from application&lt;/h2&gt;

&lt;p&gt;Imagine you have the perfectly designed database, with
   every nuance and subtlety excellently crafted in the
   ares of keys, indexes, normalization, denormalization
   and security.  At this point your job branches out into
   several new areas, but one of the most important is 
   knowing how to write application code that efficiently
   accesses the database.
&lt;/p&gt;

&lt;h2&gt;Bulk operations: loading or exporting large amounts of data&lt;/h2&gt;

&lt;p&gt;Some database applications involve a large number of small
   transactions, where each trip to the database writes only a 
   single row or reads only a dozen or so rows.
&lt;/p&gt;

&lt;p&gt;But in many cases you need to bulk load large amounts of
   data in one shot, thousands or even millions of rows.  In
   these cases the techniques that govern small transactions
   are useless and counter-productive, and you need to learn
   some new commands and strategies to handle the bulk loads.
&lt;/p&gt;

&lt;h2&gt;Understanding Contexts and how they lead to different sets of Best Practices&lt;/h2&gt;

&lt;p&gt;Not all databases are created for the same purpose.  If you
   have a very large operations then it will likely have multiple
   independent databases that fill the classical roles, while in
   a smaller shop the roles may be combined in one database.  I
   like to refer to these roles as "contexts" because they determine
   how the tables will be designed and how acess to the tables
   will be governed.  The major contexts are:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;OLTP or Online Transaction Processing, characterized
        by simultaneous reads and writes, generally assumes
        little or no periods of inactivity, and generally 
        assumes that the individual transactions are very
        small.  The apps we were all writing in the 80s and
        90s to do accounting, ERP, MRP, Job control, payroll,
        airline reservations and many others fall into this
        context.
    &lt;li&gt;Data Warehouse context, characterized by periodic
        bulk loads of new information with most activity
        being reads.  The Data Warehouse context is largely
        associated with the "Star Schema" table design.
        Data in a Warehouse is historical, it never changes
        after it is loaded.
    &lt;li&gt;CMS or Content Management System, also characterized
        by very few writes compared to reads, but more likely
        to have a normalized structure.  Unlike a Data
        Warehouse, the data is subject to change, just not that
        often. 
    &lt;li&gt;Any other Read Only Context.  I include this category 
        because I spent some time working on Direct Marketing
        databases, which are like a Data Warehouse in that they
        are updated periodically and the data does not change,
        but the Star Schema is completely inappropriate for them.
&lt;/ul&gt;

&lt;p&gt;If you consider a huge online shopping system, you can see that
   within that application there are at least two contexts.  The
   product catalog is likely to see vastly fewer writes than 
   reads, but the shopping cart tables will be in a constant state
   of reads and writes.
&lt;/p&gt;        



&lt;h2&gt;Preventing performance degradation through 
various maintenance tasks&lt;/h2&gt;

&lt;p&gt;Once the database and its application stack is up and running,
   and the reads and writes and coming through, the laws of
   thermodynamics come into play and system performance can 
   begin to degrade even if the database stays the same size
   and the load on the system is steady.
&lt;/p&gt;

&lt;p&gt;Different vendors have different tools for combatting this,
   but they tend to come down to reclaiming temporary space and
   occassionally rebuilding indexes.  There are also log files
   that have to be purged, regular backups to be made, and other
   operations along those lines.
&lt;/p&gt;

&lt;h2&gt;Deployment strategies: partitioning, tablespaces&lt;/h2&gt;

&lt;p&gt;When systems become sufficiently large, it is no longer
   possible to just attach some disks to a box and run
   a database server.  The Database Architect must consider
   breaking different tables out onto different sets of 
   spindles, which is usually done with "tablespaces", and
   moving older data onto slower cheaper spindles, which is
   often done with Partitioning.
&lt;/p&gt;

&lt;h2&gt;Deployment strategies, failure protection, from 
simple backup to hot standbys&lt;/h2&gt;

&lt;p&gt;Because a database typically experiences simultaneous
   reads and writes from multiple sources, and may be expected
   to be up and running 24/7 &lt;i&gt;indefinitely&lt;/i&gt;, the concept
   of making a backup and recovering from a failure becomes
   more complicated than simply copying a few files to a 
   safe location.
&lt;/p&gt;

&lt;p&gt;In the most demanding case, you will need to provide a
   second complete box that can become fully live within
   seconds of a disastrous failing of the main box.  This is
   called various things, but Postgres calls it a "hot standby"
   in version 9 and some MS SQL Server shops call it a
   "failover cluster."
&lt;/p&gt;

&lt;p&gt;The ability to come up live on a second box when the first
   one fails is made possible by the way databases handle
   ACID compliance, and the fact that they produce something
   called a Write-Ahead-Log (WAL) that can be fed into a 
   second box that "replays" the log so that its copy of the
   database is getting the same changes as the master copy.
&lt;/p&gt;

&lt;h2&gt;Server side coding: stored procedures and functions&lt;/h2&gt;

&lt;p&gt;I really could not figure out where to put this entry 
   in the list, so I just punted and put it near the end.
   It could really go anywhere.&lt;/p&gt;

&lt;p&gt;Stored procedures or functions are procedural routines
   (not object oriented) that are on the database server and
   can be invoked directly from an application or embedded
   inside of SQL commands.  Generally speaking they provide
   various flow-control statements and rudimentary variable
   support so that you can code multi-step processes on the
   server itself instead of putting them in application code.&lt;/p&gt;

&lt;h2&gt;Server side coding: Triggers&lt;/h2&gt;

&lt;p&gt;Triggers are quite possibly the most elegant and beautiful
   technology that servers support, absolutely the least
   understood, and definitely the most reviled by the
   ignorant.  You will find virtually no web content today
   that will explain why and how to use triggers and
   what they are good for.
&lt;/p&gt;

&lt;p&gt;Except of course for &lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;my own essay on
   triggers&lt;/a&gt; that discusses them in terms of
   encapsulation.

&lt;h2&gt;Temporary tables&lt;/h2&gt;

&lt;p&gt;Temporary tables are like Stored Procedures inasmuch as
   I had no idea where to put them in the list, so they
   just ended up at the end.
&lt;/p&gt;

&lt;p&gt;As the name implies, a temporary table is a table that
   you can create on-the-fly, and which usually disappears
   when your transaction is complete.  They are most often
   found in Stored Procedures.  They can impact performance
   for the worst in many ways, but can be extremely 
   useful when you are doing multi-staged analsysis of
   data in a Data Warehouse (that's where I use them the most).
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4686458638958533585?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4686458638958533585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4686458638958533585' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4686458638958533585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4686458638958533585'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/database-skills.html' title='Database Skills'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-1993954082447566275</id><published>2010-11-06T13:20:00.001-04:00</published><updated>2010-11-28T22:14:13.556-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recursion'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL SELECT'/><category scheme='http://www.blogger.com/atom/ns#' term='Common Table Expressions'/><title type='text'>Recursive Queries with Common Table Expressions</title><content type='html'>&lt;p&gt;
This week The Database Programmer returns after almost 18 months
with an entry on using Common Table Expressions (CTEs) to do
recursive queries.  Relational databases were plagued from their
inception with a lack of meaningful treatment for recursive
operations, and CTEs have finally plugged that hole.
&lt;/p&gt;

&lt;p&gt;
Common Table Expressions appeared in SQL Server 2005, and in
PostgreSQL 8.4, and are also available in Oracle.  As for
mySQL, since I don't use it, I did a quick Google search and
looked at the Docs for 5.5, and couldn't really find 
anything.  I generally tend to assume mySQL cannot do the
tough stuff.
&lt;/p&gt;

&lt;h2&gt;But First, A Rant&lt;/h2&gt;

&lt;p&gt;
There have always been plenty of people who claimed SQL was
a bloated and clumsy language to work with.  Most of the time
I tend to agree, but I find the advantages of relational/SQL
system to be so large that I'm willing to pay that price.
&lt;/p&gt;

&lt;p&gt;
But with Commom Table Expressions (CTEs) I just can't help
drifting into conspiracy theories involving the enemies of
SQL infiltrating the committees and deliberately suggesting
the most twisted, bloated, complicated way they could think
of to do what is really a very basic operation.  In other
words, I am profoundly unimpressed with the syntax of CTEs,
but as long as they are here and they work, we'll go along.
&lt;/p&gt;

&lt;h2&gt;The Basic Example&lt;/h2&gt;

&lt;p&gt;
Your basic recursive table contains a foreign key to itself,
so that some rows in the table are children of some other
row in the table.  This recursion can nest to any depth,
and the chart below shows a very simple example:
&lt;/p&gt;

&lt;pre&gt;
Primary_key   |   Parent_Key  |  Notes  
--------------+---------------+---------------------------
     A        |     null      |   top level row, no parent
     B        |      A        |   first level child of A
     C        |      B        |   child of B, grandchild
              |               |   of A
     D        |      C        |   child of C, grandchild 
              |               |   of B, great-grandchild 
              |               |   of A
     X        |     null      |   top level row, no parent
     Y        |      X        |   child of X
     Z        |      Y        |   child of Y, grandchild 
              |               |   of X
&lt;/pre&gt;

&lt;p&gt;What we want is a query that can return a given row
and &lt;i&gt;all of its children&lt;/i&gt; out to any level, including
helpful information about the structure of the recursion,
something like this:
&lt;/p&gt;

&lt;pre&gt;
Primary_key | Level | Top_Key | Immediate_Parent_key 
------------+-------+---------+-----------------------
     A      |   1   |  A      | null
     B      |   2   |  A      | A    
     C      |   3   |  A      | B   
     D      |   4   |  A      | C
     X      |   1   |  X      | null
     Y      |   2   |  X      | X   
     Z      |   3   |  X      | Y   
&lt;/pre&gt;


&lt;h2&gt;And Another Rant&lt;/h2&gt;

&lt;p&gt;
At this point the mind boggles at how long this blog entry
needs to be to explain this simple operation.  But lets
get going anyway.
&lt;/p&gt;

&lt;h2&gt;The First Step and Last Step&lt;/h2&gt;

&lt;p&gt;A Common Table Expression begins with the "WITH" clause
and ends with a standard SQL Select:
&lt;/p&gt;

&lt;pre&gt;
;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
  ....we'll get to this below....
)
select * from myCTEName
&lt;/pre&gt;

&lt;p&gt;The basic idea is that we are going to define a CTE with
a name and a list of columns, and then SELECT out of it.
Without that final SELECT statement the CTE does not actually
do anything.  The SELECT can also be arbitrarily complex, doing
aggregrations, WHERE clauses and anything else you might need.&lt;/p&gt;

&lt;p&gt;The first thing to notice is the leading semi-colon.  This is
a trick adopted by MS SQL Server users.  SQL Server does not
require statements to be terminated with a semi-colon, but a
SQL Server CTE requires the &lt;b&gt;previous&lt;/b&gt; statement to have 
been terminated with a semi-colon (nice huh?).  So SQL Server
programmers adopted the strategy of starting the CTE with
a semi-colon, which keeps the syntactical requirement with
the CTE, where it belongs.&lt;/p&gt;

&lt;p&gt;A given CTE sort of has a name.  That is, you have to name
it something, but think of it as a table alias in a SQL SELECT,
such as "Select * from myTable a JOIN otherTable b...", it
exists only during the execution of the statement.  
&lt;/p&gt;

&lt;p&gt;The columns listed in the parantheses can have any names
(at least in SQL Server).  But these column names are what
you will refer to in the final SQL SELECT statement.
&lt;/p&gt;

&lt;h2&gt;Coding The Inside of the CTE, Step 1&lt;/h2&gt;

&lt;p&gt;Now we code the inside of the CTE in two steps.
The first step is called the "anchor", and it is a
straightforward query to find the top-level rows:
&lt;/p&gt;

&lt;pre&gt;
;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key   as primary_key
         , 1             as level
         , primary_key   as top_key
         , null          as immediate_parent_key
      from myRecursiveTable
     where Parent_key is null
)
select * from myCTEName
&lt;/pre&gt;

&lt;p&gt;This should be self-explanatory, we are querying only for
rows that have no parent (WHERE Parent_key is null) and we
are hardcoding the "level" column to 1, and we are also 
hardcoding the "immediate_parent_key" column to null.
&lt;/p&gt;

&lt;p&gt;This query alone would return two of the rows from
our desired output:
&lt;/p&gt;

&lt;pre&gt;
Primary_key | Level | Top_Key | Immediate_Parent_key 
------------+-------+---------+-----------------------
     A      |   1   |  A      | null
     X      |   1   |  X      | null
&lt;/pre&gt;

&lt;h2&gt;Coding The Inside of the CTE, Step 2&lt;/h2&gt;

&lt;p&gt;Now we are going to add the actual recursion.  When I first
learned CTEs this was the hardest part to figure out, because it
turned out my hard-won set-oriented thinking was actually slowing me
down, I had to think like a procedural programmer when defining
the second half of the query.
&lt;/p&gt;

&lt;pre&gt;
;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key,1,primary_key,null
      from myRecursiveTable
     where Parent_key is null
    UNION ALL
    select chd.primary_key,par.level+1,par.top_key,chd.parent_key
      FROM myCTEName        par
      JOIN myRecursiveTable chd ON chd.parent_key = par.primary_key
)
select * from myCTEName
&lt;/pre&gt;

&lt;p&gt;Thinking step-wise, here is what is going on under the hood:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The server executes the "anchor" query, generating a
        result set called "myCTEName" containing just the
        top level rows.
    &lt;li&gt;The server then executes the second query.  At this
        point the result set "myCTEName" exists and can be
        referenced, so that you can link children to their
        parents.  (That's why you see the JOIN)
    &lt;li&gt;Step 2 is repeated recursively, adding grand-children,
        great-grand-children, and so on, until no more rows
        are being added, at which point it stops, and...
    &lt;li&gt;The final result set is passed to the trailing
        SELECT, which pulls results out of "myCTEName"
        as if it were a table or view.
&lt;/ol&gt;

&lt;p&gt;So when we code the 2nd part of the inside of the CTE, the
   part after the UNION ALL, act as if the first query has
   already run and produced a table/view called "myCTEName"
   that can be referenced.  Once you understand that, the
   query is pretty easy to understand:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The "From myCTEName par" clause tells us we are pulling
    from the previously generated set.  I like to use the alias
    "par" for "parent" to remind myself that the prior result is
    the parent row.
    &lt;li&gt;We then join to the original source table and use the
    alias "chd" to remind ourselves we are pulling child rows
    from there.  The "ON chd.parent_key = par.primary_key" 
    defines how children are joined to parents.
    &lt;li&gt;Our first column, "chd.primary_key", is the unique
    key for the results.
    &lt;li&gt;Our second column, "par.level+1" gives us a nifty
    automatically incremented "level" column.
    &lt;li&gt;Our third column, "par.top_key" ensures that all rows
    contain a reference to their top-most parent.
    &lt;li&gt;Our final column, "chd.parent_key", makes sure each
    row contains a reference to its immediate parent.
&lt;/ul&gt;

&lt;h2&gt;Finding Various Statistics&lt;/h2&gt;

&lt;p&gt;Once you have the inside of the CTE coded, the fun part
moves to the final SELECT, which is operating on the
complete set of results.  You do not necessarily have to pull
the complete list.  For instance, you may want to find out
the maximum nesting level for each parent, or the count
of children for each parent:
&lt;/p&gt;

&lt;pre&gt;
;WITH myCTEName (primary_key,level,top_key,immediate_parent_key)
as (
    select primary_key,1,primary_key,null
      from myRecursiveTable
     where Parent_key is null
    UNION ALL
    select chd.primary_key,par.level+1,par.top_key,chd.parent_key
      FROM myCTEName        par
      JOIN myRecursiveTable chd ON chd.parent_key = par.primary_key
)
select top_key
     , max(level) as maxNestingLevel
     , count(*)   as countRows
     , count(*)-1 as countChildren
  from myCTEName
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Common Table Expressions give SQL-based databases the (very)
   long-needed ability to execute recursive queries, albeit with
   a rather complex syntax.  Once you grasp the basics of how
   to code them, there are many possible uses that go far beyond
   the simple example shown here.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-1993954082447566275?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/1993954082447566275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=1993954082447566275' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1993954082447566275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1993954082447566275'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html' title='Recursive Queries with Common Table Expressions'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-9026934688258366105</id><published>2009-06-29T21:33:00.000-04:00</published><updated>2010-11-29T20:48:40.101-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Upsert'/><title type='text'>Approaches to "UPSERT"</title><content type='html'>&lt;p&gt;This week in the Database Programmer we look at something
   called an "UPSERT", the strange trick where an insert
   command may magically convert itself into an update if
   a row already exists with the provided key.  This trick
   is very useful in a variety of cases.  This week we will 
   see its basic use, and next week we will see how the same
   idea can be used to materialize summary tables efficiently.
&lt;/p&gt;

&lt;h2&gt;An UPSERT or ON DUPLICATE KEY...&lt;/h2&gt;

&lt;p&gt;The idea behind an UPSERT is simple.  The client issues
   an INSERT command.  If a row already exists with the
   given primary key, then instead of throwing a key
   violation error, it takes the non-key values and updates
   the row.
&lt;/p&gt;

&lt;p&gt;This is one of those strange (and very unusual) cases
   where MySQL actually supports something you will not 
   find in all of the other more mature databases.  So if you
   are using MySQL, you do not need to do anything special
   to make an UPSERT.  You just add the term "ON DUPLICATE
   KEY UPDATE" to the INSERT statement:
&lt;/p&gt;

&lt;pre&gt;
insert into table (a,c,b) values (1,2,3)
    on duplicate key update
     b = 2,
     c = 3
&lt;/pre&gt;

&lt;p&gt;The MySQL command gives you the flexibility to specify
   different operation on UPDATE versus INSERT, but with
   that flexibility comes the requirement that the UPDATE
   clause completely restates the operation.   
&lt;/p&gt;

&lt;p&gt;With the MySQL command there are also various considerations
   for AUTO_INCREMENT columns and multiple unique keys.
   You can read more at the MySQL page for the
   &lt;a href=
   "http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html"
   &gt;INSERT ... ON DUPLICATE KEY UPDATE&lt;/a&gt; feature.
&lt;/p&gt;

&lt;h2&gt;A Note About MS SQL Server 2008&lt;/h2&gt;

&lt;p&gt;MS SQL Server introduced something like UPSERT in 
   SQL Server 2008.  It uses the MERGE command, which is 
   a bit hairy, check it out in this 
   &lt;a href=
   "http://www.databasejournal.com/features/mssql/article.php/3739131/UPSERT-Functionality-in-SQL-Server-2008.htm" 
   &gt;nice tutorial.&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;Coding a Simpler UPSERT&lt;/h2&gt;

&lt;p&gt;Let us say that we want a simpler UPSERT, where you do not
   have to mess with SQL Server's MERGE or rewrite the entire
   command as in MySQL.  This can be done with triggers.
&lt;/p&gt;

&lt;p&gt;To illustrate, consider a shopping cart with a natural key
   of ORDER_ID and SKU.  I want simple application code that 
   does not have to figure out if it needs to do an INSERT or
   UPDATE, and can always happily do INSERTs, knowing they will
   be converted to updates if the line is already there.
   In other words, I want simple application code that just keeps
   issuing commands like this:
&lt;/p&gt;

&lt;pre&gt;
INSERT INTO ORDERLINES
       (order_id,sku,qty)
VALUES 
       (1234,'ABC',5)
&lt;/pre&gt;

&lt;p&gt;We can accomplish this by a trigger.  The trigger must occur
   before the action, and it must redirect the action to an 
   UPDATE if necessary.  Let us look at examples for MySQL,
   Postgres, and SQL Server.
&lt;/p&gt;

&lt;h2&gt;A MySQL Trigger&lt;/h2&gt;

&lt;p&gt;Alas, MySQL giveth, and MySQL taketh away.  You cannot code
   your own UPSERT in MySQL because of an extremely severe 
   limitation in MySQL trigger rules.  A MySQL trigger &lt;i&gt;may not
   affect a row in a table different from the row originally
   affected by the command that fired the trigger.&lt;/i&gt;  A MySQL 
   trigger attempting to create a new row may not affect
   a different row.
&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Note: I may be wrong about this. This limitation has bitten
   me on several features that I would like to provide for MySQL.
   I am actually hoping this limitation will not
   apply for UPSERTs because the new row does not yet exist, but
   I have not had a chance yet to try.&lt;/i&gt;
&lt;/p&gt;

&lt;h2&gt;A Postgres Trigger&lt;/h2&gt;

&lt;p&gt;The Postgres trigger example is pretty simple, hopefully the
   logic is self-explanatory.  As with all code samples, I did
   this off the top of my head, you may need to fix a syntax
   error or two.
&lt;/p&gt;
   
&lt;pre&gt;
CREATE OR REPLACE FUNCTION orderlines_insert_before_F()
RETURNS TRIGGER
 AS $BODY$
DECLARE
    result INTEGER; 
BEGIN
    SET SEARCH_PATH TO PUBLIC;
    
    -- Find out if there is a row
    result = (select count(*) from orderlines
                where order_id = new.order_id
                  and sku      = new.sku
               )

    -- On the update branch, perform the update
    -- and then return NULL to prevent the 
    -- original insert from occurring
    IF result = 1 THEN
        UPDATE orderlines 
           SET qty = new.qty
         WHERE order_id = new.order_id
           AND sku      = new.sku;
           
        RETURN null;
    END IF;
    
    -- The default branch is to return "NEW" which
    -- causes the original INSERT to go forward
    RETURN new;

END; $BODY$
LANGUAGE 'plpgsql' SECURITY DEFINER;

-- That extremely annoying second command you always
-- need for Postgres triggers.
CREATE TRIGGER orderlines_insert_before_T
   before insert
   ON ORDERLINES
   FOR EACH ROW
   EXECUTE PROCEDURE orderlines_insert_before_F();
&lt;/pre&gt;


&lt;h2&gt;A SQL Server Trigger&lt;/h2&gt;

&lt;p&gt;SQL Server BEFORE INSERT triggers are significantly different
   from Postgres triggers.  First of all, they operate at the
   &lt;i&gt;statement level&lt;/i&gt;, so that you have a set of new rows instead
   of just one.  Secondly, the trigger must itself contain an 
   explicit INSERT command, or the INSERT never happens.  All of this
   means our SQL Server example is quite a bit more verbose.
&lt;/p&gt;

&lt;p&gt;The basic logic of the SQL Server example is the same as the
   Postgres, with two additional complications.  First, we must use
   a CURSOR to loop through the incoming rows.  Second, we must 
   explicitly code the INSERT operation for the case where it 
   occurs.  But if you can see past the cruft we get for all of that,
   the SQL Server exmple is doing the same thing:
&lt;/p&gt;

&lt;pre&gt;
CREATE TRIGGER upsource_insert_before
ON orderlines
INSTEAD OF insert
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @new_order_id int;
    DECLARE @new_sku      varchar(15);
    DECLARE @new_qty      int;
    DECLARE @result       int;

    DECLARE trig_ins_orderlines CURSOR FOR 
            SELECT * FROM inserted;
    OPEN trig_ins_orderlines;

    FETCH NEXT FROM trig_ins_orderlines
     INTO @new_order_id
         ,@new_sku
         ,@new_qty;

    WHILE @@Fetch_status = 0 
    BEGIN
        -- Find out if there is a row now
        SET @result = (SELECT count(*) from orderlines
                        WHERE order_id = @new_order_id
                          AND sku      = @new_sku
                      )
    
        IF @result = 1 
        BEGIN
            -- Since there is already a row, do an
            -- update
            UPDATE orderlines
               SET qty = @new_qty
             WHERE order_id = @new_order_id
               AND sku      = @new_sku;
        END
        ELSE
        BEGIN
            -- When there is no row, we insert it
            INSERT INTO orderlines 
                  (order_id,sku,qty)
            VALUES
                  (@new_order_id,@new_sku,@new_qty)
            UPDATE orderlines

        -- Pull the next row
        FETCH NEXT FROM trig_ins_orderlines
         INTO @new_order_id
             ,@new_sku
             ,@new_qty;

    END  -- Cursor iteration

    CLOSE trig_ins_orderlines;
    DEALLOCATE trig_ins_orderlines;

END
&lt;/pre&gt;

&lt;h2&gt;A Vague Uneasy Feeling&lt;/h2&gt;

&lt;p&gt;While the examples above are definitely cool and nifty, 
   they ought to leave a certain nagging doubt in many 
   programmers' minds.  This doubt comes from the fact that
   an &lt;i&gt;insert is not necessarily an insert anymore&lt;/i&gt;,
   which can lead to confusion.  Just imagine the new programmer
   who has joined the team an is banging his head on his desk
   because he cannot figure out why his INSERTS are not 
   working!
&lt;/p&gt;

&lt;p&gt;We can add a refinement to the process by making the
   function optional.  Here is how we do it.
&lt;/p&gt;

&lt;p&gt;First, add a column to the ORDERLINES table called
   _UPSERT that is a char(1).  Then modify the trigger so that
   the UPSERT behavior only occurs if the this column holds
   'Y'.  It is also extremely import to always set this value
   back to 'N' or NULL in the trigger, otherwise it will appear
   as 'Y' on subsequent INSERTS and it won't work properly.
&lt;/p&gt;

&lt;p&gt;So our new modified explicit upsert requires a SQL statement
   like this:
&lt;/p&gt;

&lt;pre&gt;
INSERT INTO ORDERLINES
       (_upsert,order_id,sku,qty)
VALUES
       ('Y',1234,'ABC',5)
&lt;/pre&gt;

&lt;p&gt;Our trigger code needs only a very slight modification.
   Here is the Postgres example, the SQL Server example should
   be very easy to update as well:
&lt;/p&gt;

&lt;pre&gt;
   ...trigger declration and definition above
   IF new._upsert = 'Y'
      result = (SELECT.....);
      _upsert = 'N';
   ELSE
      result = 0;
   END IF;
   
   ...rest of trigger is the same
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The UPSERT feature gives us simplified code and fewer
   round trips to the server.  Without the UPSERT there are
   times when the application may have to query the server to
   find out if a row exists, and then issue either an UPDATE
   or an INSERT.  With the UPSERT, one round trip is eliminated,
   and the check occurs much more efficiently inside of the
   server itself.
&lt;/p&gt;

&lt;p&gt;The downside to UPSERTs is that they can be confusing if 
   some type of explicit control is not put onto them such as
   the _UPSERT column.
&lt;/p&gt;

&lt;p&gt;Next week we will see a concept similar to UPSERT used
   to efficiently create summary tables.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-9026934688258366105?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/9026934688258366105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=9026934688258366105' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/9026934688258366105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/9026934688258366105'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html' title='Approaches to &quot;UPSERT&quot;'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-2676473018609405160</id><published>2009-04-19T16:50:00.000-04:00</published><updated>2010-11-28T22:10:05.094-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Relational Model'/><title type='text'>The Relational Model</title><content type='html'>&lt;p&gt;
If you look at any system that was born on and for the 
internet, like Ruby on Rails, or the PHP language, you find
an immense wealth of resources on the internet itself, in 
endless product web sites, blogs, and forums.  But when
you look for the same comprehensive information on products
or ideas that matured before the web you find it is not there.
Relational databases stand out as a product family that matured
before the internet, and so their representation in cyberspace
is very different from the newer stuff.
&lt;/p&gt;

&lt;h2&gt;The Math Stuff&lt;/h2&gt;

&lt;p&gt;You may have heard relational theorists argue that the
   strength of relational databases comes from their solid
   mathematical foundations.  Perhaps you have wondered,
   what does that mean?  And why is that good?
&lt;/p&gt;

&lt;p&gt;To understand this, we have to begin with 
   &lt;a href="http://en.wikipedia.org/wiki/Edsger_W._Dijkstra"
   &gt;Edsger W. Dijkstra&lt;/a&gt;, a pioneer in the area of computer
   science with many accomplishments to his name.  Dijkstra
   believed that the best way to develop a system or program
   was to begin with a mathematical description of the system,
   and then refine that system into a working program.  When
   the program completely implemented the math, you were
   finished.
&lt;/p&gt;

&lt;p&gt;There is a really huge advantage to this approach.  If you
   start out with a mathematical theory of some sort, which 
   presumably has well known behaviors, then the working program
   will have all of those behaviors and, put simply, everybody
   will know what to expect of it.
&lt;/p&gt;

&lt;p&gt;This approach also reduces time wasted on creative efforts
   to work out how the program should behave.  All those 
   decisions collapse intot he simple drive to make the program
   mimic the math.
&lt;/p&gt;

&lt;h2&gt;A Particular Bit of Math&lt;/h2&gt;

&lt;p&gt;It so happens that there is a particular body of math
   known as Relational Theory, which it seemd to 
   &lt;a href="http://en.wikipedia.org/wiki/Edgar_F._Codd"
   &gt;E. F. Codd&lt;/a&gt; would be a very nice fit for storing
   business information.  In his landmark 1970 paper
   &lt;a href="http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf"
   &gt;A Relational Model of Data for Large Shared Data Banks
   (pdf) &lt;/a&gt; he sets out to show how these mathematical
   things called "relations" have behaviors that would be
   ideal for storing business models.
&lt;/p&gt;

&lt;p&gt;If we take the Dijkstra philosophy seriously, which is to
   build systems based on well-known mathematical theories,
   and we take Codd's claim that "Relations" match well to
   business record-keeping needs, the obvious conclusion is
   that we should build some kind of "Relational" datastore,
   and so we get the Relational Database systems of today.
&lt;/p&gt;

&lt;p&gt;So there in a nutshell is why relational theorists are 
   so certain of the virtues of the relational model, it's
   behaviors are well-known, and if you can build something
   that matches them, you will have a very predictable
   system.
&lt;/p&gt;

&lt;h2&gt;They are Still Talking About It&lt;/h2&gt;

&lt;p&gt;If you want to know more about the actual mathematics,
   check out the &lt;a href=
   "http://groups.google.com/group/comp.databases.theory/topics"
   &gt;comp.databases.theory&lt;/a&gt; Usenet group, or check out
   Wikipedia's articles on &lt;a href=
   "http://en.wikipedia.org/wiki/Relational_algebra"
   &gt;Relational Algebra&lt;/a&gt; and &lt;a href=
   "http://en.wikipedia.org/wiki/Relational_calculus"
   &gt;Relational Calculus&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;A Practical Downside&lt;/h2&gt;

&lt;p&gt;The downside to all of this comes whenever the mathematical
   model describes behaviors that are contrary to human goals
   or simply irrelevant to them.  Examples are not hard to
   find.
&lt;/p&gt;

&lt;p&gt;When the web exploded in popularity, many programmers found
   that their greatest data storage needs centered on &lt;i&gt;documents&lt;/i&gt;
   like web pages rather than &lt;i&gt;collections of atomic values&lt;/i&gt;
   like a customer's discount code or credit terms.  They found
   that relational databases were just not that good at storing
   documents, which only stands to reason because they were never
   intended to.  In &lt;i&gt;theory&lt;/i&gt; the model could be stretched,
   (if the programmer stretched as well), but the programmers
   could feel in their bones that the fit was not right, and they
   began searching for something new.
&lt;/p&gt;

&lt;p&gt;Another example is that of calculated values.  If you have
   shopping cart, you probably have some field "TOTAL" somewhere
   that stores the final amount due for the customer.  It so
   happens that such a thing violates relational theory, and there
   are some very bright theorists who will refuse all requests
   for assistance in getting that value to work, because you
   have violated their theory.  This is probably the most shameful
   behavior that relational theorists exhibit - a complete
   refusal to consider extending the model to better reflect
   real world needs.
&lt;/p&gt;

&lt;h2&gt;The Irony: There are No Relational Databases&lt;/h2&gt;

&lt;p&gt;The irony of it all is that when programmers set out to build
   relational systems, they ran into quite a few practical 
   downsides and a sort of consensus was reached to break the
   model and create the SQL-based databases we have today.
   In a &lt;i&gt;truly relational&lt;/i&gt; system a table would have quite
   a few more rules on it than we have in our SQL/TABLE based
   systems of today.  But these rules must have seemed 
   impractical or too difficult to implement, and they were
   scratched.
&lt;/p&gt;

&lt;p&gt;There is at least one product out there that claims to
   be truly relational, that is &lt;a href=
   "http://en.wikipedia.org/wiki/Dataphor"&gt;Dataphor&lt;/a&gt;.

&lt;h2&gt;The Weird Optional Part&lt;/h2&gt;

&lt;p&gt;Probably the grandest irony in the so-called relational
   database management systems is that any programmer can
   completely break the relational model by making bad
   table designs.  If your tables are not normalized, you
   lose much of the benefits of the relational model,
   and you are completely free to make lots of 
   non-normalized and de-normalized tables.  
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I have to admit I have always found the strength of 
   relational databases to be their simplicy and power,
   and not so much their foundations (even if shaky) in
   mathematical theory.  A modern database is very good
   at storing data in tabular form, and if you know how
   to design the tables, you've got a great foundation for
   a solid application.  Going further, I've always found
   relational theorists to be unhelpful in the extreme in
   the edge cases where overall application needs are not
   fully met by the underlying mathematical model.  The
   good news is that the products themselves have all of
   the power we need, so I left the relational theorists
   to their debates years ago.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-2676473018609405160?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/2676473018609405160/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=2676473018609405160' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2676473018609405160'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2676473018609405160'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/04/relational-model.html' title='The Relational Model'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4444460780155602959</id><published>2009-03-01T19:08:00.003-05:00</published><updated>2010-11-28T22:10:19.298-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Philosophy'/><title type='text'>I Am But a Humble Filing Clerk</title><content type='html'>&lt;p&gt;This week we are returning to the series on Philosophy,
   and we will nail down the role of data and the 
   database in any application that requires such
   things.
&lt;/p&gt;

&lt;p&gt;This is the Database Programmer blog, for anybody who wants
   practical advice on database use.&lt;/p&gt;

&lt;p&gt;There are links to other essays at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Review of The Absolute&lt;/h2&gt;

&lt;p&gt;In the first essay in this series, &lt;a href=
   "http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html"
   &gt;The Quest For the Absolute&lt;/a&gt;, I offered the opinion
   that all programmers by nature seek absolutes to 
   simplify and inform the development effort.  Taking
   a page from the ancient Greek philosopher Aristotle,
   I suggested that the best absolute was the quest for
   the &lt;i&gt;virtuous&lt;/i&gt; program, which is to say a program
   that served its purpose well.
&lt;/p&gt;

&lt;p&gt;A program that serves its purpose well is one that
   meets the needs of the check signer, the end-user,
   and the programmer.  The check signer needs some
   strategic goal to be met, the end-user must be 
   productive, and the programmer must make a living.
   If a program achieves all of these, it is an ideal
   virtuous program, and has satisfied the absolute
   requirements that are true of all programs.
&lt;/p&gt;

&lt;h2&gt;Considering the Decision Maker&lt;/h2&gt;

&lt;p&gt;Normally we think of a decision maker as some important
   person who has the power to choose your product or
   services, or to give her money to your competitor.
   She makes her decision based on how well she can judge
   who will meet her strategic needs.
&lt;/p&gt;

&lt;p&gt;Although the decision maker will have vastly different
   needs in different situations, and is usually thinking
   at a high level, she has at least one
   need that is universal: the simple need to keep and
   use records.  She needs a filing system.  All of her
   stated goals will &lt;i&gt;assume&lt;/i&gt; that you both know
   this unstated goal is down there at the foundation of
   the entire proposed system.
&lt;/p&gt;

&lt;p&gt;We programmers often forget this simple fact because
   computers have been around long enough that we 
   do not remember that in their original forms it was
   impossible to mistake that computers were just 
   electronic filing systems.   Way back when
   I was a kid the day came when phone bills started
   arriving with an "IBM Card" slipped into them.  You
   returned the card with your check -- they were moving
   their files into the electronic age.  Then came 
   electronic tickets on airlines -- nothing more than
   a big filing system.  The modern web sites we visit
   to buy tickets are nothing but an interface to what 
   remains a filing system at its heart.
&lt;/p&gt;

&lt;h2&gt;The Virtuous Programmer&lt;/h2&gt;

&lt;p&gt;So if we go back to the idea of "virtue" as the Greeks
   thought of it, which means serving your function well,
   a virtuous programmer will remember always that he
   is but humble filing clerk.  This is not his entire
   purpose, but it is the beginning of all other purposes
   and the foundation that the higher purposes are 
   built upon.
&lt;/p&gt;

&lt;h2&gt;Not Just Relational&lt;/h2&gt;

&lt;p&gt;This principle is general to all programming.  An 
   email server is a program that must receive and 
   store email for later retrieval.  What good is an
   email server that cannot store anything?  What
   good is a camera without its memory card?  What 
   good is a mobile phone without its contacts list?
   What good is a image editing program if it cannot
   read and write files?
&lt;/p&gt;

&lt;p&gt;So all programs exist to process data, and the 
   business application programmer knows that in his
   context this means we are really making giant sexy
   record-keeping systems.  We are the guys that 
   color-code the filing cabinets.
&lt;/p&gt;
  
   
&lt;h2&gt;Does Not Mean Relational Is Required&lt;/h2&gt;   
   
&lt;p&gt;This idea, that we are filing clerks, does not
   automatically mean we must pick relational databases
   for the persistence layer -- the question of what
   filing system to use is a completely different
   question.
&lt;/p&gt;   
   
&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;If we begin with the idea that the ideal program
   meets the needs of decision maker, end-user, and
   programmer, and if we consider first the needs of
   the decision maker, then we begin with the universal
   strategic need to keep good records.  The ideal
   programmer knows this need is at the bottom of all
   other needs, and remembers always that we are but
   humble filing clerks.
&lt;/p&gt;


&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Other philosophy essays are:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/prepare-now-for-possible-future-head.html"
        &gt;Prepare Now For Possible Future Head Transplant&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html"
        &gt;The Quest for The Absolute&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2009/03/i-am-but-humble-filing-clerk.html"
        &gt;I Am But A Humble Filing Clerk (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"
        &gt;Why I Do Not Use ORM&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
        &gt;Minimize Code, Maximize Data&lt;/a&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4444460780155602959?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4444460780155602959/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4444460780155602959' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4444460780155602959'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4444460780155602959'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/03/i-am-but-humble-filing-clerk.html' title='I Am But a Humble Filing Clerk'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-7550015338201836087</id><published>2009-02-14T14:05:00.002-05:00</published><updated>2009-02-14T14:08:09.923-05:00</updated><title type='text'>A Comprehensive Database Security Model</title><content type='html'>&lt;p&gt;This week I am taking a bit of a departure.  Normally I write
   about things I have already done, but this week I want to 
   speculate a bit on a security model I am thinking of coding
   up.  Basically I have been asking myself how to create a
   security model for database apps that never requires elevated
   privileges for code, but still allows for hosts sharing multiple
   applications, full table security including row level and 
   column level security, and structural immunity to SQL injection.
&lt;/p&gt;

&lt;h2&gt;The Functional Requirements&lt;/h2&gt;

&lt;p&gt;Let's consider a developer who will be hosting multiple 
   database applications on a server, sometimes instances of the
   same application for different customers.  The applications
   themselves will have different needs, but they all boil down
   to this:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Some applications will allow surfers to join the site 
    and create accounts for themselves, while others will be 
    private sites where an administrator must make user accounts.
    &lt;li&gt;Some applications will not contain sensitive data, and 
    so the site owner wants to send forgotten passwords in email 
    -- which means the passwords must be stored in plaintext.  Other
    site owners will need heightened security that disallows
    storing of passwords in plaintext.
    &lt;li&gt;In both cases, administrators must of course be able to
    manage accounts themselves.
    &lt;li&gt;The system should be structurally immune
    to SQL injection.
    &lt;li&gt;It must be possible to have users with the same user id
    ("Sheilia", "John", etc.) on multiple applications who are
    actually totally different people.
    &lt;li&gt;The application code must never need to run at an
    elevated privelege level for any reason -- not
    even to create accounts on public sites where
    users can join up and conduct transactions.
    &lt;li&gt;It must be possible for the site owners or their
    agents to directly
    connect to the database at very least for querying and
    possibly to do database writes without going through our
    application.
    &lt;li&gt;Users with accounts on one app must never be able to 
    sign on to another app on the same server.
&lt;/ul&gt;

&lt;p&gt;These requirements represent the most flexible possible
   combination of demands that I have so far seen in real life.
   The question is, can they be met while still providing
   security?  The model I'd like to speculate on today says
   yes.

&lt;h2&gt;Informed Paranoia Versus Frightened Ignorance&lt;/h2&gt;

&lt;p&gt;Even the most naive programmer knows that the internet
   is not a safe place, but all too often a lot of security
   advice you find is based on &lt;i&gt;frightened ignorance&lt;/i&gt;
   and takes the form, "never do x, you don't know what might
   happen."  If we are to create a strong security model,
   we have to do better than this.  
&lt;/p&gt;

&lt;p&gt;Much better is to strive to be like a strong system architect,
   whose approach is based on &lt;i&gt;informed paranoia&lt;/i&gt;.  
   This hypothetical architect knows everybody is out 
   to compromise his system,  but he seeks a thorough knowledge
   of the inner workings of his tools so that he can
   engineer the vulnerabilities out as much as possible.  
   He is not looking to write rules for the programmer 
   that say "never do this", he is rather looking to make it
   impossible for the user or programmer to compromise
   the system.
&lt;/p&gt;

&lt;h2&gt;Two Examples&lt;/h2&gt;

&lt;p&gt;Let us consider a server hosting two applications, which
   are called "social" and "finance".
&lt;/p&gt;

&lt;p&gt;The "social" application is a social networking site with
   minimal security needs.  Most important is that the site
   owners want members of the general public to sign up, and
   they want to be able to email forgotten passwords
   (and we can't talk them out of it) -- so we
   have to store passwords in plaintext.
&lt;/p&gt;

&lt;p&gt;The "finance" application is a private site used by employees
   of a corporation around the world.  The general public is
   absolutely not welcome.  To make matters worse however, the
   corporation's IT department demands to be able to directly
   connect to the database and write to the database without
   going through the web app.  This means the server will have
   an open port to the database.  Sure it will be protected with
   SSL and passwords, but we must make sure that only users
   of "finance" can connect, and only to their own application.
&lt;/p&gt;
   
&lt;h2&gt;Dispensing With Single Sign-On&lt;/h2&gt;

&lt;p&gt;There are two ways to handle connections to a database.  One
   model is to give users real database accounts, the other is 
   to use a single account to sign on to the database.  Prior to
   the web coming along, there were proponents of both models in
   the client/server world, but amongst web developers the single
   sign-on method is so prevalent that I often wonder if they 
   know there is any other way to do it.
&lt;/p&gt;

&lt;p&gt;Nevertheless, we must dispense with the single sign-on method
   at the start, regardless of how many people think that Moses
   carved it on the third tablet, because it just has too many
   problems:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Single Sign-on &lt;b&gt;is the primary architectural flaw that makes
    SQL injection possible&lt;/b&gt;.  As we will see later, using real
    database accounts makes your site (almost) completely immune
    to SQL injection.
    &lt;li&gt;Single Sign-on requires a connection at the maximum privilege
    level that any system user might have, where the code then decides
    what it will let a particular user do.  This is a complete 
    violation of the requirement that code always run at the lowest
    possible privilege level.
    &lt;li&gt;Single Sign-on totally prevents the requirement that 
    authorized agents be allowed to connect to the database and
    directly read and write values.  
&lt;/ul&gt;

&lt;p&gt;So single sign-on just won't work with the requirements listed.
   This leads us to creating real accounts on the database server.
&lt;/p&gt;

&lt;h2&gt;Real Accounts and Basic Security&lt;/h2&gt;

&lt;p&gt;When you use a real database account, your code connects
   to the database using the username and password provided
   by the user.  Anything he is allowed to do your code will
   be allowed to do, and anything he is not allowed to do will
   throw and error if your code tries to do it.
&lt;/p&gt;

&lt;p&gt;This approach meets quite a few of our requirements nicely.
   A site owner's IT department can connect with the same 
   accounts they use on the web interface -- they have
   the same privileges in both cases.  Also, there is no
   need to ever have application code elevate its privilege
   level during normal operations, since no regular users should ever be 
   doing that.  This still leaves the issue of how to create
   accounts, but we will see that below.
&lt;/p&gt;

&lt;p&gt;A programmer who thinks of security in terms of &lt;i&gt;what code
   can run&lt;/i&gt; will have a very hard time wrapping his head around
   using real database accounts for public users.  The trick to
   understanding this approach
   is to forget about code for a minute and to
   think about tables.  The basic fact of database application
   security is that &lt;i&gt;all security
   resolves to table permissions&lt;/i&gt;.  In other words, our security
   model is all about who can read or write to what tables, it is
   not about who can run which program.
&lt;/p&gt;
   
&lt;p&gt;If we grant public users real database accounts, and they
   connect with those accounts, the security must be handled 
   within the database itself, and it comes down to:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Defining "groups" as collections of users who share
    permissions at the table level.
    &lt;li&gt;Deciding which groups are allowed select, insert, update,
    and delete privileges on which tables.
    &lt;li&gt;Granting and revoking those privileges on the server itself
    when the database is built.
    &lt;li&gt;At very least row-level security will be required, wherein
    a user can only see and manipulate certain rows in a table.
    This is how you keep users from using SQL Injection to mess
    with each other's order history or member profiles.
    &lt;li&gt;Column security is also very nice to finish off the
    picture, but we will not be talking about that today as it
    does not play into the requirements.
&lt;/ul&gt;

&lt;p&gt;Now we can spend a moment and see why this approach eliminates
   most SQL Injection vulnerabilities.  We will imagine a table of
   important information called SUPERSECRETS.  If somebody could
   slip in a SQL injection exploit and wipe out this table we'd all
   go to jail, so we absolutely cannot allow this.
   Naturally, most users would have no privileges on
   this table -- even though they are directly connected to the
   database they cannot even see the table exists, let alone 
   delete from it.  So if our hypothetical black hat
   somehow slips in ";delete from supersecrets"
   and our code fails to trap for it, nothing happens.  They have
   no privlege on that table.  On the other side of things, consider
   the user who is privileged to delete from that table.  If this
   user slips in a ";delete from supersecrets" he is only going to
   the trouble with SQL Injection &lt;i&gt;to do something he is perfectly
   welcome to do anyway through the user interface.&lt;/i&gt;  So much
   for SQL injection.
&lt;/p&gt;

&lt;p&gt;To repeat a point made above: row-level security is a must.
   If you grant members of a social site global UPDATE privileges
   on the PROFILES table, and you fail to prevent a SQL Injection,
   all hell could break loose.  Much better is the ability to
   limit the user to seeing only his own row in the PROFILE table,
   so that once again you have created a structural immunity
   to SQL injection.
&lt;/p&gt;
   
&lt;h2&gt;Anonymous Access&lt;/h2&gt;

&lt;p&gt;Many public sites allow users to see all kinds of information
   when they are not logged on.  The most obvious example would
   be an eCommerce site that needs read access to the ITEMS table,
   among others.  Some type of anonymous access must be allowed
   by our hypothetical framework.
&lt;/p&gt;

&lt;p&gt;For our two examples, the "social" site might allow limited
   viewing of member profiles, while the "finance" application
   must show absolutely nothing to the general public.
&lt;/p&gt;

&lt;p&gt;If we want a general solution that fits both cases, we opt
   for a &lt;i&gt;deny-by-default&lt;/i&gt; model and allow each application
   to optionally have an anonymous account.
&lt;/p&gt;

&lt;p&gt;First we consider deny-by-default.  This means simply that
   our databases are always built so that no group has any
   permissions on any tables.  The programmer of the "social"
   site now has to grant certain permissions to the anonymous
   account, while the programmer of the "finance" application
   does nothing - he already has a secure system.
&lt;/p&gt;

&lt;p&gt;But still the "finance" site is not quite so simple.  An anonymous
   user account with no privileges &lt;i&gt;can still log in&lt;/i&gt;, and
   that should make any informed paranoid architect nervous.
   We should extend 
   the deny-by-default philosophy so the framework will
   not create an anonymous
   account unless requested.  This way the programmer of the
   "finance" application still basically does nothing, while
   the programmer of the "social" must flip a flag to create
   the anonymous account.
&lt;/p&gt;
  

&lt;h2&gt;Virtualizing Users&lt;/h2&gt;

&lt;p&gt;If we are having real database accounts, there is one small
   detail that has to be addressed.  If the "social" site has
   a user "johnsmith" and the finance application has a user
   of the same name, but they are totally different people,
   we have to let both accounts exist but be totally separate.
&lt;/p&gt;

&lt;p&gt;The answer here is to alias the accounts.  The database
   server would actually have accounts "finance_johnsmith" and
   "social_johnsmith".  Our login process would simply take
   the username provided and append the code in front of it
   when authenticating on the server.  'nuf said on that.
&lt;/p&gt;

&lt;h2&gt;Allowing Public Users To Join&lt;/h2&gt;

&lt;p&gt;The "social" site allows anybody to join up and create
   an account.  This means that somehow the web application
   must be able to create accounts on the database server.
   Yet it must do this without allowing the web code to
   elevate its privileges, and while preventing the disaster
   that would ensue if a user on the "social" site somehow
   got himself an account on the "finance" site.
&lt;/p&gt;

&lt;p&gt;Believe it or not, this is the easy part!  Here is how it
   works for the "social" site:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Create a table of users.  The primary key is the user_id
    which prevents duplication.
    &lt;li&gt;For the social site, there is a column called 
    PASSWORD that stores the password in plaintext.
    &lt;li&gt;Allow the anonymous account to INSERT into this table!
    (Remember though that deny-by-default means that so far
    this account has no other privileges).
    &lt;li&gt;Put an INSERT trigger on the table that automatically creates
    an aliased user account, so that "johnsmith" becomes
    "social_johnsmith".  The trigger also sets the password.
    &lt;li&gt;A DELETE trigger on the table would delete users if
    the row is deleted.
    &lt;li&gt;An UPDATE trigger on the table would update the password
    if the user UPDATES the table.
    &lt;li&gt;Row level security is an absolute must.
    Users must be able to
    SELECT and UPDATE table, but only their own row.  If your
    database server or framework cannot support row-level
    security, it's all out the window.
&lt;/ul&gt;

&lt;p&gt;This gives us a system that almost gets us where we need
   to be: the general public can create acounts, 
   the web application does not need to elevate its privileges,
   users can set and change their passwords, and no user can
   see or set anything for any other user.  However, this leaves
   the issue of password recovery.
&lt;/p&gt;

&lt;p&gt;In order to recover passwords and email them to members of
   the "social" site, it is tempting to think that 
   the anonymous account must be able to
   somehow read the users table, but that is no good because
   then we have a &lt;i&gt;structural flaw&lt;/i&gt; where a successful
   SQL injection would expose user accounts.  However, this
   also turns out to be easy.  There are two options:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Write a stored procedure that the anonymous user is
    free to execute, which does not return a password but
    actually emails it directly from within the database
    server.  This requires your database server be able to
    send emails.  (Postgres can, and I assume SQL Server
    can, and I don't really know about mySql).
    &lt;li&gt;Create a table for password requests, allow inserts
    to it but nothing else.  A trigger sends the email.
    In this approach you can track email recovery requests.
&lt;/ul&gt;

&lt;p&gt;For the "finance" application we cannot allow any of this
   to happen, so again we go to the deny-by-default idea.  All
   of the behaviors above will not happen unless the programmer
   sets a flag to turn them on when the database is built.
&lt;/p&gt;

&lt;p&gt;This does leave the detail of how users of the "finance"
   application will reset their passwords.
   For details on how a secure app can still allow password
   resets, see my posting of Sept 7 2008 &lt;a href=
   "http://database-programmer.blogspot.com/2008/09/advanced-table-design-secure-password.html"
   &gt;Secure Password Resets&lt;/a&gt;.
&lt;/p&gt;    

&lt;h2&gt;One More Detail on Public Users&lt;/h2&gt;

&lt;p&gt;We still have one more detail to handle for public users. 
   Presumably a user, having joined up, has more privileges than
   the anonymous account.  So the web application must be able
   to join them into a group without elevating its privileges.
   The solution here is the same as for creating the account:
   there will be a table that the anonymous user can make
   inserts into (but nothing else), and a trigger will join
   the user to whatever group is named.
&lt;/p&gt;

&lt;p&gt;Except for one more detail.  We cannot let the user join
   whatever group they want, only the special group for members.
   This requirement can be met by defining the idea of a "freejoin"
   group and also a "solo" group.  If the anonymous user inserts
   into a user-group table, and the requested group is flagged
   as allowing anybody to join, the trigger will allow it, but
   for any other group the trigger will reject the insert.
   The "solo" idea is similar, it means that if a user is in
   the "members" group, and that group is a "solo" group, they
   may not join any other groups.  This further jails in 
   members of the general public.
&lt;/p&gt;

&lt;h2&gt;Almost Done: User Administration&lt;/h2&gt;

&lt;p&gt;In the last two sections we saw the idea of a table of users
   and a cross-reference of users to groups.  This turns out to
   solve another issue we will have: letting administrators
   manage groups.  If we define a group called "user_administrators"
   and give them total
   power on these tables, and also give them CRUD screens 
   for them, then we have a user administrator system.
   This works for both the "social" and the "finance" application.
&lt;/p&gt;

&lt;p&gt;The triggers on the table have to be slightly different 
   for the two cases, but that is a small exercise to code
   them up accordingly.
&lt;/p&gt;

&lt;h2&gt;Cross-Database Access&lt;/h2&gt;

&lt;p&gt;Believe it or not, the system outlined above has met all of
   our requirements except one.  So far we have a system that never 
   requires the web server to have any elevated priveleges within
   the database, allows members of the public to join some sites
   while barring them from others, is structurally immune from
   SQL injection, allows different people on different sites to
   have the same user id, and allows administrators
   of both sites to directly manage accounts.  Moreover, we 
   can handle both plaintext passwords and more serious 
   reset-only situations.
&lt;/p&gt;

&lt;p&gt;This leaves only one very thorny issue: cross-database
   access.  The specific database server I use most is PostgreSQL,
   and this server has a problem (for this scenario) anyway,
   which is that out-of-the-box, a database account can connect
   to any database.  This does not mean the account has any
   priveleges on the database, but we very seriously do not want
   this to happen at all.  If a member of the "social" site can
   connect to the "finance" app, we have a potential vulnerability
   even if he has zero privileges in that database.  We would be
   much happier if he could not connect at all.
&lt;/p&gt;

&lt;p&gt;In Postgres there is a solution to this, but I've grown to
   not like it.  In Postgres you can specify that a user can only
   connect to a database if they are in a group that has the 
   same name as the database.  This is easy to set up, but it
   requires changing the default configuration of Postgres.  
   However, for the sheer challenge of it I'd like to work out
   how to do it without requiring that change.  So far I'm 
   still puzzling this out.  I'd also like to know that the
   approach would work at very least on MS SQL Server and
   mySql.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Most of what is in this week's essay is not that radical to
   any informed database veteran.  But to web programmers 
   who were unfortunate enough to grow up in the world
   of relational-databases-must-die nonsense, it is probably
   hard or impossible to imagine a system where users are
   connecting with real database accounts.  The ironic thing
   is that the approached described here is far more secure
   than any single sign-on system, but it requires the programmer
   to shift thinking away from action-based code-centric models
   to what is really going on: table-based privileges.  Once
   that hurdle is past, the rest of it comes easy.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-7550015338201836087?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/7550015338201836087/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=7550015338201836087' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7550015338201836087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/7550015338201836087'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/02/comprehensive-database-security-model.html' title='A Comprehensive Database Security Model'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-9115573581013299810</id><published>2009-02-01T16:29:00.000-05:00</published><updated>2009-02-01T16:30:55.164-05:00</updated><title type='text'>This Application Has Unique Business Rule Needs</title><content type='html'>&lt;p&gt;No it does not.  If it did, then your customer/employer
   would be
   doing something no other human being has ever done, which
   is unlikely in the extreme.  The application may be 
   unique in its particulars, but it is almost certainly
   extremely common in its patterns.  This week we will see
   how "unique" needs are in fact nothing but common ordinary
   development projects.
&lt;/p&gt;

&lt;h2&gt;Beginning With the Conclusion&lt;/h2&gt;

&lt;p&gt;I have had this conversation with many programmers over 
   the past few years, and it always follows the same
   patterns.  The easy part of the argument is showing the
   programmer that what he thinks is special or unique
   is in fact common.  The much harder part, because it
   involves the delicate human ego, is showing the programmer
   that he has not seen this because he is ignorant.  This
   is not fun to do and I myself usually skip it, it's 
   usually not worth the trouble.
&lt;/p&gt;

&lt;h2&gt;Path 1: Details&lt;/h2&gt;

&lt;p&gt;Occasionally I speak to a programmer who thinks he has
   a unique situation.  His claim begins with 
   the mountain of details he must handle, details which appear
   to be contradictory, subtle, and overall perplexing.  He
   wonders if some new approach is required to handle them.
&lt;/p&gt;

&lt;p&gt;In answering this claim, we begin with the easy part, 
   showing that the situation is itself not unique.  In short,
   all worthwhile projects involve mountains of detail, so 
   there is nothing special there.  When it comes to the 
   subtleties and the maze of exceptions and special cases,
   these are common in mature businesses that have evolved
   this complexity in response to business needs over the years.
   So again there is nothing unique here, the programmer's 
   situation is again common.
&lt;/p&gt;

&lt;p&gt;At this point we have to ask how the programmer will deal
   with this perplexing mountain of detail.  If he knows 
   what he is doing, he will give the general answer that he
   is going to break it down as much as possible into 
   independent smaller problems that can be solved on their
   own.  Since this is nothing more than how all programmers
   solve complex problems, the entire "uniqueness" claim
   has completely collapsed.  His project is utterly common.
&lt;/p&gt;

&lt;p&gt;The much harder part of the conversation comes if the
   programmer does not know how to break down
   the problem.  For instance, if the problem is all about
   a fiendishly complex pricing system with lots of discounts
   and pricing levels, and the programmer does not know that
   he needs to begin with the database, and he further does not
   want to hear that, well, there is not much I can do for
   him.  He will end up working a lot harder than he needs
   to, and will probably remain convinced he is dealing with
   something "unique".
&lt;/p&gt;   
   
&lt;p&gt;But let's go a little deeper into that example of the
   complicated pricing system.  Why do I claim that he must
   start with the tables, and that is he is wasting time
   if he does not?  Well, a complete answer is much more than
   will fit here, and in fact I hit that theme over and over
   in these essays, but it comes down to:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;He must have an accurate and precise description of
    the details that govern the pricing scheme.  That is
    what tables are for.
    &lt;li&gt;In working out the mechanics of the tables, particularly
    their primary and foreign keys, he will come to a 
    his most complete understanding of the mechanisms
    involved.
    &lt;li&gt;When the tables completely reflect the details he
    must work with, the code will just about write itself.
    &lt;li&gt;Lastly, but probably most importantly, the customer
    will expect to control the pricing system by adjusting
    the parameters at all levels.  Again, that is what tables
    are for.  The user is in control of the pricing system
    if he can edit the tables (because of course he cannot
    edit the code).
&lt;/ul&gt;
   
&lt;h2&gt;Path 2: Combinations&lt;/h2&gt;

&lt;p&gt;Once upon a time we had simple desktop business applications,
   games, and then this weird new thing, "the web".  Now they
   are all mixed together, as we play games on the internet that
   are tied into huge databases.  Modern applications often
   combine technologies that used to be comfortably separate.
   On any particular project, 
   some of the requirements look like they
   can be met with an RDBMS, some require management and 
   delivery of media such as MP3 or video, and he is told as
   well he must provide RSS feeds and import data coming in
   XML format.  Perhaps as well there will be stone tablets
   and papyrus scrolls.
&lt;/p&gt;

&lt;p&gt;This programmer may believe he is in a unique situation
   because of this &lt;i&gt;combination&lt;/i&gt; of needs.  Because no single
   toolset out there can meet the entire project, perhaps this
   is something never before seen?  But this does
   not hold up.  Just like the argument about complexity,
   he must break the problem up correctly, and when he has done
   so he will have a perfectly ordinary project.  Though I might
   add it will also be a very &lt;i&gt;interesting&lt;/i&gt; project and
   probably a lot of fun.
&lt;/p&gt;

&lt;h2&gt;In The End It Is All About Patterns&lt;/h2&gt;

&lt;p&gt;I have given two examples above taken from my own experience
   where programmers have claimed to me that they faced some 
   unique situation.  There are many other cases, and they always
   make perfect sense to the person who thinks he has discovered
   something new.  The biggest flaw in the programmer's thinking
   is failing to distinguish between &lt;i&gt;particulars&lt;/i&gt; and
   &lt;i&gt;patterns&lt;/i&gt;.
&lt;/p&gt;

&lt;p&gt;My claim in this essay is that the patterns of all problems
   are the same.  Somebody has seen it before, somebody has done
   it before, the answer is out there.  The process of analysis
   and programming is about slotting your particulars in the
   patterns that have already been established.
&lt;/p&gt;

&lt;p&gt;In the broadest sense all programs process data, and 
   particular programs break down into broad patterns of data
   access and manipulation.  Sometimes you have a broad range
   of users putting in data with very little massaging 
   (think twitter) and sometimes you have one group controlling
   much of the data while others make use of it (think
   Amazon), and sometimes your data is mostly relational
   and table based (think any ecommerce or biz app) and
   sometimes its mostly media (think youtube).  
&lt;/p&gt;

&lt;p&gt;Once you have these broad patterns identified, you can then
   proceed to make use of established practices within 
   each particular area.  What is the best way to provide
   sensitive data on the web and protect it from unauthorized
   eyes?  Somebody has done it before.  What is the best way
   to track large amounts of media?  Somebody has done it 
   before.  What is the best way to set up a complex pricing
   system with lots of discounts and pricing levels?  Somebody
   has done it before.  In all cases, your particulars may
   be different, but the patterns will be the same.
&lt;/p&gt;

&lt;h2&gt;Conclusion: Find the Patterns&lt;/h2&gt;

&lt;p&gt;Whenever I find myself looking at a situation that appears
   to be new, I try to tell myself that it may be new to me,
   but it is not likely to be new to the human race.  If it 
   does not appear to follow a well-known pattern then I
   proceed as if &lt;i&gt;I have not yet recognized the pattern&lt;/i&gt;
   and continue to analyze and break it apart until the pattern
   emerges.  So far it always has.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-9115573581013299810?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/9115573581013299810/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=9115573581013299810' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/9115573581013299810'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/9115573581013299810'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/02/this-application-has-unique-business.html' title='This Application Has Unique Business Rule Needs'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4478895011513319689</id><published>2009-01-25T16:27:00.005-05:00</published><updated>2010-11-28T22:11:01.380-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='calculated values'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Dictionary'/><category scheme='http://www.blogger.com/atom/ns#' term='denormalization'/><title type='text'>The Data Dictionary and Calculations, Part 2</title><content type='html'>&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;The Simple Case Is Not Much Help&lt;/h2&gt;

&lt;p&gt;We will begin by examining a simple case of a shopping
   cart.  We have columns QTY and PRICE, and we want to
   add the column EXTENDED_PRICE that will contain 
   PRICE * QTY.  Our dictionary might look something like
   this:
&lt;/p&gt;

&lt;pre&gt;
table orderlines:
    ...details...
    
    column price:
        # see last week's essay for details on FETCH
        automation_id: FETCH
        auto_formula: items.price
    
    # A user-entered value, no automation
    column quantity:
    
    # The extended price:
    column extended_price:
        automation_id: extend
        auto_formula: price * qty
&lt;/pre&gt;

&lt;p&gt;This seems simple enough, we have specified the formula right in
   the table definition, and now we are free to make use of that
   formula in any way we want -- either by generating code or
   interpreting it at run-time.
&lt;/p&gt;

&lt;p&gt;Unfortunately it is a bad idea to start coding right now 
   with this example.  The problem is that it is too simple,
   and will lead us down paths that cause great problems when
   we hit more complex cases.  We must begin with a more complex
   case before we consider how to use this formula in our
   framework.
&lt;/p&gt;

&lt;h2&gt;Not All Items Are Taxable&lt;/h2&gt;

&lt;p&gt;Consider the case where you have a shopping cart online and
   you must collect sales tax, but not all items are taxable.
   This means you need conditional logic of some sort, you must
   look at a flag and then decide whether to add tax.  Here is
   a first stab at what this might look like:
&lt;/p&gt;
    
&lt;pre&gt;
table orderlines:
    ...details....
    
    # We'll skip the details on these columns for now
    column price:    
    column quantity:
    column flag_taxable:
    column taxrate:
    
    # We need to know the extended amount, that is
    # what we will tax
    column extended_amount:
        automation_id: extend
        auto_formula: price * qty
    
    # Here is the column that matters
    column tax:
        automation_id: extend
        auto_formula: CASE WHEN flag_taxable = 'Y'
                           THEN taxrate * extended_amount
                           ELSE 0 END
&lt;/pre&gt;

&lt;p&gt;While this looks like a simple enough extension to the first
   example, it gets us into a thorny decision, the decision
   between &lt;i&gt;parsing&lt;/i&gt; and &lt;i&gt;assembling&lt;/i&gt;
&lt;/p&gt;

&lt;h2&gt;Parse Versus Assemble&lt;/h2&gt;

&lt;p&gt;Before I get into the parse vs. assemble, question, let me 
   pull back and explain why the example bothers me, and why
   it is worth an entire essay to discuss.  In short, we intend
   to use the dictionary to implement a radical form of DRY -
   Don't Repeat Yourself (see &lt;a href=
   "http://en.wikipedia.org/wiki/Don%27t_repeat_yourself"
   &gt;The Wikipedia article on DRY&lt;/a&gt;.)  Once we have specified the
   formula in the dictionary, we want to use it for all 
   code generation and docs generation at very least, but we
   may also want to refer to the formulas in Java code (or PHP,
   Python, Ruby etc.) or even in Javascript code on the browser.
&lt;/p&gt;

&lt;p&gt;But the problem with the example is that it is coded in
   SQL.  In the form I presented, it can be used for generating
   triggers, but not for anything else, unless you intend to
   use a parser to split it all up into pieces that can be
   reassembled for different presentations.  The example as 
   written is useful only for a single purpose -- but everything
   in our dictionary ought to be useful at any layer in the
   framework for any purpose.
&lt;/p&gt;

&lt;p&gt;But it gets worse.  What if the programmer uses a dialect
   of SQL aimed for one platform that does not work on another?
   To guarantee cross-server compatibility, we not only have to
   parse the phrase, but then re-assemble it.
&lt;/p&gt;

&lt;p&gt;There is a third argument against the use of SQL expressions.
   We may be able to parse the expression and satisfy ourselves
   that it is valid, but that still does not mean it will
   work -- it may refer to non-existent columns or require 
   typecasts that the programmer did not provide.  This leads to
   one terrible event that you ought to be able to prevent when
   you use a dictionary: having an upgrade run successfully only
   to hit a run-time error when somebody uses the system.
&lt;/p&gt;

&lt;p&gt;A much simpler method is to &lt;i&gt;assemble&lt;/i&gt; expressions by 
   having the programmer provide formulas that are already cut up
   into pieces.
&lt;/p&gt;

&lt;h2&gt;The Assembly Route&lt;/h2&gt;

&lt;p&gt;So I want to have formulas, including conditionals, and I want
   to be able to use the formulas in PHP, Java, Javascript, inside
   of triggers, and I want to be able to generate docs out of them
   that do not contain code fragments, and I want to be able to
   guarantee when an upgrade has run that there will be no errors
   introduced through programmer mistakes in the dictionary.
   The way to do this is to specify the formulas a little
   differently:
&lt;/p&gt;

&lt;pre&gt;
    column taxable:
        calculate:
            case 00:
                compare: @flag_taxable = Y
                return: @taxrate * @extended_amount
            case 01:
                return: 0
&lt;/pre&gt;

&lt;p&gt;Here are the changes I have made for this version:

&lt;ol&gt;&lt;li&gt;The programmer must specify each case in order
    &lt;li&gt;Each case is a compare statement followed by a return
    &lt;li&gt;A case without a compare is unconditional, it always
        returns and processing ends
    &lt;li&gt;I stuck little '@' signs in front of column names,
        I will explain those in a moment.
&lt;/ol&gt;

&lt;p&gt;In short, we want the programmer to provide us with the
   conditional statements already parsed out into little pieces,
   so when we load them they look like data instead of code.
   We now have the responsibility for assembling code fragments,
   but in exchange we have pre-parsed data that can be handed
   to any programming language and used.
&lt;/p&gt;

&lt;h2&gt;Conclusion: Assembly Means Data&lt;/h2&gt;

&lt;p&gt;The decision to go the assembly route is simply another example
   of the &lt;a href=
   "http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
   &gt;Minimize Code, Maximize Data&lt;/a&gt; principle.  The dictionary 
   itself should be composed entirely of data values, no code snippets
   should be allowed to sneak in.  The reason is simple.  No matter what
   route we follow we will have to validate and assemble the formula -
   be it for PHP, Javascript, or an alternate database server.  But if
   we let the programmer give us code snippets we have the extra 
   burden of parsing as well.  Who needs it?
&lt;/p&gt;


&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2 (this essay)&lt;/a&gt;&lt;/i&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4478895011513319689?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4478895011513319689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4478895011513319689' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4478895011513319689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4478895011513319689'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html' title='The Data Dictionary and Calculations, Part 2'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-8096860715885677410</id><published>2009-01-18T19:27:00.005-05:00</published><updated>2010-11-28T22:11:01.381-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='calculated values'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Dictionary'/><category scheme='http://www.blogger.com/atom/ns#' term='denormalization'/><title type='text'>The Data Dictionary and Calculations, Part 1</title><content type='html'>&lt;p&gt;The stunning power of a data dictionary comes into play once
   the dictionary contains formulas for calculated values.
   The dictionary can then be used to generate code, and also
   to generate documentation.  This double-win is not available
   without the calculations because the resulting docs and
   database would be &lt;i&gt;incomplete&lt;/i&gt;, requiring tedious and
   error-prone manual completion.
&lt;/p&gt;


&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;


&lt;h2&gt;Calculations and Normalization&lt;/h2&gt;

&lt;p&gt;Before I begin, I will point out that all calculated values 
   stored in a database are &lt;i&gt;denormalizing&lt;/i&gt;, they all 
   introduce redundancies.  This does not mean they are bad,
   it just means you need a way to make sure they stay 
   correct (see &lt;a href=
   "http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"
   &gt;Keeping Denormalized Values Correct&lt;/a&gt;, also see &lt;a href=
   "http://www.andromeda-project.org/pages/cms/normalization+and+automation.html"
   &gt;Normalization and Automation&lt;/a&gt;).  If you cannot
   keep them correct, they will get very bad very fast.  This essay
   will show you one approach to ensuring calculated values
   are always correct.
&lt;/p&gt;

&lt;p&gt;However, before I start, I have to point out how important
   it is to begin by normalizing your database (to at least 3NF)
   and &lt;i&gt;adding calculations only upon the strong foundation
   of a normalized database&lt;/i&gt;.  If you do not normalize first,
   you will discover that it is impossible to work up formulas
   that make any sense -- values will always seem to be not quite
   where you need them, and it will always seem you need one more
   kind of calculation to support, and it will be very difficult
   to write the code generator that gives strong results.
   But if you build on a normalized database, it turns out you
   only need a few features in your dictionary and your code
   generator.
&lt;/p&gt;

&lt;h2&gt;Use Denormalization Patterns&lt;/h2&gt;

&lt;p&gt;Once you have normalized your database, you will find that
   your calculations all fall into three basic categories
   (detailed in April 2008 in &lt;a href=
   "http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"
   &gt;Denormalization Patterns&lt;/a&gt;).  These three patterns are:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;FETCH operations, like copying an item's price from
    the ITEMS table to the ORDERLINES table.
    &lt;li&gt;EXTEND operations, which are calculations within a row,
    such as assigning EXTENDED_PRICE the value of QUANTITY * PRICE.
    &lt;li&gt;AGGREGATE operations, like a SUM of the lines of an order
    to the order header.
&lt;/ul&gt;

&lt;p&gt;This week we will look at the first type of operations,
   the FETCH operations.
&lt;/p&gt;

&lt;h2&gt;Putting the FETCH Into Your Data Dictionary&lt;/h2&gt;

&lt;p&gt;So we have an ORDERLINES table, and it contains a PRICE 
   column, and the value of that column should be copied from
   the ITEMS table.  This is an extremely common operation in
   most database applications, so we decide it would be really
   cool if we could specify that in the data dictionary and have
   the code generator take care of it.  This would chop a lot of
   labor off the development process.
&lt;/p&gt;

&lt;p&gt;Here is how a column like this would appear in my own 
   dictionary format:
&lt;/p&gt;

&lt;pre&gt;
table orderlines:
    description: Order Lines
    module: sales
    
    column price:
        automation_id: fetch
        auto_formula: items.price
    ...more columns...
&lt;/pre&gt;

&lt;p&gt;This looks nice, I have put the formula for the PRICE column into
   the data dictionary.  Now of course I need that formula to get
   out into the application somehow so that &lt;i&gt;it will always be
   executed and will never be violated.&lt;/i&gt;  We will now see how to
   do that.
&lt;/p&gt;

&lt;h2&gt;The Trigger Approach&lt;/h2&gt;

&lt;p&gt;When it comes to code generators, if there are ten programmers
   in a room, there are going to be &lt;i&gt;at least 10 opinions&lt;/i&gt;
   on how to write and use a code generator (the non-programmer boss
   will also have an opinion, so that makes 11).  I have no interest
   in bashing anybody's approach or trying to list all of the
   possibilities, so I will stick with the approach I use myself,
   which is to generate database trigger code.  If you want to know
   why that approach works for me, check out &lt;a href=
   "http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"
   &gt;Triggers, Encapsulation and Composition&lt;/a&gt;.
&lt;/p&gt;   

&lt;p&gt;When I work on code generators, I begin by manually coding an
   example of what I'm getting at, so I know it works.  The trigger
   snippet we are looking for must do two things.  It must make sure
   the price is always copied, and it must make sure that no user
   can subvert the value.  This snippet (which is in the PostgreSQL
   flavor of server-side SQL) does this on an insert:
&lt;/p&gt;

&lt;pre&gt;
-- top of trigger....

    -- PART 1: Prevent users from subverting the
    --         the formula by throwing error if they
    --         try to supply a value:
    IF new.price IS NOT NULL THEN 
        ErrorCount = ErrorCount + 1; 
        ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;';
    END IF;
    
    -- PART 2: If the value of SKU exists, use it to look
    --         up the price and copy it into the new row
    IF new.sku IS NOT NULL THEN 
      SELECT INTO new.price par.price
        FROM items par 
       WHERE new.sku = par.sku ;
    END IF;

-- more trigger stuff
&lt;/pre&gt;

&lt;p&gt;NOTE!  You may notice my trigger code somehow seems to "know" to
   use the SKU column when searching the ITEMS table, yet my formula
   did not specify that.  &lt;i&gt;I am assuming your data dictionary contains
   definitions of primary keys and foreign keys, otherwise it is of
   no real use&lt;/i&gt;.  I am further assuming that when I see the formula
   to "FETCH" from the ITEMS table, I can look up the foreign key that
   matches ORDERLINES to ITEMS and find out what column(s) to use.
&lt;/p&gt;

&lt;p&gt;The example above works on INSERT operations only.  You need 
   a slightly different version for updates, which throws an error
   if the user attempts to change the price, and which does a new
   FETCH if the user has changed the SKU value.
&lt;/p&gt;

&lt;pre&gt;
    IF new.price &amp;lt;&amp;gt; old.price THEN 
        ErrorCount = ErrorCount + 1; 
        ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;';
    END IF;
    IF  coalesce(new.sku,'') &amp;lt;&amp;gt; coalesce(old.sku,'')  THEN 
       SELECT INTO new.price par.price
         FROM items par WHERE new.sku = par.sku ;
    END IF;
&lt;/pre&gt;

&lt;h2&gt;Sidebar: A Complete Trigger&lt;/h2&gt;

&lt;p&gt;If you want a teaser on how many amazing things the trigger can
   do once you've loaded up your dictionary and builder with features,
   here is a bit of code from a demo application.  Most everything in
   it will get treated in this series on the data dictionary.
&lt;/p&gt;

&lt;pre style="height: 250px; overflow-y: scroll"&gt;
CREATE OR REPLACE FUNCTION orderlines_upd_bef_r_f()
  RETURNS trigger AS
$BODY$
DECLARE
    NotifyList text = '';
    ErrorList text = '';
    ErrorCount int = 0;
    AnyInt int;
    AnyInt2 int;
    AnyRow RECORD;
    AnyChar varchar;
    AnyChar2 varchar;
    AnyChar3 varchar;
    AnyChar4 varchar;
BEGIN
    SET search_path TO public;


    -- 1010 sequence validation
    IF (new.recnum_ol &amp;lt;&amp;gt; old.recnum_ol)  THEN 
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'recnum_ol,3002, may not be re-assigned;';
    END IF;

    -- 1010 sequence validation
    IF (new.skey &amp;lt;&amp;gt; old.skey)  THEN 
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'skey,3002, may not be re-assigned;';
    END IF;

    -- 3100 PK Change Validation
    IF new.recnum_ol &amp;lt;&amp;gt; old.recnum_ol THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'recnum_ol,1003,Cannot change value;';
    END IF;
    -- 3100 END


    IF new.flag_taxable &amp;lt;&amp;gt; old.flag_taxable THEN 
        ErrorCount = ErrorCount + 1; 
        ErrorList = ErrorList || 'flag_taxable,5001,may not be explicitly assigned;';
    END IF;

    IF new.price &amp;lt;&amp;gt; old.price THEN 
        ErrorCount = ErrorCount + 1; 
        ErrorList = ErrorList || 'price,5001,may not be explicitly assigned;';
    END IF;
   IF  coalesce(new.sku,'') &amp;lt;&amp;gt; coalesce(old.sku,'')  THEN 
       SELECT INTO new.flag_taxable
                   ,new.price
                   par.flag_taxable
                   ,par.price
         FROM items par WHERE new.sku = par.sku ;
   END IF;

    -- 5000 Extended Columns
    IF new.amt_retail &amp;lt;&amp;gt; old.amt_retail THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'amt_retail,5002,Cannot assign value directly to column amt_retail ;';
    ELSE 
        new.amt_retail =  CASE WHEN  1 = 1  THEN new.price*new.qty        ELSE 0 END ;
    END IF;


    IF new.pct99_discount &amp;lt;&amp;gt; old.pct99_discount THEN 
       IF new.pct99_discount &amp;lt;&amp;gt; (SELECT par.pct99_discount FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN 
            ErrorCount = ErrorCount + 1; 
            ErrorList = ErrorList || 'pct99_discount,5001,may not be explicitly assigned;';
       END IF;
    END IF;

    IF new.taxauth &amp;lt;&amp;gt; old.taxauth THEN 
       IF new.taxauth &amp;lt;&amp;gt; (SELECT par.taxauth FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN 
            ErrorCount = ErrorCount + 1; 
            ErrorList = ErrorList || 'taxauth,5001,may not be explicitly assigned;';
       END IF;
    END IF;

    IF new.taxpct &amp;lt;&amp;gt; old.taxpct THEN 
       IF new.taxpct &amp;lt;&amp;gt; (SELECT par.taxpct FROM orders par WHERE new.recnum_ord = par.recnum_ord ) THEN 
            ErrorCount = ErrorCount + 1; 
            ErrorList = ErrorList || 'taxpct,5001,may not be explicitly assigned;';
       END IF;
    END IF;
   IF  coalesce(new.recnum_ord,0) &amp;lt;&amp;gt; coalesce(old.recnum_ord,0)  THEN 
       SELECT INTO new.pct99_discount
                   ,new.taxauth
                   ,new.taxpct
                   par.pct99_discount
                   ,par.taxauth
                   ,par.taxpct
         FROM orders par WHERE new.recnum_ord = par.recnum_ord ;
   END IF;

    -- 5000 Extended Columns
    IF new.amt_discount &amp;lt;&amp;gt; old.amt_discount THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'amt_discount,5002,Cannot assign value directly to column amt_discount ;';
    ELSE 
        new.amt_discount =  CASE WHEN  1 = 1  THEN new.amt_retail*new.pct99_discount*.01        ELSE 0 END ;
    END IF;

    -- 5000 Extended Columns
    IF new.amt_net &amp;lt;&amp;gt; old.amt_net THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'amt_net,5002,Cannot assign value directly to column amt_net ;';
    ELSE 
        new.amt_net =  CASE WHEN  1 = 1  THEN new.amt_retail-new.amt_discount        ELSE 0 END ;
    END IF;

    -- 5000 Extended Columns
    IF new.amt_tax &amp;lt;&amp;gt; old.amt_tax THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'amt_tax,5002,Cannot assign value directly to column amt_tax ;';
    ELSE 
        new.amt_tax =  CASE WHEN new.flag_taxable = 'Y' THEN new.amt_net*new.taxpct*.01        ELSE 0 END ;
    END IF;

    -- 5000 Extended Columns
    IF new.amt_due &amp;lt;&amp;gt; old.amt_due THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'amt_due,5002,Cannot assign value directly to column amt_due ;';
    ELSE 
        new.amt_due =  CASE WHEN  1 = 1  THEN new.amt_net+new.amt_tax        ELSE 0 END ;
    END IF;

    -- 7010 Column Constraint
    new.flag_taxable = UPPER(new.flag_taxable);
    IF NOT (new.flag_taxable  IN ('Y','N')) THEN 
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'new.flag_taxable,6001,Column -Taxable- can be either Y or N;';
    END IF;

    -- 8001 Insert/Update Child Validation: NOT NULL
    IF new.sku IS NULL THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'sku,1005,Required Value;';
    END IF;
    -- 8001 FK Insert/Update Child Validation
    IF new.sku IS NULL THEN
        --Error was reported above, not reported again
        --ErrorCount = ErrorCount + 1;
        --ErrorList = ErrorList || '*,1005,Foreign key columns may not be null: sku;';
    ELSE
        -- LOCK TABLE items IN EXCLUSIVE MODE;
        SELECT INTO AnyInt COUNT(*) FROM items par 
            WHERE par.sku = new.sku;
        IF AnyInt= 0 THEN

            ErrorCount = ErrorCount + 1;
            ErrorList = ErrorList || 'sku,1006,Please Select Valid Value: ' || new.sku::varchar || ';';
        END IF;
    END IF;

    -- 8001 Insert/Update Child Validation: NOT NULL
    IF new.recnum_ord IS NULL THEN
        ErrorCount = ErrorCount + 1;
        ErrorList = ErrorList || 'recnum_ord,1005,Required Value;';
    END IF;
    -- 8001 FK Insert/Update Child Validation
    IF new.recnum_ord IS NULL THEN
        --Error was reported above, not reported again
        --ErrorCount = ErrorCount + 1;
        --ErrorList = ErrorList || '*,1005,Foreign key columns may not be null: recnum_ord;';
    ELSE
        -- LOCK TABLE orders IN EXCLUSIVE MODE;
        SELECT INTO AnyInt COUNT(*) FROM orders par 
            WHERE par.recnum_ord = new.recnum_ord;
        IF AnyInt= 0 THEN

            ErrorCount = ErrorCount + 1;
            ErrorList = ErrorList || 'recnum_ord,1006,Please Select Valid Value: ' || new.recnum_ord::varchar || ';';
        END IF;
    END IF;

    IF ErrorCount &gt; 0 THEN
        RAISE EXCEPTION '%',ErrorList;
        RETURN null;
    ELSE
        IF NotifyList &amp;lt;&amp;gt; '' THEN 
             RAISE NOTICE '%',NotifyList;
        END IF; 
        RETURN new;
    END IF;
END; $BODY$
  LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER
  COST 100;
ALTER FUNCTION orderlines_upd_bef_r_f() OWNER TO postgresql;

&lt;/pre&gt;

&lt;h2&gt;Variatons on FETCH&lt;/h2&gt;

&lt;p&gt;I have found two variations on FETCH that have proven very 
   useful in real world applications.
&lt;/p&gt;

&lt;p&gt;The first I call DISTRIBUTE.  It is dangerous because it can be
   a real performance killer, and turns out you very rarely need it.
   However, that being said, sometimes you want to copy a value from
   a parent table down to every row in a child table when the value
   changes in the parent.  The first time I did this was to copy the
   final score from a GAMES table into a WAGERS table on a fake
   sports betting site.
&lt;/p&gt;

&lt;p&gt;The other variation I have found useful is FETCHDEF, my shorthand
   for "fetch by default."  In this variation the user is free to
   supply a value of their own, but if they do not supply a value then
   it will be fetched for them.
&lt;/p&gt;

&lt;h2&gt;The Code Generator Itself&lt;/h2&gt;

&lt;p&gt;As for writing the code generator itself, that is of course far more
   than I can cover in one blog entry or even 10.  Morever, since 
   anybody who decides to do so will do so in their own language
   and in their own style, there is little to be gained by showing
   code examples here.
&lt;/p&gt;

&lt;h2&gt;Conclusion: Expand Your Dictionary!&lt;/h2&gt;

&lt;p&gt;If you make up a data dictionary that only contains structure
   information like columns and keys, and you write a builder program
   to build your database, you can get a big win on upgrades and
   installs.  However, you can take that win much farther by adding
   calculated values to your database and expanding your builder
   to write trigger code.  This week we have seen what it looks like
   to implement a FETCH calculation in your dictionary and what the
   resulting trigger code might look like.
&lt;/p&gt;

&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1 (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-8096860715885677410?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/8096860715885677410/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=8096860715885677410' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8096860715885677410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8096860715885677410'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html' title='The Data Dictionary and Calculations, Part 1'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-924934037677519872</id><published>2009-01-11T18:20:00.003-05:00</published><updated>2010-11-28T22:11:22.220-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Dictionary'/><title type='text'>Upgrading Indexes With a Data Dictionary</title><content type='html'>&lt;p&gt;When you set out to use a Data Dictionary to control your
   database upgrades, you must consider not just columns and
   tables, but also indexes and keys.  The process for adding
   indexes and keys is almost the same as that for columns and
   tables, but there are a few wrinkles you have to be aware
   of.
&lt;/p&gt;
   
   
&lt;h2&gt;Review of Basic Ideas&lt;/h2&gt;

&lt;p&gt;In my &lt;a href=
   "http://database-programmer.blogspot.com/2008/06/using-data-dictionary.html"
   &gt;First Essay On Data Dictionaries&lt;/a&gt;, one of the major points
   was that the dictionary is easiest to use if it is in some
   type of plaintext format that is in source control along with
   the rest of your application files, and is the processed with
   a "builder" program.
&lt;/p&gt;

&lt;p&gt;Last week we saw the &lt;a href=
   "http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html"
   &gt;Basic compare operation&lt;/a&gt; used by the builder to 
   build and update tables.
   You read in your data dictionary, query the information_schema
   to determine the current structure of the database, and then
   generate commands to add new tables and add new columns
   to existing tables.
&lt;/p&gt;

&lt;h2&gt;The Importance of Keys and Indexes&lt;/h2&gt;

&lt;p&gt;If a builder program is to be useful, it must be complete,
   if it leaves you with manual tasks after a build then the
   entire concept of automation is lost.  The builder must be
   able to build the entire structure of the database at 
   very least, and this means it must be able to work out
   keys and indexes.&lt;/p&gt;
   
&lt;h2&gt;The Basic Steps&lt;/h2&gt;

&lt;p&gt;The basic steps of building indexes and keys are these:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Load your dictionary to some format you can work with
        easily.  I prefer to load it to tables.
    &lt;li&gt;Query the databases's INFORMATION_SCHEMA to determine
        which indexes and keys already exist.
    &lt;li&gt;Execute some type of diff to determine which indexes
        need to be built.
    &lt;li&gt;Build the indexes that are not there.
    &lt;li&gt;If you like, drop the indexes that are not in the spec.
&lt;/ul&gt;

&lt;h2&gt;Sidebar: Dropping Indexes&lt;/h2&gt;

&lt;p&gt;A builder program can add things, and it can also drop things.
   When it comes to &lt;i&gt;destructive&lt;/i&gt; operations, I prefer not
   to have my builder drop tables or columns, because the
   consequences of a mistake can be unthinkable.
&lt;/p&gt;

&lt;p&gt;However, when it comes to indexes, it is much more likely to
   be ok to drop a stray index.  Dropping indexes does not destroy
   user data.  Also, extraneous indexes will slow down inserts
   and updates, so getting rid of them is usually the Right Thing.
&lt;/p&gt;


&lt;h2&gt;Step 1: Your Specification&lt;/h2&gt;

&lt;p&gt;Your data dictionary format must have some way of letting
   you specify an index.  It is also a good idea to allow you
   to specify an ascending or descending property for each
   column, and to specify if the index is to be unique (effectively
   making it a unique constraint).
&lt;/p&gt;

&lt;p&gt;Here is an example of a very simple set of indexes:
&lt;/p&gt;

&lt;pre&gt;
table example:
    description: My example table
    
    index first_name:
        column first_name:
    index last_name:
        column last_name:
    index socialsec:
        unique: "Y"
        column socialsec:
        
    # ... column definitions follow...
&lt;/pre&gt;

&lt;p&gt;I am currently working on a program that requires frequent access
   by three columns, where the first two are in descending order
   but not the first.  An index spec for this might look like:
&lt;/p&gt;

&lt;pre&gt;
table shipments:
    description: Incoming Magazines
    
    index history:
        column bipad:
        column year:
            flag_asc: "N"
        column issue:
            flag_asc: "N"

    # ... column definitions follow...
&lt;/pre&gt;

&lt;p&gt;As far as loading this into memory, I covered that in some detail
   &lt;a href=
   "http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html"
   &gt;last week&lt;/a&gt; and will not dwell on it here.  I will simply assume
   you have code to parse and load the spec to a format that works 
   for you.
&lt;/p&gt; 

&lt;h2&gt;Step 2: The Information Schema or Server Tables&lt;/h2&gt;

&lt;p&gt;When I set out to write my builder, I found that the information_schema
   was a bit more complicated than I needed.  The server I was using,
   Postgres, had a simpler way to get what I wanted.  I also found I would
   get all kinds of extraneous definitions of indexes on system tables
   or tables that were not in my spec.  The query below
   was the easiest way to get index definitions that were limited to the
   tables in my spec on the Postgres platform:
&lt;/p&gt;

&lt;pre&gt;
Select tablename,indexname,indexdef 
 FROM pg_indexes
 JOIN zdd.tables on pg_indexes.tablename = zdd.tables.table_id
 WHERE schemaname='public'
&lt;/pre&gt;

&lt;p&gt;As far as primary keys and foreign keys go, the story is basically
   the same, your server may provide them in a convenient way the
   way Postgres gives index definitions, or you may have to dig a little
   deeper to get precisely what you want.
&lt;/p&gt;

&lt;h2&gt;Step 3: The Diff&lt;/h2&gt;

&lt;p&gt;So now we have a picture of the indexes we need to exist, and the
   indexes that already exist.  It is time to look at how to diff
   them effectively.  This step does not work the same way as it does
   with columns and tables.
&lt;/p&gt;

&lt;p&gt;Before we go into how to do the diff, let's review how we did it with
   tables and columns.  We can basically diff tables and columns &lt;i&gt;by
   name&lt;/i&gt;.  If our spec lists table CUSTOMERS and it does not appear
   to exist in the database, we can build the table CUSTOMERS, simple
   as that.  But with indexes the name really does not mean anything,
   what really matters is what columns are being indexed.  
&lt;/p&gt;

&lt;p&gt;This is why we diff indexes on the column definitions, not on
   their names.  If you want a complete trail, you would begin with
   this table that describes your own indexes:
&lt;/p&gt;

&lt;pre&gt;
SPEC_NAME   | TABLE     | COLUMNS
------------+-----------+-----------------
CUST1       | CUSTOMERS | zipcode:state
HIST1       | HISTORY   | bipad:year:issue
ORDERS1     | ORDERS    | bipad:year:issue
&lt;/pre&gt;

&lt;p&gt;Then you pull the list of indexes from the server, and lets
   say you get something like this:
&lt;/p&gt;

&lt;pre&gt;
DB_NAME   | TABLE     | COLUMNS
----------+-----------+-----------------
CUST1     | CUSTOMERS | zipcode:state
ABCD      | HISTORY   | bipad:year:issue
ORDER1    | ORDERS    | year:issue
&lt;/pre&gt;

&lt;p&gt;When you join these two together, you are matching on TABLE and
   COLUMNS, we do not care about the index names.  A query to join
   them might look like this:
&lt;/p&gt;

&lt;pre&gt;
SELECT spec.spec_name,spec.table,spec.columns
      ,db.db_name,db.columns as db_cols
  FROM spec
  FULL OUTER JOIN db   On spec.table = db.table
                      AND spec.columns = db.column
&lt;/pre&gt;

&lt;p&gt;This query would give us the following output:&lt;/p&gt;

&lt;pre&gt;
SPEC_NAME   | TABLE     | COLUMNS          | DB_NAME | DB_COLS 
------------+-----------+------------------+---------+---------------------
CUST1       | CUSTOMERS | zipcode:state    | CUST1   | zipcode:state
HIST1       | HISTORY   | bipad:year:issue | ABCD    | bipad:year:issue
ORDERS1     | ORDERS    | bipad:year:issue |         |
            |           |                  | ORDER1  | year:issue
&lt;/pre&gt;

&lt;p&gt;Now let us examine the results row by row.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The first row shows that the index on zipcode+state on the
        customers table is in the spec and in the database, we take
        no action on that index.
    &lt;li&gt;The second row shows that the index on bipad+year+issue is 
        also in both the database and the spec.  This particular index
        has a different name in the database, but we don't care.  
        (Maybe the programmer changed the name in the spec).  We take
        no action on this index.
    &lt;li&gt;The third line shows an index on the ORDERS table that is not
        in the database, we must build that index.
    &lt;li&gt;The fourth line shows an index in the database that is not
        in the spec, you can drop that if you want to.
&lt;/ul&gt;

&lt;h2&gt;The Rest of It&lt;/h2&gt;

&lt;p&gt;From here it is a simple matter to generate some commands to create
   the indexes we need.
&lt;/p&gt;

&lt;p&gt;Keys work the same way, with a few obvious differences in how they
   might be named.
&lt;/p&gt;

&lt;p&gt;We can add features from here to track if the columns are being
   indexed in ascending or descending order.
&lt;/p&gt;

&lt;h2&gt;Conclusion: Indexes Go By Definition&lt;/h2&gt;

&lt;p&gt;When writing a database upgrade "builder" program, they key thing
   to understand about indexes and keys is that you are looking to
   indentify and build indexes &lt;i&gt;according to their definition&lt;/i&gt;,
   and that names do not matter at all.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-924934037677519872?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/924934037677519872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=924934037677519872' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/924934037677519872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/924934037677519872'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/01/upgrading-indexes-with-data-dictionary.html' title='Upgrading Indexes With a Data Dictionary'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-248041273634861313</id><published>2009-01-04T18:42:00.001-05:00</published><updated>2010-11-28T22:11:22.221-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Dictionary'/><title type='text'>Dictionary Based Database Upgrades</title><content type='html'>&lt;p&gt;The number one search term that brings people to this blog is
   "data dictionary."  So this week I will begin a  
   series on how to use the data dictionary to improve your own
   productivity and reduce errors.
&lt;/p&gt;

&lt;h2&gt;Building And Upgrading&lt;/h2&gt;

&lt;p&gt;This week we are going to see how to use a data dictionary to
   eliminate upgrade scripts (mostly) and make for more efficient upgrades.
   The approach described here also works for installing a system
   from scratch, so an install and an upgrade become the same
   process.&lt;/p&gt;
   
&lt;p&gt;The major problems with upgrade scripts are these:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;They are the least-tested code in any system, and are
        the most likely to break.  If a script breaks and anybody
        but the original programmer is running the upgrade, this
        leads to aborted upgrades and upset customers.
    &lt;li&gt;They are horribly inefficient when a customer upgrades
        after a long time: the same table may be rebuilt many times
        as script after script adds a column or two.
&lt;/ul&gt;

&lt;p&gt;By contrast, a dictionary-based upgrade can take any customer
   from any version of your software and &lt;i&gt;in the fewest steps
   possible&lt;/i&gt; bring them completely current, &lt;i&gt;with no possibility
   of broken scripts&lt;/i&gt;.
&lt;/p&gt;
   
&lt;p&gt;But first, a quick review of an important idea...&lt;/p&gt;

&lt;h2&gt;Review: The Text File&lt;/h2&gt;

&lt;p&gt;Back in June of 2008 this blog featured an &lt;a href=
   "http://database-programmer.blogspot.com/2008/06/using-data-dictionary.html"
   &gt;overview of the data dictionary&lt;/a&gt;.  There were many 
   ideas in that essay, but I wish to focus on one in particular,
   the question of &lt;i&gt;where to put the data dicationary and in
   what format.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;In terms of &lt;i&gt;where to put it&lt;/i&gt;, the data dictionary should
   be just another application file, in source control, and 
   delivered with your code.  When the dictionary is in a 
   plaintext file (or files) and treated like other application
   code, you do not have to invent any new
   methods for handling it, just use the same methods you use for
   the rest of your code.
&lt;/p&gt;

&lt;p&gt;In simple practical terms, it is best if a data dictionary can
   be easily read and written by both people and computers. 
   This leads to a plaintext file or files in some format such 
   as JSON or YAML.  I personally prefer YAML because it is a superset
   of JSON, so using YAML gives me the option to sneak in JSON syntax
   later if I choose, but starting with JSON does not let me go the
   other way.
&lt;/p&gt;

&lt;p&gt;Requiring easy handling by people tends to rule out XML, which is
   bletcherous to work with manually (in my humble opinion).  Requiring
   readability by the computer rules out UML unless your UML drawing
   tool can produce a usable data file (comments always welcome, tell us
   your favorite tool for doing this!).  When considering UML, it is 
   the class diagrams that are most likely to be translatable into a
   data dictionary.
&lt;/p&gt;

&lt;p&gt;Finally, encoding dictionary information in program
   class files technically
   meets the practical requirements listed above, but it has the
   disadvantage of &lt;i&gt;trapping data in code&lt;/i&gt;, which unnecessarily
   couples your dictionary to whatever language you are using at the
   moment.  It is much better if the dictionary sits outside of the
   code as pure data.  Not to mention that spreading the dictionary out
   in a collection of one-class-per-table files makes it much harder
   to do upgrades in the way I am about to describe.
&lt;/p&gt;

&lt;h2&gt;Review of Steps&lt;/h2&gt;

&lt;p&gt;When using a dictionary-based approach, you write some type of
   "builder" program that reads your dictionary file, examines the
   current structure of the database, and then generates SQL commands
   to alter and create tables to make them all current.
&lt;/p&gt;

&lt;p&gt;There are plenty of ways to do this.  My own approach is to 
   load the dictionary itself into tables, pull the current state into
   similar tables, and then do queries to find new and altered
   tables and columns.  If you want to see a full-blown program of
   this type, check out &lt;a href=
   "https://andro.svn.sourceforge.net/svnroot/andro/trunk/andro/application/androBuild.php"
   &gt;androBuild.php&lt;/a&gt;, the Andromeda implementation of this idea.
   The routines that apply to today's topic include "RealityGet()",
   "Differences()", "Analyze()" and "PlanMake()".
&lt;/p&gt;

&lt;h2&gt;Step 1: Load the Dictionary to RAM&lt;/h2&gt;

&lt;p&gt;To use the approach in this essay, you begin by parsing your plaintext
   file and loading it to tables.  Here is a simple example of a what
   a dictionary file might look like in YAML format:
&lt;/p&gt;

&lt;pre&gt;
table states:
    description: US States
    
    columns:
        state:
            type: char
            colprec: 2
            caption: State Code
            primary_key: "Y"
        description:
            type: varchar
            colprec: 25
            caption: State Name
&lt;/pre&gt;

&lt;p&gt;If you are using PHP, you can parse this file using the 
   &lt;a href="http://spyc.sourceforge.net/"&gt;spyc&lt;/a&gt; program, which 
   converts the file into associative arrays.  All or nearly all
   modern languages have a YAML parser, check out the 
   &lt;a href="http://www.yaml.org"&gt;YAML site&lt;/a&gt; to find yours.
&lt;/p&gt;

&lt;h2&gt;Step 2: Load the Dictionary To Tables&lt;/h2&gt;

&lt;p&gt;The database you are building should have some tables that
   you can use as a scratch area during the upgrade.  You may 
   say, "The builder gives me tables, but I need tables to run
   the builder, how can I do this?"  The simplest way is to hardcode
   the creation of these tables.  A more mature solution would use
   a separate dictionary file that just defines the dictionary tables.
&lt;/p&gt;
   
&lt;p&gt;The structure of the tables should match the data file, of course.
   Here is what the YAML above would like like after being loaded
   to tables:
&lt;/p&gt;

&lt;pre&gt;
TABLE   | DESCRIPTION
--------+--------------------------------
states  | US States  
   

TABLE   |COLUMN      |CAPTION    |TYPE    |PRECISION  
--------+------------+-----------+--------+-----------
states  |state       |State Code |char    |2          
states  |description |State Name |varchar |25
&lt;/pre&gt;

&lt;h2&gt;Step 3: Fetch The Current State&lt;/h2&gt;

&lt;p&gt;All modern databases support the "information_schema" database
   schema, a schema inside of each database that contains 
   tables that describe the structure of the database.  While you
   can make queries directly against the information_schema tables,
   I prefer to fetch the information out of them into my own
   tables so that all column names are consistent with my own.
   A simple query to do this might look like this:
&lt;/p&gt;

&lt;pre&gt;
-- Postgres-specific example of pulling info out of the
-- information_schema table:

insert into TABLES_NOW (table_id)  -- my dictionary table
SELECT table_name as table_id 
  FROM information_schema.tables  
 WHERE table_schema = 'public'
   AND table_type = 'BASE TABLE'
&lt;/pre&gt;

&lt;p&gt;Pulling column information out can be much more complicated, owing
   to differences in how vendors implemement information_schema, and
   owing to the complex way data is stored in it.  Here is my own code
   to pull out the definitions of columns from the Postgres 
   information_schema, which also simplifies the definition
   dramatically, to make my downstream coding easier:
&lt;/p&gt;

&lt;pre&gt;
insert into zdd.tabflat_r 
 (table_id,column_id,formshort,colprec,colscale)  
 SELECT c.table_name,c.column_name, 
        CASE WHEN POSITION('timestamp' IN data_type) &gt; 0 THEN 'timestamp'
           WHEN POSITION('character varying' IN data_type) &gt; 0 THEN 'varchar'
           WHEN POSITION('character' IN data_type) &gt; 0 THEN 'char'
             WHEN POSITION('integer' IN data_type) &gt; 0 THEN 'int'
             ELSE data_type END,
        CASE WHEN POSITION('character' IN data_type) &gt; 0 THEN character_maximum_length
         WHEN POSITION('numeric'   IN data_type) &gt; 0 THEN numeric_precision 
     ELSE 0 END,
        CASE WHEN POSITION('numeric'   IN data_type) &gt; 0 THEN numeric_scale
         ELSE 0 END
   FROM information_schema.columns c 
   JOIN information_schema.tables t ON t.table_name = c.table_name  
  WHERE t.table_schema = 'public' 
    AND t.table_type   = 'BASE TABLE'");
&lt;/pre&gt;

&lt;h2&gt;Step 4: The Magic Diff&lt;/h2&gt;

&lt;p&gt;Now we can see how the magic happens.  Imagine you have 20 tables
   in your application, and in the past week you have modified 5 of them
   and added two more.  You want to upgrade your demo site, so what is
   the next step for the builder?
&lt;/p&gt;

&lt;p&gt;The builder must now do a "diff" between your dictionary and the
   actual state of the database, looking for:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Completely new tables.
    &lt;li&gt;New columns in existing tables.
&lt;/ul&gt;

&lt;p&gt;Lets say you have two tables, "TABLES_SPEC" which lists the
   tables in your application.  Then you have "TABLES_NOW" that lists
   the tables in your database.   The following query will give
   you a list of new tables:
&lt;/p&gt;

&lt;pre&gt;
SELECT spec.table_id
  FROM TABLES_SPEC spec 
 WHERE NOT EXISTS (
       SELECT table_id from TABLES_NOW now
        WHERE now.table_id = spec.table_id
       )
&lt;/pre&gt;

&lt;p&gt;It is now a simple thing pull the column definitions for each
   table and generate some DDL to create the tables.
&lt;/p&gt;

&lt;p&gt;But we also have tables that have new columns.  We can pull those
   out like so:
&lt;/p&gt;

&lt;pre&gt;
SELECT * from COLUMNS_SPEC spec
 -- the first where clause gets new columns
 WHERE not exists (
       SELECT table_id FROM COLUMNS_NOW now
        WHERE spec.table_id = now.table_id
          AND spec.column_id= now.column_id
       )
   -- this second subquery makes sure we are
   -- getting only existing tables
   AND EXISTS (
       SELECT table_id from TABLES_NOW now
        WHERE now.table_id = spec.table_id
       )
&lt;/pre&gt;

&lt;p&gt;Now again it is a simple matter to generate DDL commands that
   add all of the new columns into each table.  Some databases
   will allow multiple columns to be added in one statement, while
   others will require one ALTER TABLE per new column (really horrible
   when you have to do that).
&lt;/p&gt;   

       
&lt;p&gt;Please note this is sample code only, just to give you ideas,
   and it will not cover every case.  

&lt;h2&gt;Sidebar: Avoid Destructive Actions&lt;/h2&gt;

&lt;p&gt;Do not rush into writing code that drops columns or tables that
   are not in your spec.  The results of a misstep can be disastrous
   (as in lose your job or your customer).  My own builder code is now
   4 1/2 years old and I have never yet bothered to write a 
   "destructive" upgrade that will clean out unused tables and columns.
   Maybe someday...
&lt;/p&gt;

&lt;h2&gt;What I Left out: Validation&lt;/h2&gt;

&lt;p&gt;There was no space in this essay to discuss a very important
   topic: validating the spec changes.  It may be that a programmer
   has done something nasty like change a column type from character
   to integer.  Most databases will fail attempting to alter the column
   because they don't know how to convert the data.  Your builder program
   can trap these events by &lt;i&gt;validating the upgrade&lt;/i&gt; before any
   changes are made.  This will be treated fully in a future essay.
&lt;/p&gt;

&lt;h2&gt;More that I Left Out: Indexes, Keys...&lt;/h2&gt;

&lt;p&gt;There are many many other things you can and really must create
   during the build, beginning with primary key and foreign keys,
   not to mention indexes as well.  These will be covered in a future
   essay.
&lt;/p&gt;

&lt;h2&gt;More That I Left Out: When You Still Need Scripts&lt;/h2&gt;

&lt;p&gt;There are plenty of reasons why you may still need a few
   upgrade scripts, these will be discussed in a future essay.
   They all come down to moving data around when table structures
   change significantly.
&lt;/p&gt;

&lt;h2&gt;Conclusion: One Upgrade To Rule Them All&lt;/h2&gt;

&lt;p&gt;The approach described this week for upgrading databases has
   many advantages.  It is first and foremost the most efficient way
   to upgrade customers from any version directly to the latest
   version.  It is also the simplest way to handle both installations
   and upgrades: they are both the same process.  Putting the dictionary
   file into plaintext gives you complete source control just like any
   other application file, and overall you have a tight, efficient
   and error-free upgrade process.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-248041273634861313?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/248041273634861313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=248041273634861313' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/248041273634861313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/248041273634861313'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html' title='Dictionary Based Database Upgrades'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-2420668866355842257</id><published>2008-11-02T14:31:00.004-05:00</published><updated>2010-11-28T22:12:40.755-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='denormalization'/><title type='text'>Keeping Denormalized Values Correct</title><content type='html'>&lt;p&gt;A normalized database stores each fact in exactly one place.
   This makes for very robust write operations, it is much easier
   to get things right on the way in.  But it becomes much harder
   to get things out efficiently or easily, so very often we 
   denormalize, that is, we store facts in more than one place for
   easier retrieval.  This requires a very well thought out 
   strategy to make sure these repeated values are always correct.
&lt;/p&gt;

&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Review of Methods&lt;/h2&gt;

&lt;p&gt;For our example this week we will consider a shopping cart.  The
   orders are kept in the ORDERS table and the items purchased are
   in the ORDER_LINES table.  We have denormalized the database by
   keeping the value ORDER_TOTAL in the ORDERS table.  Every time a line
   item is changed, the ORDER_TOTAL must be updated.
&lt;/p&gt;

&lt;p&gt;There are a handful of methods that are popular in the wild 
   for doing this.  Some practices emphasize 
   programmer discipline, others seek to prevent actions that
   will lead to inccorrect values.  Strategies also fall into 
   application level or database level, while still others operate
   at the architecture level.
&lt;/p&gt;

&lt;p&gt;Personally I chose triggers about four years ago, which I will
   explain at the end of this essay.
&lt;/p&gt;

&lt;h2&gt;The Weakest Approach: Discipline&lt;/h2&gt;

&lt;p&gt;The simplest approach is to require that programmers be made aware
   of all denormalized values and that they must remember if they
   modify a line item to update the ORDER_TOTAL.  This works well enough
   actually for small programming teams, where
   there is only one or maybe two programmers, preferably sitting right
   next to other.  Mind-reading helps here as well.
&lt;/p&gt;

&lt;p&gt;Of course this approach falls apart like a rotten burlap bag as soon
   as the team or the program exceeds the ability of the team to keep
   it all straight in their heads.  
&lt;/p&gt;

&lt;p&gt;But I did not bring up this example just to ridicule it.  I have found 
   that even seasoned veteran programmers (including your humble author)
   will fall into the trap of trying to enforce conventions at many 
   levels in their programming.  To see why this is always a bad idea
   and should always be avoided, consider this hypothetical case.
&lt;/p&gt;

&lt;p&gt;Imagine a new programming language is introduced known as Super-G,
   which is fashionable and wonderful and everybody loves it.  It has 
   a well-thought out typing system except for one odd behavior:
   If you code a line that concatenates a string with a date, program
   execution ends with no error.  The compiler does not trap for this
   and run-time does not tell you why it quits.
   You can Google for it and find
   out that everybody knows about it, and &lt;i&gt;you just have to remember
   not to do that!&lt;/i&gt;  The language's authors have no plans to fix it
   because nothing is wrong.  The fully expec you to always remember
   never to concatenate strings and dates.
&lt;/p&gt;

&lt;p&gt;The example is meant to be absurd, but to reinforce that any 
   strategy where &lt;i&gt;you just have to remember&lt;/i&gt; is out of the running
   from the start.  Since we would not accept this in any tool we use,
   we should certainly never build our own practices upon such sand,
   and certainly we would not count on it to keep denormalized values
   correct.
&lt;/p&gt;

&lt;h2&gt;Limiting Access To The Database&lt;/h2&gt;

&lt;p&gt;The next simplest strategy is to prune down what agents (programs
   or users) can get at the database.  The idea is simple: just let one
   program get at the database, make sure that program is correct,
   and force everybody to go through the application.
&lt;/p&gt;

&lt;p&gt;This will work if you can get your programs right and there is no
   chance that any of the check-signers will demand access except through
   your application.  Many programmers believe this is true for them.
   Some of them are right, but many are not: their users would love to
   get access to the database but the programmer has created a situation
   where it is impossible.
&lt;/p&gt;

&lt;p&gt;Personally I try to avoid this approach completely, and my reasons
   are both philosophical and technical.
&lt;/p&gt;

&lt;p&gt;On the technical side, successful programs always expand in
   scope, and the demand for flexible database access always 
   increases.  Limiting access to the database means that eventually
   you have to recode the entire database interface.  This means work
   for you, cost to the customer, and work for the customer in plugging
   into whatever interface you create.  This may be doable, but the
   overriding fact is that &lt;i&gt;databases already have an interface&lt;/i&gt;,
   and any time spent re-inventing it could better be spent
   on just about anything.
&lt;/p&gt;

&lt;p&gt;On the philosophical side I simply do not like any architecture
   where limitations are built in from the start.  Call it a personal
   prejudice, but I much prefer to find the flexible solution where
   there is one (and personally I love to find it where it appears 
   it does not exist).  Overall the flexible solution
   always leads to more possibilities for
   work, more features, and just plain more fun.
&lt;/p&gt;

&lt;h2&gt;Application Framework Strategies&lt;/h2&gt;

&lt;p&gt;If you are committed mainting the ORDER_TOTAL in application code,
   and you wish to avoid the "please remember to always...." blunder,
   then it must not be possible for new programmers or prima donna
   programmers to violate the requirement.  This means your framework
   cannot allow random SQL commands, and must somehow force all write
   access to particular tables to route through particular objects or
   functions.   A good ORM system should at very least not only provide
   a mechanism for updating related tables, but also prevent any access
   except through that mechanism.
&lt;/p&gt;

&lt;p&gt;Personally I have no use for these kinds of systems, for reasons
   explained in the previous section, and so I cannot really comment
   on them beyond describing these basic minimum requirements.
&lt;/p&gt;

&lt;h2&gt;Server-Side Strategy: Stored Procedure&lt;/h2&gt;

&lt;p&gt;A few years ago I was working in Manhattan and a fellow programmer
   explained that at his wife's job all database write access had to go
   through stored procedures.  The idea was to ensure that business rules
   were always enforced and to prevent any programmer from wittingly
   or unwittingly violating the rules.  In the interest of full disclosure,
   I'll point out that I have never worked on such a system, and all of
   my knowledge of such systems is second-hand or third-hand.&lt;/p&gt;
   
&lt;p&gt;With that being said, the obvious up-side to this method is that you
   avoid forcing database access through your application, making things
   much more robust and extendable.  Further, you make sure, by coding 
   up routines that handle UPDATES and INSERTS to ORDERS and ORDER_LINES
   that the useful but troubesome ORDER_TOTAL value is always updated
   when it needs to be.   Further still, you can tie security to the
   stored procedures and control who can modify orders, which is a
   prime feature
   mentioned by everybody who has ever explained such a system to me.
&lt;/p&gt;

&lt;p&gt;There is a significant down-side if you intend to code the stored
   procedures manually.  My own experience is that server-side code is
   the most difficult to debug (please feel free to post a comment
   trumpeting your favorite debugger for stored procedures, I'm all
   ears).  
&lt;/p&gt;

&lt;p&gt;I have never been tempted to use a system like this because I 
   believe it is still exactly one level more complicated than it needs
   to be.  What I really want is to be able to directly code an INESRT
   to the ORDER_LINES table from any source and know the ORDER_TOTAL field
   will always be correct.  If that were possible, then all parties are
   liberated from inventing and then using any API except SQL.  Now of
   course many of us prefer to build some layer on top of SQL (myself
   inclued), but if the architecture supports direct SQL while enforcing
   business rules then all parties are free to use abstraction layers
   of their choosing, and nobody is forced to invent or accomodate 
   anything they do not wish to.
&lt;/p&gt;

&lt;h2&gt;Server-Side Stragey: Triggers&lt;/h2&gt;

&lt;p&gt;It is a simple technical fact that the tightest possible encapsulation
   of code and data occurs when you attach triggers to tables.  In our
   example of the ORDER_TOTAL value, any INSERT, UPDATE, or DELETE to
   the ORDER_LINES table would update the ORDER_TOTAL in the ORDERS
   table.  This approach gives maximum flexibility: you can directly 
   access the database without violating rules, and any player can use
   an abstraction layer of their choice, or none at all.  
&lt;/p&gt;

&lt;p&gt;Since many programmers find it very tedious and error-prone to code
   and debug server-side routines, this approach still faces a large
   obstacle if you intend to code the triggers by hand.  But this should 
   not be necessary when taking this approach, because all denormalization
   will follow patterns.  This is a theme that I tend to repeat over and
   over in these essays: your tables will all follow predictable patterns
   and your denormalizations will likewise follow patterns.  Whenever
   you have patterns you can have automation, and in this case that means
   generating the triggers instead of coding them by hand.
&lt;/p&gt;

&lt;p&gt;Another concern with this approach is security.  I have been stressing
   the inevitable need for expanded database access as your application
   matures, but if you let somebody in with full priveleges, they could
   accidentally or maliciously cause huge damage if they can run willy-nilly
   wherever they want in the database.  The trigger-based approach 
   is the tightest possible way to enforce business rules, but it does
   nothing to address security.  And if you end up granting database
   access based on confidence in triggers, then you are forced into 
   enforcing security as well inside of the database -- but that is an
   essay for another day.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Once we decide to denormalize then we are required to dream up a
   strategy to keep things correct going in.  The weakest strategies
   depend upon voluntary adherence to some set of conventions, and many
   strategies accept limitations in overall flexibilty to reduce the
   threat from unknown elements.  The trigger option, not very popular
   these days, provides the tightest encapsulation of code and data,
   and lends itself well to code generation.
&lt;/p&gt;


&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization&lt;/a&gt;.
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-2420668866355842257?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/2420668866355842257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=2420668866355842257' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2420668866355842257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/2420668866355842257'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html' title='Keeping Denormalized Values Correct'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4645622410868958041</id><published>2008-10-26T16:13:00.002-04:00</published><updated>2008-10-26T16:15:48.595-04:00</updated><title type='text'>Data and Code at the Application Level</title><content type='html'>&lt;p&gt;This week I would like to address the assertion that "code is data"
   and how the application developer might benefit or be harmed by
   this idea in the practical pursuit of deadlines and functioning
   code.  For some reason my essay written back in May, &lt;a href=
   "http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
   &gt;Minimize Code, Maximize Data&lt;/a&gt; got picked up on the blogosphere
   on Thursday, and comments on &lt;a href=
   "http://news.ycombinator.com/item?id=343633"&gt;ycombinator&lt;/a&gt;,
   on &lt;a href=
   "http://www.reddit.com/r/programming/comments/7927o/the_database_programmer_minimize_code_maximize/"
   &gt;reddit.com&lt;/a&gt;, and on the post itself 
   have suggested the thesis is flawed or unworkable
   because "code is data."  Let's take a look at that.
&lt;/p&gt;

&lt;h2&gt;Credit Where Credit Due&lt;/h2&gt;

&lt;p&gt;I first heard the thesis "Minimize Code, Maximize Data" from 
   &lt;a href=
   "http://en.wikipedia.org/wiki/Neil_Pappalardo"
   &gt;A. Neil Pappalardo&lt;/a&gt;.  I consider it the "best kept secret in
   programming" because I personally have found it to be almost 
   completely absent from my own day-to-day experience with other
   programmers.
&lt;/p&gt;

&lt;p&gt;However, &lt;a href="http://www.reddit.com/user/glomek/"&gt;glomek&lt;/a&gt; over
   at reddit.com also credits Eric Raymond with the following quote, 
   "Smart data structures and dumb code works a lot better than the
   other way around."
&lt;/p&gt;

&lt;p&gt;Also, &lt;a href="http://news.ycombinator.com/user?id=sciolizer"&gt;sciolizer&lt;/a&gt;
   over on the &lt;a href="http://news.ycombinator.com/item?id=343633"
   &gt;news.ycombinator.com&lt;/a&gt; comments area gives us these quotes from 
   some of the greats:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;
  Fred Brooks: "Show me your flow charts and conceal your tables and I
    shall continue to be mystified, show me your tables and I won't 
    usually need your flow charts; they'll be obvious."
&lt;li&gt;Rob Pike: "Data dominates. If you've chosen the right data structures
   and organized things well, the algorithms will almost always be 
   self-evident. Data structures, not algorithms, are
   central to programming."
&lt;li&gt;Eric S. Raymond (again): "Fold knowledge into data, so program
   logic can be stupid and robust."
&lt;li&gt;Peter Norvig: "Use data-driven programming, where pattern/action
   pairs are stored in a table."
&lt;/ul&gt;

&lt;p&gt;And finally, &lt;a href="http://canonical.org/~kragen"&gt;Kragen Javier 
   Sitaker&lt;/a&gt; left a comment on my original essay mentioning Tim
   Berners-Lee and his theory of "least power."  You can read a 
   description of that
   &lt;a href="http://www.w3.org/DesignIssues/Principles.html"&gt;here&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;So Why Do They Say Code is Data?&lt;/h2&gt;

&lt;p&gt;The suprising answer is that code &lt;i&gt;is&lt;/i&gt; data, &lt;i&gt;in particular contexts
   and when trying accomplish certain tasks.&lt;/i&gt;  The contexts do not
   include application development, and the tasks do not involve 
   storing of customer information, but the fact remains true for
   those who work in the right contexts.
&lt;/p&gt;

&lt;p&gt;As an example, at the bottom layer of the modern computer
   are the physical devices
   of CPU and RAM.  Both the computer program being executed and the
   data it operates on are stored in RAM in the same way.  This is called
   the &lt;a href="http://en.wikipedia.org/wiki/Von_Neumann_architecture"&gt;
   Von Neumann architecture&lt;/a&gt;.  Its a fascinating study and a programmer
   can only be improved by understanding it.  At this level code is
   data in the most fundamental ways.  There are many many other 
   contexts and tasks for which it is true that code is data.
&lt;/p&gt;

&lt;p&gt;But we who create applications for customers are separated from
   Von Neumann by decades.  These decades have seen a larger and larger
   stack of tools that allow us to concentrate on specialized tasks
   without worrying about how the tools below are doing their jobs.
   One of the most significant sets of tools that we use allow us
   to cleanly separate code from data and handle them differently.
&lt;/p&gt;

&lt;h2&gt;The One and Only Difference&lt;/h2&gt;

&lt;p&gt;Trying to explain the differences between code and data is like
   trying to explain the differences between a fish and a bicycle.
   You can get bogged down endlessly explaining the rubber tire
   of the wheel, which the fish does not even have, or explaining
   the complexity of the gills, which the bicycle does not even
   have.
&lt;/p&gt;

&lt;p&gt;To avoid all of that nonsense I want to go straight to what
   data is and what code is.  The differences after that are
   apparent.
&lt;/p&gt;

&lt;p&gt;Data is an inert record of fact.  It does nothing but sit
   there.  
&lt;/p&gt;

&lt;p&gt;A program is the actor, the agent, the power.  The application
   program picks up the data, shakes it, polishes, and puts it
   down somewhere else (as in picking it up from the db server,
   transforming it into HTML, and delivering it to a browser).
&lt;/p&gt;

&lt;p&gt;To repeat: data is facts.  Code is actions that operate on 
   facts.  The one and only difference is simply that they are not the same
   thing at all, they are a fish and a bicycle. 
&lt;/p&gt;

&lt;h2&gt;Exploiting The Difference&lt;/h2&gt;

&lt;p&gt;All of the quotes listed above, and my original essay in May on the
   subject, try to bring home a certain point.  This point is simply
   that &lt;i&gt;the better class of programs are those that begin with
   a distinction between fact and action, and seek first to organize
   the facts and only then to plan the actions.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;Put another way, it is of enormous practical advantage to the
   programmer to fully understand that first and always he is 
   manipulating facts (data).  If he ignores the principles of how facts are
   organized and operated on, he can never reach his full abilities
   as a programmer.  Only when he understands how the facts are
   organized can he see the clearest program designs.
&lt;/p&gt;

&lt;p&gt;And Again: Understand the
   facts first.  From there design your data structures.  After that
   the algorithms write themselves.  
&lt;/p&gt;

&lt;h2&gt;Minimizing and Maximizing&lt;/h2&gt;

&lt;p&gt;The specific advice to minimize code and maximize data is nothing
   more than taking the idea to its logical conclusion.  If I write
   program X so that the data structures are paramount, and I find the
   algorithms to be simple (or "dumb" as ESR would say), easy to write
   and easy to maintain, don't I want to do that all of the time?
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The wise programmer is one who can take the wisdom and theory
   of the industry and correctly judge what is appropriate and
   applicable and what is not.  This is the programmer who has a shot
   at keeping focused, making budget and making deadlines.  He 
   knows when a generalized routine will support the overall 
   project and when to just code the case at hand and move on.
&lt;/p&gt;

&lt;p&gt;The unwise programmer is one who cannot properly apply a theoretical
   concept to the correct context, or cannot judge the context in
   which a concept is appropriate.  He is the one who produces
   mammoth abstractions, loses sight of the end-goals of the
   check-signers and end-users, and never seems to be able to
   make the deadline.
&lt;/p&gt;

&lt;p&gt;Of all of the advice I have received over the years, one of the
   most useful and productive has been to "minimize code, maximize
   data."  As an application developer and framework developer it has
   served me better than most.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4645622410868958041?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4645622410868958041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4645622410868958041' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4645622410868958041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4645622410868958041'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/10/data-and-code-at-application-level.html' title='Data and Code at the Application Level'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4160042624476522436</id><published>2008-10-19T19:23:00.003-04:00</published><updated>2010-11-28T22:12:04.938-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='denormalization'/><title type='text'>The Argument For Denormalization</title><content type='html'>&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Non-normalized, Normalized and Denormalized&lt;/h2&gt;

&lt;p&gt;A &lt;b&gt;nonnormalized&lt;/b&gt; database is a disorganized one, where nobody
   has bothered to work out where the facts should be stored.  It is like
   a stack of paper files that has been tossed down the stairs.  We 
   are not interested in non-normalized databases.
&lt;/p&gt;

&lt;p&gt;A &lt;b&gt;normalized&lt;/b&gt; database has been organized so that each fact is
   stored in exactly one place (2nf and greater) and no more than one fact
   is stored in each place (1nf).  In a normalized database there is
   a place for everything and everything is in its place.
&lt;/p&gt;

&lt;p&gt;A &lt;b&gt;denormalized&lt;/b&gt; database is a normalized database that has
   had redundancies deliberately re-introduced for some practical
   gain.
&lt;/p&gt;

&lt;p&gt;Most denormalizing means adding columns to tables that
   provide values you would otherwise have to calculate 
   as needed.  Values are copied from table to table, calculations
   are made within a row, and totals, averages and other aggregrations
   are made between child and parent tables.
&lt;/p&gt;


&lt;h2&gt;Related Essays&lt;/h2&gt;

&lt;p&gt;If you are a first-time reader of this blog, I recommend taking
   a look at &lt;a href=
   "http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"
   &gt;Third Normal Form and Calculated Values&lt;/a&gt; and &lt;a href=
   "http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"
   &gt;Denormalization Patterns&lt;/a&gt;, which cover issues related to
   today's post.
&lt;/p&gt;

&lt;h2&gt;The Practical Problems Of Normalization&lt;/h2&gt;

&lt;p&gt;There are four practical problems with a fully normalized 
   database, three of which I have listed before.  I will list
   them all here for completeness:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;&lt;b&gt;No calculated values&lt;/b&gt;.  Calculated values are a fact
    of life for all applications, but a normalized database lacks
    them.  The burden of providing calculated values must be taken
    up by somebody somehow.  Denormalization is one approach to
    this, though there are others.
    &lt;li&gt;&lt;b&gt;Non-reproducible Calculations&lt;/b&gt;.  If you do not store
    calculated values in your database, your application must
    generate them on the fly as needed.  If your application changes
    over time, you risk not being able to reproduce prior results
    when the business rules drift far enough from the original.
    &lt;li&gt;&lt;b&gt;Join Jungles&lt;/b&gt;.  When each fact is stored in exactly
    one place, you may find it daunting to pull together everything
    needed for a certain query.  A query joining 4,5, 7 or even 12
    tables may be required for something the end-user considers
    trivial and easy.  Such queries are hard to code, hard to debug,
    and dangerous to alter.
    &lt;li&gt;&lt;b&gt;Performance&lt;/b&gt;.  When you face a JOIN jungle you almost
    always face performance problems.  A JOIN is a very expensive
    operation compared to a single-table read, and the more JOINs you
    have the worse it gets.
&lt;/ol&gt;

&lt;h2&gt;The Convenience Argument&lt;/h2&gt;

&lt;p&gt;The convenience argument addresses the first problem listed
   above, no calculated values.  When calculated values are
   generated and added to tables, it is far easier for downstream
   programmers (including members of the customer's IT department)
   to generate their own reports and ad-hoc queries.  It is also
   much easier for members of the original team to generate
   display pages and reports.
&lt;/p&gt;

&lt;p&gt;This convenience is not a result of the simple presence of
   the calculated values.  The convenience stems from the fact
   that the downstream programmers do not have to get involved
   in code that generates or calculates the values.  They do not
   have to know anything about the API, the language the app
   was written in, or anything else, they just have to pull the
   data they need.
&lt;/p&gt;

&lt;p&gt;This convenience goes beyond the programmers to semi-technical
   users who may want to use their favorite 3rd party reporting
   tool (like Crystal Reports) to query the database.  If your
   application API will not work with their favorite tool
   (or if you don't have an API), then you have a dissappointed
   customer.  But if the data is right there in tables they
   can pretty much use anything.
&lt;/p&gt;

&lt;p&gt;At this point you may be saying, sure, that's fine, but views
   get all of this done without denormalizing.  That is true,
   but when we go on to the next 3 arguments we will see something
   of why denormalizing often wins out over views.
&lt;/p&gt;

&lt;h2&gt;The Stability Argument&lt;/h2&gt;

&lt;p&gt;Every healthy computer program changes and grows as new users
   and customers make use of it.  During this process it is 
   inevitable that later customers will request significant changes
   to very basic functions that were coded early on and are
   considered stable.  When this happens the programmers have
   the daunting task of providing the original functionality
   unchanged for established customers, while providing the new
   functionality for the newer customers.
&lt;/p&gt;

&lt;p&gt;Denormalizing can help here.  When derived values are calculated
   during write operations and put directly into the database, they
   can basically stay there forever unchanged.   When a significant
   new version brings newer code to older users, there
   is no need to fear that that an invoice printed last week will
   suddenly come out with different numbers.
&lt;/p&gt;

&lt;p&gt;There still remains of course the fact that a bug in this whole
   effort means future calculations are wrong, and the worst case
   is when a bug gets
   out to production and generates bad calculated values.
   When this happens you 
   face the prospect of fixing bad data on a live system.  This is
   definitely my least favorite thing to do.
&lt;/p&gt;

&lt;h2&gt;The Simple Queries Argument&lt;/h2&gt;

&lt;p&gt;The third problem listed above is JOIN jungles: queries that 
   involve so many JOINs that they become impractical to write,
   difficult to debug, and dangerous to change.
&lt;/p&gt;

&lt;p&gt;When you denormalize a database by copying values around
   between parent and child tables, you reduce the number of
   JOINs that are required.  Very obvious examples include things
   like copying an items price onto an order_lines table when
   a customer puts an item in their cart.  Each time you copy
   a fact from one table to another, you eliminate the need for
   a JOIN between those two tables.  Each eliminated JOIN is
   a simpler query that is easier to get right the first time,
   easier to debug, and easier to keep correct when changed.
&lt;/p&gt;
   
&lt;p&gt;This argument also goes directly back to the convenience 
   argument.  If that huge customer you just landed is happy to
   hear that they can use Crystal Reports to generate reports,
   you may still face disappointment when they find the reports
   involve "too many tables" from their perspective for reports
   that "ought to be simple".  
&lt;/p&gt;

&lt;h2&gt;The Performance Argument&lt;/h2&gt;

&lt;p&gt;The final argument proceeds from our fourth problem listed
   above.  Normalized databases require a lot more JOINs than
   denormalized databases, and JOINs are very expensive.
   This means that, overall, any operation that reads
   and presents data will be more expensive in a normalized
   database than a denormalized one.  
&lt;/p&gt;

&lt;p&gt;Once we reduce the JOINs by copying data between tables,
   we end up improving performance because we need fewer JOINs
   to retrieve the same number of facts.
&lt;/p&gt;

&lt;p&gt;Denormalization is not the only way to get the convenience
   of copied values and calculated values.  Views and materialized
   views are the most often mentioned alternatives.  The
   choice between denormalizing and using views has a lot to do
   with the &lt;a href=
   "http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"
   &gt;Pay Me Now or Pay Me Later&lt;/a&gt; decision.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Denormalization is not an absolute: it is not one of those things
   that all wise experienced programmers always do, and it is not 
   something that only fools ignore.  The four arguments listed here
   have guided me well in deciding when to denormalize (and when not to),
   and I hope that they are of some benefit to you when you face the
   same decisions.
&lt;/p&gt;

&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization&lt;/a&gt;.
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4160042624476522436?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4160042624476522436/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4160042624476522436' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4160042624476522436'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4160042624476522436'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html' title='The Argument For Denormalization'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-8570670664659852059</id><published>2008-10-12T12:47:00.002-04:00</published><updated>2010-11-27T13:51:31.248-05:00</updated><title type='text'>The Argument For Normalization</title><content type='html'>&lt;p&gt;This week we will review the practical arguments
   in favor of normalization.  The major concern as always on this
   blog is to examine database decisions in light of how they affect
   the overall application.  The major argument for normalization is
   very simple: you end up coding less, coding easier, and coding
   stronger, and you end up with fewer data errors.
&lt;/p&gt;

&lt;p&gt;There are links to related essays on normalization and denormalization at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;


&lt;h2&gt;Informal Description of Normalization&lt;/h2&gt;

&lt;p&gt;When I find a programmer who is stuck trying to grasp the concepts
   of normalization, the problem usually comes down to not being able
   to see the big picture.  The programmer may read any number of 
   rigorous papers on the subject (the &lt;a href=
"http://en.wikipedia.org/wiki/Database_normalization"
   &gt;Wikipedia article&lt;/a&gt; is a good place to start) but still be unable
   to get the basic point.  This leaves the programmer stumbling through
   table design, second-guessing himself, and then running through a
   frustrating sequence of redesigns.
   In the worst case it leads him to conclude normalization
   may not be worth the effort, at which point he starts writing really
   crappy applications.
&lt;/p&gt;

&lt;p&gt;The goal of normalization in simple terms is just this: to store each
   fact in exactly one place.  When you put each fact in only one place, you
   always know where to go to read it or write it.  When facts are 
   repeated in the database, the application programmer has an increased
   burden to make sure they are all consistent. 
   If he fails to shoulder this
   burden completely, the database will have inconsistent values for the
   same facts, leading to emergency phone calls and emails requesting
   help.  &lt;i&gt;These request for help always come at 4:30pm as you are 
   getting ready for a date or an extended vacation.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;Once the programmer understands this very simple goal, he usually 
   has an "aha!" moment reading the various articles on normalization.
   Each rule for normalizing suddenly makes more sense, as it appears
   as just another way to make sure that there is a place for 
   everything and everything is in its place.
&lt;/p&gt;

&lt;h2&gt;The Programmer's Burden&lt;/h2&gt;
   
&lt;p&gt;When a programmer is dealing with a non-normalized database, he is
   going to run into 4 problems over and over again.  Three of these
   are called "anomalies" and the fourth is "inconsistency" (a
   fancy way of saying the database holds bad data).  Most authors 
   who write on normalization take it for granted that the reader 
   can readily see why the anomalies are bad, but I would like to
   spell it out here to make it crystal clear.  First we will look
   at the three kinds of anomalies, then we will go to the inconsistency
   problem, and see exactly how they affect the programmer.
&lt;/p&gt;

&lt;p&gt;Consider a fledgling programmer who has ready too many Web 2.0
   blogs saying that relational databases are bad, and so has not 
   bothered to learn anything about them.  In the name of "simplicity",
   he creates a single table that lists employees, their email addresses,
   the customers they are assigned to, and the primary 
   email address of each customer.  This will lead to three kinds of
   anomaly, each of which leads to inconsistency.
&lt;/p&gt;
   
&lt;p&gt;An &lt;b&gt;Update Anomaly&lt;/b&gt; occurs when a fact is stored in 
   multiple locations and a user is able to change one without changing
   them all.  If a user goes to this employee-customer table and
   changes an employee's email on only one row, and no provision is
   made to change the others, then the database now has inconsistent
   values for the employee's email.
&lt;/p&gt;
   
&lt;p&gt;&lt;li&gt;An &lt;b&gt;Insert Anomaly&lt;/b&gt; occurs when it is not actually
   possible to record a fact.  If an employee is hired but not yet
   assigned to any customers, it is not possible to store
   his email address!
&lt;/p&gt;

&lt;p&gt;&lt;li&gt;A &lt;b&gt;Delete Anomaly&lt;/b&gt; occurs when the user deletes one
   fact and clobbers some other fact along the way.  If an employee
   goes on leave, so that we must remove (delete) their assignments,
   then we have lost their email address!
&lt;/p&gt;
   
&lt;p&gt;This non-normalized database requires the programmer to write 
   additional application code to try to intercept and correct these
   issues.  This is the &lt;i&gt;Programmer's Burden&lt;/i&gt; in a 
   non-normalized situation, and it gets worse and worse as the program
   expands.
&lt;/p&gt;

&lt;p&gt;The Programmer's Burden also emerges as a continuing stream of
   complaints from users that "the program is wrong."  For every case
   where the programmer fails to provide exception-handling code, a
   user will stumble across inconsistent data.  The customer says,
   "it says 'X' on this  screen but it says 'Y' on that screen,
   what's going on?"  As far as they are concerned it is a bug 
   (which of course it is) and must be fixed.  You can't make money
   coding new features when you are fixing garbage like that.
&lt;/p&gt;

&lt;h2&gt;The Basic Argument&lt;/h2&gt;

&lt;p&gt;So the basic argument for normalization is:
   we wish to avoid the Programmer's Burden as completely as possible.
   We want to spend our time on cool features, not going back over and
   over to fix features we thought were finished already.
&lt;/p&gt;

&lt;h2&gt;Special Comment on First Normal Form&lt;/h2&gt;

&lt;p&gt;First normal form is different from the others.  When a database
   designer violates the higher normal forms, the result is that
   a fact is recorded in more than one place.  However, when you 
   violate first normal form it results in &lt;i&gt;more than one fact in
   the same place.&lt;/i&gt;
&lt;p&gt;

&lt;p&gt;A basic example would be the same table of employees and customers,
   where we "solve" the problems listed
   above by storing only one row
   for each employee, with a comma-separated list of accounts, like so:
&lt;/p&gt;

&lt;pre&gt;
EMPLOYEE   EMAIL                     CUSTOMERS
------------------------------------------------------------
ARANDOLPH  art@praxis.com            100, 523, 638, 724
SRUSSELL   sax@overlook.edu          516, 123, 158
PBOYLE     phyllis@sp-elevataor.com  713, 928, 212
&lt;/pre&gt;

&lt;p&gt;The above scheme increases the Programmer's Burden because now
   he must &lt;i&gt;decompose&lt;/i&gt; the data that comes from the server.
   In technical terms we say that the value CUSTOMERS is
   &lt;i&gt;non-atomic&lt;/i&gt;, it is not a single fact.  Every piece of
   code that touches that table must break down the list of
   customers and sometimes reassemble it.  
&lt;/p&gt;

&lt;p&gt;To see this, consider the basic task of adding a customer for
   employee Art Randolph.  If the tables were set up properly,
   you would insert into a cross-reference of employees and 
   customers, and duplicates would be trapped by a primary key.
   But here you must retrieve the list of existing customers, 
   split it up in application code, and check that the value is
   not repeated.  Then you have collapse the list back down and
   send it up to the server.  
&lt;/p&gt;

&lt;p&gt;All I can say is, no thanks.&lt;/p&gt;
   
&lt;h2&gt;By The Way, What Is The Right Way?&lt;/h2&gt;

&lt;p&gt;Now that we have beat up our fledgeling programmer's lousy
   employee-customer table, it would be worthwhile to spell
   out how to do it correctly.
&lt;/p&gt;

&lt;p&gt;First off, we always need one table for each kind of thing
   we are keeping track of.  That means we will have a table of
   employees and a table customers.  This solves all of the
   anomalies and inconsistencies listed above because we
   put facts about employees in the employees table (like email
   address) and facts about customers in the customers table.
&lt;/p&gt;

&lt;p&gt;This leaves the issue of linking employees to customers.
   There are three ways to do it:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;If each customer gets a team of employees assigned to
    them, but an employee only ever works for one customer, then
    put a &lt;a href=
"http://database-programmer.blogspot.com/2007/12/database-skills-foreign-keys-this-is.htmlhttp://database-programmer.blogspot.com/2007/12/database-skills-foreign-keys-this-is.html"
    &gt;Foreign Key&lt;/a&gt; on the employees table that links to the
    customers table.
    &lt;li&gt;If each employee works on more than one customer, but
    each customer gets only one employee, then put a 
    foreign key on the customers table that links back to
    employees.
    &lt;li&gt;If an employee can work for more than one customer and
    vice-versa, make a &lt;a href
"http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html#rule5"
    &gt;Cross-reference&lt;/a&gt; between customers and employees.
&lt;/ol&gt;


&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This week we have seen a fairly simple argument for normalization,
   and one that regular readers of this blog have seen before: 
   normalization eliminates unnecessary coding burdens.  It is hard
   enough to get software projects done on time and on budget without
   imposing additional labor that could be avoided entirely by
   normalizing. 
&lt;/p&gt;

&lt;p&gt;I do not mean to imply that normalizing takes &lt;i&gt;no time&lt;/i&gt;
   or is instantly easier than a fear-based retreat into coding
   your way out of things.  It does take time to learn to normalize
   and it does take time to learn to code an application around
   normalized tables.   In my own experience I passed through the
   various erroneous mindsets that I make fun of in this blog, and 
   each time I put effort into learning the "right way" then every
   effort I made after that was forever easier, had fewer bugs, and
   made my customers more happy.  So I am not saying it is free, but
   I am saying it is one of the best bargains in town.
&lt;/p&gt;


&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;


&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;The normalization essays on this blog are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
    &gt;Revisiting Normalization and Denormalization&lt;/a&gt;.
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"&gt;Pay Me Now Or Pay Me Later&lt;/a&gt;
    &lt;li&gt;&lt;i&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"&gt;The Argument for Normalization (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"&gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"&gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"&gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"&gt;The Argument for Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"&gt;Denormalization Patterns&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"&gt;Keeping Denormalized Values Correct&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"&gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;
    
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-8570670664659852059?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/8570670664659852059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=8570670664659852059' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8570670664659852059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8570670664659852059'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html' title='The Argument For Normalization'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6805318746973220049</id><published>2008-09-28T18:27:00.003-04:00</published><updated>2010-11-28T22:15:39.934-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Philosophy'/><title type='text'>The Quest for the Absolute</title><content type='html'>&lt;p&gt;This is the Database Programmer blog, for anybody who wants
   practical advice on database use.&lt;/p&gt;

&lt;p&gt;There are links to other essays at the &lt;a href="#bottom"&gt;bottom of this post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;


&lt;p&gt;Today I am taking a huge detour from technical matters to lay out the
philosophical groundwork behind this blog.  The ideas presented today
lie beneath every essay on this site.  It is easy to observe that people
seem driven to formulate absolute truths to guide their pursuits.
Programming is no different, programmers are driven to find the 
absolutes that will universally guide their efforts.  Those absolutes
are not that hard to find, if you know the method for seeking them
out.  Fortunately, we have hundreds and thousands of years
of human efforts, both successes and failures, to draw upon when 
embarking upon the task.
&lt;/p&gt;


&lt;h2&gt;Absolutes in the Post-Modern Age&lt;/h2&gt;

&lt;p&gt;Academics refer to our current stage of history as the "Post-Modern"
   age.  Thinking in the post-modern age is dominated by a 
   deep mistrust
   of the very concept of absolute truth.  Many thinkers have noted
   that in the post-modern age the only absolute is that there
   are no absolutes.  Now, anybody who has not 
   bothered to read much past what they are handed likely believes
   much of this without even thinking about it, they may not know that
   in the history of the human race such thinking is less than 60
   years old.
&lt;/p&gt;

&lt;p&gt;But that "no absolutes" stuff is all nonsense at 
   best and downright cowardice at
   worst.  If you want an example of an absolute truth, try stepping
   off the edge of a cliff: even if you do not believe in gravity,
   gravity believes in you.  It is an absolute truth for me that if
   I do not take care of my customers my life becomes unpleasant.
   It is a further absolute truth for me that I constantly obvserve
   programmers proclaiming absolutes (always use relational, always
   use OO, etc).  When I stop observing it, then I suppose it won't
   be an absolute anymore (and I suppose then it never was?)
&lt;/p&gt;

&lt;p&gt;So let us now cheerfully ignore the wailing of those who cry 
   that there are no absolutes, and ask if we might discover some
   elements of software development strategy that hold true always
   (ok, maybe mostly always) for the
   context of database application development.
&lt;/p&gt;
   

&lt;h2&gt;Aristotle and Virtue&lt;/h2&gt;

&lt;p&gt;Nowadays nobody has to read philosophy much anymore, at least not where
   I live (in the United States), so most programmers have never heard of
   a man named Aristotle, who lived about 2500 years ago.  This is a shame,
   because Aristotle had a logical way of thinking about things that
   would warm the heart of any programmer.
&lt;/p&gt;

&lt;p&gt;One of Aristotle's major contributions to civilization was his
   formulation of what philosophers call "virtue".  Philosophers use the
   term in a technical sense, and they do not use "virtuous" to mean
   "nice" or "pleasant" or "good-natured."  To a philosopher (or at least
   those that taught me) something is virtuous in Aristotelean terms if
   if &lt;i&gt;performs its function well&lt;/i&gt;.  The standard classroom example
   is that a virtuous table serves the function of a table, and a 
   virtuous table maker is somebody who makes good tables.
&lt;/p&gt;

&lt;p&gt;This is a very useful concept for programmers.  If we want to speak
   of a "virtuous" program, we mean simply one that meets its goals.
   This takes the whole high-minded theory and philosophy stuff back
   to real down-to-earth terms.  (This is why I always preferred Aristotle
   to Plato).  
&lt;/p&gt;

&lt;p&gt;In the quest for the absolute, if we let the ancient philosophers
   guide us, we discover the surprisingly basic idea that our programs
   &lt;i&gt;should perform their functions well if they are to be called
   virtuous.&lt;/i&gt;  This is easy to swallow, easy to understand, and
   easy to flesh out.
&lt;/p&gt;


&lt;h2&gt;What is a Virtuous Computer Program?&lt;/h2&gt;

&lt;p&gt;A virtuous computer program is one that serves its purpose well, and
   so we need to flesh out the three purposes that are common to most
   programs:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;To meet some institutional or strategic goal of those who
        sign the checks (or accept the work as charity in some case).
    &lt;li&gt;To meet the goals of end-users, which almost always comes down
        to performance and ease-of-use.
    &lt;li&gt;To provide income for the developers (or meet their own goal of
        providing charity work for non-profits).
&lt;/ul&gt;

&lt;p&gt;Notice what is not on the list, things like &lt;i&gt;ensure all data resides
   in a relational database&lt;/i&gt;, or &lt;i&gt;implement all code in strictly
   object-oriented languages&lt;/i&gt;.  We are not nearly ready to consider
   such specific strategies as those, they are completely out of place
   here in a discussion of the unifying goals of all projects.
&lt;/p&gt;

&lt;p&gt;So let's review.  So far we know that the 
   absolutes of programming are the pursuit
   of virtue, which turns out to be a fancy way of saying that the program
   should perform its functions well, which turns out to mean simply that
   it should do what the check-signer asked for, in a way that is workable
   for the end-users, and at a price that keeps the programmer fed.
&lt;/p&gt;

&lt;p&gt;This leads us towards strategies for reaching those goals.&lt;/p&gt;

&lt;h2&gt;The Virtuous Programming Strategy&lt;/h2&gt;

&lt;p&gt;Continuing with the idea that a virtuous program meets is basic
   goals, we can say that a virtuous strategy smooths the way for
   a programmer to meet the basic goals.
   An unvirtuous (or just plain bad) strategy
   litters the path with obstructions or ends up not meeting the
   goals of the check-signer, end-users, programmer, or all of the
   above.
&lt;/p&gt;

&lt;p&gt;Before we can begin to formulate a strategy, we must look next
   at the reality of the programming world.  Some of the fundamental
   realities include (but are not limited to):
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The end-user or check-signer may not fully understand or
        be able to articulate their requirements.
    &lt;li&gt;The programmer may not correctly understand requirements,
        even when correctly articulated.
    &lt;li&gt;In a healthy prosperous situation there will be new
        requirements that interact with established requirements
        in ways that range from no interaction at all to 
        fiendish incompatibilities.
    &lt;li&gt;The world will change around you, creating demands that did
        not exist when the system was created (some of us can still
        remember when there was no internet).
    &lt;li&gt;Staff will come and go.
    &lt;li&gt;...and so on.
&lt;/ul&gt;

&lt;p&gt;So even before we begin formulating particular strategies for particular
   situations, we recognize that our strategy had underlying goals it
   must facilitate, such as:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Being easy to change, both for correcting mistakes
        and adding features.
    &lt;li&gt;Being able to maintain and sort out possibly contradictory
        requirements that arise as the years go by.
    &lt;li&gt;Requiring little or no "deep magic" that depends on arcance
        knowledge of employees who may depart.
    &lt;li&gt;Being able to expect the unexpected (like the explosion
        of the web etc.)
&lt;/ul&gt;

&lt;p&gt;Only after we have worked through to this point can we begin to
   evaluate specific strategies and technologies.  We can now begin
   to ask about the proper context of the database server, where to
   use object orientation, and if javascript is a good programming
   language.  Anything that responds to our core goals and realities
   can be considered for use, anything which does not play into the
   core goals is useless at best and obstructive at worst.
&lt;/p&gt;

&lt;p&gt;Future essays (and some past essays) in this series will refer
   back to these ideas.  For example, many developers have observed
   over the years that if you &lt;a href=
"http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
   &gt;Minimize Code and Maximize Data&lt;/a&gt; then you gain many advantages
   in terms of development time, robustness, and feature count.
   Other ideas similar to this will come out over and over in future
   essays in this series.

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The strategies and techniques that you will see on this blog are
   all aimed at one way or another towards the goals expressed in
   this essay.  At the very beginning comes the goals of the
   check-signer, the end-users, and the programmer.  From there we
   seek strategies that will satisfy our need to grow, change,
   correct, and adapt.  Only then can we ask about the technologies
   such as databases and object-oriented languages and see how 
   well they let us meet all of these goals.
&lt;/p&gt;


&lt;a name="bottom"&gt;
&lt;h2&gt;Related Essays&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;This blog has two tables of contents, the  
&lt;a href="http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html"&gt;Topical Table of Contents&lt;/a&gt; and the list 
of 
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"&gt;Database Skills&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Other philosophy essays are:&lt;/p&gt;

&lt;ul&gt;&lt;a href="http://database-programmer.blogspot.com/2010/11/prepare-now-for-possible-future-head.html"
        &gt;Prepare Now For Possible Future Head Transplant&lt;/a&gt;
    &lt;i&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html"
        &gt;The Quest for The Absolute (this essay)&lt;/a&gt;&lt;/i&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2009/03/i-am-but-humble-filing-clerk.html"
        &gt;I Am But A Humble Filing Clerk&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"
        &gt;Why I Do Not Use ORM&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
        &gt;Minimize Code, Maximize Data&lt;/a&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6805318746973220049?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6805318746973220049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6805318746973220049' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6805318746973220049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6805318746973220049'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html' title='The Quest for the Absolute'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6644921613098942952</id><published>2008-09-21T18:56:00.032-04:00</published><updated>2010-12-21T22:40:21.927-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Lots of Links'/><title type='text'>Topical Table of Contents</title><content type='html'>&lt;p&gt;This posting is updated whenever a new post goes up.&lt;/p&gt;

&lt;p&gt;There is a also a
&lt;a href="http://database-programmer.blogspot.com/2010/11/database-skills.html"
&gt;Skills-oriented Table Of Contents&lt;/a&gt;.  It is not as complete as this list,
which lists all posts, but it is more centered on links as they relate to skills.


&lt;div style="margin: 10px; border: 1px solid black; padding: 5px;
          background-color: lightgreen"&gt;
&lt;p&gt;          
If you want some free analysis, why not &lt;a href="http://database-programmer.blogspot.com/p/submit-your-analysis-request.html"
&gt;submit your schema&lt;/a&gt; to the
Database Programmer?  If you are willing to discuss your issues with a bit of public
exposure, I will provide free analysis, and everybody can benefit!
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://database-programmer.blogspot.com/2010/12/user-submitted-analysis-topic-email.html"
&gt;User-Submitted Analysis Topic: Email&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;a name="app"&gt;
&lt;h2&gt;The Application Stack&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/12/working-definition-of-business-logic.html"
    &gt;A Working Definition of Business Logic, With Implications for CRUD Code&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/12/cost-of-round-trips-to-server.html"
    &gt;The Cost of Round Trips To The Server&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/12/critical-analysis-of-algorithm-sproc.html"
    &gt;Critical Analysis of An Algorithm: Sproc, Embedded SQL, ORM&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2010/12/historical-perspective-of-orm-and.html"
    &gt;Historical Perspective of ORM and Alternatives&lt;/a&gt;
    
&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/02/framework-and-database.html"
    &gt;The Framework And The Database&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/10/data-and-code-at-application-level.html"
    &gt;Data And Code At The Application Level&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html"
    &gt;Approaches to UPSERT&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html"
    &gt;Why I Do Not Use ORM&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="norm"&gt;
&lt;h2&gt;Table Design Basics: Keys, Normalization, Denormalization&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;The first group of posts introduces the must-know terms and techniques for table design.
&lt;/p&gt;

&lt;p&gt;It might be a good idea to start with 
&lt;a href="http://database-programmer.blogspot.com/2009/04/relational-model.html"
&gt;The Relational Model&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/11/database-skills-introdution.html"
    &gt;Introduction&lt;/a&gt; (spell it out or figure it out)
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-primary-keys-this-is.html"
    &gt;Primary Keys&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-foreign-keys-this-is.html"
    &gt;Foreign Keys&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-first-normal-form.html"
    &gt;First Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2007/12/database-skills-second-normal-form.html"
    &gt;Second Normal Form&lt;/a&gt;
    &lt;li&gt;&lt;a 
href="http://database-programmer.blogspot.com/2008/01/database-skills-third-normal-form-and.html"
    &gt;Third Normal Form and Calculated Values&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/of-tables-and-constraints.html"
    &gt;Of Tables and Constraints&lt;/a&gt; (also listed below in Table Design Patterns)
&lt;/ul&gt;

&lt;p&gt;Following up on the normal forms are some basic discussions of
   normalization and denormalization.
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html"
&gt;The Argument For Normalization&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html"
&gt;The Argument For Denormalization&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2010/11/revisiting-normalization-and.html"
&gt;Revisiting Normalization And Denormalization&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/04/denormalization-patterns.html"
&gt;Denormalization Patterns&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/11/keeping-denormalized-values-correct.html"
&gt;Keeping Denormalized Values Correct&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"
&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"
&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;

&lt;/ul&gt;
  
&lt;br/&gt;
&lt;a name="tabdesign"&gt;
&lt;h2&gt;Table Design Patterns&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;The second subseries details commonly occurring patterns in table design,
   how to recognize them and when to use them.
&lt;/p&gt;

&lt;p&gt;There is a complete &lt;a href=
"http://database-programmer.blogspot.com/2008/01/table-design-patterns.html"
&gt;List of Table Design Patterns&lt;/a&gt;.  The rest of the entries are:

&lt;ul&gt;&lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html"
    &gt;A Sane Approach To Choosing Primary Keys&lt;/a&gt;

    &lt;li&gt;&lt;a href="http://database-programmer.blogspot.com/2008/07/different-foreign-keys-for-different.html"
    &gt;Different Foreign Keys for Different Primary Keys&lt;/a&gt;

    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/01/table-design-patterns-cross-reference.html"
&gt;Cross Reference Validation Pattern&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/01/table-design-pattern-limited.html"
&gt;Limited Transaction Pattern&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/02/false-patterns-such-as-reverse-foreign.html"
&gt;False Patterns and the Reverse Foreign Key&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/02/primary-key-that-wasnt.html"
&gt;The Primary Key That Wasn't: Impermanent Primary Keys&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/04/advanced-table-design-resolutions.html"
&gt;Resolutions&lt;/a&gt; (Also listed in Queries below)
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/of-tables-and-constraints.html"
&gt;Of Tables And Constraints&lt;/a&gt; (Also listed above in keys, normalization, denormalization)
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/how-sql-union-affects-table-design.html"
&gt;How the SQL UNION Affects Table Design&lt;/a&gt; (Also listed below in Queries)

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/07/history-tables.html"
     &gt;History Tables&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/09/advanced-table-design-secure-password.html"
     &gt;Secure Password Resets&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2010/12/case-when-table-design-is-easy-and.html"
     &gt;A Case When Table Design is Easy and Predictable&lt;/a&gt; (combinatorial and maximum complexity)
&lt;/ul&gt;

&lt;br/&gt;

&lt;a name="select"&gt;
&lt;h2&gt;SQL SELECT and Queries&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/introduction-to-queries.html"
&gt;Introduction To Queries&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html"
&gt;GROUP BY, HAVING, SUM, AVG and COUNT(*)&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/how-sql-union-affects-table-design.html"
&gt;How SQL UNION Affects Table Design&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/join-is-cornerstone-of-powerful-queries.html"
&gt;The JOIN is the Cornerstone of Powerful Queries&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/04/joins-part-two-many-forms-of-join.html"
&gt;Joins Part Two, The Many Forms of JOIN&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/04/advanced-table-design-resolutions.html"
&gt;Resolutions&lt;/a&gt; (Also listed in Table Designs Above)
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html"
&gt;Recursive Queries With Common Table Expressions&lt;/a&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2010/11/really-cool-ntile-window-function.html"
&gt;The Really Cool NTILE() Window Function&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="algorithms"&gt;
&lt;h2&gt;Algorithms and Processes&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/08/advanced-algorithm-sequencing.html"
    &gt;Sequencing Dependencies&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="server"&gt;
&lt;h2&gt;Server-Side Code&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"
    &gt;Triggers, Encapsulation and Composition&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2010/11/loops-without-cursors.html"
    &gt;Loops Without Cursors&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="analysis"&gt;
&lt;h2&gt;Analysis&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/03/requirements-are-always-wrong-or.html"
    &gt;The Requirements Are Always Wrong&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/02/this-application-has-unique-business.html"
    &gt;This Application Has Unique Business Rule Needs&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="devcycle"&gt;
&lt;h2&gt;Development Cycle&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/02/database-development-table-structure.html"
    &gt;Table Structure Changes&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html"
    &gt;Dictionary Based Database Upgrades&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/upgrading-indexes-with-data-dictionary.html"
    &gt;Upgrading Indexes With A Data Dictionary&lt;/a&gt;
&lt;/ul&gt;


&lt;a name="philo"&gt;
&lt;h2&gt;Philosophy&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html"
    &gt;Minimize Code, Maximize Data&lt;/a&gt;

    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/09/quest-for-absolute.html"
    &gt;The Quest For The Absolute&lt;/a&gt;

    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/03/i-am-but-humble-filing-clerk.html"
    &gt;I Am But A Humble Filing Clerk&lt;/a&gt;



    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html"
    &gt;Triggers, Encapsulation and Composition&lt;/a&gt;  (repeated here because there is some philosophy in there)
&lt;/ul&gt;


&lt;a name="dictionary"&gt;
&lt;h2&gt;Data Dictionary&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Many of these posts are listed elsewhere in this table of contents,
   but I wanted to have them altogether in one place as well.
&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/06/using-data-dictionary.html"
    &gt;Using a Data Dictionary&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calculations-part-1.html"
&gt;The Data Dictionary and Calculations, Part 1&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/data-dictionary-and-calcuations-part-2.html"
&gt;The Data Dictionary and Calculations, Part 2&lt;/a&gt;

    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/dictionary-based-database-upgrades.html"
    &gt;Dictionary Based Database Upgrades&lt;/a&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/01/upgrading-indexes-with-data-dictionary.html"
    &gt;Upgrading Indexes With A Data Dictionary&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="security"&gt;
&lt;h2&gt;Security&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/05/introducing-database-security.html"
    &gt;Introducing Database Security&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2009/02/comprehensive-database-security-model.html"
     &gt;A Comprehensive Database Security Model&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/09/advanced-table-design-secure-password.html"
     &gt;Secure Password Resets&lt;/a&gt;
&lt;/ul&gt;


&lt;a name="perf"&gt;
&lt;h2&gt;Performance&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/06/database-performance-1-huge-inserts.html"
     &gt;Huge Inserts&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/07/database-performance-pay-me-now-or-pay.html"
     &gt;Pay Me Now or Pay Me Later&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/06/database-performance-web-layer.html"
     &gt;The Web Layer&lt;/a&gt;
&lt;/ul&gt;

&lt;a name="browser"&gt;
&lt;h2&gt;The Browser&lt;/h2&gt;
&lt;/a&gt;

&lt;ul&gt;
     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/07/wonderful-awful-browser.html"
     &gt;The Wonderful Awful Browser&lt;/a&gt;

     &lt;li&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/08/javascript-as-foreign-language.html"
     &gt;Javascript As a Foreign Language&lt;/a&gt;

&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6644921613098942952?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6644921613098942952/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6644921613098942952' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6644921613098942952'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6644921613098942952'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html' title='Topical Table of Contents'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6797440758396684905</id><published>2008-09-07T18:58:00.000-04:00</published><updated>2008-09-07T18:59:11.161-04:00</updated><title type='text'>Advanced Table Design: Secure Password Resets</title><content type='html'>&lt;p&gt;Most web-based database applications make use of email
   to allow users to change their passwords.  Completing
   securing this operation can be tricky business, and one
   of the best ways to do it is to user database
   server abilities.
&lt;/p&gt;

&lt;h2&gt;Disclaimer 1: Only As Secure as Email&lt;/h2&gt;

&lt;p&gt;We tend to take it for granted today that password reset systems
   work through email.  We reason that if a user can access an email
   sent by us then they are who they say they are.  Obviously this
   will not be true if a user's email account has been compromised.
&lt;/p&gt;

&lt;p&gt;Dealing with the possibility of compromised email accounts is
   outside the scope of this week's essay.  There are other 
   strategies available to reduce that risk, but they will be 
   treated in some future essay.
&lt;/p&gt;

&lt;h2&gt;Disclaimer 2: Only SSL (HTTPS) of Course!&lt;/h2&gt;

&lt;p&gt;It is not much use giving yourself a super-secure email system
   if you transmit sensitive information over unencrypted 
   connections.  Secure Socket Layers (SSL) should always be used
   when high security is required.  For the end-user this means
   they are going to a site through HTTPS instead of HTTP.
&lt;/p&gt;

&lt;h2&gt;Password Resets vs. Sending Passwords&lt;/h2&gt;

&lt;p&gt;On some low-security systems it is acceptable
   to send a user his password in an email.  This approach
   is very ill-advised in higher security contexts because
   we have no control over the user's storage of that email.
   It could end up anywhere, and anybody might read it.
&lt;/p&gt;

&lt;p&gt;When security requirements are higher, it is better
   to force the user to reset their password.  There are
   several reasons for this, but the important one here
   is that we do not want to send the actual password in
   an email.  Therefore we must send a link that sends them
   to a page where they can provide a new password.
&lt;/p&gt;

&lt;h2&gt;The Requirements&lt;/h2&gt;

&lt;p&gt;If we spell out the requirements for a secure password
   reset system, they are at the very least these:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;We must generate some hash and send it to the user,
        this is how she will identify herself so we
        can let her change her password.
    &lt;li&gt;The hash must expire at some point, since we cannot
        be sure the user will completely purge out the
        email (or that he even can, depending on the policy
        of the email host).
    &lt;li&gt;It must be completely impossible for anybody to read
        the hash, otherwise they could intercept the
        reset process and set a password for themselves.
    &lt;li&gt;Despite requirement 3 just listed, we must somehow
        verify the hash when the user presents it.
    &lt;li&gt;We must be able to change the user's password, which
        is a priveleged operation, even though &lt;i&gt;the user
        is not even logged in&lt;/i&gt;.
&lt;/ol&gt;

&lt;p&gt;It is not actually possible to implement these requirements
   in application code alone (or perhaps I should say is not
   possible to do it and meet minimum acceptable risk).
   There are two problems if you try it:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Requirements 3 and 4 cannot be reconciled.  If the
        application is able to read the hash to verify it,
        then a vulnerability in the application code could
        lead to compromise.  If we implement in application
        code we have the burden of ensuring practically
        zero vulnerabilities, while if we go server-side
        we have no such burden (at least for this feature).
    &lt;li&gt;Requirement five requires the application code to
        connect at a very high privelege level, which could
        lead to completely unrelated vulnerabilities.
&lt;/ol&gt;

&lt;h2&gt;Implementing In The Database&lt;/h2&gt;

&lt;p&gt;The system I will now describe meets all 5 of the 
   requirements listed above while never requiring a 
   priveleged connection to the database.  The feature
   is implemented in an isolated system that cannot
   touch other systems, and it has no burden to be
   particularly careful in writing the application
   code.
&lt;/p&gt;

&lt;p&gt;Since a picture is worth a thousand words, here it is:
&lt;/p&gt;

&lt;center&gt;
&lt;img src="http://www.andromeda-project.org/images/kfd-blog/dbskills-35-emailpassword.png" /&gt;
&lt;/center&gt;

&lt;p&gt;The process begins at the top left.  The user
   (Yellow circle)
   clicks on some "Forgot Password" link and provides
   an email or account id.  This goes to web server
   which generates an INSERT to the &lt;i&gt;insert-only&lt;/i&gt;
   table of hashes.  This insert contains only the
   user's id, nothing else is needed.
   There is a trigger on the table that fires on the
   INSERT.  This trigger generates the hash and
   sends the email to the user.
&lt;/p&gt;

&lt;p&gt;The salient features here are that the table is
   insert-only, which is explained below, and that
   the trigger operates at super-user level, which
   is also explained below.
&lt;/p&gt;

&lt;p&gt;Once the user receives the link and clicks on it,
   our process goes over to the right.  The user 
   lands on a page and provides a new password
   (and probably of course must type it in twice).
   The web server does basic things like making sure
   the two values match, that the password is long
   enough, and like that, and then generates an
   INSERT into a second table.  The insert contains
   the email or account ID, the hash, and the
   desired new password.
&lt;/p&gt;

&lt;p&gt;The magic begins on the INSERT into the second
   table.  An INSERT trigger running at superuser
   level is allowed to look at the first table and
   verify the hash and its expiration.  If these
   match, it sets the user's password.  
&lt;/p&gt;

&lt;p&gt;Simple, really, IMHO.&lt;/p&gt;

&lt;h2&gt;Feature 1: Insert Only Tables&lt;/h2&gt;

&lt;p&gt;This system depends on creating tables that any 
   unpriveleged user can insert into, but which nobody
   can SELECT from or UPDATE to or DELETE from.
&lt;/p&gt;

&lt;p&gt;This may sound like a joke: "Insert Only Table", something
   like "Write only memory".  But the idea is very simple,
   if nobody can SELECT from the table then nobody can
   discover active hashes.  If nobody can UPDATE the table
   then nobody can forge hashes.  Finally, if nobody can
   DELETE from the table then nobody can cause mischief.
&lt;/p&gt;

&lt;p&gt;The code for the tables looks like this:&lt;/p&gt;

&lt;pre class="code"&gt;
-- FIRST TABLE
CREATE TABLE users_pwrequests
(
  recnum_pwr integer,
  user_id character varying(40),
  md5 character(32),
  ts_ins timestamp without time zone,
)
-- NOTE! This syntax is PostgreSQL, there may be
-- slight variations on other platforms.
REVOKE ALL ON TABLE users_pwrequests FROM PUBLIC;
GRANT INSERT ON TABLE users_pwrequests FROM PUBLIC;

-- SECOND TABLE
CREATE TABLE users_pwverifies
(
  recnum_pwv integer,
  user_id character varying(40),
  md5 character(32),
  member_password character varying(20),
)
REVOKE ALL ON TABLE users_pwverifies FROM PUBLIC;
GRANT INSERT ON TABLE users_pwverifies FROM PUBLIC;

&lt;/pre&gt;

&lt;h2&gt;Feature 2: Trigger Security Priveleges&lt;/h2&gt;

&lt;p&gt;It is possible on most servers to severely limit
   a user's allowed actions on a table, but then
   to provide trigger code that fires on those actions
   and executes a super-user level.  Today's technique
   depends upon this ability.  Trigger code operating
   at superuser level can look at the insert-only
   table to verify a hash, and it can also set the
   user's password.
&lt;/p&gt;

&lt;p&gt;This basic ability is 
   what makes triggers so amazing and cool
   for implementing business logic (see also
   &lt;a href=
"http://database-programmer.blogspot.com/2008/05/database-triggers-encapsulation-and.html" 
   &gt;Triggers and Encapsulation&lt;/a&gt;), because there
   is no way for a user to directly invoke a 
   trigger for his own nefarious purposes, and there
   is no way for a cracker to avoid the firing of
   the trigger if he performs an action on a table.
   Triggers are truly the most powerful example of
   encapsulation of data and code that is available
   to today's programmer.
&lt;/p&gt;

&lt;p&gt;The first trigger looks something like this (
   this is PostgreSQL code, your server will likely
   require variations) (I have also stripped it
   down for brevity, it may not work exactly 
   without modification):
&lt;/p&gt;

&lt;pre class="code"&gt;
CREATE OR REPLACE FUNCTION users_pwrequests_ins_bef_r_f()
  RETURNS trigger AS
$BODY$
DECLARE
    NotifyList text = '';
    ErrorList text = '';
    ErrorCount int = 0;
    AnyInt int;
    AnyRow RECORD;
    AnyChar varchar;
    AnyChar2 varchar;
    AnyChar3 varchar;
    AnyChar4 varchar;
BEGIN
    -- necessary for an old glitch in pg security
    SET search_path TO public;

    -- Only execute if the user's id is valid
    SELECT INTO AnyInt Count(*)
           FROM users WHERE user_id = new.user_id;
    IF AnyInt &gt; 0 THEN 
       SELECT INTO AnyChar email
              FROM users WHERE user_id = new.user_id;
       -- This lets you put the email itself into 
       -- a table for admin control
       SELECT INTO AnyChar2 variable_value
              FROM variables
             WHERE variable = 'PW_EMAILCONTENT';
       -- Also the server is stored in a table
       SELECT INTO AnyChar3 variable_value
              FROM variables
             WHERE variable = 'SMTP_SERVER';
             
       -- This becomes the email FROM Address
       SELECT INTO AnyChar4 variable_value
              FROM variables
             WHERE variable = 'EMAIL_FROM';
       IF AnyChar4 IS NULL THEN AnyChar4 = ''; END IF;
       
       -- Very important! Set the md5 hash!
       new.md5 := md5(now()::varchar);
       
       -- Call out to a stored procedure that sends emails
       PERFORM pwmail(AnyChar
          ,'Password Reset Request'
          ,AnyChar2 || new.md5
          ,AnyChar3
          ,AnyChar4);
       EXECUTE ' ALTER ROLE ' || new.user_id || ' NOLOGIN ';
    END IF;    -- 3000 PK/UNIQUE Insert Validation

END; $BODY$
  -- The "SECURITY DEFINER" is crucial, it allows 
  -- the trigger to run as the super-user who 
  -- created it
  LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER
&lt;/pre&gt;

&lt;p&gt;The second trigger looks like this:&lt;/p&gt;

&lt;pre class="code"&gt;
CREATE OR REPLACE FUNCTION users_pwverifies_ins_bef_r_f()
  RETURNS trigger AS
$BODY$
DECLARE
    NotifyList text = '';
    ErrorList text = '';
    ErrorCount int = 0;
    AnyInt int;
    AnyRow RECORD;
    AnyChar varchar;
    AnyChar2 varchar;
    AnyChar3 varchar;
    AnyChar4 varchar;
BEGIN
    SET search_path TO public;

    -- Read the first table to see if the 
    -- link is valid and has not expired
    SELECT INTO AnyInt Count(*)
           FROM users_pwrequests
          WHERE user_id = new.user_id
            AND md5     = new.md5
            AND age(now(),ts_ins) &lt; '20  min';         
    IF AnyInt = 0 THEN                                
        ErrorCount = ErrorCount + 1; 
        ErrorList  = ErrorList || 'user_id,9005,Invalid Link;';
    ELSE 
       -- Magic!  The user's password is set
        EXECUTE 'ALTER ROLE ' ||  new.user_id 
            || ' LOGIN PASSWORD ' 
            || quote_literal(new.member_password);
            
        -- Very important!  Now that we have set it,
        -- erase it so it is not saved to the table
        new.member_password := '';
    END IF;    -- 3000 PK/UNIQUE Insert Validation

    IF ErrorCount &gt; 0 THEN
        RAISE EXCEPTION '%',ErrorList;
        RETURN null;
    ELSE
        RETURN new;
    END IF;
END; $BODY$
  LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER;
&lt;/pre&gt;

&lt;h2&gt;Feature 3: Sending Email From Database Server&lt;/h2&gt;

&lt;p&gt;The technique present above requires that your
   database server be able to send emails.  This
   is not always possible.  Postgresql 
   (&lt;a href="http://www.postgresql.org"&gt;www.postgresql.org&lt;/a&gt;)
   can do it, and I have to believe the other big guys
   can as well, but I have not tried it yet personally.
&lt;/p&gt;

&lt;p&gt;To send emails through a PostgreSQL server, you must 
   install Perl as an untrusted language, and then install
   the Perl MAIL package.  If anybody wants to know more
   about that then please leave a comment and I will
   expand the essay to include that.
&lt;/p&gt;

&lt;h2&gt;Feature 4: The Empty Column&lt;/h2&gt;

&lt;p&gt;There is one more note that should be made.  To use
   this system, you must tell the server the user's
   desired new password.  To do that, you must actually
   make it part of the INSERT command and therefore you
   must have a column for it in the 2nd read-only table.
   However, you certainly do not want to actually save
   it, so you have the trigger set the password
   first and then blank out the value, so the final 
   row saved to the table does not actually contain
   anything.  This is noted in the code comments on
   the second trigger, which is included above.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The technique presented today makes full use of 
   database server abilities to create a password 
   reset system that is highly resistant to forgery,
   interception, and evil-admin meddling.  It makes
   use of a combination of restrictive table security,
   priveleged trigger code, and sending emails from
   the database server.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6797440758396684905?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6797440758396684905/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6797440758396684905' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6797440758396684905'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6797440758396684905'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/09/advanced-table-design-secure-password.html' title='Advanced Table Design: Secure Password Resets'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-6739932415295345173</id><published>2008-08-25T12:56:00.001-04:00</published><updated>2008-09-07T19:04:41.667-04:00</updated><title type='text'>Advanced Algorithm: Sequencing Dependencies</title><content type='html'>&lt;p&gt;Some database applications require you to perform a series of actions
   where you know only that some actions must be performed before others.
   Before you can perform the actions, you must work out a safe sequence
   that takes into account all of the dependencies.  This week in The
   Database Programmer we will see an algorithm for doing this.
&lt;/p&gt;

&lt;h2&gt;Examples&lt;/h2&gt;

&lt;p&gt;There are many examples where a programmer must work out dependencies
   before doing something.  
&lt;/p&gt;

&lt;p&gt;A manufacturing package may track many steps in the manufacture of an
   item.  Some steps cannot be performed until others are complete.  A 
   simple system would require the end-user to work out the entire process,
   but a better system would let the user enter only the dependencies: which
   processes require others to be complete.  In this kind of system the
   computer can be used to schedule manufacturing tasks.
&lt;/p&gt;

&lt;p&gt;All popular Linux distributions have a package installation system in 
   which each package lists its required dependencies.  If you want to install
   a large number of packages in one shot, producing a tangled bunch of
   related dependencies, today's algorithm can be used to work them all out.
&lt;/p&gt;

&lt;p&gt;If you are using a data dictionary to build tables, every foreign key
   represents a dependency, where the child table requires the parent table
   to exist before it can be built.  Today's algorithm can be used to
   sequence the tables and build them in order.
&lt;/p&gt;

&lt;p&gt;Another database example is generating code to perform calculations.
   Some calculations will depend on previous calculations, so your code
   generator must be able to sequence them all so that the calculations
   are performed in the proper order.  
&lt;/p&gt;

&lt;h2&gt;Big Words: Directed Acyclic Graph&lt;/h2&gt;

&lt;p&gt;The examples abvoe are all cases of what mathematicians call a &lt;a href=
   "http://en.wikipedia.org/wiki/Directed_acyclic_graph"
   &gt;Directed Acyclic Graph&lt;/a&gt;.  If you do not want to read the entire
   Wikipedia article, the main points are these:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;We have a set of items.  These can be anything you are keeping
    track of in your database.
    &lt;li&gt;Any item may be connected to zero or more other items.
    &lt;li&gt;The connection is one-way only.  So if we say A requires B, we are
    not saying that B also requires A (in fact it is forbidden).
    &lt;li&gt;There can be no loops (cycles).  If A requires B, B may not require A.  
    Further, if A requires B, and B requires C, C may not require A.
&lt;/ul&gt;

&lt;p&gt;Whenever I can, I like to point out that it is very useful to read up
   on the mathematical foundations of certain programming techniques.  
   We can often pick up very useful insights from those who think of these
   things at the most abstract level.  It is also much easier to get 
   advice from the more abstract-minded database people if you are at least
   marginally familiar with the mathematical terms.
&lt;/p&gt;

&lt;h2&gt;The Tables&lt;/h2&gt;

&lt;p&gt;So now let us proceed to the tables and the code.  The tables below
   show a data dictionary that will be used to generate DDL to build
   a database:
&lt;/p&gt;

&lt;pre&gt;
Table: TABLES

TABLE       | DESCRIPTION            | SEQUENCE
------------+------------------------+---------
ORDERS      | Sales Orders Headers   |  &lt;b&gt;?&lt;/b&gt;
ORDER_LINES | Sales order lines      |  &lt;b&gt;?&lt;/b&gt;
CUSTOMERS   | Customers              |  &lt;b&gt;?&lt;/b&gt;
ITEMS       | Items                  |  &lt;b&gt;?&lt;/b&gt;


Table: DEPENDENCIES

CHILD_TABLE  | PARENT_TABLE
-------------+---------------
ORDERS       | CUSTOMERS
ORDER_LINES  | ORDERS
ORDER_LINES  | ITEMS
&lt;/pre&gt;

&lt;p&gt;The problem here is knowing the safe order in which to build the
   tables.  If I try to build ORDER_LINES before I have built ITEMS,
   then I cannot put a foreign key onto ORDER_LINES, because ITEMS is
   not there.  In short, I need to know the value of the SEQUENCE 
   column in the example above.
&lt;/p&gt;

&lt;h2&gt;The Expected Answer&lt;/h2&gt;

&lt;p&gt;The example above is simple enough that we can work it out by hand.
   This is actually a good idea, because we want to get an idea of what
   the answer will look like:
&lt;/p&gt;

&lt;pre&gt;
TABLE       | DESCRIPTION            | SEQUENCE
------------+------------------------+---------
ORDERS      | Sales Orders Headers   |  1
ORDER_LINES | Sales order lines      |  2
CUSTOMERS   | Customers              |  0
ITEMS       | Items                  |  0
&lt;/pre&gt;

&lt;p&gt;This answer should be self-explanatory, except maybe for the fact that
   both CUSTOMERS and ITEMS have the same value.  We need to look at that
   before we can see the code that produces it.  Is it OK that two entries
   have the same value, and how would our program handle that?
&lt;/p&gt;

&lt;p&gt;The short answer is that it is perfectly OK and natural for two or more
   entries to have the same value.  All this means is that they can be done
   in any order &lt;i&gt;relative to each other&lt;/i&gt;, so long as they are done 
   before the other entries.
&lt;/p&gt;

&lt;p&gt;In terms of the example, where we want to build these tables in a 
   database, it means that:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;We would query the list of tables and sort by SEQUENCE
    &lt;li&gt;We would loop through and build each table
    &lt;li&gt;We don't care about ITEMS and CUSTOMERS having the same value,
        they get built
        in whatever which-way the server gives us the list.
&lt;/ul&gt;

&lt;p&gt;The same concept applies to the other potential examples: manufacturing,
   software packages, and generating calculations.  So long as you follow
   the sequence, we don't care about items that have the same value.
&lt;/p&gt;

&lt;h2&gt;Stating the Solution in Plain English&lt;/h2&gt;

&lt;p&gt;We are now ready to work out a program that will generate the SEQUENCE
   column.  The basic steps the program must perform are:
&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Initialize the column to -1.  A value of -1 means "Not sequenced."
    &lt;li&gt;Update the column to zero for all items that have no dependencies.
    &lt;li&gt;Repeat the following action until the affected rows are zero:
    &lt;i&gt;Update the SEQUENCE column to 1 (then 2, then 3) for all rows
       that have all of their dependencies sequenced already.&lt;/i&gt;  
    &lt;li&gt;Once the command in step 3 is no longer affecting any rows,
        check for any rows that have -1, these are involved in
        &lt;i&gt;circular dependencies&lt;/i&gt; and we cannot proceed until the
        user straightens them out.
&lt;/ol&gt;

&lt;h2&gt;Stating the Solution in Code&lt;/h2&gt;

&lt;p&gt;The first step is very easy, we initialize the table with this command:
&lt;/p&gt;

&lt;pre&gt;
UPDATE TABLES SET SEQUENCE = -1;
&lt;/pre&gt;

&lt;p&gt;The next step is also very easy, we mark with a '0' all of the
   tables that have no dependencies.  The basic idea is to find all of the
   entries that have no entries in DEPENDENCIES. 
&lt;/p&gt;

&lt;pre&gt;
UPDATE TABLES SET SEQUENCE = 0
 WHERE NOT EXISTS (SELECT child FROM DEPENDENCIES
                    WHERE child = TABLES.TABLE)
&lt;/pre&gt;

&lt;p&gt;Now for the hard part.  We now have to execute a loop.  On each pass
   of the loop we are looking for all items &lt;i&gt;whose dependencies have
   all been sequenced.&lt;/i&gt;  We will do this over and over until the command
   is not affecting any rows.  It is important that we cannot exit the
   loop by testing if all rows are sequenced, because a circular dependency
   will prevent this from happening and we will have an infinite loop.
&lt;/p&gt;

&lt;p&gt;You can control this loop from client code, but I wrote mine as a
   Postgres stored procedure.  This algorithm turns out to be surprisingly
   complicated.  The UPDATE command below may not be all that 
   self-explanatory.  What it works out is:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Get a list of child tables from the DEPENDENCIES table
    &lt;li&gt;JOIN through to TABLES to look at the SEQUENCE value
        of their parents.
    &lt;li&gt;Group and check that the minimum value is greater than zero, if
        it is it means all parents are sequenced and the table can
        be sequenced.
    &lt;li&gt;Update the SEQUENCE value for the tables we found
&lt;/ul&gt;

&lt;pre&gt;
CREATE OR REPLACE FUNCTION zdd.Table_Sequencer() RETURNS void AS
$BODY$
DECLARE
    -- Note that rowcount is initialized to be &gt; 0, this makes
    -- the loop work properly
    rowcount integer := 1;
    
    -- This tracks the value we are assigning to SEQUENCE.  We
    -- initialize it to 1 because we already took care of the
    -- the rows that have value 0
    lnSeq integer := 1;
BEGIN
    while rowcount &gt; 0 LOOP
        UPDATE tables set SEQUENCE = lnSeq
          FROM (SELECT t1.CHILD 
                  FROM DEPENDENCIES t1 
                  JOIN TABLES       t2 ON t1.PARENT = t2.TABLE
                 GROUP BY t1.CHILD
                HAVING MIN(t2.SEQUENCE) &gt;= 0
                ) fins
          WHERE TABLES.TABLE = fins.CHILD
            AND TABLES.SEQUENCE = -1;

  lnSeq := lnSeq + 1;
  GET DIAGNOSTICS rowcount = ROW_COUNT;
 END LOOP;
 
 RETURN;
END;
$BODY$
LANGUAGE plpgsql;
&lt;/pre&gt;

&lt;p&gt;The stored procedure above will stop executing once the UPDATE command
   is no longer having any effect.  Once that happens, your final step is
   to make sure that all rows have a valid SEQUENCE value, which is to say
   that no entry has SEQUENCE of -1.  If any of the rows have that value
   then you have a circular dependency.  You must report those rows to
   the user, and you can also report the dependencies that are causing
   the loop.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Sequencing dependencies is a fundamental algorithm that has a lot of
   use cases in database applications.  It is easy enough to accomplish,
   but the innermost UPDATE command can be a little puzzling when you
   first look at it.  Once you have mastered this algorithm you are on
   the way to the "big leagues" of database applications such as ERP,
   MRP and others.
&lt;/p&gt;

&lt;p&gt;Next Essay: &lt;a href=
"http://database-programmer.blogspot.com/2008/09/advanced-table-design-secure-password.html"
   &gt;Secure Password Resets&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-6739932415295345173?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/6739932415295345173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=6739932415295345173' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6739932415295345173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/6739932415295345173'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/08/advanced-algorithm-sequencing.html' title='Advanced Algorithm: Sequencing Dependencies'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-1831564724368020791</id><published>2008-08-03T13:58:00.006-04:00</published><updated>2008-09-07T19:06:57.623-04:00</updated><title type='text'>Javascript As a Foreign Language</title><content type='html'>&lt;p&gt;So you know 37 different programming languages, 
   you've programmed moon landers, missiles and
   toasters, and how could Javascript be any problem?
   Then you start trying to code up some Javascript
   and find that it just does not feel right, 
   nothing seems to flow naturally or easily.  Your
   instincts do not seem to guide you.  You are not 
   alone, here is your cheatsheet...
&lt;/p&gt;

&lt;p&gt;Welcome to the Database Programmer blog.  If you
   are trying to write database applications in 2008
   then you most likely bump into Javascript.  My hope
   in this week's essay is to provide a "soft landing"
   into this beautiful and powerful but somewhat 
   strange language.
&lt;/p&gt;

&lt;p&gt;To see the other essays in this series, consult
   our &lt;a href=
"http://database-programmer.blogspot.com/2007/12/database-skills-complete-contents.html" 
   &gt;Complete Table of Contents&lt;/a&gt;.
&lt;/p&gt;

&lt;h2&gt;Contents&lt;/h2&gt;

&lt;p&gt;Today's essay is rather long.  It covers extremely
   basic ideas but proceeds directly to very powerful
   techniques.  I have provided
   a summary here so that you can skip over the material
   that may already be familiar to you.
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href='#fb'&gt;Firefox and Firebug&lt;/a&gt;
    &lt;li&gt;&lt;a href='#execution'&gt;Execution&lt;/a&gt;
    &lt;li&gt;&lt;a href='#scope'&gt;Variable Scope&lt;/a&gt;
    &lt;li&gt;&lt;a href='#adding'&gt;Adding Methods to Core Javascript&lt;/a&gt;
    &lt;li&gt;&lt;a href='#functions'&gt;Functions as First Class Citizens&lt;/a&gt;
    &lt;li&gt;&lt;a href='#objectclasses'&gt;Objects And Classes&lt;/a&gt;
    &lt;li&gt;&lt;a href='#objectsyntax'&gt;Creating An Object Without A Class&lt;/a&gt;
    &lt;li&gt;&lt;a href='#objectprops'&gt;Accessing Object Properties and Methods&lt;/a&gt;
    &lt;li&gt;&lt;a href='#iteration'&gt;Iteration&lt;/a&gt;
    &lt;li&gt;&lt;a href='#jsonajax'&gt;JSON and Ajax&lt;/a&gt;
    &lt;li&gt;&lt;a href='#syncajax'&gt;Synchronous Ajax: S-JSON&lt;/a&gt;
    &lt;li&gt;&lt;a href='#jquery'&gt;JQuery and Friends&lt;/a&gt;  
&lt;/ul&gt;

&lt;a name="fb"&gt;
&lt;h2&gt;Start Off: Firefox and Firebug&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;In case you have been living under a rock for
   the past few years, let me tell you to do your
   development in a real web browser, that is, 
   Firefox, and to immediately download the 
   Firebug extension.  Firebug more or less does 
   everything you need to debug Javascript, and it
   has many features you may not even know you need.
   Do not try to develop Javascript without Firebug.
&lt;/p&gt;

&lt;p&gt;In particular, firebug has a "console" object
   that you can send messages to, such as this:
&lt;/p&gt;

&lt;pre&gt;
console.log("this is so much better than alert()!");
for(var x = 1; x&lt;10; x++) {
    console.log("We are on "+x);
}
&lt;/pre&gt;

&lt;a name="execution"&gt;
&lt;h2&gt;Execution&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Javascript executes while your page is being loaded,
   and can be placed anywhere on the page.  While I make
   no claims that the example below is good or bad
   practice, it does illustrate how Javascript executes. 
&lt;/p&gt;

&lt;pre&gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
&amp;lt;script&amp;gt;
// Script is executing as it is encountered,
// so this variable comes into existence
// immediately
var x = 5;  

function square(x) {
    return x * x;
}

// Now that the square function is defined,
// we can call it
var y = square(x);
&amp;lt;/script&amp;gt;
&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
&amp;lt;h1 id='h1'&amp;gt;Here is a Javascript Example!&amp;lt;/h2&amp;gt;

&amp;lt;script&amp;gt;
// Script can be embedded directly in the
// body of your HTML (for better or worse!)
var h1 = document.getElementById('h1');
h1.innerHTML = 'I changed the H1 content!';

// This function can be used anywhere downstream
function changeH1(newText) {
    var h1 = document.getElementById('h1');
    h1.innerHTML = newText;
}
&amp;lt;/script&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;div&amp;gt;Here is a div of text&amp;lt;/div&amp;gt;
&amp;lt;script&amp;gt;
changeH1("Changing H1 yet again!");
&amp;lt;/script&amp;gt;
&lt;/pre&gt;

&lt;a name="scope"&gt;
&lt;h2&gt;Variable Scope&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Scoping in Javascript is pretty straightforward.
   If you assign a value to a variable outside of any
   function it becomes a global.  If you explicitly
   define a variable as "window.x = 5" it becomes a
   global.  If you put the keyword var in front of it
   before using it it becomes local (and can mask a
   global variable of the same name).  You can use the
   "var" keyword inside of loops, and many javascript
   programmers use "var" everywhere.  Here is an example.
&lt;/p&gt;

&lt;pre&gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
&amp;lt;script&amp;gt;
// We are executing outside of a function, so
// both of these are globals:
var x = 5;
y = 10;

function example() {
    // Since a global named 'x' exists, and we do
    // not use the "var" keyword, we are re-assigning
    // the global variable
    x = 7;
    
    // Using the "var" keyword makes a local variable,
    // we cannot "see" the global x anymore
    var x = 2;
    alert(x);
    
    // I can still access the global variable to
    // set its value back:
    window.x = 5;
    alert(x);
    alert(window.x);    
}

&amp;lt;/script&amp;gt;
&amp;lt;/head&amp;gt;
&lt;/pre&gt;

&lt;a name="adding"&gt;
&lt;h2&gt;Adding Methods to Core Javascript&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Javascript lacks certain functions that are very
   useful to have, such as trimming spaces from strings.
   One very cool thing about Javascript is that
   you can directly add these methods to the core language,
   by adding functions to the "prototype" object of the
   core classes.  Here is how you add a "trim" function
   to core Javascript.
&lt;/p&gt;

&lt;pre&gt;
String.prototype.trim = function() {
 return this.replace(/^\s+|\s+$/g,"");
}
x = "   abc  ";
alert('-' + x + '-'); // the dashes let you see the spaces
alert('-' + x.trim() + '-');  // spaces removed!
&lt;/pre&gt;

&lt;p&gt;When I first saw this trick I dutifully copy-n-pasted
   it in and it worked, but the syntax looked very
   perplexing, I could not figure out how to make use
   of it myself.  My brain had not yet wrapped itself 
   around the Javascript mentality.  
   This leads directly to our next 
   concept, that functions are "first class cizitens".
&lt;/p&gt;

&lt;a name="functions"&gt;
&lt;h2&gt;Functions as First Class Citizens&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;You may have heard that Javascript treats functions
   as "first class citizens" and wondered, "what does
   that mean?"  The best way to explain it in terms of
   other languages is that you can create functions 
   on the fly and pass them around like variables.
   This may be a little hard to grasp, so we will go 
   directly to examples.
&lt;/p&gt;

&lt;pre&gt;
// Most languages support this type of function definition
function square(x) {
    return x * x;
}

// Javascript gives you a slightly different syntax if
// you like, which can be extremely powerful
var square = function(x) {
    return x * x;
}

// The books usually go on to an example like this, 
// which frankly did not seem to me to have any purpose:
y = x;
alert( y(5) );
&lt;/pre&gt;

&lt;p&gt;The basic idea to get here is that you can do anything
   with a function that you can do with a variable.  There
   are multiple uses for this, but we have already seen one,
   namely, the ability to add a method to a previously
   created class.  This is what we did above when we added
   the "trim()" method to the base "String" class.
   This means that our approach to building class hierarchies
   is very different than in other Object-oriented languages
   like PHP, Foxpro, Delphi, VB and so forth.
&lt;/p&gt;

&lt;pre&gt;
// This example shows two different ways to add methods
// to HTML elements and make them act more object-oriented.

// Method 1, make a function that makes an INPUT read-only
// by changing its style and setting a property.  Notice
// the code refers to "this" as if it were part of an
// object, see below to see why that works.
function makeReadOnly() {
    this.className = 'readOnly';
    this.readOnly = true;
}

// Now attach that function to a DOM element (an HTML INPUT)
var input = document.getElementById('myInput');
input.makeReadOnly = makeReadOnly;

// Some other code can now tell the input to go into
// read only mode:
function changeModes() {
    var input = document.getElementById('myInput);
    // When this executes, the "this" variable in 
    // the function will refer to "input"
    input.makeReadOnly();
}
&lt;/pre&gt;

&lt;p&gt;There is another way to do this as well, that really
   illustrates how to make use of Javascript's native
   abilities:
&lt;/p&gt;

&lt;pre&gt;
// Method 2 is to defne the function while adding it
// to the INPUT element.
var input = document.getElementById('myInput');
input.makeReadOnly = function() {
    this.className = 'readOnly';
    this.readOnly = true;
}

// This code works exactly as it did above
function changeModes() {
    var input = document.getElementById('myInput);
    input.makeReadOnly();
}
&lt;/pre&gt;

&lt;p&gt;Now that we have introduced this idea, it will come up
   all over the place in later examples.
&lt;/p&gt;

&lt;a name="objectclasses"&gt;
&lt;h2&gt;Objects And Classes&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;When I first tried to use Javascript I kept looking for
   the "class" keyword, but it's not there!  Believe it or
   not you use the "function" keyword to create what we 
   would call a class in other languages.  Here is an
   example of how to create and instantiate an object
   in Javascript:
&lt;/p&gt;

&lt;pre&gt;
// Here is a simple PHP class for an 
// object that handles a row from a database
class dbRow {
    var tableName = '';
    var rowId = 0;
    
    function dbRow(tableName,id) {
        this.tableId = table;
        this.fetchRow(id);
    }
    
    function fetchRow(id) {
        # ...more code here
    }
}

var x = new dbRow('customers',23);
&lt;/pre&gt;

&lt;p&gt;In Javascript we make a function instead of a class:&lt;/p&gt;

&lt;pre&gt;
function dbRow(tableName,id) {
    // When the object is instantiated, this
    // code runs immediately
    this.tableName = tableName;
    
    // We must define a fetchRow function before
    // we can actually call it....
    this.fetchRow = function(id) {
        // some kind of ajax stuff going on here
    }
    
    // ...and now we can invoke the function
    this.fetchRow(id);
}

// When this command returns we have a new "dbRow"
// object.  
var x = new dbRow('customers',23);
&lt;/pre&gt;

&lt;a name="objectsyntax"&gt;
&lt;h2&gt;Creating An Object Without a Class&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;We can say Javascript is "purely dynamic", by
   which we mean you can define anything on the fly,
   including ojects, even if you have no class 
   definition (er, I mean no "function()" definition...).
   You can explicitly create an object by enclosing
   the definition in curly braces.  Properties and their
   values are assigned with "name: value" syntax, separated
   by commas.   
   Since you can do anything with a function that you can
   do with a variable, the following is a nifty way to
   create an object:
&lt;/p&gt;

&lt;pre&gt;
var x = {
    propertyName: 'value',
    otherProperty: 'otherValue',
    
    square: function(x) {
        return x * x;
    }
    // Don't put a comma after the last property!
    // It will work in firefox but not in IE!
}

alert(x.square(5));
&lt;/pre&gt;
   
&lt;p&gt;This syntax is called "JSON" by the way, for 
   "Javascript Object Notation".  If you can get 
   comfortable with JSON you can start to code up
   some really elegant Javascript.
&lt;/p&gt;

&lt;a name="objectprops"&gt;
&lt;h2&gt;Accessing Object Properties and Methods&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;You can hardcode references to an object's properties
   by using the ".property" syntax, but you can also use
   variables that hold the name of the property.
&lt;/p&gt;

&lt;pre&gt;
// Create an object
var x = {
    first: 'Sax',
    last: 'Russel',
    
    combine: function() {
        return this.first + ' ' + this.last;
    }
}

// You can now explicitly access properties
alert (x.first);
alert (x.last);

// But you can also have a variable hold the name
// of the property you want:
var propName = 'first';
alert (x[propName]);

// Objects can be nested to any depth, and you can
// mix hardcoded and variable names.  If we had a
// complex data dictionary stored on the browser,
// we might get the caption for a column like this:
var tableId = 'customers';
var columnId = 'total_sales';
var caption = dd[tableId].columns[columnId].caption;
&lt;/pre&gt;

&lt;p&gt;This works also for functions.  Assuming the same
   object as the above, we can invoke functions that
   are named by other variables:
&lt;/p&gt;

&lt;pre&gt;
var x = { .... repeated from above example };

var methodName = 'combine';
alert( x[methodName]() );
&lt;/pre&gt;

&lt;a name="iteration"&gt;
&lt;h2&gt;Iteration&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;As a database programmer I write a lot of code that
   iterates arrays and associative arrays.  Iteration
   tends to be very important to database programmers,
   as it is the most natural way to loop through rows
   retrieved from a database, or to loop through the
   values in a row.  Basic iteration of an array looks
   like this:
&lt;/p&gt;

&lt;pre&gt;
// make an array 
var myList = [ 'sax', 'anne', 'nirgal', 'frank' ];
for(var idx in myList) {
    // console.log() requires firebug
    console.log("Index and value: "+idx+", "+myList[idx])
}
&lt;/pre&gt;

&lt;p&gt;All of the action is in the line "for(var idx in myList)",
   this structure will loop through the array.  On 
   each pass the variable "idx" will contain the 
   array's index number.  To actually get the value you
   need you have to go looking for myList[idx].
&lt;/p&gt;

&lt;p&gt;Associate Arrays are a very natural data structure for
   a database programmer, as they are an easy way to represent
   a single row retrieved from the database.  There is no
   explicit support for associative arrays in Javascript, 
   but this does not matter because you can use an object
   and get the same results.
&lt;/p&gt;

&lt;pre&gt;
// Here is an object
var myObject = {
   first: 'Sax',
   last: 'Russel',
   occupation: 'Physicist'
}
// Now we loop through it like an associative array
for(var key in myObject) {
    console.log("The array key is: " + key);
    console.log("The value is: " + myObject[key]);
}
&lt;/pre&gt;

&lt;a name="jsonajax"&gt;
&lt;h2&gt;JSON and Ajax&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Nowadays everybody is jumping into AJAX with both feet.
   AJAX can be particularly useful to a database programmer,
   because you can make AJAX calls that return objects
   (including code), arrays, and database data.
&lt;/p&gt;

&lt;p&gt;I should note that the term "AJAX" itself means something
   very precise, being "Asynchronous Javascript and XML",
   while the example I am about to show contains no XML,
   so my use of the term is not strictly correct.  
   Nevertheless, many people routinely use the term AJAX to
   mean any round-trip to the browser that fetches some 
   fragment of information without doing an entire page
   refresh.  While this is regrettable, I'm not going to try
   to buck that trend here.
&lt;/p&gt;

&lt;p&gt;That being said, here is a nifty way to use PHP to send
   a data structure back on an AJAX request:
&lt;/p&gt;

&lt;pre&gt;
# THE PHP CODE:
function someAjaxHandler() {
    $table = myFrameworkGetPostRetrievalFunction('table');
    $id    = myFrameworkGetPostRetrievalFunction('id');
    $row = myFrameworkRowRetrievalFunction("customers",23);
    
    # This nifty PHP function encodes arrays and objects
    # into JSON, very cool
    echo json_encode($row);
}
&lt;/pre&gt;

&lt;p&gt;This would be handled in the browser like so:
&lt;/p&gt;

&lt;pre&gt;
function someAjaxResponseHandler() {
    if (this.readyState != 4) return;
    try {
        eval( 'window.requestData ='+this.responseText);
    }
    catch(e) {
        alert("The server response was not parsable JSON!");
        return;
    }
}
&lt;/pre&gt;
  
&lt;a name="syncajax"&gt;
&lt;h2&gt;Synchronous AJAX and JSON: S-JSON&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;It is pretty safe to say that the asynchronous nature of
   AJAX is a powerful part of its appeal.  The request is sent
   and the browser remains responsive to the user until the
   request comes back.  This is especially powerful for fetching
   things in the background while the user works.
&lt;/p&gt;

&lt;p&gt;However, in database applications sometimes it is the Right
   Thing for the browser to stop while fetching data.  If a 
   user clicks on [SAVE] on a CRUD screen 
   to save some changes, we actually want
   the browser to wait until the server tells them that it all
   went ok (or not).  You can do this by setting a flag on your
   call.  I have found this a very powerful approach to
   writing desktop-in-browser applications:
&lt;/p&gt;

&lt;pre&gt;
function JSON(url) {
    // Create an object
    var browser = navigator.appName;
    if(browser == "Microsoft Internet Explorer"){
        var http = new ActiveXObject("Microsoft.XMLHTTP");
    }
    else {
        var http = new XMLHttpRequest();
    }

    // The trick is to pass "false" as the third parameter,
    // which says to not go asynchronously.

    http.open('POST' , url, false);
    http.send(null);
    
    // Execution now halts, waiting for the complete
    // return value

    // Once execution resumes, we can capture the
    // JSON string sent back by the server and do anything
    // we want with it
    try {
        eval( 'window.requestData ='+http.responseText);
    }
    catch(e) {
        alert("The server response was not parsable JSON!");
        return;
    }

    // Processing of the result occurs here...  
}
&lt;/pre&gt;

&lt;a name='jquery'&gt;
&lt;h2&gt;jQuery and Friends&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Nowadays we have a growing list of very powerful Javascript
   libraries available.  Some of them are very high level and
   some of them are low-level.
&lt;/p&gt;

&lt;p&gt;One library I will mention by name as being very useful
   is &lt;a href="http://jquery.com"&gt;jQuery&lt;/a&gt;.  This library
   provides a wealth of extremely simple and powerful 
   abilities for finding and manipulating the HTML that is
   in your document.  I highly recommend it.
&lt;/p&gt;

&lt;h2&gt;Closing Thoughts&lt;/h2&gt;

&lt;p&gt;Any database programmer working in 2008 is either already
   required to use Javascript or may find himself facing it
   soon.  Javascript is very flexible and powerful, but is
   different from the languages we are used to for writing
   applications, like PHP, Foxpro, Delphi, VB and others.
&lt;/p&gt;

&lt;p&gt;Nevertheless, Javascript can do everything you need it to 
   do, you just have to grasp the "javascript mentality" as
   it were.  I have attempted this week in this essay to
   put into one place all of the facts and tricks that were
   not so obvious to me from reading books or simply 
   taking a snippet of code and trying to modify it.  I
   hope that you find it useful!
&lt;/p&gt;

&lt;/p&gt;Next Essay: &lt;a href=
"http://database-programmer.blogspot.com/2008/08/advanced-algorithm-sequencing.html"
    &gt;Sequencing Dependencies&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-1831564724368020791?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/1831564724368020791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=1831564724368020791' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1831564724368020791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/1831564724368020791'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/08/javascript-as-foreign-language.html' title='Javascript As a Foreign Language'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-8406437372919342915</id><published>2008-07-27T15:38:00.002-04:00</published><updated>2008-08-03T14:09:05.086-04:00</updated><title type='text'>Different Foreign Keys for Different Tables</title><content type='html'>&lt;p&gt;A foreign key can be used to implement table design 
   patterns that span multiple tables.  By choosing how
   a foreign key handles a DELETE attempt on the parent
   table, you can structure your table designs to 
   follow two standard patterns.
&lt;/p&gt;

&lt;p&gt;Welcome to the Database Programmer blog.  This series
   of essays is for anybody who wants to learn about
   databases on their own terms.  There is a complete
   &lt;a href=
"http://database-programmer.blogspot.com/2007/12/database-skills-complete-contents.html"
   &gt;Table of Contents&lt;/a&gt;, as well as a summary of
   &lt;a href=
"http://database-programmer.blogspot.com/2008/01/table-design-patterns.html"
   &gt;Table Design Patterns&lt;/a&gt;.  There is a new essay in
   this spot each Monday morning.
&lt;/p&gt;

&lt;h2&gt;A Simple Example of Two Foreign Keys&lt;/h2&gt;

&lt;p&gt;Picture a basic shopping cart, with its two basic tables
   of CART and CART_LINES (or ORDERS and ORDER_LINES if you
   are more old-fashioned).  The table CUSTOMERS is also
   in there as a parent to CARTS.  Our three tables look
   something like this:
&lt;/p&gt;

&lt;pre&gt;
   CUSTOMERS
      |
      |
     /|\
     CART  Cart is child of customers
      |
      |
     /|\
   CART_LINES  Lines is child of Cart
&lt;/pre&gt;

&lt;p&gt;There are two foreign keys here.  CART has a foreign key
   to CUSTOMERS, and CART_LINES has a foreign key to CART,
   but the two foreign keys should behave very differently.
&lt;/p&gt;

&lt;h2&gt;Table Types and Table Design Patterns&lt;/h2&gt;

&lt;p&gt;In &lt;a href=
"http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html"
   &gt;A Sane Approach To Choosing Primary Keys&lt;/a&gt; we saw 
   that table design begins with identifying the basic
   kinds of tables: Reference and Small Master Tables,
   Large Master Tables, Transactions, and Cross-References.
   Just as we picked different kinds of primary keys 
   for the different tables, so will we pick different
   kinds of foreign keys between these tables.
&lt;/p&gt;

&lt;h2&gt;Deleting a Customer&lt;/h2&gt;

&lt;p&gt;Imagine you have a customer who has made 10 orders
   in 2 years.  A system administrator, who is allowed
   to basically do anything, goes into your admin
   screens, looks up the customer, and clicks [DELETE].
   What should happen?
&lt;/p&gt;

&lt;p&gt;The near-universal answer is that the user should
   be denied the action.  An error should come back that
   says "That customer has orders, cannot delete."
   We want it this way because we never want to delete
   any parent row and "orphan" the child rows.  
   Database programmers know from long experience that
   if you allow the DELETE, your queries will give incorrect
   results, or you will work extremely hard with lots
   of weird LEFT JOINS and UNIONS trying to
   get them to come back correctly.  
&lt;/p&gt;

&lt;p&gt;This is not an issue of "flexibility", where a more
   robust system would allow the deletion.  This is a 
   basic question of record-keeping.  If the customer has
   orders on file then the customer must be kept on file.
   Enforcing this rule keeps code clean and simple, and
   trying to avoid this rule in the name of "flexibility"
   just makes heaps of work for everybody.
&lt;/p&gt;

&lt;p&gt;Going further, the administrator in question, who 
   supposedly can do anything, may not violate the rule.
   An administrator is simply somebody who can do anything
   &lt;i&gt;that would not produce bad data&lt;/i&gt;.  Administrators
   should not be given the ability to violate &lt;i&gt;the basic
   structure of the data&lt;/i&gt;, they simply have full 
   rights to do anything &lt;i&gt;within the structure of the
   data&lt;/i&gt;.
&lt;/p&gt;

&lt;a name="deleterestrict"&gt;
&lt;h2&gt;The DELETE RESTRICT Foreign Key&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;The behavior we want here is called DELETE RESTRICT.
   On most database servers this is the default 
   behavior for a foreign key.  It means that you cannot
   delete a parent table row if there are matching
   rows in the child table.
&lt;/p&gt;
   
&lt;p&gt;The DELETE RESTRICT pattern is almost universally used
   when the child table is a transaction table and the
   parent table is a master table or reference table.
&lt;/p&gt;

&lt;p&gt;The syntax looks something like this:&lt;/p&gt;

&lt;pre&gt;
-- Most database servers implement DELETE RESTRICT
-- by default, so this syntax:
Create table CART (
    customer integer REFERENCES customers
   ,order    integer.....
)

-- ...is the same as this explicit syntax:
Create table CART (
    customer integer REFERENCES customers
                     ON DELETE RESTRICT 
   ,order    integer.....
)
&lt;/pre&gt;

&lt;a name="deleterecascade"&gt;
&lt;h2&gt;Deleting An Order and DELETE CASCADE&lt;/h2&gt;
&lt;/a&gt;

&lt;p&gt;Now let us say a staff member is on the phone with
   a customer, enters an order, enters five lines,
   and then the customers says "forget it" and the user
   needs to delete the entire order from the CART.
&lt;/p&gt;

&lt;p&gt;In this case the user wants to go delete the order,
   and he expects the computer &lt;i&gt;to also delete the
   lines&lt;/i&gt;.  This makes perfect sense, why keep the
   lines if we don't want the order?
&lt;/p&gt;

&lt;p&gt;It may seem strange that in the case of deleting 
   a customer it makes perfect sense to stop the user,
   but when deleting an order it makes perfect sense
   to delete the lines as well.  
&lt;/p&gt;

&lt;p&gt;The difference is that an entry in the CART table
   is a transaction entry.  When a user deletes a
   transaction they almost always want to automatically
   delete all of the relevant rows from all child tables
   as well.  The two rules basically are:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The user cannot delete a master entry that
        has transactions.
    &lt;li&gt;Deleting a transaction means deleting the
        entire transaction.
&lt;/ul&gt;

&lt;p&gt;NOTE: By "transaction" here I mean financial transaction
   or other interaction between master elements. I do not
   mean a database transaction.
&lt;/p&gt;

&lt;p&gt;The syntax for DELETE CASCADE looks something like this:
&lt;/p&gt;

&lt;pre&gt;
-- if the user deletes a row from CART,
-- do them the favor of deleting all of the
-- lines as well
Create table CART_LINES (
    order   integer REFERENCES CART
                    ON DELETE CASCADE
   ,order_line integer....
)
&lt;/pre&gt;

&lt;h2&gt;Conclusion: Different Tables Types, Different Foreign Key Types&lt;/h2&gt;

&lt;p&gt;I have said many times in these essays that the foreign key
   is the only meaningful way to connect data in different
   tables.  This week we have seen that the kind of foreign
   key you choose depends on what kind of tables you are
   connecting together.  Children of master tables generally
   get DELETE RESTRICT, and children of transaction tables
   generally get DELETE CASCADE.
&lt;/p&gt;

&lt;p&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/08/javascript-as-foreign-language.html"
   &gt;Next Essay: Javascript as a Foreign Language&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-8406437372919342915?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/8406437372919342915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=8406437372919342915' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8406437372919342915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/8406437372919342915'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/07/different-foreign-keys-for-different.html' title='Different Foreign Keys for Different Tables'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-4620199291474217668</id><published>2008-07-20T15:11:00.001-04:00</published><updated>2008-07-27T15:40:05.627-04:00</updated><title type='text'>History Tables</title><content type='html'>&lt;p&gt;A history table allows you to use one table to track
   changes in another table.  While the basic idea is
   simple, a naive implementation will lead to bloat
   and will be difficult to query.  A more sophisticated
   approach allows easier queries and can produce not
   just information about single rows, but can also
   support aggregrate company-wide queries.
&lt;/p&gt;

&lt;p&gt;This week in the Database Programmer Blog we return to
   table design patterns with an essay on history tables.
   The basic premise of this blog is that good coding skills
   do not lead magically to good database skills -- you can
   only make optimal use of a database by understanding it
   on its own terms.  There is a new essay each Monday, 
   and there is a &lt;a href=
"http://database-programmer.blogspot.com/2007/12/database-skills-complete-contents.html"
   &gt;Complete Table of Contents&lt;/a&gt; and a &lt;a href=
"http://database-programmer.blogspot.com/2008/01/table-design-patterns.html"
   &gt;List of Table Design Patterns&lt;/a&gt;.

&lt;h2&gt;What to Put Into A History Table&lt;/h2&gt;

&lt;p&gt;Naive approaches to history tables usually involve making
   &lt;i&gt;a complete copy of the original (or new) row&lt;/i&gt; when
   something changes in the source table.  This turns out 
   to be of little use, for reasons I will explain below.
   A much more useful approach is to track only a few
   columns and to store any combination of old values,
   new values, and differences.  A history table designed
   this way can be tremendously useful.
&lt;/p&gt;

&lt;p&gt;We will start with the example of a sales order table,
   called ORDERS.  The columns we are interested in might
   look like this:
&lt;/p&gt;

&lt;pre&gt;
ORDER | CUSTOMER | DATE     | LINES  |   TAX |  TOTAL |   PAID | BALANCE
------+----------+----------+--------+-------+--------+--------+---------
1234  | 9876     | 5/1/08   |  48.00 |  5.00 |  53.00 |      0 |   53.00
2345  | 9876     | 5/3/08   | 150.00 |     0 | 150.00 | 150.00 |       0
3456  | 5544     | 6/8/08   |  25.00 |  2.60 |  27.60 |  15.00 |   12.60
4567  | 3377     | 7/3/08   | 125.00 |  7.00 | 132.00 |  50.00 |   82.00
&lt;/pre&gt;

&lt;p&gt;We first have to ask which columns must be copied into history
   so that we can link the history table back to the ORDERS table.
   The only column we need for tracking is ORDER (the order 
   number), so
   the history table will always have an ORDER column.
&lt;/p&gt;

&lt;p&gt;We should also assume that the history table will contain at least
   a timestamp and a column to track the user who made the change,
   which brings us to a minimum of three columns.
&lt;/p&gt;

&lt;p&gt;Finally, it tends to be very useful to track what action 
   caused the history entry, be it an INSERT, UPDATE, or DELETE.
   This brings us up to four minimum columns.
&lt;/p&gt;

&lt;p&gt;Next we ask which columns we will definitely not need.  There
   are two groups of columns we will not need, which are 
   1) the columns that never change and 2) the columns we do not
   care about.  Columns that do not change are likely to be
   the CUSTOMER and the DATE column.  There is no need to bloat
   the history table with these valus because we can just get
   them out of the ORDERS table.  The second group, columns we
   do not care about, are are usually things like ship-to address,
   maybe an email, and other information.  Naturally there is
   no hard-and-fast rule here, it depends entirely upon the
   needs of the application.
&lt;/p&gt;

&lt;p&gt;So now we know what we definitely need and what we definitely
   do not need, and we are ready to begin work considering
   the columns that will change.  Not surprisingly, these are
   usually all about the numbers.  Next we will see how to
   track the numbers.
&lt;/p&gt;
   
&lt;h2&gt;Tracking Changes to Numbers&lt;/h2&gt;

&lt;p&gt;While it is certainly useful to store one or both of the
   old and new values for a number, it far more useful to 
   store the &lt;i&gt;change in the value&lt;/i&gt;, or the &lt;i&gt;delta&lt;/i&gt;.
   Having this number in the history table makes for some
   really nifty abilities.  If you store all three of the
   old, new, and delta, then you can more or less 
   find out anything about the ORDER's history with very
   simple queries.
&lt;/p&gt;

&lt;p&gt;So we are now ready to consider what the history table
   might look like.  We will take the case of an order that
   was entered by user 'sax', updated twice by two other
   users, and in the end it was deleted by user 'anne'.
   Our first stab at the history table might look like this:
&lt;/p&gt;

&lt;pre&gt;
ORDER | USER_ID  | ACTION | DATE    | LINES_OLD | LINES_NEW | LINES_DELTA 
------+----------+--------+---------+-----------+-----------+-------------
1234  | sax      | UPD    | 5/1/08  |      0.00 |     48.00 |       48.00
1234  | arkady   | UPD    | 5/7/08  |     48.00 |     58.00 |       10.00
1234  | ralph    | UPD    | 6/1/08  |     58.00 |     25.00 |      -33.00
1234  | anne     | DEL    | 6/4/08  |     25.00 |      0.00 |      -25.00
&lt;/pre&gt;

&lt;p&gt;I should note that if you keep LINES_OLD and LINES_NEW, then
   strictly speaking you do not need the LINES_DELTA columns.
   Whether or not you put it in depends on your approach to table
   design.   If you framework allows you to guarantee that it will
   be correct, then your queries will be that much simpler with
   the LINES_DELTA column present.
&lt;/p&gt;

&lt;p&gt;You may wonder why there is no entry for the original INSERT.
   This is because you must enter an order before you can 
   enter the lines, so the original value will always be zero.
   Only when lines start going in does the ORDER get any
   numbers.  This is true for header tables, but it would not
   be true for detail tables like ORDER_LINES_HISTORY.
&lt;/p&gt;

&lt;h2&gt;Some of the Obvious Queries&lt;/h2&gt;

&lt;p&gt;There are few obvious queries that we can pull from the
   history table right away.  These include the following:
&lt;/p&gt;

&lt;pre&gt;
-- Find the value of of the line items of an
-- order as of June 1st
SELECT LINES_NEW 
  FROM ORDERS_HISTORY
 WHERE ORDER = 1234
   AND DATE &lt;= '2008-06-01'
 ORDER BY DATE DESC LIMIT 1;
 
-- Find the original value of the line items,
-- and the user who entered it.  
SELECT LINES_NEW, USER_ID
  FROM ORDERS_HISTORY
 WHERE ORDER = 1234
 ORDER BY date LIMIT 1;
   
-- Find the users who have worked on an order
SELECT DISTINCT USER_ID
  FROM ORDERS_HISTORY
 WHERE ORDER = 1234;
&lt;/pre&gt; 

&lt;p&gt;Most of queries should be pretty obvious, and there
   are plenty more that will suggest themselves once you
   start working with the history tables.
&lt;/p&gt;

&lt;h2&gt;Queries Involving the Delta&lt;/h2&gt;

&lt;p&gt;The real power of the DELTA column comes into play
   when you are trying to compute back-dated values
   such as the company's total open balance on 
   June 1, 2008.  If you have a naive history table that
   stores only the old value or only the new value, this
   is truly a tortuous query to write, but if you have both
   then it is really quite easy.
&lt;/p&gt;

&lt;pre&gt;
-- Query to calculate the total open balance of all
-- orders as of a given date
SELECT SUM(BALANCE_DELTA) 
  FROM ORDERS_HISTORY
 WHERE DATE &lt;= '2008-06-01';
&lt;/pre&gt;

&lt;p&gt;This magical little query works because paid orders
   will "wash out" of the total.  Consider an order that
   is entered on May 20 for $200.00, and is then paid
   on May 23rd.  It will have +200 entry in the 
   BALANCE_DELTA column, and then it will have a -200.00
   entry 3 days later.  It will contribute the grand sum
   of zero to the total.
&lt;/p&gt;

&lt;p&gt;But an order entered on May 25th that has not been
   paid by June 1st will have only a +200 entry in
   the BALANCE_DELTA column, so it will contribute the
   correct amount of $200.00 to the balance as of 
   June 1st.
&lt;/p&gt;

&lt;p&gt;If the company owner wants a report of his total 
   open balances on each of the past 30 days, you can retrieve
   two queries and build his report on the client:
&lt;/p&gt;

&lt;pre&gt;
-- Get begin balance at the beginning of the period
SELECT SUM(BALANCE_DELTA) as BEGIN_BALANCE
  FROM ORDERS_HISTORY
 WHERE DATE &lt; '2008-06-01';

-- Get the total changes for each day.  When you
-- build the report on the client, add each day's
-- change amount to the prior day's balance
SELECT SUM(BALANCE_DELTA) AS BALANCE_DELTA
  FROM ORDERS_HISTORY
 WHERE DATE BETWEEN '2008-06-01' AND '2008-06-30'
 GROUP BY DATE;
&lt;/pre&gt;

&lt;h2&gt;Keeping History Tables Clean&lt;/h2&gt;

&lt;p&gt;A clean history table is one that contains no unnecessary
   information.   You normally do not want entries going
   into the history table if nothing relevant changed.
   So your history table mechanism should examine the 
   columns it is tracking, and only make an entry to the
   history table if one of the columns of interest actually
   changed.
&lt;/p&gt;

&lt;h2&gt;Problems With The Naive History Table&lt;/h2&gt;

&lt;p&gt;A very basic history table will usually 
   &lt;i&gt;copy the entire original row&lt;/i&gt; from the source table
   into the history table whenever an INSERT, UPDATE
   or DELETE occurs.  One simple problem is that you
   end up with bloated history tables.  Because they are
   cluttered with unnecessary repititions, they are difficult
   to work with by inspection.
&lt;/p&gt;

&lt;p&gt;A much more serious technical problem with the naive
   approach is that it is horribly difficult to produce
   the queries demonstrated above.  You must reproduce
   the concept of a delta by either running through all
   of the rows on the client, or you must make a difficult
   (and often impossible) JOIN of the history table
   to itself in which you connect each row to the 
   row that came just before it.
   All I can say is, no thanks, I'll go with the delta.
&lt;/p&gt;

&lt;h2&gt;History Table Security&lt;/h2&gt;

&lt;p&gt;History tables always involve some concept of
   auditing, that is, keeping track of user actions.
   This means we need to protect against deliberate
   falsification of the history tables, which leads
   to two rules.  First, a user must have no ability
   to directly DELETE rows from the history table,
   or they could erase the record of changes.
   Second, the user must have no ability to 
   directly INSERT or UPDATE existing rows, because
   if they could they can falsify the history.  These
   rules apply to both regular users and system 
   administrators, the administrator must have no
   privelege to subvert or manipulate the history.
&lt;/p&gt;

&lt;p&gt;Since history tables have a tendency to become
   seriously bloated, there must be some priveleged
   group that can DELETE from the history tables,
   which they would do as a periodic purge operation.
   This group should have no ability to
   UPDATE the tables, because such priveleges would open
   a potential hole for subverting the history.
   Regular system administrators should not be in
   this group, this should be a special group whose only
   purpose is to DELETE out of the history tables.
&lt;/p&gt;

&lt;p&gt;If you are making use of DELTA columns, then stricly
   speaking you do not want to purge, but &lt;i&gt;compress&lt;/i&gt;
   history tables.  If you want to purge out all entries
   in 2005, you must replace them with a single entry
   that contains a SUM of the DELTA columns for all 
   of 2005.
&lt;/p&gt;

&lt;p&gt;So to sum up, we have the following security rules
   for a history table:
&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;No system user should be able to DELETE from the
        history table.
    &lt;li&gt;No system user should be able to UPDATE the 
        history table.
    &lt;li&gt;No system user should be able to directly control
        the INSERT into the history table.
    &lt;li&gt;A special group must be defined whose only ability
        is to DELETE from the history table, so that the
        tables can be purged (or compressed)
        from time to time.
&lt;/ul&gt;

&lt;h2&gt;Implementation&lt;/h2&gt;

&lt;p&gt;As always, you have your choice of implementing the
   history mechanism in the client code or in the database
   itself.
&lt;/p&gt;

&lt;p&gt;The best performing and most secure method is to 
   implement history tables with triggers on the source
   table.  This is the best way to implement both
   security and the actual business rules in one
   encapsulated object (the table).
   However, if you have no current practices
   for coding server-side routines, or you do not have a
   data dictionary system that will generate the code for
   you, then it may not be practical to go server-side
   for a single feature.
&lt;/p&gt;

&lt;p&gt;Implementing history tables in code has the usual 
   benefit of keeping you in the language and habits
   you are most familiar with, but it means that you cannot
   allow access to your database except through your
   application.  I cannot of course make a general rule here,
   this decision is best made by the design team based
   on the situation at hand and anticipated future needs.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;History tables have many uses.  Beyond the obvious
   first use of finding indidivual values at some point
   in the past, well crafted tables can produce 
   company-wide aggregations like total open balances
   on a given day, changes in booked orders on a day
   or in a range of days, and many other queries along
   those lines.  Security is very important to prevent
   history tables from being subverted.
&lt;/p&gt;

&lt;p&gt;&lt;a href=
"http://database-programmer.blogspot.com/2008/07/different-foreign-keys-for-different.html"
   &gt;NEXT ESSAY: Different Foreign Keys For Different Tables&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/426922399870577072-4620199291474217668?l=database-programmer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://database-programmer.blogspot.com/feeds/4620199291474217668/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=426922399870577072&amp;postID=4620199291474217668' title='36 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4620199291474217668'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/426922399870577072/posts/default/4620199291474217668'/><link rel='alternate' type='text/html' href='http://database-programmer.blogspot.com/2008/07/history-tables.html' title='History Tables'/><author><name>KenDowns</name><uri>http://www.blogger.com/profile/11117175783163937575</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='28' height='32' src='http://3.bp.blogspot.com/_JntqvNOLMzc/SbF8_AjPH8I/AAAAAAAAAAM/VO6lfomSAOM/S220/clip1.JPG'/></author><thr:total>36</thr:total></entry><entry><id>tag:blogger.com,1999:blog-426922399870577072.post-7300960025418463577</id><published>2008-07-14T07:31:00.002-04:00</published><updated>2008-07-20T15:20:35.203-04:00</updated><title type='text'>The Wonderful Awful Browser</title><content type='html'>&lt;p&gt;When a desktop programmer tries to write database applications
   for the browser, he faces a great many challenges, both technical
   and cultural.  Both sets of challenges appear because the browser
   and the web were invented for purposes different than our own. 
   On the technical side we must reinvent huge amounts of functionality
   that we got "for free" with the old desktop systems of Foxpro, Delphi,
   VB Classic and so on, and on the cultural side we must wade through
   mountains of irrelevant or downright damaging advice that is aimed
   at people working on the next version of Facebook or eBay.
   In this essay we look at as many of these challenges as I can 
   muster.
&lt;/p&gt;

&lt;h2&gt;Why A Desktop Developer Would Move to The Web&lt;/h2&gt;

&lt;p&gt;When the browser first appeared, it totally lacked the technical
   powers required to replace desktop applications.  Nevertheless,
   some programmers immediately began to
   ponder how to move into the world of the browser.  The reasons were
   simple then, are simple now, and have not changed:
&lt;/p&gt;   
   
&lt;ul&gt;&lt;li&gt;Far easier deployment -- nothing to install.
    &lt;li&gt;Worldwide access -- businesses with multiple locations are 
        suddenly much easier to take care of.
    &lt;li&gt;You could now create a public website and give customers and
        vendors limited access to certain information.
    &lt;li&gt;Operating System independence.  This is far more of a reality 
        now than we dreamed it might be in the darkest days of 
        Microsoft's Total World Domination, but there were visionaries
        early on who saw the possibilities.
&lt;/ul&gt;

&lt;p&gt;So there are many programmers, and I am one of them, who continue to
   work on the same kinds of applications we did before the web existed,
   but who now deploy these applications in the browser, for the reasons
   listed above.  Here now is our tale of woe and sorrow! 
&lt;/p&gt;

&lt;h2&gt;The Cultural Divide&lt;/h2&gt;

&lt;p&gt;While desktop programmers were scratching their heads
   and trying to figure out how to fit into
   this new world, a new 
   generation of programmers was growing up who were 
   perfecting this new platform and developing applications
   that were undreamed of before.  Unfortunately, some of
   the good advice they dreamed up is either irrelevant 
   or counter-productive to the database programmer who
   is deploying to the web.
&lt;/p&gt;

&lt;p&gt;The driving reality for the database application programmer
   is that her users &lt;i&gt;are not surfing&lt;/i&gt;.  They are using
   a dedicated program written for the purposes of the business
   they work for.  Most of what the browser can do is either
   not necessary or positively in their way, and the browser
   lacks productivity tools that they took for granted
   in "the old system."  This fact is central to the 
   cultural divide between application programmers and
   web programmers.
&lt;/p&gt;
   
&lt;h3&gt;The Infamous Back Button Problem&lt;/h3&gt;

&lt;p&gt;
  If I surf over
   to www.osnews.com and click on an article, when
   I am finished I click "BACK" once or twice until I'm back
   to osnews, and then pick another article.  But to the 
   application user, &lt;i&gt;who is not surfing but is using
   a dedicated program&lt;/i&gt;, who has clicked "New Patient", typed in
   the info, and clicked "Save", the back button is a positive
   menace.  It is misleading and dangerous.  This has led
   to who-knows-how-many rants from web programmers telling
   application programmers, "You don't understand the web,
   you shouldn't write it that way," in which the desparate
   application programmer replies only, "but you don't
   understand, I &lt;i&gt;must&lt;/i&gt; have it work this way."  The simple
   fact is that when a user is modifying data in a browser 
   there is no concept of BACK.  There may "UNDO" or "REVERT",
   but once the data is saved it is saved.  This is why 
   application programmers resort to trying to hide or disable
   the button, or why they think they should be able to modify
   the history (which of course they cannot do because that would
   be a huge security hole for public sites).
&lt;/p&gt;

&lt;h3&gt;Ajax only Makes The Back Button Worse&lt;/h3&gt;

&lt;p&gt;
Picture a 
   user on  the customers screen, who then goes to the menu
   and picks the vendors screen.  They work for five 
   minutes on the vendors screen, and their wonderful snappy
   AJAX application is fetching search results and navigating
   from row to row and saving changes.  Then they decide they
   made a mistake and hit [BACK] and wham! they're on the
   customer screen!  It seems that the better the applications
   become, the worse the [BACK] button becomes.  In my own
   shop we have finally decided to have the login program
   pop up a new window which does not have the [BACK]
   button or the address bar.  This is considered heresy
   by web programmers (you don't understand the web! they 
   cry) but of course what is true for them is not true
   for us, and vice-versa.
&lt;/p&gt;

&lt;p&gt;This also leads to much work.  We must provide for 
   such features as UNDO with no native support in
   the browser, and worse, with whatever native support
   the browser does have been intended for something
   totally different.  
&lt;/p&gt;

&lt;h3&gt;Your Application Must Work Without Javascript&lt;/h3&gt;

&lt;p&gt;
   This is dead wrong for the application programmer.
   Application programmers have a power that is 
   totally outside the experience of a pure web
   programmer: we can dictate system requirements to
   the customer.  This led to many unhappy problems
   before the web, but with Firefox (and firebug!) we
   now have a platform that is free and robust.  We
   simply install Firefox (or instruct the IT 
   department to do so) and we have a platform that
   we know will support our application.  
&lt;/p&gt;
   
&lt;h3&gt;Keyboard Shortcuts&lt;/h3&gt;

&lt;p&gt;Nothing illustrates the divide between the web
   and the desktop like keyboard shortcuts.
&lt;/p&gt;

&lt;p&gt;When Windows 95 swept the
   office world (but before the web really came into
   its own), programmers developed a new term for
   applications that required constant use of the
   mouse:  we called them "mousetraps".  The worst
   kind of mousetrap program requires the user to 
   constantly lift their hand from the keyboard and
   go to the mouse, then back again.  This is fatiguing,
   confusing, and terribly counter-productive for the
   end user.
&lt;/p&gt;

&lt;p&gt;But the real problem is that the browser
   &lt;i&gt;was born a mousetrap&lt;/i&gt;.  From the perspective
   of the desktop programmer, keyboard shortcuts are
   clearly an afterthought, a "red-headed stepchild"
   as they say.  Native HTML supports only the
   ACCESSKEY attribute, and recently Firefox was changed
   so that the default key combination is CTRL+ALT
   instead of ALT.  This small change led me to finally
   realize that these folks, to put it mildly, have
   never lived in my world and haven't the slightest
   clue what my users need.  I could expect no help
   from them on this front.
&lt;/p&gt;

&lt;p&gt;The solution for the web programmer is to 
   remember &lt;i&gt;my users are not surfing, they are using
   a dedicated program&lt;/i&gt;.  Therefore it is the
   Right Thing for the application programmer to 
   hijack the CTRL-N key and have it mean "New Patient"
   (or New Customer, New Vendor, etc)
   instead of opening a new browser window.  Moroever,
   he must kill the CTRL-N so that it does nothing
   on a page where there is no [New] button. 
   If he does not, then sometimes CTRL-N
   will create a new patient, and sometimes it will pop
   up a new window with my.yahoo.com!  So the application
   programmer confidently rewires the "standard" browser
   keys and has happier customers for his effort.
&lt;/p&gt;

&lt;h2&gt;Technical Problems In the Browser&lt;/h2&gt;

&lt;h3&gt;No Default Focus&lt;/h3&gt;

&lt;p&gt;Have you ever gone to a website where the first thing you
   must do is log in, but the user id input does not have
   focus?  That is a sure sign that the page was written
   by a web programmer with no desktop experience.
   When you put a database program into the browser, you
   expect the user to be typing constantly, looking things
   up, adding information, and so forth.  So the application
   programmer must ensure that his first control always 
   receives focus.  Call it petty if you like, but without
   it your program becomes a mousetrap.  Perfection comes
   by concentrating on these small things that either
   annoy or please users.
&lt;/p&gt;

&lt;h3&gt;Tabbing Off to Mars&lt;/h3&gt;

&lt;p&gt;Default browser behavior is to allow the user to TAB 
   through controls in the order they were created.
   This can be modified by explicitly assigning TABINDEX
   attributes to the control.  However, when you get to
   the last control, the browser then Tabs you up to the
   menu, or the address bar, or anywhere else.
&lt;/p&gt;

&lt;p&gt;In a business application, where &lt;i&gt;the user is not
   surfing the web&lt;/i&gt;, this is wrong.  Tabbing out of
   the content area is equilivant to exiting the 
   application, it throws the user into a context that
   they do not need and (sad to say) do not understand.
&lt;/p&gt;

&lt;p&gt;When I first began deploying business apps in the 
   browser I would get calls saying, "it's frozen" or
   "i'm typing and nothings happening" and other such
   mysterious claims.  Once I observed the users I realized
   they were "tabbing to Mars", the focus was up on the
   menu or in the address bar or somewhere else equally
   irrelevant.  So we created the idea of the "Tab Loop",
   so that when the user hits TAB on the last control it
   loops back to the first.  This completely ended
   those calls.
&lt;/p&gt;
   
&lt;h3&gt;The Tower Of Babel&lt;/h3&gt;

&lt;p&gt;Desktop programmers have a luxury undreamed of by
   web programmers: they can do all or nearly all of
   their programming &lt;i&gt;in a single language&lt;/i&gt;, like
   classic Visual Basic, Foxpro, or Delphi (or heck,
   COBOL or 4GL!).  Most of
   these programmers also know SQL, but it is not seen
   as a burden to learn it, it is just part of the
   job.
&lt;/p&gt;

&lt;p&gt;But when the application programmer moves to the web,
   he is confronted with at least four systems he must
   grasp if he is to perform as a master crafstman:
   (X)HTML, CSS, Javascript, and one or more
   server languages like
   Ruby, PHP, Java, etc.  These different technologies
   all have syntaxes 
   and philosophies that are different from each other
   and from past experience.  All I can say is I'm
   glad I made the effort but I sure as heck hope I
   never have to make a change that dramatic again.
&lt;/p&gt;

   
&lt;h3&gt;Let's Not Talk About State&lt;/h3&gt;

&lt;p&gt;When an application programmer moves to the web,
   he is confronted with the totally alien concept
   that he cannot maintain state.  This idea has been
   discussed much in past years, and I suspect it may
   not be the problem it once was, as most of us have
   long since gotten past it.  I did not want to leave
   it out completely, but it is way too large an issue
   to discuss in a paragraph and I doubt anyway that I
   could add to the wisdom that is already out there.
&lt;/p&gt;

&lt;h3&gt;Lousy Widgets&lt;/h3&gt;

&lt;p&gt;The HTML SELECT element stinks.  Every serious
   application programmer either downloads a 
   replacement or writes his own replacement.  In
   the old world of the desktop we did not have to
   do such things.  
&lt;/p&gt;

&lt;p&gt;The final piece of the puzzle in my shop was
    &lt;a href="http://docs.jquery.com"&gt;jQuery&lt;/a&gt;.
    The irony of jQuery is that it seems to me
    its core function is DOM traversal and 
    manipulation, but its elegant simplicity has
    drawn creative minds to do things like create
    really nice widgets for entering time.
    In my shop we finished off our desktop-in-browser
    framework by using jQuery extensively, this
    solved lots of our lousy widgets problem.
&lt;/p&gt;

&lt;h2&gt;Salvation In the New Javascript Frameworks?&lt;/h2&gt;

&lt;p&gt;I should mention the new frameworks that are 
   emerging for desktop-in-browser, such as 
   &lt;a href="http://www.extjs.com"&gt;extJS&lt;/a&gt; and
   &lt;a href="http://www.dojotoolkit.org"&gt;Dojo&lt;/a&gt;, not to
   mention of course the 
   &lt;a href="http://developer.yahoo.com/yui"&gt;Yahoo!
   User! Interface! Library!&lt;/a&gt;.  
&lt;/p&gt;

&lt;p&gt;In my case I set out on the task of doing browser-based
   applications four years ago, when none of these 
   technologies existed.  So I developed my own simple
   browser-side framework.  jQuery let me round off the
   rough edges, and now I have no need of a 
   third-party framework.  So I am afraid I cannot offer
   any experience in the use of these others.
&lt;/p&gt;

&lt;p&gt;But even so, if I could use somebody else's dedicated
   work and put my efforts elsewhere, that would only
   be wise, so I continue to watch them closely.
&lt;/p&gt;
   
   
&lt;h2&gt;When Do Application Programmers Accept 
    Advice From Web Programmers?&lt;/h2&gt;
    
&lt;p&gt;Does all of this mean that we application programmers
   can learn nothing from web programmers?  Of course
   not, that we be arrogant in the extreme.
   The answer is that when we enter the world 
   of the public website, we must seek the advice and
   guidance of the experts in that world.  So when I
   write the public portion of the application, the
   part that is visible to customers, vendors and other
