Monday, June 30, 2008

Database Performance: The Web Layer

A database application is a like a convoy of ships, it is only as fast as the the slowest ship. The three "ships" in a web-based database application are the database itself, the web layer, and the browser. Today we will continue our series on performance by examining how the web layer can efficiently retrieve data from the database.

Welcome to the Database Programmer blog. This blog is for anybody who wants to see practical examples of how databases work and how to create lean and efficient database applications. There is a Complete Table Of Contents that is updated each week, and a Master list of table design patterns that is updated whenever a new design pattern is presented.

Cost 1: Round Trips

The first basic cost of retrieving data is the "round trip". Database programmers speak of a "round trip" as occurring whenever you send a request the server and retrieve some results. Each round trip to the server carries some overhead, as the server must do some basic work to allocate and release resources at the start and end of the request. This overhead is added to the base cost the server must pay to actually go out to disk to find and retrieve your data.

If your application makes more round trips than are necessary, then the program will be slower than it could be.

Cost 2: Retrieval Size

Every byte that the application retrieves from the server carries a cost at several points. The server must go to disk and read it, the wire must carry the load from the db server to the web server, and the web server must hold the result in memory. If your web code regularly retrieves more information than it needs, then the program will be slower than it could be.

This is why you will see advice that you should never use "SELECT *..." in direct queries, because it is near certain that you are retrieving data you will not use. The more desirable query names the exact columns you need so that you have maximum efficiency. This is especially important if your table contains text (aka clob) fields, if you use "SELECT *..." on one of those tables you risk pulling all kinds of data over the wire that is just going to be thrown away.

Example 1: One Round Trip By Using JOIN

Consider the basic case where you are retrieving and displaying the line items from an order (or shopping cart as people call it these days). Let us assume that you have an ORDER_LINES table that contains SKU, QTY, and PRICE among others. The item's description is in the ITEMS table. To display the lines, you must retrieve each line and also retrieve the item's description.

To do this most efficiently, we can make a single round trip to the server that retrieves all of the columns we need in one shot, then do a loop to render them like so (the example is in PHP):

# Assume some function that gives you the order number,
# sanitized for safe substition
$order = GetOrderNumber();

# Form the SQL
$sq="SELECT ol.sku,ol.price,ol.qty,ol.extended_price
           ,i.description
       FROM ORDER_LINES ol
       JOIN ITEMS       i   ON ol.sku = i.sku
      WHERE ol.oder = $order";

# Most frameworks should have some command to retrieve
# all rows for a query, something like this:
$lines = SQL_AllRows($sq);

# Finally, render the HTML
foreach($lines as $line) {
    #
    #  HTML rendering code here
    #
}

I should stress that this example carries a reasonable expectation that the order is small enough that you don't start hitting the inefficiencies of your particular language. Rendering large results sets in a Web Application is severely problematic compared to the old desktop systems, and doing so requires separate techniques that will have to wait for a future essay.

Example 2: Caching Small Tables

Sometimes you will need to generate a printed report that involves many tables, including several description lookups. For instance, I have a medical application that generates statements for all patients who have a balance. A typical run will produce 100 or 200 statements, and each statement requires information from no less than 8 tables.

In cases like this you can simplify your queries by retrieving the small lookup tables in their entirety before going to the main query and loop. For the example of the medical program there are two tables that qualify for this treatment. These are the tables of "ICD9" codes and "CPT" codes. Both of these usually have only about 100 rows, and there are only 2 relevant columns in one and 3 in the other. Therefore there is a big gain to be had by simply loading them into RAM ahead of time and simplifying the resulting code.

This bare-bones example shows simply that the tables are loaded first, and then main execution begins.

# The function SQL_Allrows() gives me the complete result from
# a query, the 2nd argument says to return an associative
# array with key values made out of the named column.
# NOTE: an "icd9" code is a medical diagnosis code
$icd9codes = SQL_AllRows(
    "Select icd9code,description from icd9codes"
    ,"icd9code"
);

# NOTE: a "CPT" code is a medical procedure code
$cptcodes = SQL_AllRows(
    "select cptcode,description from cptcodes"
    ,"cptcodes"
);

#  ...now continue by pre-fetching the list of patients
#     we will be dealing with, and then we can finally
#     go into the main loop and refer to the $icd9codes
#     and $cptcodes array as needed.
#
$patients = SQL_AllRows(
    "Select patient from patients where balance > 0
      order by last_name,first_name"
);
foreach($patients as $patient) {
    #
    # retrieve the statement information, use
    # arrays $cptcodes and $icd9codes to display
    # descriptions for those codes
    #
}

Knowing Your Context

There is one more piece of the puzzle that a programmer must have if he is to make wise decisions when trying to balance round trips and retrieval size. This is a thorough knowledge of your context. Knowing your context can dramatically help in making decisions.

Some examples of context are:

  • Huge social networking site or portal with hundreds of hits per second.
  • eCommerce site.
  • Line of business program used by the staff of a company to do their daily work.

My own context is the third item, line of business applications. In this context the following realities hold:

  • A huge user base might be a few hundred, with never more than five or six simultaneous transactions going on.
  • A much more common user base is 10-20 users (or even 3 or 4!), with one transaction every 5-20 seconds.
  • The public website accessed by customers is limited to a few thousand potential users, of which you rarely if ever have two or more users on at the same time.

In this context I have a wealth of server resources, because my customer can spend as little as $1500.00 and get a server with more RAM than 10-20 users will ever use at the same time. Therefore, my own coding habits often tend toward caching lookup tables and pulling 300 rows into memory at one shot so that I can get them to the screen (or PDF, or CSV...) as fast as possible. But these decisions are guided by the context of my applications, if your context is different, you may be led to different conclusions.

Conclusion

It is not difficult to create database applications that perform well. The basic rules of thumb are to make a minimum number of round trips to the server and to retrieve precisely the values that you need and no more. These ideas work well because they minimize your most expensive operation, which is disk access.

It is also perfectly acceptable to denormalize your tables (following Denormalization Patterns) which simplifies your queries and reduces JOIN operations. Finally, you must know your context well, so that you can evaluate techniques such as caching lookup tables.

These ideas form the cornerstone of most performance optimization and you will find that applying them over and over rigorously will give you most of what you need to keep performance strong in the web layer.

Next Post: Pay Me Now or Pay Me Later

4 comments:

oraclenerd said...

Ken,

Why not just join the lookup tables in the database? That's what it was built to do right?

KenDowns said...

Oraclenerd: I do not know which example you are referring to or what you mean by "join in the database."

Do you mean us a view that contains the joins? That's perfectly fine. The essay and examples are meant to lay bare what is going on, but you can always move the JOINs into a view just as easily as coding them in direct SQL.

Or did you mean denormalize by copying values? Though I do advocate denormalization following strict patterns, personally I would not do it in this case, not for descriptions anyway. I usually reserve denormalization for numbers like price.

Boris said...

I think oraclenerd is refering to example 2:Caching Small Tables

aparna john said...

Hi,The interface is very friendly so novice web designers can build attractive websites quickly for Web Design Cochin. It has full support of the current web standards and you can easily preview the site on a number of browsers. Unlike Adobe Dreamweaver CS6, you cannot build sites that are optimized for smartphones and tablets but that isn't a concern for many novice web developers.Thanks.....