Saturday, December 11, 2010

The Cost of Round Trips To The Server

A database is not much without the applications that connect to it, and one of the most important factors that affects the application's performance is how it retrieves data from queries. In this essay we are going to see the effect of round trips on application performance.

This blog has two tables of contents, the Complete Table of Contents and the list of Database Skills.

Pulling 15,000 Rows

The test will pull 15,000 rows from a table. We do it three different ways and see which is faster and by how much.

Getting a Lot of Rows

The script below creates a table and puts 1 million rows into it. We want far more rows in the table than we will actually pull so that we can pull fresh rows on every pass through the test. It is deliberately crafted to spread out the adjacent values of the integer primary key. This is because, inasmuch as can control what is going on, we want every single row to be on a different page, so that in all tests the cost of retrieving the row is roughly the same and we are measuring only the effect of our retrieval methods.

The script can be run without modification in pgAdmin3, and with slight mods on MS SQL Server.

create table test000 (
    intpk int primary key
   ,filler char(40)
)


--  BLOCK 1, first 5000 rows
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--
declare @x,@y;
set @x = 1;
set @y = string(40,40,1);
while @x <= 5000 begin
    insert into test000 (intpk,filler)
    values ((@x-1)*200 +1,'@y');

    set @x = @x + 1;
end

-- BLOCK 2, put 5000 rows aside 
--
select  * into test000_temp from test000

-- BLOCK 3, Insert the 5000 rows 199 more
--          times to get 1million altogether
--  pgAdmin3: run as pgScript
--  All others: modify as required  
--  
declare @x;
set @x = 1;
while @x <= 199 begin
    insert into test000 (intpk,filler)
    select intpk+@x,filler from test000_temp;

    set @x = @x + 1;
end

Test 1: The Naive Code

The simplest code is a straight loop that pulls 15,000 consecutive rows by sending an explicit query for each one.

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 1, Individual explicit fetches
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 1, using $x1 to $x2";
$timeBegin = microtime(true);
while ($x1++ <= $x2) {
    $dbResult = pg_exec("select * from test000 where intpk=$x1");
    $row = pg_fetch_array($dbResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 1, elapsed time: ".$elapsed;
echo "\n";

Test 2: Prepared Statements

The next command asks the server to prepare a statement, but it still makes 15,000 round trips, executing the prepared statement with a new parameter each time. The code looks like this:

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 2, Individual fetches with prepared statements
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 2, using $x1 to $x2";
$timeBegin = microtime(true);
$dbResult = pg_prepare("test000","select * from test000 where intpk=$1");
while ($x1++ <= $x2) {
    $pqResult = pg_execute("test000",array($x1));
    $row = pg_fetch_all($pqResult);
}
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 2, elapsed time: ".$elapsed;
echo "\n";

Test 3: A single round trip

This time we issue a single command to retrieve 15,000 rows, then we pull them all down in one shot.

# Make a database connection
$dbConn = pg_connect("dbname=roundTrips user=postgres");

# Program 3, One fetch, pull all rows
$timeBegin = microtime(true);
$x1 = rand(0,199)*5000 + 1;
$x2 = $x1 + 14999;
echo "\nTest 3, using $x1 to $x2";
$dbResult = pg_exec(
    "select * from test000 where intpk between $x1 and $x2"
);
$allRows = pg_fetch_all($dbResult);
$elapsed = microtime(true)-$timeBegin;
echo "\nTest 3, elapsed time: ".$elapsed;
echo "\n";

Results

I ran this five times in a row, and this is what I got:

Naive 15,000 Prepared 15,000 One Round Trip
~1.800 seconds ~1.150 seconds ~0.045 seconds

Compared to the naive example, the set-oriented fetch of al 15,000 rows in a single shot ran 40 times faster. This is what set-oriented code does for an application.

While the prepared statement option ran faster than the naive option, the set oriented example still ran 25 times faster than the repeated prepared statements.

I also re-arranged the order of the tests, and the results were the same.

Does Server or Language Matter?

So this test was done using PHP against PostgreSQL, will other servers and client languages get different results? Given the same hardware, a different client language or server is going to have a different spread but the shape will be the same. Fetching all rows in a single shot beats the living frack out of round trips inside of loops in any client language against any server.

Putting It Into Use

The most obvious conclusion is that any query returning more than 1 row should return all rows as a set. The advantage is so stark with large row counts that it is worthwhile making this the default for our applications, unless we can find a very good reason not to. So what would the objections be?

One objection might go something like, "Ken, I see the numbers, but I know my app very well and we never pull more than 10-20 rows in a pop. I cannot imagine how it would matter at 10-20 rows, and I do not want to recode." This makes sense so I ran a few more tests with 20 and 100 rows, and found that, on my hardware, you need about 100 rows to see a difference. At 20 rows all three are neck-in-neck and at 100 the set is pulling 4 times faster than the prepared statement and 6 times faster than the naive statement. So the conclusion is not an absolute after all, some judgment is in order.

Another thing to consider is how many simultaneous reads and writes might be going on at any given time. If your system is known to have simultaneous transactions running regularly, then the complete fetch may be a good idea even if you do some tests for best-guess row count and the tests are inconclusive. The reason is that the test is a single user case, but multiple simultaneous users put a strain on the database, even when they are not accessing the same tables. In this case we want the application to play the "good citizen" and get in and out as quickly as possible to reduce strain on the server, which will improve the performance of the entire application, not just the portions optimized for complete fetches.

Another objection might be, "Well, my code needs to pull from multiple tables, so I cannot really do this. When we do -PROCESS-X- we go row by row and need to pull from multiple tables for each row." In this case you *definitely* need to go set oriented and pull all associated quantities down in a query with a JOIN or two. Consider this, if on your particular hardware the ratio of naive row-by-row to single fetch is 10, and you must pull from 2 other tables for each row, that means you are really running 30 times slower (ratio is 10 x 3 reads) than you could be.

A Final Note About PHP, Data Structures, and Frameworks

Back when dinosaurs ruled the Earth and there was no internet (outside of Universities, etc), the languages we used had specialized data structures that were tuned to database use. Compared to those older systems the newer languages born on the internet are more or less starving for such a data structure.

PHP gets by fairly well because its associative array can be used as a passive (non object-oriented) data structure that comes pretty close to what we had before.

I bring this up because the choice of a language and its support for a "fetch all" operation obviously impacts how well the conclusions of this post can be implemented. If your mapping tool has an iterator that absolves you of all knowledge of what is going on under the hood, it may be worthwhile to see if it is doing a complete fetch or a row-by-row.

1 comment:

Markus Winand said...

hi!

There is an excellent video describing that roundtrip issue to non-programmers.

The video covers some more performance issues, however,
here's a youtube link to that particular scene: Weekend shopping

I have covered that in my last year's post "Latency: Security vs. Performance"