Sunday, March 16, 2008

The JOIN is the Cornerstone of Powerful Queries

Welcome to the Database Programmer!

There are links to other essays at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The Very Basics of JOIN

Sometimes you need information that has been separated out into different tables. For instance, imagine you are a programmer on an in-house eCommerce site for a company that sells computer parts around the world. The Sales Manager walks down the hall one day and says she wants a detail listing of customer types and order dates. Our customer types are in the CUSTOMERS table, and the order dates are in the ORDERS table, so what she wants is this:

A SELECT statement to pull these columns would look like this:

SELECT customers.customer,customers.custtype,orders.date
  FROM customers 
  JOIN orders     ON customer.customer = order.customer

Refining With Aliases

Because the JOIN is so basic and common, it can get very cumbersome to constantly spell out long table names in front of every column. Therefore you can use an "alias" in the JOIN clause to give each table a nickname:

SELECT c.customer,c.custtype,o.date
  FROM customers c
  JOIN orders    o ON c.customer = o.customer

More Than One JOIN

Sometimes the information you need is in two tables that cannot be JOINed because they are not "next" to each other. Put another way, neither table has a foreign key to the other. Put a third way, they do not have any columns in column. Let us pretend our troublesome Sales Manager comes down the hall again and this time she wants a listing of every item ordered by every customer type. This time she is looking for the following:

This time we need to start with the CUSTOMERS table and then "go through" the ORDERS table to get to the information we need in the LINES table. This means two JOINs:

SELECT c.custtype,l.sku
  FROM customers c
  JOIN orders    o ON c.customer = o.customer
  JOIN lines     l ON o.order    = l.order

Most Requests Will Make Sense

Sometimes a user's request will appear to make no sense. The user asks for a combination of values that appear to have no connection, and we programmers object, "that makes no sense! Why would you want that?"

The most important idea to keep in mind here is not a technical idea at all, it is more a matter of how to keep people happy. In my experience it is extremely rare for a customer to ask for a query that well and truly makes no sense. Just because I do not understand it does not mean it makes no sense! In fact the Sales Manager likely knows her job very well and if she is asking for items by customer type she must have a reason.

Now, with that being said, the technical solution in these cases is to follow the foreign keys. If a database has been designed well, all tables will be connected to each other through foreign keys, and you can trace out a path that connects the various data points by following these foreign keys.

Denormalizing For Performance

You have probably heard people say that sometimes you need to "denormalize for performance." Now we will look at what that means.

Consider an assignment given to two people, one of them a veteran database programmer and the other a newbie. It is guaranteed that the veteran's database design will have a lot more tables in it than the newbie's database. This is because the veteran knows he will have far fewer errors getting data in if he keeps a separate table for each level of detail required by the program. By contrast, the newbie will be guided by a strange desire to save on tables as if there is some kind of world-wide shortage of tables.

But the veteran now has a problem. While normalization is great for ensuring correctness on the way in, it tends to require more JOINs on the way out, and it so happens that JOINs are rather expensive for a database to perform. In fact, they are one of the most expensive operations there is, and they only get worse as the number of tables being JOINed increases.

Therefore, the veteran will sometimes take a design to the fullest of normalization, and then deliberately denormalize it to reduce JOINs. A very simple example is adding the CUSTTYPE column to the ORDERS table and then copying the value of Customer Type onto each order. If the programmer is confident that the value will always be copied correctly, then any report on sales that involves customer types can avoid an expensive JOIN between ORDERS and CUSTOMERS. This is the essence of the "Denormalizing for Performance" approach, and we will see more essays specifically on that topic later in this series.

Denormalized is not the same as non-normalized. The newbie will have fewer tables, tables that are non-normalized because they have values bunched together that do not belong together. The newbie will spend a lot of time correcting data errors as a result of this. The veteran however will have lots of normalized tables and will look for (or write) a framework that assists in controlling where a user can write values and when the values are copied around to the their de-normalized spots.

Conclusion: Do Not Fear the JOIN

Just as the foreign key is the fundamental (and in fact the only) mechanism that relates data together, so the JOIN is the basic building block that ties that information back together.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Other essays relating to SQL SELECT are:

3 comments:

Alex said...

Great article again!

Maybe one should stress, that a FK relation is not mandatory for a join, but in most cases very useful ;-)

Outer joins soon to come? ;-)

KenDowns said...

Alex: thanks! You are correct that the FK is not mandatory, so thanks for adding that. I almost put into the article a comment that in the last 4 years I have been able to design databases so that every single JOIN follows an FK, but I left it out because I thought it would be stretching a point in an introductory article. Nevertheless, I do feel it is possible with a good design for all or nearly all JOINs to follow fks.

Outer joins will certainly be treated at some point soon.

Anonymous said...

Great stuff even though it 2014 now. Can you put the images back? e.g. /images/kfd-blog/join2.png was not found on your server. Thanks