Welcome to the Database Programmer, the blog for anybody who wants to learn the practical realities of working with databases.
There are links to other essays at the bottom of this post.
This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.
The Many Kinds of JOIN
When a programmer first decides to start learning SQL, the JOIN always appears simple at first. But it can produce some unexpected results. Sometimes a JOIN brings back more rows than the novice thinks it should, and sometimes less. This week we are going to concentrate on all of the various results that are produced by the JOIN.
A JOIN always operates on the same principle: it returns one row for each combination of rows from both sides of the match.
A Parent-Child JOIN Returns Number of Matching Children
Consider the following two simple tables:
CUSTOMER | ZIP CUSTOMER | ORDER | DATE ---------+--------- ---------+-------+---------- 1234 | 11733 1234 | 57 | 3/1/08 5283 | 77074 1234 | 78 | 3/15/08 1234 | 89 | 4/07/08 5283 | 23 | 2/13/08 5283 | 32 | 3/17/08
When you JOIN these two tables together, there are three combinations that match for customer 1234, and two combinations that match on 5283, so this query:
SELECT customers.zip ,orders.customer,orders.order,orders.date FROM customers JOIN orders ON customers.customer = orders.customer
...will return 5 rows. Values from the child table appear once, and values from the parent tables repeat. The return count matches the number of rows in the child table:
ZIP | CUSTOMER | ORDER | DATE ------+----------+-------+---------- 11733 | 1234 | 57 | 3/1/08 11733 | 1234 | 78 | 3/15/08 11733 | 1234 | 89 | 4/07/08 77074 | 5283 | 23 | 2/13/08 77074 | 5283 | 32 | 3/17/08
The Parent-Child And a Left JOIN
Now we consider the case when our Sales Manager comes down the hall and asks for a list of customers in zip code 11101, with their total sales. So you have this information in the database:
CUSTOMER | ZIP CUSTOMER | ORDER | DATE | AMOUNT ---------+--------- ---------+-------+---------+------- 1234 | 11101 1234 | 57 | 3/1/08 | 25 5283 | 11101 1234 | 78 | 3/15/08 | 35 2938 | 11101 1234 | 89 | 4/07/08 | 45 5283 | 23 | 2/13/08 | 55 5283 | 32 | 3/17/08 | 65
...and you write this query:
SELECT c.customer,sum(o.amount) as amount FROM customers c JOIN orders o ON c.customer = o.customer WHERE customer.zip = 11101
...and the result only has two rows:
CUSTOMER | AMOUNT ---------+--------- 1234 | 105 5283 | 120
The novice database programmer will look at this and say, "I don't get it, my filter says zip 11101, but I only got two rows back." This is because the JOIN clause acted as a filter, because the JOIN only returns where there is a match on both sides.
If you want to get all of the rows from the left side of the JOIN, even if they have no match on the right side, you use a LEFT JOIN:
SELECT c.customer,sum(o.amount) as amount FROM customers c LEFT JOIN orders o ON c.customer = o.customer WHERE customer.zip = 11101
...and now the answer is:
CUSTOMER | AMOUNT ---------+--------- 1234 | 105 5283 | 120 2938 | null
Of course we probably do not want that "null" because it messes up calculations, so you can use the COALESCE() function to replace nulls with zeroes. The COALESCE() function accepts a list of values and returns the first non-null:
SELECT c.customer,sum(COALESCE(o.amount,0)) as amount FROM customers c LEFT JOIN orders o ON c.customer = o.customer WHERE customer.zip = 11101
...and this completes the picture:
CUSTOMER | AMOUNT ---------+--------- 1234 | 105 5283 | 120 2938 | 0
You Can JOIN a Table to Itself
Recently a customer of mine dropped by with a problem he had in an insurance database. He had a table that listed various companies and their total premiums collected by year. He is required by various regulation to report the premiums as income at the rate of 28% in the year they are collected, with the remaining 72% being reported in the following year. It did not appear readily obvious how to do this.
This is a case of self-JOIN, where you JOIN a table to itself to match rows to each other. In this case we created the following query:
SELECT tab1.year ,tab1.premium * .28 as premium1 ,tab2.premium * .72 as premium2 FROM premiums tab1 JOIN premiums tab2 ON tab1.company = tab2.company WHERE tab2.year = tab1.year + 1
Notice that the JOIN condition only matches company to company, and an additional WHERE clause was required to limit the matches. Technically we would prefer to see that second condition in the JOIN clause, but most database servers do not support that. It is a little more confusing to a newbie as written, but not terribly so.
Another SELF JOIN and a FULL OUTER JOIN
Another customer of mine called recently to complain that his legacy system is not doing something right. He said there was information missing.
In his business, which is magazine distribution to retailers, he has a table called "DEFAULTS" that lists the default number of each magazine given to each store. Sometimes he wants to see the defaults for two magazines side by side, for all of the stores they are delivered to. If his defaults table looked like this:
MAGAZINE | STORE | DEFAULT ---------+-------+-------- 123 | A | 5 123 | B | 10 123 | C | 7 456 | A | 6 456 | D | 3
...and he wanted to see magazines 123 and 456 side by side, he would expect output like this:
STORE | MAG 123 | MAG 456 ------+----------+---------- A | 5 | 6 B | 10 | 0 C | 7 | 0 D | 0 | 3
Here is how you can recognize this request as needing a self-join. If it appears that you need two separate filters for the same table, and the two filters result in values that are meant to sit side-by-side, then you have a self-join. Our first stab at the query would be:
SELECT d1.store ,d1.default as mag123 ,d2.default as def456 FROM defaults d1 JOIN defaults d2 on d1.store = d2.store WHERE d1.magazine = 123 AND d2.magazine = 456
This is the query I found in his program, and as soon as I saw it his complaint made sense. He said basically, "There are only 5 rows for magazines x and y and there should be 500." The problem goes back to the fact that a JOIN limits the answer to those cases where there is a match on both sides. The query above would give only this:
STORE | MAG 123 | MAG 456 ------+----------+---------- A | 5 | 6
...because only store A has a row for both magazines. What we want is every entry for each magazine for either store even if there is no matching entry in the other store. The LEFT JOIN we saw above would help, but would only do half the job, it would give us:
STORE | MAG 123 | MAG 456 ------+----------+---------- A | 5 | 6 B | 10 | 0 C | 7 | 0
What we need is a FULL OUTER JOIN, which is basically a LEFT JOIN and a RIGHT JOIN. Its a LEFT JOIN that goes both ways. If we add in our COALESCE() functions the final query is:
SELECT d1.store ,COALESCE(d1.default,0) as mag123 ,COALESCE(d2.default,0) as def456 FROM defaults d1 FULL OUTER JOIN defaults d2 on d1.store = d2.store WHERE d1.magazine = 123 AND d2.magazine = 456
Conclusion: That your JOIN May Be Complete
The JOIN is a very powerful clause. The JOIN is the most-used method for combining information from multiple tables together. This week we saw that the JOIN has provisions for joining a table to itself, and for handling cases where the matches may not be all present.
Related Essays
This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.
Other essays relating to SQL SELECT are:
9 comments:
Thanks again for a concise article! Maybe you can leave some remarks about the "traditional" (within the where clause) way of joins in another article?! I find it easier to understand for beginners ;-)
Best regards
Alex
Thanks..
A very nice article for the novice database programmer.
This --
sum(COALESCE(o.amount,0))
is unnecessary.
Aggregate functions such as SUM ignore NULLs.
SUM(o.amount)
;o)
@rudy So does AVERAGE. I'm thinking he's covering his bases here so that any math works out as expect rather than as correct.
AVG antivirus comes with a daily scanning facility. To enable this feature, go to AVG Internet Security and click on Schedule Scan. On the Scan Parameters screen, create a name for your scan and then, specify which areas and file types you want to scan. Now, choose the Scheduling tabs from the left panel. Tick on the Schedule this scan as you want to run the scan on a daily basis. Call on +44-800-368-9065 to get connected with the technical experts for instant help.
AVG Help Number UK
AVG Activation Code
I read this post fully on the topic of the difference of hottest and earlier technologies, it's remarkable article.
토토
안전놀이터
Excellent items from you, man. I have take into account your stuff previous to and you're just too magnificent. 바카라사이트
바카라사이트 appreciate it for your hard work. You should keep it up forever! Best of luck.
Good information.
Post a Comment