Pages

Sunday, April 6, 2008

GROUP BY, HAVING, SUM, AVG, and COUNT(*)

FACTOID: This is the 2nd most popular page on this blog.

There are links to other essays at the bottom of this post.

UPDATED Friday Nov 19, 2010

Aggregation

You can use a SQL SELECT to aggregate data. Aggregation combines rows together and performs some operation on their combined values. Very common aggregations are COUNT, SUM, and AVG.

The simplest use of aggregations is to examine an entire table and pull out only the aggregations, with no other columns specified. Consider this SQL:

SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sumSales
      ,AVG(sale_amount) as avgSales
  FROM orders

If you have a very small sales order table, say about 7 rows, like this:

ORDER |  DATE      | STATE | SALE_AMOUNT
------+------------+-------+-------------
 1234 | 2007-11-01 | NY    |       10.00
 1235 | 2007-12-01 | TX    |       15.00
 1236 | 2008-01-01 | CA    |       20.00
 1237 | 2008-02-01 | TX    |       25.00
 1238 | 2008-03-01 | CA    |       30.00
 1237 | 2008-04-01 | NY    |       35.00
 1238 | 2008-05-01 | NY    |       40.00

Then the simple query above produces a one-row output:

CNT  | SUM  | AVG
-----+------+-----
  7  | 175  |  25

Some Notes on The Syntax

When we use COUNT(*) we always put the asterisk inside.

Note that the example names the output columns by saying "as sumSales" and "as avgSales". This is important because without it we will get whatever the database server decides to call it, which will vary from platform to platform, so it is a good idea to learn to use the "AS" clause.

The WHERE Clause Filters BEFORE the Aggregation

If you want to get just the sales from New York state, you can put a WHERE clause in:

SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sumSales
      ,AVG(sale_amount) as avgSales
  FROM orders
 WHERE state = 'NY'

...and you will get only the results for NY:

CNT | SUM  | AVG
----+------+----------
  3 |  85  |  28.33333

Notice of course that the average has a repeating decimal. Most databases have a ROUND function of some sort, so I can correct that with:

SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sum
      ,ROUND(AVG(sale_amount),2) as avg
  FROM orders
 WHERE state = 'NY'

...and get:

CNT | SUM  | AVG
----+------+----------
  3 |  85  |  28.33

The Fun Begins With GROUP BY

The query above is fine, but it would be very laborious if you had to issue the query (or write a program to do it) for every possible state. The answer is the GROUP BY clause. The GROUP BY clause causes aggregations to occur in groups (naturally) for the columns you name.

SELECT state,
      ,COUNT(*) as cnt
      ,SUM(sale_amount)          as sumSales
      ,ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state

Which gives us this result:

STATE | CNT | SUM  | AVG
------+-----+------+----
NY    |  3  |  85  |  28
TX    |  2  |  40  |  20
CA    |  2  |  50  |  25  

Every Column a GROUP BY or Aggregate

When you use the GROUP BY column then every column in the output must either be a group by column or must be an aggregate function. To understand this, imagine we put "Date" into the query above:

SELECT state,
     , date -- huh?? which value should we get??
     , COUNT(*) as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state

Several states have more than one row in the database, so the database server has to decide which value of DATE to give you. Since it cannot know which one you want, it throws an error and says in short, "don't confuse me!"

Two More Aggregations, MIN and MAX

If we think again about the DATE column, in most practical situations we usually want to know the smallest or largest value, or both, so this query is not uncommon:

SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state

which yields:

STATE | minDate    | maxDate    |CNT | SUM  | AVG
------+------------+------------+----+------+-----
NY    | 2007-11-01 | 2008-05-01 | 3  |  85  |  28
TX    | 2007-12-01 | 2008-02-01 | 2  |  40  |  20
CA    | 2008-01-01 | 2008-03-01 | 2  |  50  |  25  

HAVING Clause is Like WHERE after GROUP BY

The HAVING clause lets us put a filter on the results after the aggregation has taken place. If your Sales Manager wants to know which states have an average sale amount of $25.00 or more, then the query would look like this:

SELECT state,
      ,COUNT(*) as cnt
      ,SUM(sale_amount)          as sumSales
      ,ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
HAVING AVG(sale_amount) >= 25

Which gives us this result, notice that Texas is now missing, as they were just not selling big enough orders (sorry 'bout that Rhonda).

STATE | CNT | SUM  | AVG
------+-----+------+----
NY    |  3  |  85  |  28
CA    |  2  |  50  |  25  

When to use WHERE, When to use HAVING

Then the Sales Manager might come down and say, 'I don't want the states who have no sales after December 2008'. We might automatically code the following, which is tragically wrong:

SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 -- WRONG! Will filter out individual rows!
 WHERE date <= '2008-12-31'
 GROUP BY state

The problem here is that individual rows that happened after 2008-12-31 will get filtered out, which will give you all stats for all states on sales before 2009. That is not right. The idea is to completely eliminate all results for states with no sales in 2009 or later, even if they had sales before that time. So we use MAX and the HAVING clause:

SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
HAVING MAX(date) >= '2008-12-31'

Using All Three

You can pull some pretty nice results out of a database in a single query if you know how to combine the WHERE, GROUP BY, and HAVING. If you have ever worked with a Sales Manager, you know they constantly want to know strange numbers, so let's say our Sales Manager says, "Can you tell me the average order size by state for all orders greater than 20? And don't bother with any average less 30.00" We say, "Sure, don't walk away, I'll print it out right now."

SELECT state
      ,COUNT(*)
      ,SUM(sale_amount) as sum
      ,ROUND(AVG(sale_amount) as avg
  FROM orders
 WHERE sale_amount > 20
 GROUP BY state
HAVING avg(sale_amount) >= 30
   AND max(date) >= '2008-12-31'

How to Do a Weighted Average

Consider the case of a table that lists test, homework and quiz scores for the students in a certain course. Each particular score is worth a certain percentage of a student's grade, and the teacher wants the computer to calculate each student's file score. If the table looks like:

STUDENT     | WEIGHT | SCORE
------------+--------+-------
NIRGALAI    |     40 |    90
NIRGALAI    |     35 |    95
NIRGALAI    |     25 |    85
JBOONE      |     40 |    80
JBOONE      |     35 |    95
JBOONE      |     25 |    70
PCLAYBORNE  |     40 |    70
PCLAYBORNE  |     35 |    80
PCLAYBORNE  |     25 |    90

Then we can accomplish this in one pull like so:

SELECT student
      ,SUM(weight * score) / 100 as final
  FROM scores
 GROUP BY student

The nice thing about this query is that it works even if data is missing. If a student missed a test, they automatically get a zero averaged in.

Conclusion: Queries Are Where It's At

The only reason to put data into a database is to take it out again. The modern database has powerful strategies for ensuring the correctness of data going in (the primary key, foreign key and other constraints) and equally powerful tools for pulling the data back out.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Other essays relating to SQL SELECT are:

72 comments:

  1. This is so timely. You saved me countless hours of work.

    Well I would probably have found the solution somewhere else, but your page was high up in the google result list ("group by" avg), and your explanation is easy to understand even for somebody whose DB experience consists of a few attempts to use MS Access.

    Many thanks.

    ReplyDelete
  2. Is it possible to add values returned from a count function? My count function works properly, and I need to add the values resulting from the count function.

    Can the sum function be used on a count function? - example sum(count(*)) - not sure of the actual syntax. Thanks in advance!

    See below for code sample -

    select unique brtm.BROKER_TEAM_NAME, shhd.FILE_NO,
    (select count(shhd.BROKER_TEAM) from tpsdba.shipment_header shhd
    where brtm.BROKER_TEAM = shhd.BROKER_TEAM
    and (shhd.DATE_ENTRY >= '20080401' and shhd.DATE_ENTRY <= '20080404') )
    as NumEntries,
    (select Count(*) from tpsdba.ci_lines cidt
    where cidt.FILE_NO=shhd.FILE_NO) As CINum
    from tpsdba.broker_team brtm
    inner join
    tpsdba.shipment_header shhd
    on
    brtm.BROKER_TEAM = shhd.BROKER_TEAM
    where brtm.BROKER_TEAM = 'KXH'
    and (shhd.DATE_ENTRY >= '20080401' and shhd.DATE_ENTRY <= '20080404')

    ReplyDelete
  3. Rob: In principle yes, you can sum counts. Your example shows a sub-query, which I have not written up on this series yet, but in short you can do a count in a subquery and then SUM the results in the outer query. Your example would be easier to work with if you stripped out everything except the count/sum columns, so off the top of my head you might have something like:


    SELECT sum(thecount) as sum
    from (select count(*)
    as theCnt
    from subtable
    where...
    ) x

    ReplyDelete
  4. help please is this possible the scenario is this....

    say i have a table named employee with the ff values.

    employeesal employee employeegroup
    100 a AA
    200 b AA
    300 c AB
    400 d AB
    500 e AA

    i need to count employee by employeegroup and after that when i count them i need each row of count of employeegroup divided by the sum of employeesal.

    this is my query...

    select employeegroup, count(employeegroup), employeesal/select count(employeegroup) ... as total
    .....


    i cant seem to get it correctly the result of the toal is zero is there any work around query for this please email me thanks so much sherwin@pickle.ph

    ReplyDelete
  5. Your explanations and examples are clear and straightforward - hard to find sometimes.

    I ran into a weirdness (or maybe not) recently. A co-worker used GROUP BY...HAVING but reversed the lines, as in

    SELECT col1, count(1)
    FROM tab1
    HAVING count(1) > 1
    GROUP BY col1

    Strangely, it worked!

    ReplyDelete
  6. @anonymous: different servers enforce different rules for the ordering of the phrases. Back in foxpro you could mix them up any way you wanted.

    ReplyDelete
  7. This is great information. Now what I don't understand is how to white the code/php that will display this information

    ReplyDelete
  8. Awesome.. so helpful. Clear and concise

    ReplyDelete
  9. thanks man.
    I will have some examination tonight.

    ReplyDelete
  10. I am not entirely sure if this query will actually perform. You listed:

    SELECT state
    ,COUNT(*)
    ,SUM(sale_amount) as sum
    ,ROUND(AVG(sale_amount) as avg
    FROM orders
    WHERE sale_amount > 20
    GROUP BY state
    HAVING avg(sale_amount) >= 30
    AND max(date) >= '2008-12-31'

    Is there a ")" missing after ,ROUND(AVG(sale_amount) <-? Also, HAVING avg(sale_amount) >= 30 should probably be HAVING AVG(sale_amount) >= 30 or HAVING avg => 30, no?

    Besides that, I really enjoyed reading your posts and straight forward explanations on this blog!

    ReplyDelete
  11. Thanks for the explanations but I have a doubt in the following query ->

    SELECT state,
    , MIN(date) as minDate
    , MAX(date) as maxDate
    , COUNT(*) as cnt
    , SUM(sale_amount) as sumSales
    , ROUND(AVG(sale_amount),0) as avgSales
    FROM orders
    GROUP BY state
    HAVING MAX(date) >= '2008-12-31';


    In the above query, you said that we want to eliminate all states which have orders in 2009 or later, so shouldn't it be max(date)<='2008-12-31'; ??

    ReplyDelete
  12. great article, really easy to understand, two thumbs up :D

    ReplyDelete
  13. I need help. I have some queries to do. Which categories have the longest and shortest average film lengths? I did the following queries:

    --query to show the shortest avg(length)

    select name, round(avg(length),2) "AVERAGE"
    from film, film_category, category
    where film.film_id = film_category.film_id
    and
    film_category.category_id = category.category_id
    group by name
    having round(avg(length),2) <= all (
    select round(avg(length),2)
    from film, film_category, category
    where film.film_id = film_category.film_id
    and
    film_category.category_id = category.category_id
    group by name)



    --query to show the longest avg(length)
    select name, round(avg(length),2) "AVERAGE"
    from film, film_category, category
    where film.film_id = film_category.film_id
    and
    film_category.category_id = category.category_id
    group by name
    having round(avg(length),2) >= all (
    select round(avg(length),2)
    from film, film_category, category
    where film.film_id = film_category.film_id
    and
    film_category.category_id = category.category_id
    group by name)

    How can I join both queries to show the longest and shortest avg(length) at the same output?

    ReplyDelete
  14. How i can call specific value from a column between period /one column dates/ and sum values/other column/

    ReplyDelete
  15. SELECT `names`, `date`, SUM(`money`) FROM testimport WHERE `names` = 'namevariable' GROUP BY `names`
    Is that correct for the above query?

    ReplyDelete
  16. I am trying to do a count on an aggregate, and I am not having any luck. How can I get a count for one column when specific criteria are met in other columns?

    Here is what I have so far? but I need for the o3.hdid to count the number of times that occurs if the other criteria in the report are met:

    SELECT DISTINCT(P.NAME),P.DOB

    FROM obs o

    INNER JOIN PERSON p ON o.PID = p.PID

    INNER JOIN OBS o2 ON p.PID = o2.PID

    AND o2.HDID = 1000

    AND o2.obsvalue > 30

    INNER JOIN obs o3 ON p.PID = o3.PID

    AND o3.HDID = 2000
    AND O3.OBSVALUE > '01-JAN-15'

    WHERE o.HDID IN (3000, 4000)

    AND o.obsdate between CURRENT_DATE-276 AND CURRENT_DATE + 365

    ReplyDelete
  17. Many thanks for this simple explanation. I think I found a little missmatch in the way the names for the results are coded and the data output columns are named: You code "as sumSales" ehre the data output is named "SUM" which was a little confusing to me at first.

    best,

    Benjamin

    ReplyDelete
  18. I have this query ... which returns the perfect value

    SELECT COUNT(*) as cnt
    ,SUM("KVA") as sumKVA
    ,AVG("KVA") as avgKVA
    FROM public."Trial1" "1" where "ASSETID" = '50000153'

    AND "READ_DATE"::text like '2015-06-01 %:%:%'

    ...............................

    But now I want to get multiple results for different months but in single query and in single table

    ReplyDelete
  19. I have this query in PostGreSQL ... which returns the perfect value

    SELECT COUNT(*) as cnt
    ,SUM("KVA") as sumKVA
    ,AVG("KVA") as avgKVA
    FROM public."Trial1" "1" where "ASSETID" = '50000153'

    AND "READ_DATE"::text like '2015-06-01 %:%:%'


    Output:-
    cnt sumkva avgkva
    48 98527.9 2052.7

    ...............................

    But now I want to get multiple results for different dates of the month (01, 02, 03 ....so on) but in single query and in single table


    Something like :--

    cnt sumkva avgkva
    48 98527.9 2052.7
    48 ****** *****
    47 &*&*&* ******


    I am stuck with this could you please help me out. I have already tried Group BY feature for Date -- but since it contains time the query is returning the same value as in table.

    ReplyDelete
  20. Help full post for those people who use computer security. All issue related with your computer Antivirus and security. We provide online solution. For more details
    avg technical support number,
    avg support phone number,
    avg tech support number,
    avg helpline number,
    avg live support,

    ReplyDelete
  21. Unfit to Solve PostgreSQL Database Query Problem in PHP? Contact to Postgres SQL Linux Support
    Finding any multifaceted nature with respect to PostgreSQL database request issue in PHP? Or on the other hand if you don't have thought how to deal with this issue by then get the best Postgres reinforce through Cognegic's Postgres SQL Support for Windows or PostgreSQL Relational Database Service. Our master authorities fittingly direction you on the most capable technique to quickly setup and tune your PostgreSQL database. With the help of PostgreSQL Remote Database Service you can without a doubt redesign and open up the execution of your entire PostgreSQL database with no kind of frustration and issue.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  22. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    occupational health and safety course in chennai

    ReplyDelete
  23. Karena persentase kemenangan anda justru akan lebih besar dari pada kekalahan anda. Dan anda akan lebih diuntungkan lagi kalau mendapatkan kartu Ceme.
    asikqq
    pionpoker
    dewaqq
    bandar ceme
    sumoqq
    hobiqq
    paito warna
    interqq
    forum prediksi

    ReplyDelete
  24. Go to the back of subscription card and find your 25 digits code. Use of Norton product key at norton setup to verify your subscription.

    ReplyDelete
  25. To activate AVG ultimate subscription, it is advisable to download and then install the latest version of AVG antivirus further open the user interface of the software then go to the “subscription” section then on the subscription screen click “enter a valid activation code.”
    Avg Support UK

    ReplyDelete
  26. Awesome post. Good Post. I like your blog. You Post is very informative. Thanks for Sharing...The Main vision of the India post is to products and services to the customers. The Main mission is to sustain the worlds Largest Postal Network. Aspirants who are taking a part in the Indian Post office Recruitment 2020 feel honor as they part of the Communication and Service... share more details.
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete
  27. Wanted to Know Which Purifier is best for your Home. Here is the List of Top 10 Water Purifiers for Home and water purifiers in india . Still not clear about which Water Purifiers to choose Let us know in the comment section.

    ReplyDelete
  28. If Big Data is a job that you're dreaming of, then we, Infycle are with you to make your dream into reality. Infycle Technologies offers the best Hadoop Training in Chennai, with various levels of highly demanded software courses such as Oracle, Java, Python, Big Data, etc., in 100% hands-on practical training with specialized tutors in the field. Along with that, the pre-interviews will be given for the candidates, so that, they can face the interviews with complete knowledge. To know more, dial 7502633633 for more.Big Data Hadoop Training in Chennai | Infycle Technologies

    ReplyDelete
  29. 카지노사이트 Your post is very helpful and information is reliable. I am satisfied with your post. Thank you so much for sharing this wonderful post.

    ReplyDelete
  30. 카지노사이트 Hello, i think that i saw you visited my weblog so i came to “return the favor”.I’m attempting to find things to enhance my website!I suppose its ok to use a few of your ideas!!

    ReplyDelete
  31. 스포츠토토 After going over a few of the blog posts on your web page,
    I honestly like your way of writing a blog. I book-marked it to my bookmark webpage list and will be checking back soon. Take a look at my
    web site too and let me know how you feel.

    ReplyDelete
  32. 스포츠토토 하는법 I'm truly enjoying the design and layout οf your website.
    Ιt's a very easy on the eyes which makes іt much more pleasant foor me to cоme here and visiit moге ⲟften.

    ReplyDelete
  33. Looking forward to read such knowledgeable articles Feel free to visit my website;
    야설

    ReplyDelete
  34. It will be easy to write down superior write-up that way. Feel free to visit my website; 한국야동

    ReplyDelete
  35. It’s simple, yet effective. A lot of times it’s difficult to get that “perfect balance” between usability and visual appeal. Feel free to visit my website; 일본야동
    일본야동
    국산야동
    일본야동
    한국야동

    ReplyDelete
  36. I really appreciate individuals like you! Take care!! 일본야동

    ReplyDelete
  37. Thank you so much for sharing such a nice Article. Looking forward for new posts. Thank You...
    ACP Sheets

    ReplyDelete
  38. wordpress design services agency Need professional WordPress Web Design Services? We're experts in developing attractive mobile-friendly WordPress websites for businesses. Contact us today!

    ReplyDelete
  39. I totally agree with the text. I can get correct and updated information. 토토사이트

    ReplyDelete
  40. Here are some of the benefits of Dell Boomi training:

    Learn the latest Boomi features and capabilities
    Gain hands-on experience with the platform
    Prepare for the Boomi certification exams
    Enhance your resume and increase your job prospectsdellbhoomi

    ReplyDelete
  41. nice article
    ca coaching in hyderabad
    ca foundation course
    ca intermediate course
    ca final course

    ReplyDelete
  42. Nice blog
    cma coaching in hyderabad
    cma foundation course
    cma intermediate course
    cma final course

    ReplyDelete
  43. this content is valuable and simple, I’ve learned a lot here.

    ReplyDelete
  44. I’ve got you saved as a favorite to look at new stuff you post…

    ReplyDelete
  45. Great Post! I look forward to seeing more from this blog here.

    ReplyDelete
  46. Thank you for sharing this amazing idea, really appreciates your post.

    ReplyDelete
  47. Thank you for taking the time to publish this information very useful!

    ReplyDelete
  48. You definitely know how to keep a reader amused.

    ReplyDelete
  49. I enjoyed reading this articles. Very good message you've done.

    ReplyDelete
  50. this site very interesting, so many useful information in there.

    ReplyDelete
  51. Very nice put up, i certainly love this website, keep on it.

    ReplyDelete
  52. "Great article, felt good after reading, worth it.
    i would like to read more from you.
    keep posting more.
    also follow Mern Stack course in hyderabad"

    ReplyDelete
  53. Great article, felt good after reading, worth it.
    i would like to read more from you
    keep posting more.snowflake course training in hyderabad

    ReplyDelete
  54. Here’s a comment based on your SQL content with the requested URL added:

    The use of SQL aggregation functions like GROUP BY, HAVING, SUM, AVG, and COUNT(*) is essential for summarizing data in a meaningful way. These operations allow you to group rows together and perform calculations like counting, summing, or averaging values within each group. Whether you're analyzing a sales database or working with other datasets, aggregation is a powerful tool for simplifying complex queries and making data-driven decisions.

    For more insights into improving your digital strategies and website analytics, check out https://tdigitalagency.com/.

    ReplyDelete
  55. Here’s a comment based on your SQL content with the requested URL added:

    The use of SQL aggregation functions like GROUP BY, HAVING, SUM, AVG, and COUNT(*) is essential for summarizing data in a meaningful way. These operations allow you to group rows together and perform calculations like counting, summing, or averaging values within each group. Whether you're analyzing a sales database or working with other datasets, aggregation is a powerful tool for simplifying complex queries and making data-driven decisions.

    For more insights into improving your digital strategies and website analytics, check out https://tdigitalagency.com/.
    2025 M01 29 06:58

    ReplyDelete