Suppose we have an SQL database table named “temperatures” with the following contents:
LOCATION TIME CELSIUS inside 2018-08-01 24 inside 2018-08-02 28 inside 2018-08-03 21 inside 2018-08-04 28 outside 2018-08-01 29 outside 2018-08-02 31 outside 2018-08-03 25 outside 2018-08-04 30
We want to find the highest temperature for each location. We use the MAX aggregate function and a GROUP BY expression:
SELECT location, MAX(celsius) celsius FROM temperatures GROUP BY location;
As expected, the result is:
LOCATION CELSIUS outside 31 inside 28
Now suppose we also want to know when each of these extreme temperatures occured. Naively, we try the following query:
SELECT location, time, MAX(celsius) celsius FROM temperatures GROUP BY location;
The response is an error: “not a GROUP BY expression”. In a GROUP BY expression all selected columns must be either part of the GROUP BY clause or an aggregate.
To achieve what we want we can use a JOIN:
SELECT t.location, t.time, t.celsius FROM temperatures t JOIN (SELECT location, MAX(celsius) celsius FROM temperatures GROUP BY location) tmax ON t.location=tmax.location AND t.celsius=tmax.celsius;
This query results in multiple rows per location if the maximum temperature was recorded at different times:
LOCATION TIME CELSIUS outside 2018-08-02 31 inside 2018-08-04 28 inside 2018-08-02 28
If we are only interested in the first occurrence of the maximum temperature per location, we can use the following query:
SELECT location, MIN(time) KEEP (DENSE_RANK LAST ORDER BY celsius) time, MAX(celsius) celsius FROM temperatures GROUP BY location;
LOCATION TIME CELSIUS inside 2018-08-02 28 outside 2018-08-02 31
Here we don’t need a JOIN anymore, because select clause for the time column is an aggregate as well.
One thought on “Selecting all columns of a database table with an SQL GROUP BY expression”
Unfortunately KEEP is an Oracle specific keyword, better would be to use a windowing function like partition by.