Archive for the ‘SQL’ Category

Numbering items within groups in MySQL

In my previous post, I described computing row numbers for items within a result set. If items in the result set can be ordered into groups, you can use the same technique to number items within the groups. This is valuable as it allows you to get the top 3, top 5, top N … items within each group (note that GROUP BY isn’t appropriate here; we don’t want to aggregate items within a group, we want an ordering of all the items within a group).

Consider the following sample of data where we’ve done an ordering by the location and age fields (ORDER BY location DESC, age DESC). Every user’s email address is within a group (location) and within that group users are sorted by age in descending order.

SELECT mytable.*
FROM mytable
ORDER BY location DESC, age DESC

ordered data

By using two variables, one to keep track of the item number (@itm) and another to keep track of the current group (@grp), and incrementing the item number if the current group has not changed, else resetting it to 1, we get a numbering of each item within each group. In this example, the group item number tells us who is the 1st oldest, 2nd oldest, etc. user in each location.

SELECT

    
@itm := IF(@grp = location, @itm + 1, 1) AS group_item_number,
    
@grp := location,

    
mytable.*

FROM     mytable,
        (
SELECT @itm := 0) AS itm_num_init,
        (
SELECT @grp := NULL) AS grp_init

ORDER BY location DESC, age DESC

group item numbers

Row numbers in MySQL

Surprisingly straight-forward. Row numbers are useful is cases where you need the rank of a row within a result set.

SET @cur_row := 0;
SELECT mytable.*,
@cur_row := @cur_row + 1 AS row_num
FROM mytable
ORDER BY whatever;

The variable initialization can be done without a SET declaration by doing it as a JOIN:

SELECT mytable.*,
@cur_row := @cur_row + 1 AS row_num
FROM mytable, (SELECT @cur_row := 0) AS row_num_init
ORDER BY whatever;

Solution via Daniel Vassallo’s answer on StackOverflow. A comment mentioned the row numbers being calculated before ordering is done, but I haven’t found this to be the case.

While the above queries will give you the row numbers as expected, putting a HAVING or GROUP BY clause in the same statement might be dangerous. From Xaprb:

The result of assigning to a variable and using it in the same statement (in the HAVING, for example) depends on the query plan the server chooses, the phase of the moon, and probably other things too.

So, in such cases, a subquery is the safe bet:

SELECT sub.* FROM

    
(SELECT mytable.*,
            
@cur_row := @cur_row + 1 AS row_num
    
FROM mytable, (SELECT @cur_row := 0) AS row_num_init
    
ORDER BY whatever) AS sub

HAVING sub.row_num ... ;

Generating a random timestamp in MySQL

Random timestamps are fairly simple to generate and useful when creating test data or fake data for demos. The technique involves converting a MySQL timestamp into unix time, adding or subtracting a certain time range (in seconds), then converting back to a MySQL timestamp.

I’ve used this technique presented on underdog-projects.net before, where a starting timestamp is specified and a random time span is added to the timestamp. However, I wanted to try something slightly different and, in my opinion, a bit more robust. Instead of specifying a starting timestamp, I wanted a pivot timestamp from which a random time span is randomly added or subtracted.

SET @pivot_ts = '2013-01-15 22:15:45';
SET @max_span = 432000; /* 5 days in seconds */
SET @bias = SIGN(-0.5 + RAND());

SELECT FROM_UNIXTIME(
    
UNIX_TIMESTAMP(@pivot_ts) + ( @bias * (FLOOR(RAND()*@max_span)) )
);

Of course all the variables can be made part of the SELECT statement to make everything more succient.