Saturday, July 7, 2012

Extended analytic and window functions in SQL 2012 -2

This is a follow up post from to explain on some of new additions to OVER clause in SQL 2012 which gives us more flexibility in calculating running counts.

Extending the earlier example, Lets consider scenario below

Analyzing the sample data and query above gives us an idea of how new enhancements to OVER clause in SQL 2012 can be effectively used to calculate various types of cumulative aggregates.
In the above query you can see two different ways of calculating running total. One is simply by using partition and order by while second method uses new enhancement of over which is rows between. rows between causes query engine to identify rows between a range inside the created horizontal partitions (row groups) based on the order as specified by order by clause. The range of rows is identified by means of various clauses as follows

1. unbounded preceding - this suggests to consider every row from start of current partition (group) based on specified order
2. current row - this indicates the engine to consider the current row under consideration
3. x preceding - this indicates to consider from last x rows from current row based on order specified by order by clause
4. x following - same as 3 but suceeding x rows rather than preceding from current row
5. unbounded following - consider till last row of partition from current row based on specified order

There's also an alternate way of specifying range by means of RANGE clause for unbounded and current row range. This works similar to above way with only difference being it considers multiple rows within current range rather than single row in case ordering criteria remains same for multiple rows.

Now lets analyze the output of this query

As you see from above both Totals column are showing  same value. Whilst fisrt column was calculated using standard ORDER BY syntax second one includes new rows range addition of SQL 2012. The next column shows same cumulative total but using RANGE instead of ROWS clause. As you see the only difference it makes are cases where you've multiple rows for same value of ordering column(s) in this case Item_date. ROWS strictly restricts CURRENT ROW to single row whereas RANGE consider all the rows with same value under CURRENT ROW. If you see case of Carona which has duplicate rows for date 13 May 2009 you can see difference in total values. RANGE works exactly like how your standard PARTITION BY...ORDER BY or CROSS APPLY subquery tends to work as it considers entire rows with same date value under current row context. ROWS will just consider each row as an individual block and calculates total relative to it.

I hope this post will be of some help to people who are trying to learn new enhancements for PARTITION BY in SQL 2012. Please give you valuable feedback if you need any further clarification on them.