Thursday, April 11, 2013

Demystifying LAST_VALUE function in SQL 2012

The impetus for writing this blog is the ambiguity existing regarding the documentation of LAST_VALUE() function in books online/ MSDN.
Recently I was playing around with new analytic functions of SQL 2012. Was trying to implement the LAST_VALUE functionality when I noticed the below discrepancy.
Consider the below table as an example
I've just shown first and last few records as table has about 1000 rows

Now I'm using below query to get previous,next,first and last prices for each product


SELECT OrderDesc,OrderDate,ProductName,Qty,ItemPrice,
LAG(ItemPrice,1) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS PrevPrice1,
LEAD(ItemPrice,1) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS NextPrice1,
FIRST_VALUE(ItemPrice) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS StartingPrice,
LAST_VALUE(ItemPrice) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS LastPrice
FROM TestTable
WHERE ProductName = 'Product10'
ORDER BY OrderDate

The output for the query is as follows (showing first few rows)

On analyzing the output we can see that FIRST_VALUE() function returns the first value of ItemPrice field in the partition (in this case based on ProductName) which is 201300 whereas for LAST_VALUE result we get the current ItemPrice value repeated for each of the row. This surprised me and I referred to MSDN documentation on the same which is below link
http://msdn.microsoft.com/en-IN/library/hh231517.aspx
 But I couldn't find out a satisfactory explanation for above behavior. I looked for related links and came across the below link which explains details on OVER clause
http://msdn.microsoft.com/en-IN/library/ms189461.aspx

Particularly have a look at below section
If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame. This applies only to functions that have can accept optional ROWS/RANGE specification. For example, ranking functions cannot accept ROWS/RANGE, therefore this window frame is not applied even though ORDER BY is present and ROWS/RANGE is not.

This had the bell ringing for me.It turned out that I ignored the ROWS/RANGE clause in above LAST_VALUE function so it defaulted to the window RANGE UNBOUNDED PRECEDING AND CURRENT ROW. This means for every row in the resultset the window considered was from first value of window (201300 ) till the current row under consideration. This was the reason for returning the same value as last value for each row. FIRST_VALUE still worked fine as window started with the initial value of the partition.
Now that we got the interpretation we can think what change we need to do to the above query to get our intended result. The solution is to consider the entire partition as the window for getting correct last value.
So rewriting the query we will get


SELECT OrderDesc,OrderDate,ProductName,Qty,ItemPrice,
LAG(ItemPrice,1) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS PrevPrice1,
LEAD(ItemPrice,1) OVER (PARTITION BY ProductName ORDER BY OrderDate) AS NextPrice1,
FIRST_VALUE(ItemPrice) OVER (PARTITION BY ProductName ORDER BY OrderDate ROWS between unbounded preceding and unbounded following) AS StartingPrice,
LAST_VALUE(ItemPrice) OVER (PARTITION BY ProductName ORDER BY OrderDate ROWS between unbounded preceding and unbounded following) AS LastPrice
FROM TestTable
WHERE ProductName = 'Product10'
ORDER BY OrderDate

Lets now analyze the results of modified query



Comparing this to the initial table result we can see that it now picks last value as 89000 within product partition for Product 10 which is correct.
Thought of putting this in a blog as I couldn't see any documentation which clearly explains the reason for this observed behavior.
Feel free to comment if you need any more details on this.
The table creation scripts for above used table can be found in the below link
https://docs.google.com/file/d/0B4ZDNhljf8tQTVNRemFoZE5aSDA/edit?usp=sharing