Friday, July 5, 2013

T-SQL to DAX: Simulating CUME_DIST analytic function


CUME_DIST() analytical function was introduced in SQL 2012 for calculating the cumulative distribution of a value in a group of values.
Its calculated as number of rows with value equal to or less than current value divided by total rows for ascending sorting and vice versa for descending.
Translating this into DAX formula we can derive the formula as

=COUNTROWS(FILTER(StudentTable,EARLIER(StudentTable[Marks])<=StudentTable[Marks] && EARLIER(StudentTable[Subject])=StudentTable[Subject]))/COUNTROWS(FILTER(StudentTable,EARLIER(StudentTable[Subject])=StudentTable[Subject]))

Apply this to sample powerpivot sheet and we will get results as below


The formula uses FILTER() function to get resultset which will have for each row set of rows with value >= current row value (as we're considering descending order of Marks here). Applying COUNTROWS over this gives us count of records in this and we divide it by total count within the subject group. This will give cumulative distribution of each mark within same subject group.
Lets apply corresponding logic in T-SQL and validate the result.

DECLARE @Students table
(
ID int,
Name varchar(100),
[Subject] varchar(20),
Marks int
)
INSERT @Students
VALUES
(1, 'Praveen', 'Maths', 98),
(3, 'Mohan', 'Physics', 88),
(5, 'Mary', 'Maths', 92),
(9, 'Mathew', 'Physics', 90),
(11, 'Prithi', 'Maths', 85),
(17 ,'Mithun', 'Maths', 77),
(23, 'Martin', 'Physics', 82),
(25, 'Kiran', 'Maths', 93),
(27, 'Kumar', 'Chemistry' ,75),
(30, 'Sita', 'Physics', 88),
(32, 'Jophin', 'Chemistry', 62),
(33, 'Gladis', 'Chemistry', 83),
(35, 'Kunal', 'Chemistry', 88),
(36, 'Manu', 'Physics' ,93),
(40, 'Madhur', 'Chemistry' ,91)


SELECT *,
CUME_DIST() OVER (PARTITION BY [Subject] ORDER BY Marks DESC) AS CumeDist
FROM @Students
ORDER BY ID

The result will be as follows

On comparison we can see that this coincides with values we got in calculated column which indicates that DAX simulation applied is correct.