Tuesday, December 11, 2012

Different ways to implement date range comparison logic in Transact SQL


There are multiple ways of implementing date range comparisons in Transact SQL. The purpose of this post is to explain one of the efficient ways by which we can implement date range comparison. The impetus for this post was the fact that of late, there have been lots of occasions where I've been tasked with optimizing code and have come across lots of inefficient ways for doing date comparison.
Now lets consider an example scenario where we require data from a table to be filtered based on date range ie. get data for current day. There are various ways by which we can implement this filter like below


1. SELECT *
FROM Table
WHERE CONVERT(varchar(10),dateField,121) = CONVERT(varchar(10),GETDATE(),121)

2. SELECT *
FROM Table
WHERE YEAR(dateField) = YEAR(GETDATE())
AND MONTH(dateField) = MONTH(GETDATE())
AND DAY(dateField) = DAY(GETDATE())

3. SELECT *
FROM Table
WHERE CONVERT(date,dateField) = CONVERT(date,GETDATE())

4. SELECT *
FROM Table
WHERE DATEDIFF(dd,0,dateField)=DATEDIFF(dd,0,GETDATE())

5. SELECT *
FROM Table
WHERE dateField >= DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)
AND dateField < DATEADD(dd,DATEDIFF(dd,0,GETDATE()),1)

Now lets compare the various methods
Method 1 has two problems. It involves a data type convertion to make value varchar whic strips off the time part and then do comparison between them. The convertion operation is costly and also use of CONVERT function in WHERE will make it Non Sargeable clause which will cause an index if present on dateField to be ignored. This can cause a performance hit especially when dateset is large.

Method 2 involves no conversion but uses integer comparison. The only problem with this is again use of functions in WHERE making them Non Sargeable predicates.

Method 3 is similar to Method 1 with only difference being conversion to date datatype rather than varchar. It has the same issues as Method 1

Method 4 is slightly better as it involves only an integer comparison based on DATEDIFF function. But it still has the issue of predicate being Non Sargeable due to use of functions in WHERE clause field.

Method 5 is an enhancement from Method 4 and relies upon integer sequence of dates. Those who've seen my earlier post on date generation from integers would already know fact that dates are stored as numbers internally in sql server with base date value of 1 Jan 1900 corresponding to 0 and unit value representing a day. This method has additional advantage of not using any functions over search field making it Sargeable.

Of all the above methods I prefer method 5 as it was found to be most effective out of the above in most occasions. The same method can be tweaked to get data over entire month,quarter,year etc.The concept is to make use of integer arithmetic to find period elapsed between required start date and base date and add it from base date to get interval start. Similarly for end date add period elapsed between base date and next period start date and use < operator to get data till closing interval preceding the resultant date.
My suggestion is to use Method 5 wherever possible to implement date comparison logic as it will enable the optimizer to use an index if present on date field which will result in better performance.

7 comments:

  1. That's an interesting post as its something that I think we all encounter. I always prefer the following approach as it doesn't involve the use of any functions in the WHERE clause. DOn't know how it compares performance-wise though.
    What do you think?

    declare @dteStart datetime, @dteEnd datetime
    set @dteStart = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)
    set @dteEnd = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),1)

    SELECT *
    FROM Table
    WHERE dateField between @dteStart and @dteEnd

    ReplyDelete
  2. This is analogous to my method 5 but it has a slight problem as it will cause any records created at midnight of DATEADD(dd,DATEDIFF(dd,0,GETDATE()),1) also to be included which is not what we want. This is because BETWEEN translates to >= AND <= whereas what we want is to exclude the last = to avoid the records in the end boundary getting included.

    ReplyDelete
  3. See the illustration below to understand the difference

    declare @dteStart datetime, @dteEnd datetime
    set @dteStart = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)
    set @dteEnd = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),1)
    declare @t table
    (
    id int identity(1,1),
    dateval datetime
    )

    insert @t (dateval)
    select GETDATE() union all
    select GETDATE()-0.4 union all
    select dateadd(dd,datediff(dd,0,GETDATE()),1)

    SELECT *
    FROM @t
    WHERE dateval between @dteStart and @dteEnd



    SELECT *
    FROM @t
    WHERE dateval >= @dteStart and dateval < @dteEnd

    output
    -----------------------
    id dateval
    -----------------------
    1 2013-01-08 21:59:28.403
    2 2013-01-08 12:23:28.403
    3 2013-01-09 00:00:00.000

    id dateval
    -----------------------
    1 2013-01-08 21:59:28.403
    2 2013-01-08 12:23:28.403

    ReplyDelete
  4. Thanks for your reply. Surely though, the issue of records created at midnight applies to all the solutions that use DATEADD(dd,DATEDIFF(dd,0,GETDATE()),1).
    In any case, my point was that I have always thought it better to set the date variables at the top of the query so they only need to be calculated once, rather than in the WHERE clause where they need to be calculated for every row in the dataset.

    ReplyDelete
  5. Nope. that's not true. Try my illustration above to see the difference. it will not include records created at midnight next day as I'm using >= and <.

    ReplyDelete
  6. Regarding records created at midnight, ok fair enough, I couldn't really care less but my main point was that I have always thought it better to set the date variables at the top of the query so they only need to be calculated once, rather than in the WHERE clause where they need to be calculated once for each and every row in the dataset.
    I'd appreciate your comments on this as, like I said previously, its something we all have to deal with. Thanks.

    ReplyDelete
  7. If there are lots of date range filters based on these values, then it might prove a little better to set values in variables on top rather than repeating it everywhere.
    Similarly use of variables to hold calculated values from parameters will also prove to be beneficial in cases where parameter sniffing happens as use of variables will not cause bad plan to be cached and reused

    ReplyDelete