Saturday, February 21, 2015

In Memory Tables vs Table Variables - Comparison

Recently there was a casual discussion with one of my colleagues who was asking me regarding the newly introduced memory-optimized tables in SQL Server 2014. The discussion went through few aspects of memory-optimized tables and their comparison against table variables. This blog summarizes the points that came out of the discussion. Thought of sharing it for the benefit of larger audience

What are memory optimized tables?
Memory-optimized tables are special type of table which resides in main memory. They got introduced in SQL Server 2014 version and is a method of implementing in-memory OLTP. They are transactional in nature and can preserve durability if required. Another important aspect of memory-optimized tables are that rows support versioning. This means that at any point of time the table preserves multiple earlier versions of its rows.
Another important feature of these tables are that you can make them durable which is the default behavior or keep them as non durable in which case the data doesn't get persisted in disk. This will help you to save some disk IO, but at the expense of data loss in case of a server crash or restart.
The memory-optimized tables can be accessed using normal adhoc T-SQL statements or using natively compiled stored procedures.
What is a table variable?
Table variables are variables of table data type. Tables variable reside in main memory until they get swapped out to disk for space. They also does not store distribution statistics information within them. They also cause no recompiles. Tables variables also does not allow modification using ALTER TABLE. Explicit indexes cannot be created in table variables until SQL Server 2014
 Now that we have seen what a memory optimized table and table variables is we will now see some differences in behavior between them

Differences between memory optimized vs table variables

When you consider scope memory optimized tables have scope same as actual tables with a slight change. The memory optimized tables will be available across all connections. They come in two flavors - durable (default) and non durable. In the case of durable tables, they preserve the schema as well as the data as they're persisted to the disk whereas in the case of non durable tables only the schema is persisted. This means that during a server restart the tables gets recreated but without any data in them. The durability can also be made fully durable or delayed durability with the latter having a risk of causing some data loss in case of an intermediate server restart/failure.
In contrast the table variables have scope only within a batch in which they're declared. This is evident from the below example

declare @t table

id int,
val varchar(10)
insert @t
values (1,'Test'),

select * from @t


The first select works whereas second one does not as its outside the batch in which table variable is created (GO represents the end of that batch)

Tempdb Usage
Memory-optimized tables will reside in main memory itself.So they wont use up any tempdb resources at all!
On the other hand table variables will be in main  memory only until the space requirements is within the  available space of cache. Once there is a requirement for extra space it utilizes the tempdb. Hence while using table variables for large datasets you would see higher usage of tempdb.

Transaction Support
Memory-optimized provides transaction support similar to normal tables. They have two ways of handling transactions based on whether we want it to be fully durable or delayed durable. The former will cause the changes made by the transaction to be persistent whereas the latter would be durable   only after the in-memory transaction log is persisted to disk.
Table variables on the other hand doesn't support transactions. This is very evident from the below example snippet

begin tran testtemptabletran
declare @t table

id int,
val varchar(10)
insert @t
values (1,'Test'),

rollback tran testtemptabletran
select * from @t

Table Statistics
Memory-optimized tables store statistics info just like normal tables. But there is a difference in the way statistics gets updated for memory-optimized tables against normal disk persisted tables. Whilst disk based tables update statistics automatically in the case of memory-optimized tables we have to do this manually. We need to make use of CREATE STATISTICS statement for this purpose. We need specify  NORECOMPUTE clause while creating/updating statistics for memory-optimized tables as they does not update the statistics themselves. We should also specify FULLSCAN mode as memory-optimized tables only support Fullscan statistics which is a time consuming process as against disk based tables which uses sampled statistics as the default option.
In comparison table variables store only limited distribution statistics. This is why quite often you find wrong cardinality estimates in the execution plan when query involves table variables.

So as seen from above points both have their own pros and cons.
The main advantage of memory optimized tables are that they are in main memory itself unless you want to persist data to disk. So question is whether these advantages can be coupled with table variables to get more advantage. Fortunately we have a way of doing this.

The Hybrid Approach - Using Memory Optimized Table Type Variables
For creating memory optimized table variables what we can do is to create a table type which is memory-optimized. Once this is done we can create a variable for this table type which will be memory-optimized in nature. This will give merge the advantages of both tables variables as well as memory-optimized objects.
The main among them are

  • The memory-optimized table type variables never get swapped to disk as they are now memory-optimized. As such there is no tempdb usage and associated IO costs
  • The memory-optimized table type variables also take advantage of memory-optimized algorithms and can be accessed using natively compiled procedures for more efficiency
  • One more advantage of memory-optimized table type variables over memory optimized tables is that since they're non transactional in nature they support cross db queries whereas normal memory-optimized tables will not allow cross db access. This comes handy especially in the case where you want data to copied across multiple db tables where memory-optimized table type variables can be used as a medium for data transfer
We can create memory-optimized table type variable using below code

      Col1 [int] NOT NULL,

      Col2 varchar(100) NOT NULL,

      INDEX [IX_Col1] HASH (Col1)

            WITH ( BUCKET_COUNT = 8)



Now you can use this variable in dml operations like below



      (345,'Blah blah')

From the above points we can clearly see the advantages of conventional table variables vs the new memory-optimized tables usage. This also gives us some insights into how the two concepts can be merged to get a hybrid solution which merges the advantages of each.


Tuesday, January 13, 2015

Implementing Automatically Tidying Up Adhoc Jobs in SQLServer Agent

Business Case
Recently I had an opportunity to inherit a system from a client for doing some enhancements. One of the tasks involved in the project was also to clean up the system which had lots of left overs like unused tables, stored procedures etc. Along with them one thing I noticed was the presence of lots of jobs which are not executed at all but still existing on the production environment. On inquiring on this I was told that these were created by developers for performing activities which would require onetime execution of scripts like for example flushing out old data, applying quarter level financial calculations etc. As a policy the db admin team were creating jobs with a one time schedule for performing them unattended as most of activities had to be performed on weekends or out of office hours.

Issue with the above approach was that jobs were left in the SQLServer agent even after required task was performed and it was left for someone to go and tidy up them from the server. When I inherited the system it was a big mess with lots of unwanted jobs retained in the instance and it was a time consuming process to identify and remove the unwanted jobs from the system.

There is an easier solution for scenarios like above to create auto cleanup jobs which takes care of  the tidy up part by itself once the required task is performed. This blog explains how we can create the jobs with the auto cleanup capability.

 Consider the case of deleting oldest quarter data from the table at the beginning of each quarter. This would usually be done over the weekend as deleting bulk data would take considerable amount of time and also there would be some following task like index reorganizing etc.
We can utilize a one time scheduled job for this purpose. There's an option in the job properties which is seldom used which enables the job to do auto tidy up once the task is performed. The property is set as below in jobs notification properties

Once you set this property the job will get automatically destroyed once it runs successfully as per the set schedule.

One caveat to note here is that since the job gets deleted upon successful execution you will not be able to check the history to see if job executed successfully.
So if you need a mechanism to ensure the job executed successfully before getting deleted what you could do is to add a dummy step in the end with just a print statement as below

And then configure to log the output to a file

Once the job is executed and removed you can go and check the output file to see the logged message indicating successful job execution.

This method can be used to create auto tidying up jobs with notification which will help you to automate adhoc one time tasks based on a predefined schedule. This will help us to make sure the jobs get auto cleaned up after performing the requested tasks and would certainly save some amount of developer effort in getting this cleaned up.

Saturday, December 20, 2014

SSMS Table Designer Not Recognizing Rowversion Datatype

This blog is the result of a recent discussion I had with my fellow colleagues. 
One of my colleagues asked me why rowversion is not recognized as a valid datatype with SQLServer. I was surprised at this statement and told that I don't believe that is correct. He replied that he can illustrate it and showed the below on his system.
He launched an instance of SQLServer Management Studio (SSMS) which belonged to SQL 2014 Express Edition and tried creating a table using the table designer. The moment he tried to add a column of rowversion datatype and save it he got the below error message

I checked the datatype listing within the designer and found that its not having rowversion included as a type within which was a bit surprising!
This was the reason why it errored out.So the only solution seems to be to select timestamp as the datatype. But the irony is that timestamp is deprecated from SQL 2008 onwards. I've check this in 2012 and 2008 R2 versions of SSMS and there also the behavior is the same. 
I told him that this is a limitation of the designer and you can very well create rowversion field if you use T-SQL CREATE TABLE statement. He tried it out and confirmed that it indeed worked. But here also if you try to script out the table from SSMS it will still show column as of timestamp type only as it seems systypes still does not have the change implemented as per the connect item below

Apparently rowversion and timestamp are synonyms of one and the same thing. However ANSI recommends using rowversion and timestamp has been deprecated for use in all fresh development activities.
I've never been a fan of the table designer in SSMS. I prefer using T-SQL scripts for all DDL activities. This just adds one more reason for my preference.
Hope this blog will give a word of caution to anyone trying to create tables using the designer and want to declare a rowversion type field. I would suggest strongly considering T-SQL scripts for activities like table creation,constraint addition etc.

Tuesday, December 16, 2014

T-SQL Tips: Customizing CDC for Oracle Service Implementation in SQLServer

The purpose of this blog is to explain a quick tip which you can implement to customize the default functionality of CDC for Oracle service in SQLServer to help you overcome the limitation posed by the default deployment script.

Business Case
Recently in one of projects CDC for Oracle was implemented and over a weekend there was some failures with our ETL processes due to some spurious data coming from the source. This prompted us to clear the residual data for the week and rerun the jobs. We had CDC as the first layer which would pull data from the source Oracle instances to the SQLServer  instance where datawarehouse resided. Due to some data issues some of ETL had failed over the weekend so we were trying for a rerun of the jobs. To our surprise we were not getting any records for the jobs which were rerun. On analysis we found CDC tables to be empty. This was not certainly not expected by us.

On analysis the reason was found as follows.
By default the data retention period for CDC was set as 3 days by the deployment script which is why the data was getting purged by the CDC cleanup job so soon. In our case this retention period was less as in case of a long weekend by the time we realize the job failure we may not have data available in CDC tables for the failed days due to the short retention period

The obvious solution to the above issue was to increase the data retention period. There's a system stored procedure which can be used for this purpose which is sp_cdc_Change_job

The procedure has a parameter called retention which can be passed to override the default value set by the deployment script. The parameter designates the value in minutes that would be set for the data retention period. So you can set value as per your requirement. So in our case I set it to 7200 (5 * 24 * 60)  to make CDC retain the change data for 5 days. For this the complete sql command would like below

EXEC sys.sp_cdc_change_job @job_type= 'cleanup',@retention = 7200

Now some details on sp_cdc_change_job sp. The procedure has a set of parameters which can be used to override the default settings of CDC jobs. Some of the parameter we may be interested in addition to the above are

@pollinginterval - This would indicate the interval at which the Oracle log has to be polled to capture the changes. This has to be used in conjunction with the @continuos parameter ie only when @continuos = 1 then polling interval can be set as this indicates the polling has to be done continuously rather than as a one time activity

@threshold - This will indicate the number of entries that can be deleted using a given statement

@maxtrans - This indicates the maximum number of transactions that can be processed in a single cycle.

All these parameters can be utilized to override default settings of CDC service as per your requirement

The current configured values for the above parameters for CDC instance can be viewed by querying the 
dbo.cdc_jobs table

Saturday, November 15, 2014

T-SQL Tips: UDF To Swap The Parts Of A Datetime Value


There was a requirement recently which required changing only the year portion of one of the date fields.
The business scenario was to extend validity of insurance details on a table.
I've also come across similar requests in forums also asking whether its possible to swap one or more parts of a datetime value without affecting the other parts of the date


This article provides a solution for scenarios like above.
The logic for these type of scenarios uses date function DATEADD to do an integer arithmetic over datetime values to swap the required parts. Since dates are represented internally as numbers in SQL Server its very easy to do integer operations with date values.


Consider the below illustration

declare @date datetime = '20141020 16:35'

Now let the requirement be that we need to swap the year part of the date with 2020. We can do it by taking the offset between current year and new year value and add it to the date value ie 2020-2014

Translating it using DATEADD we can use

SELECT DATEADD(yy,@Year-YEAR(@date),@date) AS YearSwappedDate

Now see the result

As you see the year part gets swapped correctly for the passed value keeping the other parts unchanged

Now lets try for year and month parts together. Going again by the same theory we can use an expression as below

declare @date datetime = '20141020 16:35'
declare @Year int = 2020
declare @Month int = 9
SELECT DATEADD(mm,@Month -MONTH(@date)+ ((@Year-YEAR(@date))*12),@date) AS YearMonthSwappedDate

What we're doing here is take the offset between month and year parts and doing addition with that. I'm reducing year also to month level by multiplying 12 so that we can just use a single DATEADD call to do the addition. The result for this looks like below

Now lets go one more step ahead and do replacement of day part also. Again extending the logic we can write expression as below

declare @date datetime = '20141020 16:35'
declare @Year int = 2020
declare @Month int = 9
declare @Day int = 12

SELECT DATEADD(mm,@Month -MONTH(@date)+ ((@Year-YEAR(@date))*12),@date) + @Day - DAY(@date)

This provides us with the results as below

Generic Function

Lets convert this into a generic UDF. I've also extended it to add the hour minute and second parts.
The UDF would look like below

CREATE Function SwapDateParts
@InputDate datetime,
@SwapYear int = NULL,
@SwapMonth int = NULL,
@SwapDay int = NULL,
@SwapHour int = NULL,
@SwapMinute int = NULL,
@SwapSecond int = NULL
RETURNS datetime
DECLARE @SwappedDate datetime
SET @SwappedDate = DATEADD(ss,(COALESCE(@SwapHour-DATEPART(HOUR,@InputDate),0)*60*60) + (COALESCE(@SwapMinute-DATEPART(MINUTE,@InputDate),0)*60)+COALESCE(@SwapSecond-DATEPART(SECOND,@InputDate),0),DATEADD(mm,COALESCE(@SwapMonth -MONTH(@InputDate),0)+ (COALESCE((@SwapYear-YEAR(@InputDate))*12,0)),@InputDate)) + COALESCE(@SwapDay - DAY(@InputDate),0)
RETURN @SwappedDate

Try running it for different set of inputs and see the result

SELECT dbo.SwapDateParts('20140516 16:33:22',DEFAULT,1,DEFAULT,DEFAULT,DEFAULT,DEFAULT) AS MonthSwapped,
dbo.SwapDateParts('20140516 16:33:22',DEFAULT,6,11,DEFAULT,DEFAULT,DEFAULT) AS MonthDaySwapped,
dbo.SwapDateParts('20140516 16:33:22',1980,12,30,DEFAULT,DEFAULT,DEFAULT) AS YearMonthDaySwapped,
dbo.SwapDateParts('20140516 16:33:22',2054,DEFAULT,30,DEFAULT,56,17) AS YearDayMinuteSecondSwapped

See the output which would be the below

 This is a very good utility function that can be applied in similar scenarios to swap out one or more parts of a given datetime value