Tuesday, December 16, 2014

T-SQL Tips: Customizing CDC for Oracle Service Implementation in SQLServer

The purpose of this blog is to explain a quick tip which you can implement to customize the default functionality of CDC for Oracle service in SQLServer to help you overcome the limitation posed by the default deployment script.

Business Case
Recently in one of projects CDC for Oracle was implemented and over a weekend there was some failures with our ETL processes due to some spurious data coming from the source. This prompted us to clear the residual data for the week and rerun the jobs. We had CDC as the first layer which would pull data from the source Oracle instances to the SQLServer  instance where datawarehouse resided. Due to some data issues some of ETL had failed over the weekend so we were trying for a rerun of the jobs. To our surprise we were not getting any records for the jobs which were rerun. On analysis we found CDC tables to be empty. This was not certainly not expected by us.

On analysis the reason was found as follows.
By default the data retention period for CDC was set as 3 days by the deployment script which is why the data was getting purged by the CDC cleanup job so soon. In our case this retention period was less as in case of a long weekend by the time we realize the job failure we may not have data available in CDC tables for the failed days due to the short retention period

The obvious solution to the above issue was to increase the data retention period. There's a system stored procedure which can be used for this purpose which is sp_cdc_Change_job


The procedure has a parameter called retention which can be passed to override the default value set by the deployment script. The parameter designates the value in minutes that would be set for the data retention period. So you can set value as per your requirement. So in our case I set it to 7200 (5 * 24 * 60)  to make CDC retain the change data for 5 days. For this the complete sql command would like below

EXEC sys.sp_cdc_change_job @job_type= 'cleanup',@retention = 7200

Now some details on sp_cdc_change_job sp. The procedure has a set of parameters which can be used to override the default settings of CDC jobs. Some of the parameter we may be interested in addition to the above are

@pollinginterval - This would indicate the interval at which the Oracle log has to be polled to capture the changes. This has to be used in conjunction with the @continuos parameter ie only when @continuos = 1 then polling interval can be set as this indicates the polling has to be done continuously rather than as a one time activity

@threshold - This will indicate the number of entries that can be deleted using a given statement

@maxtrans - This indicates the maximum number of transactions that can be processed in a single cycle.

All these parameters can be utilized to override default settings of CDC service as per your requirement

The current configured values for the above parameters for CDC instance can be viewed by querying the 
dbo.cdc_jobs table