Wednesday, April 10, 2013

Implementing Checkpoints in SSIS

Checkpoints indicate a consistent state of a database. This is essential for database to perform actions like writing to log file, truncating log etc.
The concept of checkpoints have also been implemented in SSIS packages. Checkpoints will indicate a saved consistent state in package sequence which can be utilized in quite a few scenarios. This blog discusses one such practical scenario.
Consider the case where you've a SSIS package which involves a step which is resource intensive. This would obviously be a time consuming step. Imagine an intermediate failure in package after this intensive step.Rerunning the step would require again spending same amount of time which may not always be desirable.In such cases adding checkpoint support can always avoid rerunning these intensive steps again due to a failure in one of the succeeding steps. Checkpoint will ensure the packages last consistent state (last task which was success) is stored and any further execution will only start from last failed step.
 The example package for the illustration will be as follows

The package consists of two data flow tasks followed by execute sql task. The data flow tasks look like below
The first data flow task as shown above consists of flat file source to transfer data from text file to sql server table using OLEDB destination.
The second data flow task will  get product details and then retrieves associated order details using a lookup task and populated result to final table

The flat file will have data as follows

The product table will have following entries

As you see from above there are few products without any associated orders like ones with ID value 1 and 6. To mimic an error situation i've set the lookup nomatch property as below for the lookup task in the second data flow

This will ensure the task will fail when it doesn't find matching records for products with id values 1 and 6. This can be utilized for illustrating effect of checkpoint.
For enabling checkpoint support in the package we need to set following properties

  • CheckpointFileName should be pointed to valid filepath. This will contain checkpoint information as a xml file
  • CheckpointUsage should be set to IfExists
  • SaveCheckpoints to true

For all the other tasks involved FailParentOnFailure property should be set to true and for the outer container FailPackageOnFailure should be set to true.
Now lets try running the package. The package progress will be as follows
And we will get error information as expected telling that lookup failed
Now lets go ahead and fix the error by changing lookup property to redirect row instead of fail component

Lets now try rerunning the package and we will be amazed to see the result which will be as below

This was not something we expected as it completely ignored the second data flow task which failed in previous run. The reason lies in the previous execution result which is as below
As you see checkpoint has recorded the task as completed which is why it was ignored in the rerun. This is because the failure happened midway in the data flow task.This can be handled by wrapping the data flow inside a transaction. This is done by setting value as Required for TransactionOption property of data flow task. This will make sure data flow task will always initiate a transaction and will record result as failure unless it goes to completion successfully till the last step within it.
One more thing to keep in mind is to start the distributed transaction coordinator service if you want to use transaction from the local services of your machine (services.msc from command prompt).

After wrapping it in transaction and getting the earlier failure if you rerun you'll get below

As you see from above, it now resumes correctly from the last failed step and goes to completion. The reason is now data flow task started a brand new transaction each time and any intermediate failure will cause entire data flow to be failed and setting status accordingly in checkpoint file.
This is something you can apply in similar scenarios where you don't want to rerun some intensive high volume steps due to a failure happening on a succeeding task.
So to summarize the below are things you need to keep in mind for implementing checkpoints in SSIS

  • Set the Checkpoint related properties for package as outlined above
  • Set FailPackageOnFailure property for individual tasks and FailPackageOnFailure for containers
  • Set TransactionOption property to Required for all data flow tasks and start DTC service
Hope this post would help you to successfully implement checkpoints in ssis package.
As always do let me know of any feedback or any clarification you may have on the above.

The table scripts,sample file and package are available at the below link

Change the connection settings and setup the sample file in the required folder and then you will be able to run and test the package.Also ensure MSDTC is running as package is using transaction.

No comments:

Post a Comment