SSIS Checkpoints
I’m using SSIS for several years now, and I want to shortly cover an SSIS feature which last week help me a lot: using checkpoints in order to recover failed packages. I’m currently developing an ETL solution which requires the import and export of data in and out of a production system on a daily basis, and it is very important to ensure the process only touches the system when required and does not impact on the system performance (or, should I say, impact the least).
What I actually need is to get the daily content of several tables out to text files and I want to be sure that if any of the data flow tasks fail for any reason, I will only re-run this particular data flow and not the other tasks, which may have completed successfully. In comes the checkpoint file.
SSIS can use a checkpoint file in order to save package runtime information (this includes variables values, where the package actually failed etc.) and when the package is re-run after failure, it will start the package from the failed object/task, thus avoiding the work already done and completed successfully. You can configure the checkpoint file location in the package properties tab. you will also need to configure the “checkpoint usage” property where you have three possible settings to choose from: Never: which means the package does not use checkpoints and will restart from the beginning of the workflow, Always: which means the package will always try to restart from the previous failed task, and if the checkpoint file does not exist, it will fail, and IfExists, which tells the SSIS engine to start the package from the previously failed task only if a checkpoint file exists. If not – the package will restart from the beginning of the workflow. You will also need to set the “Save Checkpoints” property to “True” in order to enable the usage of checkpoint files.
And now, for the gotchas…
- Checkpoints are only applicable for the control flow tasks. Within a data flow task, you can not use checkpoints, and SSIS will always start the data flow execution from the beginning.
- The atomic unit of work that can be restarted is any task host container (such as the “execute SQL” task or the “For Loop” task). You can not restart a package from a failed “For Each” task: restarting a package from this task will re-run it from scratch. So, for example, if you’re looping through files in a directory and the package failed in the middle of looping, next time the package runs, it will loop through the files from the beginning. Remember that when you’re configuring checkpoints.
- Package configurations are not re-loaded when a package restarts, It uses the previously loaded configurations, which were saved in the checkpoint files.
- And, last but not least, for each container you want to define as a “restart” point, you need to set the “FailPackageOnFailure” property to “True”. This will ensure SSIS considers this container as a “restarting” point candidate.
Using checkpoints is a slight compromise for me in the above mentioned scenario. What I was really looking for and couldn’t find a solution just yet, is how to recover from a failed data flow task. So, when I’m exporting data to a text file and the task fails, I need a way to recover from that point, so I will not have to export the entire data set again, rather only the data I still need. Some sort of data auditing capabilities, where I can grab and keep the primary key of the data that flows in the data flow would really help. I hope Microsoft is thinking about this as well… Maybe in SQL Server 2011…
-Yaniv
(This is post number 4 for week number 4 of my SQL blog challenge)
Technorati Tags:
SSIS,
Checkpoints