Slowly Changing Dimension Type 2 Tips & Tricks
I’ve blogged in the past about Slowly Changing Dimension Type 2, and I see that it’s a subject that really interests you. I remember quite distinctly how hard and demanding it was to create it for my project (trust me I do!) and even to do it for two business entities. So, hoping that I may assist you even slightly, I thought I may share with you a few thoughts I have on the matter:
Do you really need it? Now I know this is going to sound quite obvious, but before you decide to take on such a demanding task, you should really ask yourself what you can take away from it. Does your user have a proven ROI from using the development over time in those areas for that entity? Does he have meaningful insights he can take away from comprising how the entity evolved over time? Only if the answer is “Yes” to both questions then you should go into the development of SCD type 2.
What are you going to track? Sometimes, your users can get carried away with themselves… it happens to the best, no doubt, whether it’s because they’re enthusiastic, or because they thinks bigger is better. No matter what the reason, sometimes your work as a System Analyst is to contain them. Now I’m not saying we’re not here to serve our users, but there are those time when they request things simply because they can, and not because they need them. Because development of Slowly Changing Dimension type 2 can be quite complicated, you should try to contain the amount of attributes you follow, as the more attributes you try to follow, the more problems may arise in development. I think it’d be fair to say that for the entities in my project only 4 are meaningful, though I found myself required to track more… Don’t forget, you can always snow – flake your entity, distinguishing between the attributes that will be tracked over time, and those that will stay constant or be overwritten.
SCD in SSIS – (this part is written with the contribution of Hamada Kais and Boris Kogan – thank you for your insight!)
Yes, you do have in ETL a component meant to track the way your entity has evolved over time, but…
- You have to ask yourself – what sort of data are you about to receive? Will you only get the new rows concerning your data or are you always going to get the entire history from the operational DB? The Slowly Changing Dimension Transformation component in SSIS is expecting to receive only new rows concerning the attributes of your business entity.
- The Slowly Changing Dimension Transformation component in SSIS only tracks changes from here onwards. If you need to build the SCD type 2 dimension for your business entity from its past, then you may need to use a stored procedure in T-SQL.
- SSIS only works with comparisons based on “equal” (=). If you’re looking to make a comparison that uses a “between” or even only “less than and equal to” (<=) or “more than and equal to” (=>), you may need to customize your SSIS solution.
- For very large SCD dimensions, the Slowly Changing Dimension component in SSIS may not perform as well as using conditional transforms and lookups on filtered data. You should consider looking into the Table Difference component developed by Alberto Ferrari to help you there.
In any case, be prepared to do a lot of “cleansing” work. There’s nothing like the QA that goes into checking the Slowly Changing Dimension to help you find all sorts of mistakes that may occur in the OLTP DB.
Still I have to admit, after you go through it all and emerge on the other side, there is nothing like the development of a Slowly Changing Dimension type 2 to make you feel like you’ve come a long way in BI development.