Slowly Changing Dimension type 2 in SSAS

September 8, 2007

There are quite a few ways of ways of documenting the changes a business entity goes through over time. The most famous of them all (and the most difficult to build) is a Slowly Changing Dimension (SCD) type 2. doing this in a table, you would create a table which would include: an incremental key, the business key, the attributes you’d like to track for that entity, a start date and an end date (for which the information was right) and a flag which would equal 1 when we are looking at the last row available for the member of that entity.


A table for example would be:


 

























Surrogate_key


Worker_id


Worker_status


Start_date


End_date


Flag


126


3789


Active


14/06/2006


31/12/2006


0


256


3789


Sick


01/01/2007


 


1


 


 


For more information about SCD, please refer to an article by Margy Ross and Ralph Kimball.


 


So you’ve done a very bold and difficult move and created a Slowly Changing Dimension type 2. How does that actually manifest itself in the Analysis Services project you’re building?


 


Well, they added something new in SQL Server 2005. Now, if a dimension in a cube is an SCD type 2, you may update the Meta Data of the relevant attributes in that dimension. Each attribute in a dimension, has a property called “Type”. Attribute types help classify an attribute in terms of business functionality. You may like to map out a few attributes for an SCD type 2. The attributes that need to be mapped and their relevant types are:


The business key (in this example Worker_id) to type: ScdOriginalID


The start date to type: ScdStartDate


The end date to type: ScdEndDate


The flag to type: SCDStatus


 


And updating that should be peanuts for you, after building the SCD type 2 dimension ;P

Add comment
facebook linkedin twitter email

Leave a Reply to Ella Maschiach Cancel Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

32 comments

  1. furmanggOctober 13, 2007 ב 1:25

    Interesting. I’ve never used the Type=Scd* options. What are the advantages of doing that? Do any client tools you know of render this information differently if you mark dimension attributes as you specified?

    Reply
  2. Ella MaschiachOctober 14, 2007 ב 13:11

    Well, an answer given by Matt Carroll, states that attribute properties are not used by the SSAS server and are only “additional metadata available to clients”. A Microsoft Technet article, also states the same (see the “remarks” part in the article).

    I admit that I am not currently using a client tool that takes advantage of this, nor do I know of one that does.

    Still, adding the attribute type was recommended to me by a Microsoft BI advisor which works with my company. I understood from him that it enhances the performance. Moreover, another MSDN article also states that attribute type is mostly used for client tools, but it also states that some types have a specific meaning for SSAS. It states those types including the SCD attribute types, along with account, time and currency. Last but not least, I would say that in the Adventure Works project in SSAS, they also chose to mention those types for the relevant attributes in the Employee and the Product dimensions.

    Hope that helps.

     

    Reply
  3. cthJuly 11, 2008 ב 12:30

    Is the cube browser provided with management studio and BIDS able to present the data correctly with SCD type 2 and these meta data attributes set?

    Reply
  4. Ella MaschiachJuly 11, 2008 ב 13:46

    Yes, the cube browser in the BIDS and the Management Studio can show you your data correctly (as long as you had defined your attributes well enough). Still, I recommend you use that viewer only as a way for you to check yourself and how you defined your cube. For your user, be sure to use a proper cube browser.

    Reply
  5. PeterJuly 13, 2008 ב 15:31

    Hi Ella,

    When you say that the cube browser in BIDS shows the data correctly, it means that it will differentiate which dimension record to show depending on the time period being looked at? Is that correct? Furthermore what happens if the time span you are analysing spans two or more possible valid dimension records?

    Has anyone tried analysing SCD dimensions via Excel 2007? Do you know if the SCD support is there as well?

    Great BLOG btw. 🙂 Much appreciated

    Peter

    Reply
  6. Ella MaschiachJuly 14, 2008 ב 10:55

    Hi Peter,

    Indeed, the cube browser in the BIDS and the Management Studio, differentiates between the time period being looked at. Let us follow the example of worker 3789 which is given in the post. We make the following assumptions:

    • The relevant rows in the dimension are connected to a fact table containing the man’s salary for the years 2006, 2007.
    • The man’s salary is 100$ per month.
    • The two rows are the only rows that exist for the worker in the dimension till this date.

    If we look at the cube browser for details on the worker according to his status, we should see:

     
    Year
    Worker Status
    2006
    2007
    Active
    600
     
    Sick  
    1200

     

    And so the relevant measures are shown next to the relevant member of the attribute in that time. If I look only at 2006 I would have seen just the member “Active” for the worker’s status and the same way, if I look only at 2007 I would have seen just the member “Sick” for the worker’s status. As I was looking at the span of two years, I saw the two relevant members for that time. So your assumption is correct.

    From a short check I ran on Excel 2007 with my cube containing SCD type 2 dimensions, I see it works perfectly well there. It shows me exactly what my cube browser shows.

    Still, I would recommend reading a post by Marco Russo about a problem that may arise from using Excel 2007 against a surrogate key. So you can consider Excel 2007 as your front – end tool, but I would also recommend reading the posts in the Microsoft Excel blog about Analysis Services and how Excel works with it.

    … And thank you 🙂

    All the best,

    Ella

    Reply
  7. ConstantijnSeptember 19, 2008 ב 9:15

    Hi,

    I can’t get your example working. On SQL, I created a table workers are described in the top. I created a table called salaries. This has the fields ID, Date, Worker_ID, Salary

    In SSAS, I imported those two tables into the DSV. Linked them together. Set the primary keys to salaries.ID and workers.Surrogate_key. Made a link from salaries.worker_ID to workers.workers_ID.
    Added a server time dimension. Also linked that one to Salaries (not in the DSV but in the cube). Set the types of the worker as specified above.

    When I run the cube, I only get Active back not Sick.

    What goes wrong ?

    Constantijn

    Reply
  8. Ella MaschiachSeptember 20, 2008 ב 15:42

    Hi Constantijn,

    If you’d like, you can contact me through the “Contact” button at the top of the page. I’ll try to look at it. Have you looked at the Employee Dimension in the Adventure Works cube to see you have built and connected everything as you should?

    All the best,
    Ella

    Reply
  9. Scott ArbeitmanJanuary 18, 2009 ב 12:30

    Have many rows do we have in Fact table for this employee? The dimension is changing, not the employee facts, so I would think we have a single entry for this employee.

    Then which dimension surrogate do we point to? Do we keep this up to date with the active status?

    Reply
  10. Ella MaschiachJanuary 18, 2009 ב 16:06

    Hi Scott,

    Let’s say we have the cube, which at its lowest level is monthly. Well if we look at a few years time, then I think you have a few rows both in the dimension and the fact table for the same worker. If the worker holds a few positions in the company – then we a few rows in the fact table in one month for the same worker. One row in the fact for each position, in each department and the salary the worker got for his work in each position (for instance).
    Over time, the worker has also a few rows in the dimension table as well, as we document all the changes he went through during his working time at the company.
    Each row in the fact table should be connected to the row in the dimension that applies to that worker, in that time period.

    For a real feel of things, I really recommend downloading the Adventure Works Analysis Services sample and looking at the Product Dimension and how it connects to any of the fact tables (such as Internet Sales).

    Hope that helped,

    Ella

    Reply
  11. AlirezaApril 15, 2009 ב 12:53

    Hi, I hope somebody yet read this comments. I have a problem which needs to keep histories on a Type II SCD which also contains a parent-child relationship. I’m really confused how I can do it.

    Reply
  12. Ella MaschiachApril 15, 2009 ב 18:59

    Hi Alireza,

    I really reccomend you have a look at the Employee Dimension in the Adventure Works sample cube. It’s a parent child with SCD.

    All the best,

    Ella

    Reply
  13. Erik M.February 8, 2010 ב 18:14

    Hi Ella,
    I’ve read this article with much interest! I am in the process of designing an SSAS cube and dimensions for employee data that has to keep historical changes, therefore I created it with the Type 2 SCDs. What I am struggling with, is how I would achieve the design in such a way, that I can answer questions like how many employees there were in the company at a random given date, or how many employees started or left the company in a random given month.
    From your reply to Scott’s question, I am not really sure what to put in the dimension table and what to put in the fact table. For example all the changes of employee location, department, job title etc, I would put that in the dimension table. Do you know of any tutorial that describes this, or can you give me some tips?
    Thanks, Erik

    Reply
  14. Ella MaschiachFebruary 9, 2010 ב 13:32

    Hi,

    In my cube, I had a monthly fact with keys of employees and positions and percent of salary for that employee in that month for that job. The amount of people working during a month in the company would be a distinct count on the amount of employee business keys (in case an employee can work in more than one unit in the company). The business key may be kept on the fact table as well as on the dimension (though that is specific to each case).

    For the amount of people that were added that month, you can create a calculated measure taking the amount of employees of this month and subtracting from it the amount of the previous month. Thing is, you can’t drillthrough on a calculated measure. If you want to see the people who started or finished that month, you have to create 2 new fact tables (one for new employees and one for the leaving) and connect the relevant members from the employee scd dimension to it.

    Basically employee attributes are kept in the employee and facts are kept in the fact table.

    To learn more I would suggest reading Kimball on the matter and looking at the Adventure Works Cube on what they did for the Employee and Promotion.

    HTH,
    Ella

    Reply
  15. Munish BansalApril 22, 2010 ב 18:34

    Hi Ella,

    Thanks for a such an invovative post about untouched topic.

    But I am still not able to make it understand & use at full. I really don’t see any difference in terms of data available while browsing/querying the cube whether I configure the attributes of my Dim (SCD2) like by assigning their types to scd* or not.

    Please put some more light on that, probably by giving an example with & without such property settings.
    Also pls mention other properties of dim-attributes like keycolumns etc for SCD2 dimension to be set for their proper use.

    Thanks again,
    Munish Bansal

    Reply
  16. Ella MaschiachApril 25, 2010 ב 16:02

    Hi Munish,

    I’m glad you found the post helpful. I can’t say I have other things to add to defining properties of specific attributes in SCD apart from what I mentioned in the post itself.
    Configuring the properties should not affect the way you browse your cube.
    Please look at the answer I gave Furmangg above.

    All the best,

    Ella

    Reply
  17. ErikJune 3, 2010 ב 17:56

    Hi Ella,
    I am setting up a dimension in SSAS from an employee table that is populated according to the SCD Type 2 structure. This table contains information about which division this employee is in, who the manager is, etc. Most of the fields are of changing type SCD2. These are the steps that I take in BIDS:

    – Create dimension from dimension table that is populated according to the SCD Type 2 structure
    – Set key usage to the surrogate key field
    – Build some hierarchies
    – Deploy dimension
    – Process dimension

    It goes wrong in the processing part. It says it has found a duplicate attribute key. I understand why, because when an employee has gone through a lot of changes, there is a record in the dimension table for every change. When I change the key collection for the business key to the business key AND the surrogate key, then I am able to process the dimension. In the dimension browser though, I see a lot of doubles on lower levels in the hierarchy…

    In other words, what would be the best way to build up such a dimension in SSAS?

    Thanks!
    Erik

    Reply
  18. Ella MaschiachJune 6, 2010 ב 16:25

    Hi Erik,

    Well when I used the attribute key for my SCD dimension I only used the business key and that worked fine. Let’s say we have something of this sort:

     

    SurrogateKey

    BusinessKey

    Name

    StatusKey

     StatusDesc

     From

     To

    Flag

    111 

    372  

    Dany 

    Single 

    1/1/2007

    31/12/2007

    0

    123 

    372 

    Dany 

    Married 

    1/1/2008

     

    1

    I would do the following:

    Let the Surrogate Key be the dimension key and hide it
    Create an attribute of worker with name as name and key as business key and type ScdOriginalID
    Create an attribute of status with name as status description and key as status key
    Create an attribute of from date with type ScdStartDate
    Create an attribute of end date with type ScdEndDate
    Create an attribute of flag with SCDStatus

    I wouldn’t use the surrogate key along with the status key. There’s no need for it, and as you described, it would only duplicate the rows in the leaf level. if you’re falling in the processing phase, then you should check your data in the tables. Perhaps with the same status key you have two different descriptions? Look closely at the error you get for the select on that attribute and that should point you to your problem in the data.

    Last but not least, I would again recommend looking at the Adventure Works Analysis Sample and look at what they did with the Product dimension.

    Hope that helped,
    Ella

    Reply
  19. ErikJune 7, 2010 ב 13:24

    Hi Ella,
    thanks for your reply. The error message that I got during processing is this one:

    Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘< < TABLE NAME >>’, Column: ‘< < BUSINESS KEY >>’, Value: ‘258’. The attribute is ‘Resource’. Errors in the OLAP storage engine: The record was skipped because the attribute key is a duplicate.

    So I looked at the data for the resource with business key 258, and found out that there are hardly any differences in the data. The only thing I noticed, is that the column ManagerKey is different every time. This ManagerKey is a foreign key relationship to the same table to the surrogate key of the manager. Maybe that’s where it goes wrong. This means anyway that if the manager gets a new record in the database because of changes, the employees do too, because the surrogate key of the manager has been updated. Would it be an idea to use the business key of the manager for the ManagerKey column?

    Thanks!

    Reply
  20. Ella MaschiachJune 11, 2010 ב 17:12

    Hi Erik,

    You say “there are hardly any differences in the data”. There shouldn’t be any differences in the data. If you do a select distinct on the key and description, you should get just one name per key and no more. For further information, please check out the following:

    social.msdn.microsoft.com/…/1dc59006-b485-41e1-a692-52970fce1658

    http://www.sqldev.org/…/a-duplicate-attribute-key-has-been-found-13316.shtml

    http://www.sqldev.org/…/dimension-attribute-keycolumns-15863.shtml

    Hope that helped,

    Ella

    Reply
  21. PraveenJuly 29, 2010 ב 7:14

    Hi,
    I want to design employee dimension(SCD 2) like this.

    Key EmpId EmpName Status FromDate ToDate Flag
    1 2345 John A 1/1/2009 6/1/2009 0
    2 2345 John S 6/2/2009 12/1/2009 0
    3 2345 John D 12/2/2009 3/1/2010 0
    4 2345 John A 4/1/2010 1/1/2099 1

    When the user select the data range, lets say 1/1/2009 to 1/1/2010, we need to show employee status as D only. How can we do it ?

    Reply
  22. Ella MaschiachAugust 1, 2010 ב 13:51

    Hi Parveen,

    Right now, the only thing that pops to mind is a calculated member that takes the last non empty status for that time. Look at the following link for more detail:

    sqlblog.com/…/mdx-and-partitioning.aspx

    social.msdn.microsoft.com/…/d09deb84-4d54-49ff-a58b-db7039e4685e

    Hope that helped,

    Ella

    Reply
  23. VladimirAugust 8, 2010 ב 20:48

    Hi Ella.

    I have the same problem with Erik. I’ll try to explain reasons.
    We have
    ScdProductId AlternateProductId SubCategory Category
    1 50 bread food
    2 100 milk food

    And we have attribute relationship:
    ScdProductId->AlternateProductId->SubCategory->
    Category

    AlternateProductId many to one with SubCategory

    And we have one new row ScdProductId = 3

    ScdProductId AlternateProductId SubCategory Category
    1 50 bread food
    2 100 milk food
    3 100 drink food

    AlternateProductId is the same = 100 but SubCategory has changed.

    After processing we get Error
    “A duplicate attribute key has been found when processing: Column: ‘AlternateProductId , Value: 100”.

    AlternateProductId have two different parents milk and drink and it is wrong
    because of attribute relationship AlternateProductId -> SubCategory. (many to one ).

    I see than in Product Dimension in the Adventure Works we haven’t AlternateProductId at all.
    And we don’t get such Errors. But we also haven’t possibility to group sales by AlternateProductId(this is requirement of our project)

    What is the best way of modeling (Especially
    attribute relationship and hierarchy)
    in this case ?.

    Reply
  24. Ella MaschiachAugust 9, 2010 ב 10:23

    Hi Vladimir,

    As I have explained to Erik, you cannot have two different names for the same key – only one. If you’d like to see the change of subcategory for a product over time, please create a different table for it and connect it to the product table to view it all in the same dimension. This is also implemented in Adventure Works, so please follow their guide lines if you’re not sure how to do it.

    All the best,
    Ella

    Reply
  25. ErikSeptember 16, 2010 ב 15:40

    Ella,
    I think what Vladimir means is that the product with the business key 100 changes from the subcategory “milk” to the subcategory “drink” (same example that I have when an employee changes departments, managers, locations, etc.). In his case, I’d think he wants to show the dimension like this when he browses:

    Food
    – Bread
    – 50
    – 1
    – Milk
    – 100
    – 2
    – Drink
    – 100
    – 3

    In other words, the product with business key 100 should appear under milk, but also under drink.

    Reply
  26. ErikSeptember 17, 2010 ב 11:51

    Hi Ella,
    in addition to my post from yesterday, I have added this situation a bit more extensively on my blog: http://www.erikmonchen.com

    Thanks.

    Reply
  27. Ella MaschiachSeptember 19, 2010 ב 16:53

    Hi Erik,

    I’ve looked at the problem again.
    Please try following these links:

    Describes a problem quite similar to your own:
    social.msdn.microsoft.com/…/9f1773d5-9693-483f-b913-77b880d772f5

    Recommends using dimension health check through bids helper:
    social.msdn.microsoft.com/…/1bf4e5a1-b9ed-4235-9ca5-3d3767d9a5e2

    Tomislav Piasevoli gives a detailed answer:
    social.msdn.microsoft.com/…/91086c77-b399-40e8-b323-c5e5ded938d1

    Another detailed answer from Hilmar Buchta:
    ms-olap.blogspot.com/…/duplicate-attribute-key-has-been-found.html  

    Please review the posts, as I believe they will cover the answer.

    All the best,
    Ella

    Reply
  28. Srivatsav MedaDecember 14, 2010 ב 18:59

    Ella,

    Thank you for the post explaning the type 2 SCD in SSAS. I have one question. After we define the type attribute in dimension, how to link the dimension with the time. Will this be on SCD dates (start or end).

    Reply
  29. Ella MaschiachDecember 15, 2010 ב 10:26

    Hi Srivatsav,

    There’s no need for you to connect the dimension to a time dimension. You connect it to the relevant fact table and that fact table is connected to a time dimension. The SCD dimension relates to the fact through the SK. The time dimension relates to the fact table through the time key. That should be enough for your reports on top of the cube.

    Hope that helps,
    Ella

    Reply
  30. David ScottOctober 12, 2011 ב 17:03

    Hi Ella,

    I hope you are the right person to ask.  Having read your excellent thread above you seem to be an expert so I am hoping you can help me!

    I have taken on a BI project using SSAS 2008 and I am struggling to see how best to approach something.  

    I have a complex multihierarchy dimension for clients here at the bank.  I have modelled the hierarchy in the warehouse with a single static dimension table called tDimClient and then I have snowflaked out the various hierarchies into seperate single tables.  Some also have SCD type 2 attributes!  E.g.

    tDimClient.Client_Id links to the fact tables and has name attribute Client_Name

    tDimClient.Client_Id -> links to -> tDimRelationship.Client_Id

    tDimRelationship has additional key field Valid_From_date and attribute Relationship_Name

    tDimClient.Client_Id -> links to ->  tDimGeography.Client_Id

    tDimGeography has additional key field Valid_From_date and attributes City, Country, Region

    tDimClient.Client_Id -> links to -> tDimAccountManager.Client_Id

    tDimAccountManager has additional key field Valid_From_date and attributes Team, Department, Company

    What I am trying to do is build a single Client dimension with all 4 tables and then have multiple hierarchies defined, all hanging off the lowest level tDimClient.Client_id.

    My qeustion is, is this too ambitious?!  I have difficulty defining the dimension with just two of the tables (tDimClient -> tDimRelationship) and I get an error saying "the tDimClientRelationship table that is required for a join cannot be reached on the relationships in the data source view".  I assume this is because either SSAS or I am having problems understanding the implied Many-to-many relationship between a bottom level dimenstion table and a snowflaked SCD (even though it won’t be a many to many once the Valid To constraint is applied – I.e. a client will only ever have one relationship at a given time, but potentially multiple over time).

    So I guess in summary there are two questions.

    1. Am I asking too much of SSAS in trying to build a single dimension with multiple snowflaked hierarchies with SCDs on the outrigged tables.

    2. If not – Do you know what the problem could be with the error above?

    Many thanks for any advice you can give me!

    Reply
  31. Ella MaschiachOctober 16, 2011 ב 15:56

    Hi David,

    Seems like a pretty big project!…
    SSAS shouldn’t have a problem building an SCD from a few tables. If you need an example, please look at the Product dimension in Adventure Works, as it is an SCD.

    I would like to note that you don’t have a many to many relationship between the tables but a one to many. Each customer may have different attributes over time, but each attribute will connect just to one customer.

    I tried looking at the error message on google, but didn’t find anything. Did you check that all the client IDs connect to all the attributes and you’re not missing rows in the join? If that doesn’t work, please try asking at the forum: social.msdn.microsoft.com/…/threads

    All the best,
    Ella

    Reply