DCSIMG
Slowly Changing Dimension type 2 in SSAS - Ella Maschiach's BI Blog

Ella Maschiach's BI Blog

BI, Olap Technologies, Tools, System Analysis and Design

News

About Ella Maschiach

Business Inteligence

Slowly Changing Dimension type 2 in SSAS

There are quite a few ways of ways of documenting the changes a business entity goes through over time. The most famous of them all (and the most difficult to build) is a Slowly Changing Dimension (SCD) type 2. doing this in a table, you would create a table which would include: an incremental key, the business key, the attributes you'd like to track for that entity, a start date and an end date (for which the information was right) and a flag which would equal 1 when we are looking at the last row available for the member of that entity.

A table for example would be:

 

Surrogate_key

Worker_id

Worker_status

Start_date

End_date

Flag

126

3789

Active

14/06/2006

31/12/2006

0

256

3789

Sick

01/01/2007

 

1

 

 

For more information about SCD, please refer to an article by Margy Ross and Ralph Kimball.

 

So you've done a very bold and difficult move and created a Slowly Changing Dimension type 2. How does that actually manifest itself in the Analysis Services project you're building?

 

Well, they added something new in SQL Server 2005. Now, if a dimension in a cube is an SCD type 2, you may update the Meta Data of the relevant attributes in that dimension. Each attribute in a dimension, has a property called "Type". Attribute types help classify an attribute in terms of business functionality. You may like to map out a few attributes for an SCD type 2. The attributes that need to be mapped and their relevant types are:

The business key (in this example Worker_id) to type: ScdOriginalID

The start date to type: ScdStartDate

The end date to type: ScdEndDate

The flag to type: SCDStatus

 

And updating that should be peanuts for you, after building the SCD type 2 dimension ;P

Comments

furmangg said:

Interesting. I've never used the Type=Scd* options. What are the advantages of doing that? Do any client tools you know of render this information differently if you mark dimension attributes as you specified?

# October 13, 2007 1:25 AM

Ella Maschiach said:

Well, an answer given by Matt Carroll, states that attribute properties are not used by the SSAS server and are only "additional metadata available to clients". A Microsoft Technet article, also states the same (see the "remarks" part in the article).

I admit that I am not currently using a client tool that takes advantage of this, nor do I know of one that does.

Still, adding the attribute type was recommended to me by a Microsoft BI advisor which works with my company. I understood from him that it enhances the performance. Moreover, another MSDN article also states that attribute type is mostly used for client tools, but it also states that some types have a specific meaning for SSAS. It states those types including the SCD attribute types, along with account, time and currency. Last but not least, I would say that in the Adventure Works project in SSAS, they also chose to mention those types for the relevant attributes in the Employee and the Product dimensions.

Hope that helps.

 

# October 14, 2007 1:11 PM

cth said:

Is the cube browser provided with management studio and BIDS able to present the data correctly with SCD type 2 and these meta data attributes set?

# July 11, 2008 12:30 PM

Ella Maschiach said:

Yes, the cube browser in the BIDS and the Management Studio can show you your data correctly (as long as you had defined your attributes well enough). Still, I recommend you use that viewer only as a way for you to check yourself and how you defined your cube. For your user, be sure to use a proper cube browser.

# July 11, 2008 1:46 PM

Peter said:

Hi Ella,

When you say that the cube browser in BIDS shows the data correctly, it means that it will differentiate which dimension record to show depending on the time period being looked at? Is that correct? Furthermore what happens if the time span you are analysing spans two or more possible valid dimension records?

Has anyone tried analysing SCD dimensions via Excel 2007? Do you know if the SCD support is there as well?

Great BLOG btw. :-) Much appreciated

Peter

# July 13, 2008 3:31 PM

Ella Maschiach said:

Hi Peter,

Indeed, the cube browser in the BIDS and the Management Studio, differentiates between the time period being looked at. Let us follow the example of worker 3789 which is given in the post. We make the following assumptions:

  • The relevant rows in the dimension are connected to a fact table containing the man's salary for the years 2006, 2007.
  • The man's salary is 100$ per month.
  • The two rows are the only rows that exist for the worker in the dimension till this date.

If we look at the cube browser for details on the worker according to his status, we should see:

 
Year
Worker Status
2006
2007
Active
600
 
Sick  
1200

 

And so the relevant measures are shown next to the relevant member of the attribute in that time. If I look only at 2006 I would have seen just the member "Active" for the worker's status and the same way, if I look only at 2007 I would have seen just the member "Sick" for the worker's status. As I was looking at the span of two years, I saw the two relevant members for that time. So your assumption is correct.

From a short check I ran on Excel 2007 with my cube containing SCD type 2 dimensions, I see it works perfectly well there. It shows me exactly what my cube browser shows.

Still, I would recommend reading a post by Marco Russo about a problem that may arise from using Excel 2007 against a surrogate key. So you can consider Excel 2007 as your front - end tool, but I would also recommend reading the posts in the Microsoft Excel blog about Analysis Services and how Excel works with it.

... And thank you :)

All the best,

Ella

# July 14, 2008 10:55 AM

Ella Maschiach's BI Blog said:

Slowly Changing Dimension Type 2 Tips & Tricks I've blogged in the past about Slowly Changing

# September 9, 2008 7:43 AM

Constantijn said:

Hi,

I can't get your example working. On SQL, I created a table workers are described in the top. I created a table called salaries. This has the fields ID, Date, Worker_ID, Salary

In SSAS, I imported those two tables into the DSV. Linked them together. Set the primary keys to salaries.ID and workers.Surrogate_key. Made a link from salaries.worker_ID to workers.workers_ID.

Added a server time dimension. Also linked that one to Salaries (not in the DSV but in the cube). Set the types of the worker as specified above.

When I run the cube, I only get Active back not Sick.

What goes wrong ?

Constantijn

# September 19, 2008 9:15 AM

Ella Maschiach said:

Hi Constantijn,

If you'd like, you can contact me through the "Contact" button at the top of the page. I'll try to look at it. Have you looked at the Employee Dimension in the Adventure Works cube to see you have built and connected everything as you should?

All the best,

Ella

# September 20, 2008 3:42 PM

Scott Arbeitman said:

Have many rows do we have in Fact table for this employee? The dimension is changing, not the employee facts, so I would think we have a single entry for this employee.

Then which dimension surrogate do we point to? Do we keep this up to date with the active status?

# January 18, 2009 12:30 PM

Ella Maschiach said:

Hi Scott,

Let's say we have the cube, which at its lowest level is monthly. Well if we look at a few years time, then I think you have a few rows both in the dimension and the fact table for the same worker. If the worker holds a few positions in the company - then we a few rows in the fact table in one month for the same worker. One row in the fact for each position, in each department and the salary the worker got for his work in each position (for instance).
Over time, the worker has also a few rows in the dimension table as well, as we document all the changes he went through during his working time at the company.
Each row in the fact table should be connected to the row in the dimension that applies to that worker, in that time period.

For a real feel of things, I really recommend downloading the Adventure Works Analysis Services sample and looking at the Product Dimension and how it connects to any of the fact tables (such as Internet Sales).

Hope that helped,

Ella

# January 18, 2009 4:06 PM

Alireza said:

Hi, I hope somebody yet read this comments. I have a problem which needs to keep histories on a Type II SCD which also contains a parent-child relationship. I'm really confused how I can do it.

# April 15, 2009 12:53 PM

Ella Maschiach said:

Hi Alireza,

I really reccomend you have a look at the Employee Dimension in the Adventure Works sample cube. It's a parent child with SCD.

All the best,

Ella

# April 15, 2009 6:59 PM

Hans Geurtsen said:

Iedere BI ontwikkelaar zou het Kimball concept van een slowly changing dimension in een dimensionaal

# January 29, 2010 6:09 PM

Erik M. said:

Hi Ella,

I’ve read this article with much interest! I am in the process of designing an SSAS cube and dimensions for employee data that has to keep historical changes, therefore I created it with the Type 2 SCDs. What I am struggling with, is how I would achieve the design in such a way, that I can answer questions like how many employees there were in the company at a random given date, or how many employees started or left the company in a random given month.

From your reply to Scott’s question, I am not really sure what to put in the dimension table and what to put in the fact table. For example all the changes of employee location, department, job title etc, I would put that in the dimension table.  Do you know of any tutorial that describes this, or can you give me some tips?

Thanks, Erik

# February 8, 2010 6:14 PM

Ella Maschiach said:

Hi,

In my cube, I had a monthly fact with keys of employees and positions and percent of salary for that employee in that month for that job. The amount of people working during a month in the company would be a distinct count on the amount of employee business keys (in case an employee can work in more than one unit in the company). The business key may be kept on the fact table as well as on the dimension (though that is specific to each case).

For the amount of people that were added that month, you can create a calculated measure taking the amount of employees of this month and subtracting from it the amount of the previous month. Thing is, you can’t drillthrough on a calculated measure. If you want to see the people who started or finished that month, you have to create 2 new fact tables (one for new employees and one for the leaving) and connect the relevant members from the employee scd dimension to it.

Basically employee attributes are kept in the employee and facts are kept in the fact table.

To learn more I would suggest reading Kimball on the matter and looking at the Adventure Works Cube on what they did for the Employee and Promotion.

HTH,

Ella

# February 9, 2010 1:32 PM

Munish Bansal said:

Hi Ella,

Thanks for a such an invovative post about untouched topic.

But I am still not able to make it understand & use at full. I really don't see any difference in terms of data available while browsing/querying the cube whether I configure the attributes of my Dim (SCD2) like by assigning their types to scd* or not.

Please put some more light on that, probably by giving an example with & without such property settings.

Also pls mention other properties of dim-attributes like keycolumns etc for SCD2 dimension to be set for their proper use.

Thanks again,

Munish Bansal

# April 22, 2010 6:34 PM

Ella Maschiach said:

Hi Munish,

I’m glad you found the post helpful. I can’t say I have other things to add to defining properties of specific attributes in SCD apart from what I mentioned in the post itself.
Configuring the properties should not affect the way you browse your cube.
Please look at the answer I gave Furmangg above.

All the best,

Ella

# April 25, 2010 4:02 PM

Munish Bansal said:

Thanks Ella.

# May 5, 2010 5:33 PM

Erik said:

Hi Ella,

I am setting up a dimension in SSAS from an employee table that is populated according to the SCD Type 2 structure. This table contains information about which division this employee is in, who the manager is, etc. Most of the fields are of changing type SCD2. These are the steps that I take in BIDS:

- Create dimension from dimension table that is populated according to the SCD Type 2 structure

- Set key usage to the surrogate key field

- Build some hierarchies

- Deploy dimension

- Process dimension

It goes wrong in the processing part. It says it has found a duplicate attribute key. I understand why, because when an employee has gone through a lot of changes, there is a record in the dimension table for every change. When I change the key collection for the business key to the business key AND the surrogate key, then I am able to process the dimension. In the dimension browser though, I see a lot of doubles on lower levels in the hierarchy...

In other words, what would be the best way to build up such a dimension in SSAS?

Thanks!

Erik

# June 3, 2010 5:56 PM

Ella Maschiach said:

Hi Erik,

Well when I used the attribute key for my SCD dimension I only used the business key and that worked fine. Let's say we have something of this sort:

 

SurrogateKey

BusinessKey

Name

StatusKey

 StatusDesc

 From

 To

Flag

111 

372  

Dany 

Single 

1/1/2007

31/12/2007

0

123 

372 

Dany 

Married 

1/1/2008

 

1

I would do the following:

Let the Surrogate Key be the dimension key and hide it
Create an attribute of worker with name as name and key as business key and type ScdOriginalID
Create an attribute of status with name as status description and key as status key
Create an attribute of from date with type ScdStartDate
Create an attribute of end date with type ScdEndDate
Create an attribute of flag with SCDStatus

I wouldn't use the surrogate key along with the status key. There's no need for it, and as you described, it would only duplicate the rows in the leaf level. if you're falling in the processing phase, then you should check your data in the tables. Perhaps with the same status key you have two different descriptions? Look closely at the error you get for the select on that attribute and that should point you to your problem in the data.

Last but not least, I would again recommend looking at the Adventure Works Analysis Sample and look at what they did with the Product dimension.

Hope that helped,
Ella

# June 6, 2010 4:25 PM

Erik said:

Hi Ella,

thanks for your reply. The error message that I got during processing is this one:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: '<< TABLE NAME >>', Column: '<< BUSINESS KEY >>', Value: '258'. The attribute is 'Resource'. Errors in the OLAP storage engine: The record was skipped because the attribute key is a duplicate.

So I looked at the data for the resource with business key 258, and found out that there are hardly any differences in the data. The only thing I noticed, is that the column ManagerKey is different every time. This ManagerKey is a foreign key relationship to the same table to the surrogate key of the manager. Maybe that's where it goes wrong. This means anyway that if the manager gets a new record in the database because of changes, the employees do too, because the surrogate key of the manager has been updated. Would it be an idea to use the business key of the manager for the ManagerKey column?

Thanks!

# June 7, 2010 1:24 PM

Ella Maschiach said:

Hi Erik,

You say "there are hardly any differences in the data". There shouldn't be any differences in the data. If you do a select distinct on the key and description, you should get just one name per key and no more. For further information, please check out the following:

social.msdn.microsoft.com/.../1dc59006-b485-41e1-a692-52970fce1658

www.sqldev.org/.../a-duplicate-attribute-key-has-been-found-13316.shtml

www.sqldev.org/.../dimension-attribute-keycolumns-15863.shtml

Hope that helped,

Ella

# June 11, 2010 5:12 PM

Praveen said:

Hi,

I want to design employee dimension(SCD 2) like this.

Key EmpId EmpName Status FromDate  ToDate    Flag

1   2345  John  A 1/1/2009  6/1/2009  0

2   2345  John  S 6/2/2009  12/1/2009 0

3   2345  John  D 12/2/2009 3/1/2010  0

4   2345  John  A 4/1/2010  1/1/2099  1

When the user select the data range, lets say 1/1/2009 to 1/1/2010, we need to show employee status as D only. How can we do it ?

# July 29, 2010 7:14 AM

Ella Maschiach said:

Hi Parveen,

Right now, the only thing that pops to mind is a calculated member that takes the last non empty status for that time. Look at the following link for more detail:

sqlblog.com/.../mdx-and-partitioning.aspx

social.msdn.microsoft.com/.../d09deb84-4d54-49ff-a58b-db7039e4685e

Hope that helped,

Ella

# August 1, 2010 1:51 PM

Vladimir said:

Hi Ella.

I have the same problem with Erik. I’ll try to explain reasons.

We have

ScdProductId AlternateProductId SubCategory Category

1 50 bread food

2 100 milk food

And we have attribute relationship:

ScdProductId->AlternateProductId->SubCategory->

Category

AlternateProductId many to one with SubCategory

And we have one new row ScdProductId = 3  

ScdProductId AlternateProductId SubCategory Category

1 50 bread food

2 100 milk food

3 100 drink food

AlternateProductId is the same = 100 but SubCategory has changed.

After processing we get Error

“A duplicate attribute key has been found when processing: Column: 'AlternateProductId , Value: 100”.

AlternateProductId have two different parents milk and drink and it is wrong

because of attribute relationship AlternateProductId -> SubCategory. (many to one ).

I see than in Product Dimension in the Adventure Works we haven’t AlternateProductId at all.

And we don’t get such Errors. But we also haven’t possibility to group sales by AlternateProductId(this is requirement of our project)

What is the best way of modeling (Especially

attribute relationship and hierarchy)

in this case ?.

# August 8, 2010 8:48 PM

Ella Maschiach said:

Hi Vladimir,

As I have explained to Erik, you cannot have two different names for the same key – only one. If you’d like to see the change of subcategory for a product over time, please create a different table for it and connect it to the product table to view it all in the same dimension. This is also implemented in Adventure Works, so please follow their guide lines if you’re not sure how to do it.

All the best,

Ella

# August 9, 2010 10:23 AM

Erik said:

Ella,

I think what Vladimir means is that the product with the business key 100 changes from the subcategory "milk" to the subcategory "drink" (same example that I have when an employee changes departments, managers, locations, etc.). In his case, I'd think he wants to show the dimension like this when he browses:

Food

- Bread

 - 50

   - 1

- Milk

 - 100

   - 2

- Drink

 - 100

   - 3

In other words, the product with business key 100 should appear under milk, but also under drink.

# September 16, 2010 3:40 PM

Erik said:

Hi Ella,

in addition to my post from yesterday, I have added this situation a bit more extensively on my blog: www.erikmonchen.com

Thanks.

# September 17, 2010 11:51 AM

Ella Maschiach said:

Hi Erik,

I’ve looked at the problem again.
Please try following these links:

Describes a problem quite similar to your own:
social.msdn.microsoft.com/.../9f1773d5-9693-483f-b913-77b880d772f5

Recommends using dimension health check through bids helper:
social.msdn.microsoft.com/.../1bf4e5a1-b9ed-4235-9ca5-3d3767d9a5e2

Tomislav Piasevoli gives a detailed answer:
social.msdn.microsoft.com/.../91086c77-b399-40e8-b323-c5e5ded938d1

Another detailed answer from Hilmar Buchta:
ms-olap.blogspot.com/.../duplicate-attribute-key-has-been-found.html  

Please review the posts, as I believe they will cover the answer.

All the best,
Ella

# September 19, 2010 4:53 PM

Srivatsav Meda said:

Ella,

Thank you for the post explaning the type 2 SCD in SSAS. I have one question. After we define the type attribute in dimension, how to link the dimension with the time. Will this be on SCD dates (start or end).

# December 14, 2010 6:59 PM

Ella Maschiach said:

Hi Srivatsav,

There’s no need for you to connect the dimension to a time dimension. You connect it to the relevant fact table and that fact table is connected to a time dimension. The SCD dimension relates to the fact through the SK. The time dimension relates to the fact table through the time key. That should be enough for your reports on top of the cube.

Hope that helps,

Ella

# December 15, 2010 10:26 AM

David Scott said:

Hi Ella,

I hope you are the right person to ask.  Having read your excellent thread above you seem to be an expert so I am hoping you can help me!

I have taken on a BI project using SSAS 2008 and I am struggling to see how best to approach something.  

I have a complex multihierarchy dimension for clients here at the bank.  I have modelled the hierarchy in the warehouse with a single static dimension table called tDimClient and then I have snowflaked out the various hierarchies into seperate single tables.  Some also have SCD type 2 attributes!  E.g.

tDimClient.Client_Id links to the fact tables and has name attribute Client_Name

tDimClient.Client_Id -> links to -> tDimRelationship.Client_Id

tDimRelationship has additional key field Valid_From_date and attribute Relationship_Name

tDimClient.Client_Id -> links to ->  tDimGeography.Client_Id

tDimGeography has additional key field Valid_From_date and attributes City, Country, Region

tDimClient.Client_Id -> links to -> tDimAccountManager.Client_Id

tDimAccountManager has additional key field Valid_From_date and attributes Team, Department, Company

What I am trying to do is build a single Client dimension with all 4 tables and then have multiple hierarchies defined, all hanging off the lowest level tDimClient.Client_id.

My qeustion is, is this too ambitious?!  I have difficulty defining the dimension with just two of the tables (tDimClient -> tDimRelationship) and I get an error saying "the tDimClientRelationship table that is required for a join cannot be reached on the relationships in the data source view".  I assume this is because either SSAS or I am having problems understanding the implied Many-to-many relationship between a bottom level dimenstion table and a snowflaked SCD (even though it won't be a many to many once the Valid To constraint is applied - I.e. a client will only ever have one relationship at a given time, but potentially multiple over time).

So I guess in summary there are two questions.

1. Am I asking too much of SSAS in trying to build a single dimension with multiple snowflaked hierarchies with SCDs on the outrigged tables.

2. If not - Do you know what the problem could be with the error above?

Many thanks for any advice you can give me!

# October 12, 2011 5:03 PM

Ella Maschiach said:

Hi David,

Seems like a pretty big project!...
SSAS shouldn't have a problem building an SCD from a few tables. If you need an example, please look at the Product dimension in Adventure Works, as it is an SCD.

I would like to note that you don't have a many to many relationship between the tables but a one to many. Each customer may have different attributes over time, but each attribute will connect just to one customer.

I tried looking at the error message on google, but didn't find anything. Did you check that all the client IDs connect to all the attributes and you're not missing rows in the join? If that doesn't work, please try asking at the forum: social.msdn.microsoft.com/.../threads

All the best,
Ella

# October 16, 2011 3:56 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: