September 2007 - Posts
You can browse a cube through the cube browser in Analysis Services or through the one in the Management Studio. I usually use the one in the Management Studio, as it gives your more space to look. In the cube I created in Analysis Services, I added a Budget hierarchy in my Budget dimension. I decided to check the data for the new hierarchy in comparison to the money that was assigned to it in the Budget fact table. And it was problematic…I was opening the Budget hierarchy and in the first 3 levels it reacted pretty quickly, but at level 4 (and I have about 9 levels in the hierarchy) it would practically die and bring my computer down with it! I went into the Partitions tab in the Cube in the SSAS Project. I went into the Aggregation Design Wizard and pretty much went through it all! I gave it a performance gain that reaches 100%, I gave it an estimated storage of 100M, I gave it a chance to run aggregation till I clicked stop. BTW – this is also when I found out that at times, it doesn't really matter that I don't click stop. The aggregation wizard can, at times, stop making aggregations on its own, if it doesn't find it needs any more aggregations. Yes, very smart wizard :) …I also decided to manually click high values in the count the aggregation wizard runs before enabling you to manipulate the aggregations. Even that didn't work.
The only thing that seemed to have the slightest influence was connecting the dimension directly to the fact table (rather than leaving it as a referenced dimension as I did before). That made the cube browser react quickly even at the 4th and 5th level, but I still had 4 more levels to go.
In a moment of desperation, I decided to check how it would look to the user in the Panorama Nova View. And it reacted great there! In all of the levels! Needless to say, I was stunned! It was as if all my efforts from before were in vain. Checking what was happening in the background, I understood the reason for the difference in performance. In the cube browser, the back engine creates lots of sql queries to determine what it needs to show. In the Panorama Viewer on the other hand, there is just a single MDX query being generated for each click you make. And that – makes all the difference!
So my last word of advice is this: when you want to look at your cube – do it in the proper viewer…
Panorama Nova View Analytics is a viewer on a cube which enables you to drag and drop attributes and measures onto a grid, so as to study the influences different slicers and dicers have on a measure. Each combination of slicers and dicers with a certain measure, can be saved as a presentation (the name in Panorama for a dynamic report). The presentation is dynamic, as you can change not only the filters on the existing attributes (as in Reporting Services), but also add new attributes and remove the old ones, creating a new presentation altogether from the one you started with. A collection of presentations is saved into a book. Usually, a book will have presentations on a single cube, though that's not obligatory. The application in itself is based on a Java Applet, and the presentations are based on MDX queries that run in the background each time you change an attribute.
In the Panorama Nova View Administrator, you coordinate all the user security definitions there are against a book. After connecting your user groups to the role definitions you gave them in your OLAP cube, you need to synchronize the roles.
Panorama has it's own user with which it connects to the cube. The Panorama user has full access to the cube. When Panorama makes a query, (whether to build a new presentation or just show the user an existing one), it uses the Panorama user and plants the actual user (or the relevant user group – depending on how you defined it in the cube) as a parameter in the filter of the MDX query. An MDX query using just the Panorama User, would have access to all the data in the cube. The MDX query with the user name in it's filter, on the other hand, will only show the user what it has been granted to see in the cube. The query with the Panorama user is the one running when you define a presentation in the Panorama Nova View Desktop. The query with the actual user, is the one running when the user looks at the presentation through the Panorama Web Access. So, if I defined a presentation with salary measures in it, but you are in a user group that is not allowed to see salary measures, then you will see the presentation as having no data in the Web Access.
To enforce the security definitions, you need to tick a square saying you should do so in the Administrator tool, and also synchronize the roles. Synchronizing roles does two things:
- Update the list of people which belong to the user group.
- Update the restrictions you gave to that user group in your SSAS project.
Until you synchronize roles Panorama will allow users to look at presentations according to the old definitions it had – both for the people belonging to the user group and to what they can see.
So, if you add a user group or change definitions to your roles in the cube – don't forget, Panorama needs to know about it as well!
I thought this time around, we'll discuss how solve order can affect your MDX calculations. I found that people tend to ignore this property, as it is not mentioned in the "Form View" of the Calculation Tab of the cube (which is what people usually tend to use when they write MDX for a cube. I hope this will be taken into consideration in SQL Server 2008). Now, this is a shame, as solve order can greatly affect your calculations.
For each calculation you do in MDX, you can specify the optional property of solve_order. If you do not mention this property, it's default value will be zero. Solve_order is the property which actually determines in what order calculations will be done in case of a conflict for a certain cell. The calculation with the highest solve_order will be the one that's used for that cell.
There is still an exception for this rule in Analysis Services in SQL Server 2005. In SSAS, calculated members with the Aggregate function have a lower solve order than any intersecting calculated measure.
As another small tip, I would recommend you use solve_order in your calculations in jumps of 5 (that is: 5, 10, 15 etc.). That way, if you find after a while that you need to add another phase to your calculation, you won't have to go through re – numbering all of your solve order's – as you'll have the "space" to just add them in.
Be wary though! If you use calculated members, which are "competing" for the same cell in the cube, with the same solve_order, the outcome will be unpredictable.
For more info, you can refer to the ever reliable MSDN article.
There are quite a few ways of ways of documenting the changes a business entity goes through over time. The most famous of them all (and the most difficult to build) is a Slowly Changing Dimension (SCD) type 2. doing this in a table, you would create a table which would include: an incremental key, the business key, the attributes you'd like to track for that entity, a start date and an end date (for which the information was right) and a flag which would equal 1 when we are looking at the last row available for the member of that entity.
A table for example would be:
|
Surrogate_key |
Worker_id |
Worker_status |
Start_date |
End_date |
Flag |
|
126 |
3789 |
Active |
14/06/2006 |
31/12/2006 |
0 |
|
256 |
3789 |
Sick |
01/01/2007 |
|
1 |
For more information about SCD, please refer to an article by Margy Ross and Ralph Kimball.
So you've done a very bold and difficult move and created a Slowly Changing Dimension type 2. How does that actually manifest itself in the Analysis Services project you're building?
Well, they added something new in SQL Server 2005. Now, if a dimension in a cube is an SCD type 2, you may update the Meta Data of the relevant attributes in that dimension. Each attribute in a dimension, has a property called "Type". Attribute types help classify an attribute in terms of business functionality. You may like to map out a few attributes for an SCD type 2. The attributes that need to be mapped and their relevant types are:
The business key (in this example Worker_id) to type: ScdOriginalID
The start date to type: ScdStartDate
The end date to type: ScdEndDate
The flag to type: SCDStatus
And updating that should be peanuts for you, after building the SCD type 2 dimension ;P
So today's post will be about the usage of the many to many relationship that can reside between two dimensions. Quoting from the definition of Dimension Relationships, I would note that "Dimension usage defines the relationships between cube dimensions and the measure groups in a dimension…The many-to-many dimension relationship defines an association between a dimension and a measure group by specifying an intermediate fact table that is joined to the dimension table".
Now I know the usual example for this relationship is with bank accounts and people, as in: each person can have more than one bank account and each bank account can have more than one owner. But I thought that instead of that I would give you an example from my own business world of Human Resource as I found it adds an extra twist to it all.
In the HR world I've defined in my cube, I have people and jobs. Now a job can have two people occupying it (if the job is defined as having 100% of work and each person is occupying it only in 50%). A person is also capable of occupying two jobs (I work half time as a system analyst and half time as a developer). The added twist? Well a job can also exist without anyone on it. That is, a job can be totally vacant and ready to be occupied.
I had a fact table for the jobs stating how much work is defined per job (the "Fact Table" in the above drawing). The Job dimension (the "Regular Dimension" in the above drawing) is linked to this fact table. In another fact table, I had how much work is defined per worker occupying a job (the "Intermediate Fact Table" in the above drawing). Both the Worker dimension (the "M2M Dimension" in the above drawing) and the Job dimension were linked to this fact table. When I tried in the Cube > Dimension Usage to connect the Worker Dimension to the Jobs Fact table via the Worker on a Job Fact table, I found that the rows for all the jobs that were completely vacant disappeared from my Job Fact table. The Analysis Services gave me no notice of this while processing the new connection to the Measure Group. I found it out only through checking the details through the Cube Browser. It seems that as the intermediate fact table did not have rows for vacant jobs (but only for actual workers on a job) it's connection to the jobs fact table had dropped those rows from it as well.
So what was my solution? I created a named query on the Worker on a Job Fact table, creating in that query also a row with a non – existent worker on the vacant jobs that were in that period of time. I hid the non – existent worker in the Worker Dimension, giving him a very large ID which will make him very hard to find (unless you're specifically looking for him). I turned that named query into a fact table including a measure of Count Rows. I hid that measure (as I never really needed that table as a fact table in reality, as much as I needed it as and intermediate fact table) and I connected the Worker Dimension to the Job Fact table through this new Intermediate Fact Table. Thus, I had the correct connection (including all workers and all of the jobs in the intermediate fact table), and this was also invisible to the End – User as the only measure in this new fact table was invisible (meaning, the End – User could never know the table or the measure ever existed).
And so, when you use the Many-to-Many relationship in the Dimension Usage tab of the cube I would advise you (as always) to check very carefully the consequences of this new relationship. Did it drop you lines from the fact table? If it did, try working around it with the steps I've described.