September 2008 - Posts
I had a friend of mine ask me a question not long ago. He had developed a site for posting ads on selling houses for a certain company. The managers of the company were now trying to analyze the data on people posting ads. One of the things they were asking him to do, was give them a graph on how many people were posting ads on a certain span of prices. They weren't going to define him the span of money, but rather wanted him to give them the span, according to the amount of people that existed in it. And that's were he got stuck... because there was really (almost) no end to how much money can be put on a house, and there seemed to be endless values to choose from in between.
Now I remembered from the Analysis Services Tutorial I did, (which is a great place to start learning about SSAS), that you can group attribute members with the DiscretizationBucketCount property along with the DiscretizationMethod property. The DiscretizationMethod helps you determine the method by which SSAS will group your values (for instance, by Equal Areas) and the DiscretizationBucketCount determines the amount of groups that will be set.
So what was my suggestion? Turn the Sales table into a Fact - Dimension sort of table, where the price of the house is an attribute, and the amount of ads was the measure. We can then use DiscretizationMethod and DiscretizationBucketCount on the attribute of price to help us look at it more clearly.
Well I guess the future is really up in the cloud...
The last strategic move for Panorama is joining Google and offering a BI Analytics tool to work with Google Docs - PowerApps. But that's old news by now. GoodData has recently gotten funding for their BI in the cloud, giving you a complete platform for storing and analyzing your data. Business Objects offers "OnDemand" another platform that enables you to share Crystal Reports over the web. The list goes on, with other companies trying to get a piece of the pie.
So maybe that's what pushed Microsoft from just enabling you to store data on its SSDS (SQL Server Data Services) and add also Data Mining to the cloud... The SQL Server Data Mining Team has just put up a site to preview the abilities you could have for data mining over the web. The site currently has Adventure Works DW sample data and also enables you to load a CSV file of your own, for you to work on and experiment:
The visual effects are wonderful and should really help you better understand what potential lies in your data.
You could see the Key Influencers for cars (according to the AW Sample Data):

Or predict the sales for the following period:

Data Mining just got a whole lot more interesting for me - check it out for yourself!
I came across two very interesting web sites lately, both of which are a great source for good webcasts. The first site is Learn Microsoft BI which has all sorts of video on BI projects and how to develop them using SSAS. There has actually just been a first video on PerformancePoint Server as well. The site requires registration, but is well worth it.
The other site is SQL Server Videos which targets more DBAs and developers.
Having done just a few webcasts and knowing how much work goes into it, I am in complete awe of what's given in both sites. I strongly recommend you check both of them out.
Update 28/10/08: Microsoft has added videos to its MSDN catalog of Books Online for SQL Server 2008.
Slowly Changing Dimension Type 2 Tips & Tricks
I've blogged in the past about Slowly Changing Dimension Type 2, and I see that it's a subject that really interests you. I remember quite distinctly how hard and demanding it was to create it for my project (trust me I do!) and even to do it for two business entities. So, hoping that I may assist you even slightly, I thought I may share with you a few thoughts I have on the matter:
Do you really need it? Now I know this is going to sound quite obvious, but before you decide to take on such a demanding task, you should really ask yourself what you can take away from it. Does your user have a proven ROI from using the development over time in those areas for that entity? Does he have meaningful insights he can take away from comprising how the entity evolved over time? Only if the answer is "Yes" to both questions then you should go into the development of SCD type 2.
What are you going to track? Sometimes, your users can get carried away with themselves... it happens to the best, no doubt, whether it's because they're enthusiastic, or because they thinks bigger is better. No matter what the reason, sometimes your work as a System Analyst is to contain them. Now I'm not saying we're not here to serve our users, but there are those time when they request things simply because they can, and not because they need them. Because development of Slowly Changing Dimension type 2 can be quite complicated, you should try to contain the amount of attributes you follow, as the more attributes you try to follow, the more problems may arise in development. I think it'd be fair to say that for the entities in my project only 4 are meaningful, though I found myself required to track more... Don't forget, you can always snow - flake your entity, distinguishing between the attributes that will be tracked over time, and those that will stay constant or be overwritten.
SCD in SSIS - (this part is written with the contribution of Hamada Kais and Boris Kogan - thank you for your insight!)
Yes, you do have in ETL a component meant to track the way your entity has evolved over time, but...
- You have to ask yourself - what sort of data are you about to receive? Will you only get the new rows concerning your data or are you always going to get the entire history from the operational DB? The Slowly Changing Dimension Transformation component in SSIS is expecting to receive only new rows concerning the attributes of your business entity.
- The Slowly Changing Dimension Transformation component in SSIS only tracks changes from here onwards. If you need to build the SCD type 2 dimension for your business entity from its past, then you may need to use a stored procedure in T-SQL.
- SSIS only works with comparisons based on "equal" (=). If you're looking to make a comparison that uses a "between" or even only "less than and equal to" (<=) or "more than and equal to" (=>), you may need to customize your SSIS solution.
- For very large SCD dimensions, the Slowly Changing Dimension component in SSIS may not perform as well as using conditional transforms and lookups on filtered data. You should consider looking into the Table Difference component developed by Alberto Ferrari to help you there.
In any case, be prepared to do a lot of "cleansing" work. There's nothing like the QA that goes into checking the Slowly Changing Dimension to help you find all sorts of mistakes that may occur in the OLTP DB.
Still I have to admit, after you go through it all and emerge on the other side, there is nothing like the development of a Slowly Changing Dimension type 2 to make you feel like you've come a long way in BI development.