DCSIMG
February 2010 - Posts - Yaniv Mor

Yaniv Mor

On SQL Server and Business Intelligence

February 2010 - Posts

In the cloud…

Yesterday I presented at the Israeli BI User Group at Microsoft Headquarters in Raanana. I gave a presentation on the Data Profiling Task in SSIS and how to dynamically configure it and use it to be a part of your ETL workflow logic. I hope the crowd enjoyed it and that I was able to add value to at least some of the attendants. Meir Dudai from “Valinor” also gave a presentation about SQL Azure and the Cloud Computing concepts. Cloud computing is definitely one of the coolest buzzwords at the moment and from a second hand impression I got on this technology, using the Amazon MapReduce platform I know it is addressing a plethora of issues organizations are facing, dealing with massive data volumes that require processing and storing capacity.

I haven’t looked at the SQL Azure product, besides from what Meir presented, and from the looks of it, Microsoft still has a long way to go until this product becomes actually usable by the organizations described above. The multitude of features which are absent, the ridicules size limitations (10GB per database) and the throttling of queries which consume too much resources are nothing short of a showstopper to anyone who is considering testing, let alone moving to this environment. However, I believe these limitations are merely the signs of a product which is simply “not there yet” and will be removed as the product matures. I see great possiblities with utilizing Cloud Computing in an OLTP environment and even more so in a DWH/BI environment, where the requirements for transporting and manipulating large data sets is really a run of the mill activity and is occurring on a day by day, or even on an hour by hour basis. I also believe Cloud Computing can get us even closer to all those terms we started using several years ago, such as Real Time Data warehouse and/or OLAP. Terms I don’t think the industry has actually been able to deliver in a reliable way up until now (I’m not talking about really small data marts, of course, rather on large-scale, with TBs of data involved…)

Having said all that, I will start looking more closely on this technology, and on SQL Azure in particular, to see how the SQL Server BI product stack can be integrated with it and benefit from it.

-Yaniv

(This is post number 14 for week number 13 of my SQL blog challenge)

SSIS – Making the Text File Destination Dynamic

One of the data flow destination objects in SSIS is the Flat File destination object. You use this destination object when you export data from a source (SQL Server, Oracle, or anything else for that matter) to a flat file. This can come handy when you maintain an Operational Data Store (ODS) in a Data Warehouse environment and you need to store data in text files before loading it to the Data Warehouse database for example or when you need to distribute a segment of data to the outside world in an easy, accessible and readable format.

In SSIS, this Flat File Destination object needs to be linked with a Flat File Connection Manager. In this object you define the location of the file, the code page and the structure of the file (columns, data types, data length etc.). The problem with using this approach is that once a Flat File Connection Manager has been configured, the file structure has to be maintained and adhered to for the duration of the lifetime of the package. This means that if the source data has changed in any way, for example a data type change or an addition of a column, you will need to modify the connection manager to comply with these changes.

There is another option of how to deal with these kind of data changes using a Script Component as a Destination object. This component still needs to be linked with a Flat File Connection Manager, however, this connection manager needs to contain a single, blank, generic column and you do not need to add columns to this object at all.

When you configure the Script Component, you can use the StreamWriter Class to access the file defined in the Flat File Connection Manager and start appending data to it, based on the source data columns. This way, when you have a new column, you only need to modify the code in the Script Component and not the file definition in the connection manager.

At the end of the day, you cannot escape changing sections of your SSIS package when meta data changes, and its a personal preference whether you choose to use the connection manager configuration or the Script component to do so. Personally, I find the Script component changes more manageable.

As a side note, I have made some performance tests using the Script component as a destination as opposed to a Flat File destination and the results were similar with a slight preference towards the script component, and this is yet another reason (for me) to use it.

-Yaniv

(This is post number 13 for week number 12 of my SQL blog challenge)

Technorati Tags: ,

SSIS – Importing data from an Oracle data source

A quick note this week: this may be a known issue for some of you, I have only got to know this little trick the other day when I had to import data from an Oracle database to a SQL Server database (believe it or not – I did not have to use an Oracle database as a source up until now…). There were quite a few tables involved and a considerable amount of data to be moved, so the obvious solution was to use SSIS for this task. If you want to use the Microsoft provider for Oracle you have to install an Oracle client on the machine that runs your SSIS packages, so that was pretty easy to accomplish and once the client was installed we started to configure the ole-db source adapter. The data was a mix of numbers and character-type data and the next message that popped up the minute I configured the source was: “Cannot retrieve the column code page info from the OLEDB provider. If the component supports the “DefaultCodePage” property, the code page from that property will be used…”. Needless to say, when you hook this into a SQL Server ole-db destination and try to run the package, it does not run and throws this error again.

So, what to do? You need to change the “AlwaysUseDefaultCodePage” property of the Oracle ole-db source to True. This way the component will indeed use the machine’s default code page and things should go smooth from this point onwards.

-Yaniv

(This is post number 12 for week number 11 of my SQL blog challenge)

Technorati Tags: ,,

PowerPivot – Part 4: what’s inside?

In this 4’th instalment of the PowerPivot blog posts series we’ll take a quick look at PowerPivot internals. Firstly, what you need to understand is that a PowerPivot database is similar to an Analysis Services cube, but it is not exactly the same, certainly not in the way it stores data internally within the Excel file.

There is an easy way to see what is inside an Excel file (thanks to the crowd at the recent Israeli BI User Group meeting…): simply change the Excel file extension from xlsx to zip and then open the “compressed” file to look at its content. What you will see there is a bunch of xml files as well as other files. What we are interested in is the PowerPivot data file, which can be located under the /xl/CustomData/ folder under the name “item1.data” (it may come in other names, I have not made any tests on this, though). You will see that this file is the file that takes up much of the disk space that the Excel file in general takes. So, when you open up an Excel file containing a PowerPivot model. this data file is read into memory and from this point onwards, all the slicing and dicing of data and calculations is done in-memory, which is how PowerPivot models are able to work really fast.

What I could not find in the Excel file zipped content is any trace of the familiar Analysis Services folder structure (such as the map, header files). So, the format is not the same but the behaviour of the PowerPivot model is the same: you can query a PowerPivot model using MDX, you can create named sets and so on. As important is the fact that you can deploy a PowerPivot model to Sharepoint, using the Sharepoint SSAS module and later on you can use this model as a source to an Analysis Services database.

There is an excellent post describing the client architecture of PowerPivot here. You can also read more about how Analysis Services server can hook into a PowerPivot model on BOL here. This pre-release article discusses how SSAS can connect to a Sharepoint PowerPivot model in a VertiPaq mode. The vertiPaq engine is the special PowerPivot data engine that enables the in-memory storage of data and the quick calculations and retrieval of data and aggregations.

-Yaniv

(This is post number 11 for week number 10 of my SQL blog challenge)