May 2010 - Posts
This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.
The roadmap for SSIS 2011 - what will exist in the future version of Integration Services 11
Today everything exists in a package.
In SQL Server 2011 there is an SSIS Server which uses SQL Server OS to manage data flows without being confined to a package. Most of the work is around building the server and the tools to manage the server.
A new designer interface which will be built around common tasks.
A designer with a walkthrough according to the functions or transformations that you've added.
MetaData management:
- Where does the data come from
- What is affected from the package downstream - cube, SharePoint list
This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.
SSIS 2008 Performance and Scalability
The importance of measuring to improve performance
Understand and measure hardware:
How many CPU cores?
How much memory?
How fast is the I\O subsystem?
Understand the potential bottlenecks:
Understand the limits of the Source System (usually a bottleneck as it's usually slower than the new system)
Gains from the source system:
Find better drivers
Configure the driver
Optimize I\O and network configuration
Measure Speed per Connection: Row / sec = Row Count / Time of the data flow
Improving the speed of the Source
Use multiple connections (not possible for a text file, but is possible for a DB with a query that takes CustID 1-1000 and then 1000- 2000 etc)
Convert the data from string to the relevant type (like int) already at the source - can create better performance.
FastParse - doesn't have any error handling. We have to guarantee that the integers are indeed integers else we won't get the error and it just won't execute. We gain better speed.
Don't assume you know where performance is best. Test it to see where it will be best - source, transformation or destination.
SSIS will take all the memory possible for it. There is no memory management for it at the current version, though it is planned for future release.
Tuning the Data Flow
Use the NOLOCK hint to remove locking overhead in large table scans
SELECT only columns you need
Network Tuning
Change the network packet size in the connection manager - put in the max of 32767, cut it by half and see how it changed performance.
Make data types as narrow as possible
Casting data types is expensive, especially in dates
Think of money, float and decimal - consider how effective they are.
SQL destination - guarantees great performance if the DB is on the same server as the SSIS.
Use partitions and partitions SWITCH load data in parallel and in great performance and then use SWITCH to connect all the tables together.
Even if you use split, you don't have to union all them. You can load the data in parallel to the DB. This is dependant of having an index on the table.
Simplify the logic as much as possible.
Change the Design
Don't Sort unless you absolutely have to (a time you do need - joining data when one source is text file that needs to be sorted). Usually you don't need to use Sort. Use the capabilities of the Source System (indexes, set based operations).
Sometimes T-SQL is faster - taking advantage of Indexes or set operations: an Update by T-SQL, group by, sum.
Delta detection: if more than 10% of your data has changed - reload it!
Do minimally logged operation if possible.
Designing for parallelism
Partition the problem: partition source data, partitioning the target table.
Design to stay in memory.
Schedule it correctly.
Right now, we're using SQL Agent, suppose to get a scheduler for the next version of SSIS.
Lookup in SSIS gives good performance, close in performance to T-SQL.
This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.
SSIS 2008 improvements over SSIS 2005
C# scripting can reference all .net assemblies
VSA replaced by VSTA
Pipeline Scale Up
Tuning SSIS through looking at the buffers on the Data Flow.
In SSIS 2005 you would add buffers and break up flows
SSIS Tip:
Tools > Options > BI Designers > IS Designers:
Control Flow Data Connect
Data Flow Data Connect
Check them both to connect the components automatically
Use Row Sampling component to do performance testing.
One buffer structure goes all through the process.
We load data in the beginning but we don't need it to move in memory. The problem is when one buffers slows down the process, it slows down for all the buffers before him as well.
In 2008 we have benchmarks for each buffer and if needed, create buffers artificially (rather than manually as we had to in 2005).
Asynchronous component - the input and output are different. We don't know what have in the middle.
Sort - asynchronous, creates a new buffer as you can exclude a column.
The other components don't need to be held up when an asynchronous component doesn't run quickly, as they don't depend on it.
Advanced Editor for the component: SynchronousInputID exists for synchronous components (and not for asynchronous).
ADO .net support - for improved source UI and added destination adapter (enabling you to use SSIS to load data into providers other than SQL Server).
The performance is still 30% slower because of the use of ADO .net provider, but it enables greater flexibility.
Use OLEDB providers as is, preferably don't use .Net OLEDB providers through ADO .net providers unless you have to.
When you connect using the ADO .net, you will only be getting a list of tables. If the data isn't in shape of tables, you won't see anything in the dropdown list.
Use ADO .net if you have to use ODBC, or if the component you're using requires it, or if you need it for your Script component.
Your default should still be OLEDB.
Import \ Export Wizard is much smarter:
ADO .net support for source and destination
Data types conversion - does only accurate conversions
Starting Import \ Export from the program menu enables straight execution. Starting it in SSIS Package requires execution to be done manually.
Persistent cache usage reducing DB and memory usage.
Cache in Lookup enabling more control over sharing and lifetime of reference data.
Cache potential cons - requiring more disk storage and data in it available only for SSIS.
New to SQL Server 2008 that's useful for SSIS 2008:
SQL Merge Syntax
Change Data Capture CDC (works only against SQL Server or through partner product against other sources)
Enhanced supportability:
In case SSIS crashes, SuperDump collects all the data needed to analyze why it crashed. Enables quicker time to analyze and treat the problem.
Data Profiling which enables you to react to the data quality. Enables analysis of the amount of nulls in a column, the uniqueness of a field etc.
Community Samples on Codeplex - code for you to use freely (SharePoint List adapter for instance).
More connectivity out of the box in SSIS 2008. The Attunity connector which is inside SSIS 2008 moves data into Oracle quicker than with the connector Oracle gives you.
This is live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv. I would really like to thank Hamada who lent me his netstick so I can blog - Thanks Hamada!
Business Intelligence is about people not about IT.
Your data is always connected to real world and in direct connection to it. It's important to know your business for you to design a good Decision Support system.
BI has always been in the middle of the spectrum - tactical decisions. CEO and operational workers will usually look just at your BI reports, they won't create them themselves.
For the CEO, you have strategic decisions - Dashboards and Scorecards. Getting data aggregated.
For Operational Workers, you need embedded reports, analytic APIs so that they can get the info from where they're working (no special knowledge needed). Getting the data to the right place.
SSIS for the operational level. SSAS to model the operational data giving a consistent view of the data for the entire organization. PerformancePoint Services giving the high level strategic view.
Decisions flow from the strategic level down to the operational end. Traditional BI from the tactic level may move up to the strategic level and change their decision.
Questions you need to ask yourself: Who am I giving the info to? What level of info do they need? How will they use the info to make decisions?
Prototype a BI project. Test the accuracy and performance of the DW and also see that people actually use the solution you developed. Do Beta testing to see how people use the reports - look at the query log their using and tune the DW according to that.
An outer user should be treated differently only from the security aspect, but except for that you treat them just as any other user. This is unless you just give them a report rather than let them use freely the data.
Operational - won't learn new application - reports, embedded functions, mobile BI
Tactical - reports, cubes, annonationas and SharePoint workflow.
Starategic - PerformancePoint Services, annonationas and SharePoint workflow.
SharePoint is the largest growing server in Microsoft. SharePoint is growing because people need to share info and work on it together in a managed way. SharePoint 2010 has social features as well, as collaborating through social networks has become more and more important. Where you're collaborating and where you're sharing, is now also where you use your Business Intelligence, which is also integrated into SharePoint.
1. IT oversight - how much control IT has
2. Organizational alignments - coordination with the goals of the organization
3 S of a DWH:
Scale
Standards - built to. Critical. Laws for instance about HR DWH (for instance) and who do you allow against it (Security).
Skills
DWH and Reports - high on 1 + 2
Spreadsheets & desktop databases - low on 1 + 2
Between the two there exist:
Shared sheets, Databases
Departmental Data Marts
Sharepoint as a way to bind all of these together.
New to SQL Server 2008 R2 - Master Data. Master Data is not a DW. DW is for analysis and hold data of current state + history. Master Data is about the current state of the Business Entity - is for operational use. Master Data - keeps the data synchronized for all the departments.
Next version of SQL Server in 2011 will include more Data Quality capabilities based on abilities acquired from Zoomix.
ETL - Extratct Transform Load. Moving data in the organization. SSIS.
EAI - Enterprise Application Integration. Moving messages in the organization. Biztalk.
Biztalk and SSIS work both on 3 layers: Data, Logic, Presentation. Coordinate on all levels.
Request and Reply - with ETL updates. Cross reference table is updated by an ETL process in batch mode.
It's been three years that me and the blog have been together (truthfully, the date was on the 15th, and I didn't forget honey, I was just really busy with work, so you understand now don't you?... Ok, I'll get you flowers :))
I have to admit that looking back at older posts is sorta weird, cause I think my own voice in my writing has changed since then. But then again, I guess you never stop changing your own voice in your writing, just as you never stop changing yourself.
I also really learned to appreciate the blog more, as time passed. Just a few weeks back, Microsoft had organized a meeting for their IT bloggers. I was there. And it really made me look at all I got from the blog - both professionally and personally. You don't expect that from a blog, or at least I didn't...
Still, I know that you, yes you, my dear reader, have also contributed to this. I have gotten along the years quite a few comments on my posts that lead to new posts. That's because you helped me with your comments look into something interesting or challenging. That's how my post about Recursive Sum for a Matrix in SSRS started, back in the day.
Also, I understood, you really like free stuff - so it's not just me (yay! :))
And that you're also interested in data.
I would really like to hear from you personally. What is it you would like to read about most in this blog?
Are you looking to read more about best practices of design and development? More about the fundamentals of MDX (or perhaps DAX)? More about Report Builder in its newer versions? More free stuff? More videos and less text?
Post me and the blog a comment - we'd love to hear from you.
Ok, now enough with the gushing - hit it Fatboy Slim!
קהילת ה BI כולה בבאז ממפגש ה BI User Group הקרוב. המפגש יהיה ביום שני הקרוב, ה 31 לחודש. אישיות בכירה ממיקרוסופט עומדת להגיע ולהרצות על Data Mining. המפגש הוא ללא עלות.
להרשמה ולפרטים נוספים אנא פנו לדף אירוע ה BIUG.
Hot off the press! I just read on Teo Lachev's blog that Microsoft SQL Server 2008 R2 Feature Pack RTM - ed. Yeah, I know! I have to admit this is also slightly amusing for me, as just this morning I asked a Microsoft consultant if he knew when it's suppose to drop...
So you know what this means, you can now download a standalone version of Report Builder 3.0, PowerPivot for Excel and Data Mining Add-ins for Office 2007. Each can be downloaded by itself or as a whole package.
If you're looking to check out all the new goodies in Report Builder 3.0 and SSRS 2008 R2, I would also recommend you sign up to the Prologika forum. You can download from there the lecture Mr. Lachev gave at Atlanta.MDF on Reporting Services 2008 R2. The zip file includes not just the slides but also report samples for each and every new feature he covered.
Just don't forget first the Microsoft SQL Server 2008 R2 Feature Pack download.
Office 2010 RTM - ed on the end of April. You probably heard already about PowerPivot and Excel 2010. And of course there's also SharePoint 2010 which has Excel Services, Visio Services, Access Services, PerformancePoint Services and PowerPivot integration.
Still, that doesn't make for the entire BI scope in Office 2010. I have just recently learned that there are BI capabilities in Project Server 2010.
Going deeper I have learned that there were already such capabilities in Project Server 2007 which integrated with PerformancePoint Services 2007... still, never too late to learn something new :)
Project 2010 has multiple OLAP Database support as well as Reporting Services and Excel reports. Putting it all together on the Project Web Access, you can get a pretty impressive dashboard on how work is progressing in the different projects in your company. You can monitor money spent on the work done, create KPIs and use data bars and sparklines to further enhance visualization in your reports on your different organizational projects. MSProject sure came a long way since the 2003 version I'm currently using...
You can read more about BI in Project 2010 and also see the different Services that work together in SharePoint 2010. For an even broader view, you can join the Office 2010 virtual launch event taking place today (May 12th 2010). Recordings will also be available for some of the sessions in the event.
I think Microsoft is really making big steps in turning the slogan of "BI for the masses" into a reality.
The municipality has been planning to move to SQL Server 2008 R2. Now obviously I'm happy about it, but it also made me wonder about our Report Builder 1.0 models. I know Report Builder 3.0 doesn't have all the functionality that Report Builder 1.0 has. In some ways Report Builder 3.0 outdoes version 1.0 and in other ways it's lacking.
I was deliberating on how the move of upgrading to SQL Server 2008 R2 may affect us in the municipality. I mean, does that make our Report Models from 1.0 obsolete? Maybe there's no point in continuing to develop Report Models, as the next Report Builder versions (both 2.0 and 3.0) are not necessarily model driven as version 1.0 was.
So what is a girl left to do, but put up a question in the MSDN SQL Server Reporting Services Forum? Which is exactly what I did :)
I wanted to get a better insight into the Microsoft strategy for Report Builder in general. So, I understood from the answers that support for Report Builder 1.0 will not stop and that both Report Builder 2.0 and 3.0 support models. After you open a report with Report Builder 3.0 (or 2.0) you will not be able to modify it with Report Builder 1.0 again, as there is no backward compatibility.
Not too long ago, I saw that in his post regarding SQL Server 2008 R2, Robert Bruckner had put up a Report Builder 3.0 FAQ document. There are quite a few points there regarding Report Builder 3.0 and how it stands in comparison to Report Builder 1.0, so I guess I'm not the only one with doubts and hesitations... I really recommend you go over that document for a more in-depth and detailed look on the matter. Again, the message coming out of the document is - don't worry about it!
So I guess development will continue as usual here.
In the meanwhile, I'm looking forward to a standalone release of Report Builder 3.0 so I can give it a go with the SQL Server 2008 R2 Express version I already installed.
As you may have read, I've got a ticket to give away for the first SQL & BI convention which is taking place at the end of this month in Tel Aviv. Yes, I'm going to give you a ticket which will grant you a pass to a day of lectures about all the new features in SQL Server 2008 R2. The lectures in the convention are aimed at helping you learn of the new features and how they can help you in your day to day work. The ticket to a day at the convention is worth 900 shekels - but for the lucky winner it's free!
Keeping with the theme of free stuff - though I really love free stuff, sometimes it's worth paying for what you get. So to get a free ticket, you need to answer two questions regarding the difference between SQL Server Express (which is free) and the "regular" SQL Server versions (which you need to pay for):
-
Give me at least one difference that exists between SQL Server Management Studio 2008 R2 Express and SQL Server Management Studio 2008 R2
- Give me at least two examples of the abilities of Reporting Services in SQL Server 2008 R2 Enterprise Edition and SQL Server 2008 R2 Express with Advanced Services
I'll draw a name out of the hat from those who answered correctly the two questions above. The lucky winner will get to go to the SQL & BI convention and see Donald Farmer lecture on all the treats waiting for us in SQL Server 2008 R2.
So what are you waiting for?
Update May 5th 2010: I understood yesterday from Microsoft that I have a ticket to give away for just one of the days of the convention - sorry everybody. The Post was updated accordingly.
Update May 10th 2010: Just so there won't be any copying, I would really appreciate it if each one of you answering the question would give an answer slightly different to the one given before. As there are more than a few differences between the two versions, (and for the time being just one person answered), I believe this is possible.