DCSIMG
SSIS 2008 Performance and Scalability - Ella Maschiach's BI Blog

Ella Maschiach's BI Blog

BI, Olap Technologies, Tools, System Analysis and Design

News

About Ella Maschiach

Business Inteligence

SSIS 2008 Performance and Scalability

This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.

 

SSIS 2008 Performance and Scalability

 

The importance of measuring to improve performance

Understand and measure hardware:
How many CPU cores?
How much memory?
How fast is the I\O subsystem?

Understand the potential bottlenecks:
Understand the limits of the Source System (usually a bottleneck as it's usually slower than the new system)

Gains from the source system:
Find better drivers
Configure the driver
Optimize I\O and network configuration

Measure Speed per Connection: Row / sec = Row Count / Time of the data flow  

Improving the speed of the Source
Use multiple connections (not possible for a text file, but is possible for a DB with a query that takes CustID 1-1000 and then 1000- 2000 etc)
Convert the data from string to the relevant type (like int) already at the source - can create better performance.

FastParse - doesn't have any error handling. We have to guarantee that the integers are indeed integers else we won't get the error and it just won't execute. We gain better speed.

Don't assume you know where performance is best. Test it to see where it will be best - source, transformation or destination.

SSIS will take all the memory possible for it. There is no memory management for it at the current version, though it is planned for future release.

 

Tuning the Data Flow

Use the NOLOCK hint to remove locking overhead in large table scans
SELECT only columns you need

Network Tuning
Change the network packet size in the connection manager - put in the max of 32767, cut it by half and see how it changed performance.

Make data types as narrow as possible
Casting data types is expensive, especially in dates
Think of money, float and decimal - consider how effective they are.

SQL destination - guarantees great performance if the DB is on the same server as the SSIS.

Use partitions and partitions SWITCH load data in parallel and in great performance and then use SWITCH to connect all the tables together.

Even if you use split, you don't have to union all them. You can load the data in parallel to the DB. This is dependant of having an index on the table.

Simplify the logic as much as possible.

Change the Design
Don't Sort unless you absolutely have to (a time you do need - joining data when one source is text file that needs to be sorted). Usually you don't need to use Sort. Use the capabilities of the Source System (indexes, set based operations).
Sometimes T-SQL is faster - taking advantage of Indexes or set operations: an Update by T-SQL, group by, sum.

Delta detection: if more than 10% of your data has changed - reload it!

Do minimally logged operation if possible.

Designing for parallelism
Partition the problem: partition source data, partitioning the target table.
Design to stay in memory.

Schedule it correctly.
Right now, we're using SQL Agent, suppose to get a scheduler for the next version of SSIS.

Lookup in SSIS gives good performance, close in performance to T-SQL.

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: