<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.microsoft.co.il/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Ella Maschiach&amp;#39;s BI Blog : SSIS 2008</title><link>http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/SSIS+2008/default.aspx</link><description>Tags: SSIS 2008</description><dc:language>en</dc:language><generator>CommunityServer 2007.1 (Build: 20917.1142)</generator><item><title>SSIS 2008 Performance and Scalability</title><link>http://blogs.microsoft.co.il/blogs/barbaro/archive/2010/05/30/ssis-2008-performance-and-scalability-Donald-Farmer.aspx</link><pubDate>Sun, 30 May 2010 19:05:00 GMT</pubDate><guid isPermaLink="false">b5c4f5bc-c09b-4439-a595-91a98c1847df:644473</guid><dc:creator>Ella Maschiach</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.microsoft.co.il/blogs/barbaro/rsscomments.aspx?PostID=644473</wfw:commentRss><comments>http://blogs.microsoft.co.il/blogs/barbaro/archive/2010/05/30/ssis-2008-performance-and-scalability-Donald-Farmer.aspx#comments</comments><description>&lt;span style="FONT-FAMILY:&amp;#39;Tahoma&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:10pt;"&gt;
&lt;p&gt;This is continued live blogging from the Donald Farmer seminar from the SQL &amp;amp; BI convention in Tel Aviv of Data Integration in Large Organizations.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;SSIS 2008 Performance and Scalability&lt;/h1&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;The importance of measuring to improve performance&lt;/h2&gt;
&lt;p&gt;Understand and measure hardware:&lt;br /&gt;How many CPU cores?&lt;br /&gt;How much memory?&lt;br /&gt;How fast is the I\O subsystem?&lt;/p&gt;
&lt;p&gt;Understand the potential bottlenecks:&lt;br /&gt;Understand the limits of the Source System (usually a bottleneck as it&amp;#39;s usually slower than the new system)&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Gains from the source system:&lt;br /&gt;Find better drivers&lt;br /&gt;Configure the driver&lt;br /&gt;Optimize I\O and network configuration&lt;/p&gt;
&lt;p&gt;Measure Speed per Connection: Row / sec = Row Count / Time of the data flow&amp;nbsp;&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Improving the speed of the Source &lt;br /&gt;Use multiple connections (not possible for a text file, but is possible for a DB with a query that takes CustID 1-1000 and then 1000- 2000 etc)&lt;br /&gt;Convert the data from string to the relevant type (like int) already at the source - can create better performance.&lt;/p&gt;
&lt;p&gt;FastParse - doesn&amp;#39;t have any error handling. We have to guarantee that the integers are indeed integers else we won&amp;#39;t get the error and it just won&amp;#39;t execute. We gain better speed.&lt;/p&gt;
&lt;p&gt;Don&amp;#39;t assume you know where performance is best. Test it to see where it will be best - source, transformation or destination.&lt;/p&gt;
&lt;p&gt;SSIS will take all the memory possible for it. There is no memory management for it at the current version, though it is planned for future release. &lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Tuning the Data Flow&lt;/h2&gt;
&lt;p&gt;Use the NOLOCK hint to remove locking overhead in large table scans&lt;br /&gt;SELECT only columns you need&lt;/p&gt;
&lt;p&gt;Network Tuning&lt;br /&gt;Change the network packet size in the connection manager - put in the max of 32767, cut it by half and see how it changed performance.&lt;/p&gt;
&lt;p&gt;Make data types as narrow as possible&lt;br /&gt;Casting data types is expensive, especially in dates&lt;br /&gt;Think of money, float and decimal - consider how effective they are.&lt;/p&gt;
&lt;p&gt;SQL destination - guarantees great performance if the DB is on the same server as the SSIS. &lt;/p&gt;
&lt;p&gt;Use partitions and partitions SWITCH load data in parallel and in great performance and then use SWITCH to connect all the tables together. &lt;/p&gt;
&lt;p&gt;Even if you use split, you don&amp;#39;t have to union all them. You can load the data in parallel to the DB. This is dependant of having an index on the table.&lt;/p&gt;
&lt;p&gt;Simplify the logic as much as possible.&lt;/p&gt;
&lt;p&gt;Change the Design&lt;br /&gt;Don&amp;#39;t Sort unless you absolutely have to (a time you do need - joining data when one source is text file that needs to be sorted). Usually you don&amp;#39;t need to use Sort. Use the capabilities of the Source System (indexes, set based operations).&lt;br /&gt;Sometimes T-SQL is faster - taking advantage of Indexes or set operations: an Update by T-SQL, group by, sum.&lt;/p&gt;
&lt;p&gt;Delta detection: if more than 10% of your data has changed - reload it!&lt;/p&gt;
&lt;p&gt;Do minimally logged operation if possible.&lt;/p&gt;
&lt;p&gt;Designing for parallelism&lt;br /&gt;Partition the problem: partition source data, partitioning the target table.&lt;br /&gt;Design to stay in memory.&lt;/p&gt;
&lt;p&gt;Schedule it correctly. &lt;br /&gt;Right now, we&amp;#39;re using SQL Agent, suppose to get a scheduler for the next version of SSIS. &lt;/p&gt;
&lt;p&gt;Lookup in SSIS gives good performance, close in performance to T-SQL.&lt;/p&gt;&lt;/span&gt;&lt;img src="http://blogs.microsoft.co.il/aggbug.aspx?PostID=644473" width="1" height="1"&gt;</description><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/ITPro/default.aspx">ITPro</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Dev/default.aspx">Dev</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Donald+Farmer/default.aspx">Donald Farmer</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/SSIS+2008/default.aspx">SSIS 2008</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Scalability/default.aspx">Scalability</category></item><item><title>SSIS 2008 improvements over SSIS 2005</title><link>http://blogs.microsoft.co.il/blogs/barbaro/archive/2010/05/30/ssis-2008-improvements-over-ssis-2005-donald-farmer.aspx</link><pubDate>Sun, 30 May 2010 15:38:00 GMT</pubDate><guid isPermaLink="false">b5c4f5bc-c09b-4439-a595-91a98c1847df:644300</guid><dc:creator>Ella Maschiach</dc:creator><slash:comments>5</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://blogs.microsoft.co.il/blogs/barbaro/rsscomments.aspx?PostID=644300</wfw:commentRss><comments>http://blogs.microsoft.co.il/blogs/barbaro/archive/2010/05/30/ssis-2008-improvements-over-ssis-2005-donald-farmer.aspx#comments</comments><description>&lt;p&gt;This is continued live blogging from the Donald Farmer seminar from the SQL &amp;amp; BI convention in Tel Aviv of Data Integration in Large Organizations.&lt;/p&gt;
&lt;p&gt;SSIS 2008 improvements over SSIS 2005&lt;/p&gt;
&lt;p&gt;C# scripting can reference all .net assemblies&lt;br /&gt;VSA replaced by VSTA&lt;/p&gt;
&lt;p&gt;Pipeline Scale Up&lt;/p&gt;
&lt;p&gt;Tuning SSIS through looking at the buffers on the Data Flow.&lt;br /&gt;In SSIS 2005 you would add buffers and break up flows&lt;/p&gt;
&lt;p&gt;&lt;b&gt;SSIS Tip: &lt;br /&gt;&lt;/b&gt;Tools &amp;gt; Options &amp;gt; BI Designers &amp;gt; IS Designers:&lt;br /&gt;Control Flow Data Connect&lt;br /&gt;Data Flow Data Connect&lt;br /&gt;Check them both to connect the components automatically &lt;/p&gt;
&lt;p&gt;Use Row Sampling component to do performance testing.&lt;/p&gt;
&lt;p&gt;One buffer structure goes all through the process.&lt;br /&gt;We load data in the beginning but we don&amp;#39;t need it to move in memory. The problem is when one buffers slows down the process, it slows down for all the buffers before him as well.&lt;br /&gt;In 2008 we have benchmarks for each buffer and if needed, create buffers artificially (rather than manually as we had to in 2005).&lt;/p&gt;
&lt;p&gt;Asynchronous component - the input and output are different. We don&amp;#39;t know what have in the middle. &lt;br /&gt;Sort - asynchronous, creates a new buffer as you can exclude a column.&lt;br /&gt;The other components don&amp;#39;t need to be held up when an asynchronous component doesn&amp;#39;t run quickly, as they don&amp;#39;t depend on it. &lt;br /&gt;Advanced Editor for the component: SynchronousInputID exists for synchronous components (and not for asynchronous).&lt;/p&gt;
&lt;p&gt;ADO .net support - for improved source UI and added destination adapter (enabling you to use SSIS to load data into providers other than SQL Server).&lt;br /&gt;The performance is still 30% slower because of the use of ADO .net provider, but it enables greater flexibility.&lt;br /&gt;Use OLEDB providers as is, preferably don&amp;#39;t use .Net OLEDB providers through ADO .net providers unless you have to.&lt;br /&gt;When you connect using the ADO .net, you will only be getting a list of tables. If the data isn&amp;#39;t in shape of tables, you won&amp;#39;t see anything in the dropdown list.&lt;/p&gt;
&lt;p&gt;Use ADO .net if you have to use ODBC, or if the component you&amp;#39;re using requires it, or if you need it for your Script component.&lt;br /&gt;Your default should still be OLEDB.&lt;/p&gt;
&lt;p&gt;Import \ Export Wizard is much smarter:&lt;br /&gt;ADO .net support for source and destination&lt;br /&gt;Data types conversion - does only accurate conversions&lt;/p&gt;
&lt;p&gt;Starting Import \ Export from the program menu enables straight execution. Starting it in SSIS Package requires execution to be done manually. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;Persistent cache usage reducing DB and memory usage.&lt;br /&gt;Cache in Lookup enabling more control over sharing and lifetime of reference data.&lt;br /&gt;Cache potential cons - requiring more disk storage and data in it available only for SSIS. &lt;/p&gt;
&lt;p&gt;New to SQL Server 2008 that&amp;#39;s useful for SSIS 2008:&lt;br /&gt;SQL Merge Syntax&lt;br /&gt;Change Data Capture CDC (works only against SQL Server or through partner product against other sources)&lt;/p&gt;
&lt;p&gt;Enhanced supportability:&lt;br /&gt;In case SSIS crashes, SuperDump collects all the data needed to analyze why it crashed. Enables quicker time to analyze and treat the problem.&lt;/p&gt;
&lt;p&gt;Data Profiling which enables you to react to the data quality. Enables analysis of the amount of nulls in a column, the uniqueness of a field etc.&lt;/p&gt;
&lt;p&gt;Community Samples on Codeplex - code for you to use freely (SharePoint List adapter for instance).&lt;/p&gt;
&lt;p&gt;More connectivity out of the box in SSIS 2008. The Attunity connector which is inside SSIS 2008 moves data into Oracle quicker than with the connector Oracle gives you.&lt;/p&gt;&lt;img src="http://blogs.microsoft.co.il/aggbug.aspx?PostID=644300" width="1" height="1"&gt;</description><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/SSIS/default.aspx">SSIS</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/ITPro/default.aspx">ITPro</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Dev/default.aspx">Dev</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/Donald+Farmer/default.aspx">Donald Farmer</category><category domain="http://blogs.microsoft.co.il/blogs/barbaro/archive/tags/SSIS+2008/default.aspx">SSIS 2008</category></item></channel></rss>