DCSIMG
SSIS Performance – The OLE DB Source Adapter - Yaniv Mor

Yaniv Mor

On SQL Server and Business Intelligence

SSIS Performance – The OLE DB Source Adapter

Technorati Tags: ,

I came across a BOL topic discussing SSIS data flow optimization techniques. For the OLE-DB source component there is a recommendation to use a SELECT statement instead of the "Table or View" access mode. For the sake of clarity, when you use the "Table or View" access method, you get to choose a specific table or view from the database you have designated as your source database, in a drop-down list. When you use the "SQL Command" access method, you can simply type in the query editor any SELECT statement you like.

The obvious advantage of using the "SQL Command" access method is that you can choose which columns to get and which not, thus reducing unnecessary data load on the ETL process. When you choose a specific table in the "Table or View" access mode, all columns from that table will be retrieved so in some, if not most cases this is a thing you would want to avoid. Just to clarify this point, even if you choose the "Table or View" access mode it does not mean that all columns of that table should pass through the data pipeline. You can still remove specific columns, simply by un-ticking them on the "Columns" pane of the OLE DB Source Editor. However, the source still has to consider those columns when it retrieves the metadata for the table. Using the "SQL Command" access mode can remove this overhead. Another benefit of using the "SQL Command" access method is that when the table schema changes, if the columns modified are not included in the SQL statement, then no metadata changes are detected by the OLE DB Source adapter, thus eliminating the need to refresh it.

However, the BOL article states that the "SQL Command" simply "performs better", regardless of the number of columns you choose to retrieve. I decided to check that.

I created a destination table based on the "FactInternetSales" table in the "AdventureWorksDW2008" sample database and named it "FactInternetSales_Destination".

 

CREATE TABLE [dbo].[FactInternetSalesDestination](
    [ProductKey] [int] NOT NULL,
    [OrderDateKey] [int] NOT NULL,
    [DueDateKey] [int] NOT NULL,
    [ShipDateKey] [int] NOT NULL,
    [CustomerKey] [int] NOT NULL,
    [PromotionKey] [int] NOT NULL,
    [CurrencyKey] [int] NOT NULL,
    [SalesTerritoryKey] [int] NOT NULL,
    [SalesOrderNumber] [nvarchar](20) NOT NULL,
    [SalesOrderLineNumber] [tinyint] NOT NULL,
    [RevisionNumber] [tinyint] NOT NULL,
    [OrderQuantity] [smallint] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [ExtendedAmount] [money] NOT NULL,
    [UnitPriceDiscountPct] [float] NOT NULL,
    [DiscountAmount] [float] NOT NULL,
    [ProductStandardCost] [money] NOT NULL,
    [TotalProductCost] [money] NOT NULL,
    [SalesAmount] [money] NOT NULL,
    [TaxAmt] [money] NOT NULL,
    [Freight] [money] NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactInternetSalesDestination_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED
(
    [SalesOrderNumber] ASC,
    [SalesOrderLineNumber] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

 

Next thing I created a new SSIS package, added an EXECUTE SQL task to clear the procedure cache and data buffers:

 

Clear_Cache_Pic2

I then added another EXECUTE SQL task to truncate the destination table:

 

Truncate_Table_Pic3

 

Next, I added a data flow task that will retrieve data from the source table and place it on the destination table. I've added no transformations at all to the data flow and I haven't modified any of the data flow default settings.

I executed the package 3 times in each access mode: the "Table or View" access mode and the "SQL Command" access mode, and the results of the test can be seen here(in seconds):

Results_Pic4

As you can see, execution time of the data flow task is much the same, regardless of the access mode chosen.

Next thing I wanted to check was the Profiler. I removed the "Destination" component in the data flow task, because I didn't want to capture the "Insert Bulk" statements, just the data retrieval statement.

Using the "Table or View" access mode, the query to get the data was:

Data_or_View_Profiler_5

Using the "SQL Command" access, the query to get the data was:

SQL_Command_Profiler_6

 

So, here lies the difference: when using the "Table or View" access mode, SSIS issues an ad-hoc SELECT statement on the source table. When using the "SQL Command" access method, it uses an sp_prepare and an sp_execute statements instead. Now, I know that using the sp_prepare statement can benefit performance because the execution plan fits the sent query. I'm also guessing that the fact that in my tests I didn't get to see any performance differences between the two methods is due to the fact that I'm running these tests on my small, isolated laptop and not in a busy server environment.

So, to conclude: do use the "SQL Command" access method, it's better for performance and it's better for manageability and better coding practices.

-Yaniv

(This is post number 1 for week number 1 of my SQL blog challenge)

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: