Archive

Posts Tagged ‘#SSISPackageOptimization’

OLE DB versus ADO NET Components in SSIS

ETL Package developers often ignore this question: What is the difference between OLE DB and ADO NET components? Should I use OLE DB Source/Destination or ADO NET Source/Destination? Which one is faster?

Let me explain what the difference is. The only difference is both of them use drivers/providers behind the scene, which have been developed using different technologies. ADO NET providers use ADO.NET along with C# behind the scene, while OLE DB providers use something else. Recently, the Microsoft said that they would not be enhancing the features in OLE DB providers. So, don’t expect more with OLE DB. Hence, you need to use ADO NET providers more and more. Don’t get scared! In fact, if you wish to connect to Azure SQL Database, you need to use ADO NET providers, as the OLE DB doesn’t offer flexibility to connect to Azure SQL Database or Data Warehouse. However, the OLE DB has got more providers to connect to different types of data stores compare to the ADO NET, at least as of now.

As far as performance is concerned, I personally did R&D with both the Sources/Destinations with huge data i.e. 1-5 GB, Database to Database, Database to File, File to Database, all types of package. Eventually, the statistics say that OLE DB has got an edge over ADO NET, and give you 60% more performance than that of ADO NET. Yes! ADO NET Sources/Destinations process data slower than OLE DB. Here, I would like to mention that I tweaked all option like DefaultBufferMaxRows, DefaultBufferSize, AutoAdjustBufferSize for SSIS 2016, etc. and then reached to this conclusion. And my experience of package development corroborates with this.

Hence, I recommend using OLE DB Source/Destination for the time being, until Microsoft enhances the ADO NET providers. Yes! You may need to use ADO NET in case of Azure, which is becoming very common now-a-days as every client is moving databases to Azure.

There are many other factors which affect the package performance, which I am going to discuss in other blogs. So, stay tuned!

Hope this helps you out.

 

Advertisements

Why Not to Use Sort and Aggregate Transformations in SSIS

I often come across queries about the SSIS package optimization techniques. There are many, and all depends up on scenarios. However, there are a few fundamental principals to architect SSIS packages. These help us to design a package, which runs faster. I am going to tell about two transformations that I always avoid: Sort Transformation and Aggregate Transformation.

You can categorize all the transformation in two broader categories: Synchronous and Asynchronous. Or, you may divide them into three types: Non-Blocking, Semi-Blocking and Blocking. It’s not right place to discuss all Categories/Types here. I will be discussing only Blocking Transformations here. Just for the sake of your knowledge, the Non-Blocking Transformation belong to Synchronous category, while Semi-Blocking and Blocking belong to Asynchronous category.

The Blocking transformations accept entire input at first i.e. all rows from source, perform the assigned operation, and then give the output of the operation. During this process, you have to wait for the transformation to collect entire required rows before getting output/desired result. This is where it bring degradation in the Data Flow performance, and in turn, slow package.

Both the Sort and Aggregate transformations belong to the Blocking types. This is the reason I always recommend not to use them altogether in your pipeline. Then question arises: How to sort the rows and, if required, to get aggregate?

There are ways! In case of OLE DB Source and SQL Server as source, you need to use SQL query, not the Table Name from drop down list, and perform these operations in the query. These operations will run at the database engine machine and will give you output instantly. In case of flat file/raw file, request your team to calculate these values and put it either in anther file or in first one/two lines into the source file. I am sure you know how to read/escape initial lines while reading flat files. There is an option available. Check it out! This is how you can avoid using Blocking transformations like Sort and Aggregate and, in turn, achieve better package performance.

I will be blogging about other techniques to optimize the SSIS package. So, stay tuned!