Why Not to Use Sort and Aggregate Transformations in SSIS

I often come across queries about the SSIS package optimization techniques. There are many, and all depends up on scenarios. However, there are a few fundamental principals to architect SSIS packages. These help us to design a package, which runs faster. I am going to tell about two transformations that I always avoid: Sort Transformation and Aggregate Transformation.

You can categorize all the transformation in two broader categories: Synchronous and Asynchronous. Or, you may divide them into three types: Non-Blocking, Semi-Blocking and Blocking. It’s not right place to discuss all Categories/Types here. I will be discussing only Blocking Transformations here. Just for the sake of your knowledge, the Non-Blocking Transformation belong to Synchronous category, while Semi-Blocking and Blocking belong to Asynchronous category.

The Blocking transformations accept entire input at first i.e. all rows from source, perform the assigned operation, and then give the output of the operation. During this process, you have to wait for the transformation to collect entire required rows before getting output/desired result. This is where it bring degradation in the Data Flow performance, and in turn, slow package.

Both the Sort and Aggregate transformations belong to the Blocking types. This is the reason I always recommend not to use them altogether in your pipeline. Then question arises: How to sort the rows and, if required, to get aggregate?

There are ways! In case of OLE DB Source and SQL Server as source, you need to use SQL query, not the Table Name from drop down list, and perform these operations in the query. These operations will run at the database engine machine and will give you output instantly. In case of flat file/raw file, request your team to calculate these values and put it either in anther file or in first one/two lines into the source file. I am sure you know how to read/escape initial lines while reading flat files. There is an option available. Check it out! This is how you can avoid using Blocking transformations like Sort and Aggregate and, in turn, achieve better package performance.

I will be blogging about other techniques to optimize the SSIS package. So, stay tuned!