Sunday, July 21, 2013

Merge, Merge Join and Union All Transformations

Let us continue our exploration on some more commonly used Data Flow Transformation used in SSIS.

In this article we shall look into Merge and the Merge Join Transformation.

(Although the names may look similar but from a functionality perspective they are entirely different).

The Merge Transformation:
The Merge Transformation merges two sorted data sets and outputs a sorted data set.
type.
E.g.: Let us consider a scenario where we have the sales data coming from India and US geographies and the requirement is to have consolidated output data sorted on Product Name.
In the below case the Unsorted data from the two sources will be sorted using a Sort transformation and will become the inputs to the Merge transformation, which will consolidate the two data set and will give sorted output.

                                            Data Flow Designer                                                                                                     

     

Quick peek at the actual Data

The Merge transformation requires that the merged columns in its inputs have matching metadata.
For example, we cannot merge a column that has a numeric data type with a column that has a character data

The Merge Join Transformation:
The Merge Join Transformation joins two data sets on a common key (or keys).
It is similar to the SQL Join like Inner Join, Full Outer Join and Left Outer Join, however the two input datasets can belong to two different sources altogether.
The Merge Join Transformation requires sorted data for its inputs and the sorting should be done on the column that would be used as Join Key.

E.g. Consider a scenario where we have a Table A(Employee) with columns EmpId, Emp_Name and Dept_Id and Table B(Department) has Dept_Id and Dept_Name. The requirement here is to have final data(Stored in SQL table) showing EmpId and Emp_Name from Table A and Dept_Name from the Table B.
In order to achieve the above objective we will use Merge Transformation which will join(Inner Join) the two data set on the basis of common key(Dept_Id) column and also provides the option to select the columns which we want in the final result set.

                                         Data Flow Designer                                            Quick peek at the actual Data
                                 
*Please note that:
·         We can avoid the Sort transformation if we are providing a sorted input to Merge Join Transformation. However we need to make sure that IsSorted Property in the SourceComponent is set to true.
·         In the Source component of a Merge Join it is necessary to configure the SortKey Position of the Join-Key column(Dept_Id in this case) to “1”.

The Union All Transformation:
The Union All transformation also provides the same functionality which Merge Transformation do. But they differ in below aspect
1.       In order to use Union All we don’t need to have a Sorted input as in the Merge Join.
2.       Union All can have more than two Input datasets while Merge transformation can handle only two.
3.       Output of the Union ALL is not sorted.

No comments:

Post a Comment