Sunday, July 21, 2013

Data Flow Task

In the last couple of articles we explored the internal architecture of SSIS and SSIS Designer.
In this article we shall look into one of the most important task in SSIS control Flow that drives actual data transfer - The Data Flow Task

Though Data Flow Task(DFT) may seem like just another task in the Control Flow Toolbox pane, but the scope/potential of this task is much higher than most of the other tasks.
As the name suggests this is where we define the flow of data from Source(s) to Destination(s).

Let us explore the Data Flow Task’s window/designer and understand the various components in it.

Consider a scenario where we have an Employee data stored in a text file which needs to be sorted on Employee Name and eventually load into a SQL Server table.

Let’s see how we can accomplish the above objective using the SSIS Data Flow Task. And use this implementation to understand the various components in the DFT.


Tool Box – Data flow tool box provides three different types of components
a.       Source Components
b.       Transformation Components and
c.        Destination Components

Source Component- Data flow source components pull the data from different external data sources and make it available to the other components in the data flow. There are many data flow components available in the tool box like Excel Source , OLEDB Source. In the above scenario since we have the Employee data stored in a Text File we have used Flat File Source Component.

Data Flow Transformations- Data Flow Transformations are used to perform tasks such as updating, summarizing, cleaning, merging, and sorting of the data (on the go). In the above scenario we have used a SORT transformation.

Destination Component - A  Data flow Destination component will load the data to a specific data store. A data flow can include multiple destinations that load data into different data stores. For e.g. we can have Excel Destination , SQL Server Destination. In the above scenario we have used OLEDB destination that will write the data in the SQL Server Table.

Connection Manager(Source)- Connection Managers are primarily the objects/components that store the address of Source/Destination. In the above scenario Connection Manager(Source) will hold the information of the location of the source file, format of the file etc.

Connection Manager(Destination)- In this scenario we have used an OLEDB connection manager which will hold the connection string to the SQL server database where we have the table.

Please note that a Source or a Destination component will only point to a particular Connection Manager and it is the Connection Manager object that actually stores the physical information of the File/Database.
This indirectly means that a single connection manager object can be reused across multiple Source/Destination components.

We shall dig deep into various Source, Destination and Transformation components in the near future.


No comments:

Post a Comment