Sunday, July 21, 2013

The Lookup Transformation:

The Lookup transformation performs lookups by joining data in input columns(Primary dataset) with columns in a reference dataset. The reference dataset can be an existing table, file etc.
The LookUp is similar to a Merge Join transformation but is more efficient when dealing with larger data sets.

Before we dig into the further technical details let us try to understand LookUp through  an example.

Consider a scenario where we have an information of all the Indian sports personalities along with the sport they are associated with(Primary Dataset).
In the second dataset we have information of all the events that are going to be part of the London Olympics 2012(Reference dataset).
We want the details of only those sports personalities who can participate in London Olympics 2012 to be transferred to another Database for further analysis.

                              


In order to achieve the above objective we will use the Lookup transformation where the Primary Data set will be the input to it and Reference dataset needs to be configured/specified in the Lookup transformation along the with the Lookup column(in this case “Event_Name”).


Additional Info
·         The Lookup column in both Primary and reference datasets must be identical(same type).
·         If, in the reference table we have multiple entries(duplicates in the Lookup column), the lookup will quit searching as soon as it finds first match.
·         If a reference is not found in the lookup(in this case “Cricket”), we can configure the lookup to either Redirect those rows to another destination (or) Fail component (or) simply Ignore it.
·         Lookup also provides options to cache the reference dataset locally which helps in significantly improving the performance of the package.
(We have three options a) Full Cache b) Partial Cache c) No Cache).

No comments:

Post a Comment