Skip to main content

Informatica basics


a)      What is Informatica?

Informatica provides an environment that can extract data from multiple sources, transform the data according to the business logic that is built in the Informatica Client application and load the transformed data into files or relational targets.





Informatica comes in different packages:
PowerCenter license has all options, including distributed metadata (data about data).
PowerMart is a limited license and does not have a distributed metadata.

The other products that are provided by Informatica are
PowerAnalyzer which is a web based tool for data analysis.
SuperGlue provides graphical representation of data quality and flow, flexible analysis and reporting of overall data volumes, loading performance, etc.

1.      Architecture:

The diagram provided below provides an overview of the various components of Informatica and the connectivity between them:


Informatica 5.1 provides the following integrated components:

a)      Informatica Repository:
The Informatica Repository is a database with a set of metadata tables that is accessed by the Informatica Client and Server to save and retrieve metadata.
Repository stores the data needed for data extraction, transformation, loading, and management.

b)      Informatica Client:
The Informatica Client is used to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create sessions to run the mapping logic.
The Informatica Client has three main applications:

                                i.            Repository Manager: This is used to create and administer the metadata repository.
The repository users and groups are created through the Repository Manager.
Assigning privileges and permissions, managing folders in the repository and managing locks on the mappings are also done through the Repository Manager

                               ii.            Designer: The Designer has five tools that are used to analyze sources, design target schemas and build the Source to Target mappings. These are
§  Source Analyzer: This is used to either import or create the source definitions.
§  Warehouse Designer: This is used to import or create target definitions.
§  Mapping Designer: This is used to create mappings that will be run by the Informatica Server to extract, transform and load data.
§  Transformation Developer: This is used to develop reusable transformations that can be used in mappings.
§  Mapplet Designer: This is used to create sets of transformations referred to as Mapplets which can be used across mappings.

                              iii.            Server Manager: The Server Manager is used to create, schedule, execute and monitor sessions.

c)      Informatica Server:
The Informatica Server reads the mapping and the session information from the repository. It extracts data from the mapping sources, stores it in the memory, applies the transformation rules and loads the transformed data into the mapping targets.

Connectivity:
Informatica uses the Network Protocol, Native Drivers or the ODBC for the Connectivity between its various components. The Connectivity details are as provided in the diagram above.


2.      Setting up Informatica:

                                 i.            Install and Configure the Server components.
                               ii.            Install the Client applications.
                              iii.            Configure the ODBC.
                             iv.            Register the Informatica Server in the Server Manager.
                               v.            Create a Repository, create users and groups, edit users profiles.
                             vi.            Add source and target definitions, set up mapping between the sources and targets, create a session for each mapping and run the sessions.

a)      Configuring the ODBC

                                 i.            Go to StartΓ SettingsΓ Control Panel
                               ii.            Go to Administrative ToolsΓ Data Sources(ODBC)
                              iii.            Click on the System DSN tab and add an entry.
                             iv.            Select MERANT CLOSED 3.60 32-BIT Oracle 8 driver.
                               v.            Provide any Data Source Name.
                             vi.            Provide the tns entry name for the (Informatica) database as the Server Name.
                            vii.            Do a test connect by providing the informatica database userid and password.
                          viii.            Save the settings.


b)      Configuring the Informatica Repository

                                 i.            Open the Repository Manager
                               ii.            Click on RepositoryΓ Add Repository
                              iii.            Provide the Name of an existing Repository and its Username
                             iv.            Click on RepositoryΓ Connect
                               v.            Provide the password for the repository.
                             vi.            Provide the Informatica database details (those provided during the ODBC setup).

                            vii.            Open the Designer
                          viii.            Click on the RepositoryΓ Connect tab.
                             ix.            Provide the password for the repository.
                               x.            The left pane displays the various folders and the Sources, Targets, Mappings, Transformations, Mapplets etc within each folder.
                             xi.            Click on the Mappings tab within any folder, select a mapping and drag it into the right pane to view the mapping.





3.      Case Study

A Transformation is a repository object that generates, modifies, or passes data.
The various Transformations that are provided by the Designer in Informatica have been explained with the aid of a mapping, Map_CD_Country_code. (Explained in blue)
The mapping is present in the cifSIT9i repository of the SIT machine under the folder Ecif_Dev_map

Objective: The mapping Map_CD_Country_code has been developed to extract data from the STG_COUNTRY table and move it into the ECIF_COUNTRY and the TRF_COUNTRY target tables.

a)      Source Definition:

                                 i.            The Source Definition contains a detailed definition of the Source.
                               ii.            The Source can be a Relational table, Fixed width and delimited flat files that do not contain binary data, COBOL files etc.
                              iii.            The relational source definition is imported from database tables by connecting to the source database from the client machine.

·        The Source in the Map_CD_Country_code is “Shortcut_To_STG_COUNTRY”*, a “Source Definition Shortcut”.
·        Right click on the Source and select edit.
·        In the Edit Transformations window, the Transformation tab has the following info:
The circled area provides the location of the object that the shortcut references.
In the above ex, the object referenced by the shortcut is present in the cifSIT9i repository under the Ecif_dev_def folder and the object name is STG_COUNTRY.

·        All fields from the Source are moved into the Source Qualifier.

What are the advantages of having a Shortcut?

The following are the main advantages of having a Shortcut:
ΓΌ  The main advantage of having a shortcut is maintenance.
If all instances of an object have to change, the original repository object is the only object that has to be edited and all shortcuts accessing the object automatically inherit the changes.

ΓΌ  Restricting the repository users to a set of predefined metadata by asking users to incorporate the shortcuts into their work instead of developing repository objects independently.

ΓΌ  Space can be saved in a repository by keeping a single repository object and using shortcuts to that object, instead of creating copies of the object in multiple folders.


For information on creating and working with Shortcuts, refer the Informatica Designer Help.


b)      Source Qualifier (SQ_Shortcut_To_STG_COUNTRY):
    1. The Source Qualifier is an Active transformation.
    2. The differences between an Active and a Passive transformation are as given below:
Active Transformation
Passive Transformation
An Active Transformation can change the number of rows that pass through it
A Passive Transformation does not change the number of rows that pass through it.
Ex.:
·        Advanced External Procedure
·        Aggregator
·        ERP Source Qualifier
·        Filter
·        Joiner
·        Normalizer
·        Rank
·        Source Qualifier
·        Router
·        Update Strategy
Ex:
  • Expression
  • External Procedure
  • Input
  • Lookup
  • Output
  • Sequence Generator
  • Stored Procedure
  • XML Source Qualifier

  • In the SQ_Shortcut_To_STG_COUNTRY, click on the Properties tabΓ  SQL Query.
The SQL Query is the query that is generated by Informatica and is a SELECT statement for each source column used in the mapping. But the Informatica Server reads only the columns in Source Qualifier that are connected to another transformation.
  • In SQ_Shortcut_To_STG_COUNTRY, since all 4 fields ISO_CTRY_COD, CTRY_NAM, EMU_IND, PROC_FLG columns are connected to the EXP_COUNTRY transformation and hence the default SQL Query generated by Informatica would have all 4 columns. In case, one of the fields had not been mapped to any other transformation, that field would not have appeared in the default SQL Query.

  • The ISO_CTRY_COD field from the Source Qualifier is moved to the Lookup transformation LKP_CTRY_COD and all the fields including the ISO_CTRY_COD is moved to the Expression transformation EXP_COUNTRY.


c)      Lookup Transformation (LKP_CTRY_COD)
    1. Lookup transformation is Passive transformation.
    2. A Lookup transformation would be used in an Informatica mapping to lookup data in a relational table, view, or synonym.
    3. The Informatica server queries the lookup table based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. The result of the Lookup would then be passed on to other transformations and targets.
·        In the Lookup transformation LKP_CTRY_COD, the input field SRC_COUNTRY_CODE is looked up against the COUNTRY_CODE field of the Lookup table and if the Lookup is successful, then the corresponding COUNTRY_CODE is returned as the output.

How does the Lookup Cache work?
Informatica creates a data cache and an index cache when the first row in the data flow hits the Lookup transformation. This happens only when the Lookup cache option is enabled in the transformation properties.
To create these caches, Informatica issues a SELECT statement against the database where the lookup table resides and extracts all the data it needs for the lookup. After that, whenever a row passes through the lookup, Informatica tries to find a match within the cached data set based on the lookup conditions and input port values for that row.
When the cache option is disabled, Informatica queries the lookup table every time a row passes through the lookup.

Advantages of Lookup transformation over Source Qualifier/Joiner transformation
Lookup transformation helps in fetching data from a table exactly where we need it in the data stream, instead of having to pass the data through every step of the mapping, as it would with a Source Qualifier or a Joiner transformation.

How do we handle multiple matches in the Lookup table?
The Lookup transformation can be configured to handle multiple matches in the following ways:
Ø  Return the first matching value, or return the last matching value
The transformation can be configured to return the first matching value or the last matching value. The first and last values are the first values and last values found in the lookup cache that match the lookup condition.
Ø  Return an error: The Informatica server returns the default value for the output ports.


d)      Expression Transformation (EXP_COUNTRY)
                                 i.            Expression transformation is Passive transformation
·        All fields from the Source Qualifier are moved into the Expression transformation. The COUNTRY_CODE that is the output of the Lookup transformation is also moved into the Expression transformation.
·        O_PROC_FLAG has been set to ‘Y’ in the Expression transformation.
·        All fields from the Expression transformation except the PROC_FLG field are moved into the Filter transformations FIL_NOTNULL_CTRY_COD and FIL_NULL_CTRY_COD.

e)      Filter Transformation (FIL_NOTNULL_CTRY_COD)
·        Filter transformation is an Active transformation.
·        The COUNTRY_CODE field is checked for NOT NULL and if found true, the records are passed on to the Update Strategy UPD_COUNTRY_CODE, the Lookup transformation LKPTRANS and the Update Strategy UPD_UPD_STG_COUNTRY.

f)       Update Strategy Transformation (UPD_COUNTRY_CODE)
                           i.            Update Strategy transformation is an Active transformation.
·        The ISO_CTRY_COD, CTRY_NAM, BMU_IND fields are moved to the Update Strategy transformation from the FIL_NOTNULL_CTRY_COD transformation.
·        Click on the Properties tab
·        Update Strategy Expression is DD_UPDATE.
·        Forward Rejected Rows option is selected.
                         ii.            Update Strategy Expression is used to flag individual records for insert, delete, update or reject.
                        iii.            The below table lists the constants for each database operation and the numerical equivalent:

Operation
Constant
Numeric Value
Insert
DD_INSERT
0
Update
DD_UPDATE
1
Delete
DD_DELETE
2
Reject
DD_REJECT
3

                       iv.            A session can also be configured for handling specific database operations. This is done by setting the “Treat rows as” field in the Session Wizard dialog box that appears while session configuration.
·        Open the Server Manager.
·        Click on cifSIT9i under the Repositories tab
·        Click on RepositoryΓ Connect
·        Provide the Username
·        Expand the Ecif_Dev_map folder.
·        Select the s_Map_CD_Country_code in the right pane, right click and select edit.
·        Properties for Sessions window open up.
·        Pls refer fig below.
                         v.            The “Treat rows as” option determines the treatment for all rows in the session. The options provided here are insert, delete, update or data-driven.
                       vi.            If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven by default. If any other option is selected, the Informatica Server ignores all Update Strategy transformations in the mapping.
                      vii.            The Data Driven option is selected if records destined for the same table need to be flagged on occasion for one operation (for example, update), or for a different operation (for example, reject).
                  viii.            Records can be flagged for reject only with this option.
For more info on Update Strategy transformation and other settings for Update Strategy, refer the Informatica Designer help.
                       ix.            The Forward Rejected Rows option indicates whether the Update Strategy transformation pass rejected rows to the next transformation or rejects them.
                         x.            By default, Informatica Server forwards rejected rows to the next transformation.
                       xi.            The Informatica Server flags the rows for reject and writes them to the session reject files.
                      xii.            If the Forward Rejected Rows is not selected, the Informatica Server drops rejected rows and writes them to the session log file.
·        Update Strategy UPD_COUNTRY_CODE updates the target table Shortcut_to_ECIF_COUNTRY which is a shortcut to the ECIF_COUNTRY table.



g)      Update Strategy Transformation (UPD_UPD_STG_COUNTRY)
·        This receives the ISO_CTRY_COD and PROC_FLG fields from the filter transformation FIL_NOTNULL_CTRY_COD when the COUNTRY_CODE is NOT NULL.
·        This updates the target table Shortcut_To_STG_COUNTRY which is a shortcut to the STG_COUNTRY table.

h)      Lookup Transformation (LKPTRANS)
  • The ISO_CTRY_COD from the filter transformation FIL_NOTNULL_CTRY_COD is brought as input to the Lookup transformation.
  • ISO_CTRY_COD as SRC_ISO_CTRY_COD is looked up against the ISO_CTRY_COD of the TRF_COUNTRY lookup table and if the Lookup is successful, the corresponding ISO_CTRY_COD of the lookup table is taken as the output.
  • The output of the Lookup table is passed to the Filter transformations FIL_NULL_TRF_CTRY_COD and FIL_NOTNULL_TRF_CTRY_COD.



i)        Filter Transformation (FIL_NULL_TRF_CTRY_COD)
  • This transformation receives the ISO_CTRY_COD from the Lookup transformation LKPTRANS and the rest of the fields from the Filter transformation FIL_NOTNULL_CTRY_COD.
  • The ISO_CTRY_COD field which is the output of the previous lookup is checked for NULL and if found to be NULL, the records are inserted into the target Shortcut_To_TRF_COUNTRY which is a Shortcut to the TRF_COUNTRY table.

j)        Filter Transformation (FIL_NOTNULL_TRF_CTRY_COD)
  • This transformation receives the ISO_CTRY_COD from the Lookup transformation LKPTRANS and the rest of the fields from the Filter transformation FIL_NOTNULL_CTRY_COD.
  • The ISO_CTRY_COD field which is the output of the previous lookup is checked for NOT NULL and if found to be NOT NULL, the records are passed on to the Update Strategy UPD_TRF_CTRY_COD.

k)     Update Strategy Transformation (UPD_TRF_CTRY_COD)
  • This is used to update the target table Shortcut_To_TRF_COUNTRY, which is a Shortcut to the TRF_COUNTRY table.

l)        Filter Transformation (FIL_NULL_CTRY_COD)
  • The COUNTRY_CODE field is checked for NULL and if found true, the records are passed on to the Lookup transformation LKPTRANS1 and the Update Strategy UPD_INS_STG_COUNTRY.
  • The records are also inserted into the target table Shortcut_To_ECIF_COUNTRY which is a shortcut to the ECIF_COUNTRY table.

m)    Update Strategy Transformation (UPD_INS_STG_COUNTRY)
  • This receives the ISO_CTRY_COD and PROC_FLG fields from the filter transformation FIL_NULL_CTRY_COD when the COUNTRY_CODE is NULL.
  • This inserts a record into the target table Shortcut_To_STG_COUNTRY which is a shortcut to the STG_COUNTRY table.

n)      Lookup Transformation (LKPTRANS1)
  • The ISO_CTRY_COD from the filter transformation FIL_NULL_CTRY_COD is brought as input to the Lookup transformation.
  • ISO_CTRY_COD as SRC_ISO_CTRY_COD is looked up against the ISO_CTRY_COD of the TRF_COUNTRY lookup table and if the Lookup is successful, the corresponding ISO_CTRY_COD of the lookup table is taken as the output.
  • The output of the Lookup table is passed to the Filter transformations FIL_NULL_TRF_CTRY_COD2 and FIL_NOTNULL_TRF_CTRY_COD2.




o)      Filter Transformation (FIL_NULL_TRF_CTRY_COD2)
  • This transformation receives the ISO_CTRY_COD from the Lookup transformation LKPTRANS1 and the rest of the fields from the Filter transformation FIL_NULL_CTRY_COD.
  • The ISO_CTRY_COD1 field which is the output of the previous lookup is checked for NULL and if found to be NULL, the records are inserted into the target Shortcut_To_TRF_COUNTRY which is a Shortcut to the TRF_COUNTRY table.

p)      Filter Transformation (FIL_NOTNULL_TRF_CTRY_COD2)
  • This transformation receives the ISO_CTRY_COD from the Lookup transformation LKPTRANS1 and the rest of the fields from the Filter transformation FIL_NULL_CTRY_COD.
  • The ISO_CTRY_COD1 field which is the output of the previous lookup is checked for NOT NULL and if found to be NOT NULL, the records are passed on to the Update Strategy UPD_TRF_CTRY_COD2.

q)      Update Strategy Transformation (UPD_TRF_CTRY_COD2)
  • This is used to update the target table Shortcut_To_TRF_COUNTRY, which is a Shortcut to the TRF_COUNTRY table.

Stored Procedure Transformation (PR_COMP_COUNTRY)
         i.            A Stored Procedure is a Passive transformation.
       ii.            A Stored Procedure can be run with the following options
Normal
Pre-load of the Source.
Post-load of the Source.
Pre-load of the Target.
Post-load of the Target.
      iii.            Pre-load of the Source is when the Stored Procedure runs before the session retrieves data from the source.

·        The Stored Procedure PR_COMP_COUNTRY is called as a Source Pre Load procedure.


What is a Sequence Generator Transformation?      
Ø  The Sequence Generator transformation is an object in Informatica which outputs a unique sequential number to each dataflow that it is attached to.
Ø  The starting value and the increment are set in the Sequence Generator transformation and the NEXTVAL is connected to the dataflow.
Ø  A Sequence generator is normally placed after a filter (generally a filter that checks the primary key value of the target for NULL, which would indicate that the record is new) and before an update strategy that is set to DD_INSERT.
Ø  If multiple informatica mappings write to the same target table, the sequence generator should be used as a reusable object or a shortcut.
Ø  If non informatica routines write to the same target table, using a trigger or a database method is recommended.

Comments

  1. hi kashif i read your article its more useful to gain knowledge about informatica and your its functional implementation.
    Informatica Training in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Contact Me

Do You have any queries ?                   If you are having any query or wishing to get any type of help related Datawarehouse, OBIEE, OBIA, OAC then please e-email on below. I will reply to your email within 24 hrs. If I didn’t reply to you within 24 Hrs., Please be patience, I must be busy in some work. kashif7222@gmail.com

Top 130 SQL Interview Questions And Answers

1. Display the dept information from department table.   Select   *   from   dept; 2. Display the details of all employees   Select * from emp; 3. Display the name and job for all employees    Select ename ,job from emp; 4. Display name and salary for all employees.   Select ename   , sal   from emp;   5. Display employee number and total salary   for each employee. Select empno, sal+comm from emp; 6. Display employee name and annual salary for all employees.   Select empno,empname,12*sal+nvl(comm,0) annualsal from emp; 7. Display the names of all employees who are working in department number 10   Select ename from emp where deptno=10; 8. Display the names of all employees working as   clerks and drawing a salary more than 3000   Select ename from emp where job=’clerk’and sal>3000; 9. Display employee number and names for employees who earn commission   Select empno,ename from emp where comm is not null and comm>0. 10

Informatica sample project

Informatica sample project - 1 CareFirst – Blue Cross Blue Shield, Maryland (April 2009 – Current) Senior ETL Developer/Lead Model Office DWH Implementation (April 2009 – Current) CareFirst Blue Cross Blue Shield is one of the leading health care insurance provided in Atlantic region of United States covering Maryland, Delaware and Washington DC. Model Office project was built to create data warehouse for multiple subject areas including Members, Claims, and Revenue etc. The project was to provide data into EDM and to third party vendor (Verisk) to develop cubes based on data provided into EDM. I was responsible for analyzing source systems data, designing and developing ETL mappings. I was also responsible for coordinating testing with analysts and users. Responsibilities: ·          Interacted with Data Modelers and Business Analysts to understand the requirements and the impact of the ETL on the business. ·          Understood the requirement and develope