Transformation type:
|
Normalization is the process of organizing data. In database terms, this includes creating normalized tables and establishing relationships between those tables according to rules designed to both protect the data and make the database more flexible by eliminating redundancy and inconsistent dependencies.
The Normalizer transformation normalizes records from COBOL and relational sources, allowing you to organize the data according to your own needs. A Normalizer transformation can appear anywhere in a pipeline when you normalize a relational source. Use a Normalizer transformation instead of the Source Qualifier transformation when you normalize a COBOL source. When you drag a COBOL source into the Mapping Designer workspace, the Mapping Designer creates a Normalizer transformation with input and output ports for every column in the source.
You primarily use the Normalizer transformation with COBOL sources, which are often stored in a denormalized format. The OCCURS statement in a COBOL file nests multiple records of information in a single record. Using the Normalizer transformation, you break out repeated data within a record into separate records. For each new record it creates, the Normalizer transformation generates a unique identifier. You can use this key value to join the normalized records.
You can also use the Normalizer transformation with relational sources to create multiple rows from a single row of data.
Normalizing Data in a Mapping
Although the Normalizer transformation is designed to handle data read from COBOL sources, you can also use it to denormalize data from any type of source in a mapping. You can add a Normalizer transformation to any data flow within a mapping to normalize components of a single record that contains denormalized data.
If you have denormalized data for which the Normalizer transformation has created key values, connect the ports representing the repeated data and the output port for the generated keys to a different pipeline branch in the mapping. Ultimately, you may want to write these values to different targets.
You can use a single Normalizer transformation to handle multiple levels of denormalization in the same record. For example, a single record might contain two different detail record sets. Rather than using two Normalizer transformations to handle the two different detail record sets, you handle both normalizations in the same transformation.
Normalizer Ports
When you create a Normalizer for a COBOL source, or in the mapping pipeline, the Designer identifies the OCCURS and REDEFINES statements and generates the following columns:
Generated key. One port for each REDEFINES clause. For more information, see Generated Key.
Generated Column ID. One port for each OCCURS clause. For more information, see Generated Column ID.
You can use these ports for primary and foreign key columns. The Normalizer key and column ID columns are also useful when you want to pivot input columns into rows. You cannot delete these ports.
Generated Key
The Designer generates a port for each REDEFINES clause to specify the generated key. You can use the generated key as a primary key column in the target table and to create a primary-foreign key relationship. The naming convention for the Normalizer generated key is:
GK_<redefined_field_name>
the Designer adds one column (GK_FILE_ONE and GK_HST_AMT) for each REDEFINES in the COBOL source. The Normalizer GK columns tell you the order of records in a REDEFINES clause. For example, if a COBOL file has 10 records, when you run the workflow, the PowerCenter Server numbers the first record 1, the second record 2, and so on.
You can create approximately two billion primary or foreign key values with the Normalizer by connecting the GK port to the desired transformation or target and using the values ranging from 1 to 2147483647. At the end of each session, the PowerCenter Server updates the GK value to the last value generated for the session plus one.
If you have multiple versions of the Normalizer transformation, the PowerCenter Server updates the GK value across all versions when it runs a session.
If you open the mapping after you run the session, the current value displays the last value generated for the session plus one. Since the PowerCenter Server uses the GK value to determine the first value for each session, you should only edit the GK value if you want to reset the sequence.
If you have multiple versions of the Normalizer, and you want to reset the sequence, you must check in the mapping after you modify the GK value.
Generated Column ID
The Designer generates a port for each OCCURS clause to specify the positional index within an OCCURS clause. You can use the generated column ID to create a primary-foreign key relationship. The naming convention for the Normalizer generated column ID is:
GCID_<occuring_field_name>
the Designer adds one column (GCID_HST_MTH and GCID_HST_AMT) for each OCCURS in the COBOL source. The Normalizer GCID columns tell you the order of records in an OCCURS clause. For example, if a record occurs two times, when you run the workflow, the PowerCenter Server numbers the first record 1 and the second record 2.
Comments
Post a Comment