Skip to main content

Flat File Validation Using Informatica


Summary:

The intention of this case study is to give an idea about validation of a flat file source using information in file trailor. File will contain total record count in trailor and same will be verified with actual record count received in file before loading to target. If file is found to be valid, loading will be done else a record will be inserted in status table indicating that file to be processes is invalid.

1. Objective

The intention of this case study is to give an idea about validation of a flat file source using information in file trailor. File will contain total record count in trailor and same will be verified with actual record count received in file before loading to target. If file is found to be valid, loading will be done else a record will be inserted in status table indicating that file to be processes is invalid.

Other scenarios:



2. Source Definition:

Sample flat file source is a comma separated values file having employee records. In addition file is having a header (starting with H) and a trailor (starting with T). Header record contains file sequence number and trailor record contains total number of records in file (including header and trailor record).

In sample file total records in file are 14 (including header and trailor). File trailor is “T14”.
Data records are having columns empno, ename, job, mgr, hiredate, sal, comm, and deptno.

3. Target Definition

3.1 Target T_FILE_PROCESS_STATUS
First target is relational table (oracle) T_FILE_PROCESS_STATUS and has below structure.



This table will be used to load file information when the file is not valid. In case of a valid file nothing will be loaded in this table. Example data in this table for invalid & valid file will be as shown below.



Process_Date
Header_SeqNo
Trailor_count
File_Record_count
Is_File_Valid
01/12/2010
1500
12
14
N
02/12/2010
1500
14
14
Y



3.2 Target EMPLOYEE
Second target is EMPLOYEE table with below structure.


4. Mappings

4.1 Mapping to validate the source file: m_ValidateSourceFile

Below shown is a prototype of the mapping to check if total record count of file matches with record count specified in trailor. If there is a mismatch then a status record will be inserted in T_FILE_PROCESS_STATUS. For a valid record count in trailor, no record will be inserted in target table.




4.1.1 Transformations used

a) exp_Header_Trailor:

This expression is used after Source Qualifier to identify file header and file trailor. In this expression total number of records in file is also calculated.




In expression define following variable ports and output ports.

Eg;

Input column; empno



Please see the below expression used in different Variable ports:


Port Name


Expression



v_HeaderSeqNo


IIF (SUBSTR (Empno, 1, 1) ='H', SUBSTR (Empno, 2), NULL)



v_TrailorRecCnt


IIF SUBSTR (Empno, 1, 1) ='T', SUBSTR (Empno, 2), NULL)



v_FileRecCnt


v_FileRecCnt + 1



O_ProcessDate


SYSDATE



Variable to extract header sequence number is v_HeaderSeqNo, variable to extract trailor record count is v_TrailorRecCnt and variable to count total number of records in file is v_FileRecCnt.

b) AGG_HEADER_TRAILOR_ROWCOUNT

Aggregator is used to make a single row for header sequence number, trailor record count and file record count.

Port Name
Datatype
Precision
Scale
I
O
V
Expression
Group By
O_ HeaderSeqNo

string
10
0
Y
Y



O_TrailorRecCnt

string
10
0
Y
Y



O_File_Rec_Cnt
interger
10
0
Y
Y



O_Prosess_date
Date/time
29
0
Y
Y


Y
O_File_header
interger
10
0

Y

Max(O_ HeaderSeqNo)


O_file_trailor
interger
10
0

Y

Max (O_TrailorRecCnt
)

O_File_rec_count
interger
10
0

Y

Max(O_File_Rec_Cnt)



Port o_ProcessDate is specified as Group By port.

C) exp_VALIDATE_ROCCOUNT

This expression is used to compare trailor record count with file record count.




Output port o_IsFileValid uses below expression to check if trailor record count matches with file record count. If there is a mismatch then o_IsFileValid flag is set to ‘N’.

Port Name
Expression


O_isFile_Valid

IIF (o_FileTrailor != o_FileRecCount,'N')



d) FIL_INVALID_FILE

This filter will pass the row only when o_IsValidFile flag is ‘N’ i.e. trailor record count does not match with file record count.
Filter condition is o_IsFileValid = 'N'.


4.2 Mapping to load source file: m_LoadFile

This is simple source to target mapping with a filter in place to skip file header and trailor.


4.2.1 Transformations used

FIL_HEADER_TRAILOR

Filter is used to skip file header and trailor. Only data records will be passed and loaded to target T_EMPLOYEE.

Filter condition used is SUBSTR (Empno, 1, 1)! = ‘H’ AND SUBSTR (Empno, 1, 1) != 'T'


5. Sessions

Session tasks will be created corresponding to each mapping. Session s_ValidateSourceFile is for mapping m_ValidateSourceFile.

Session s_LoadFile is for mapping m_LoadFile. As date in source file is in DD-MON-YY format, DateTime Format String is changed from default to DD-MON-YY. All other settings are default.


6.  Workflow (wf_LoadFile):

Simple workflow as shown below is created using a start task and two session tasks.

Condition to execute session wf_LoadFile is defined as below expression.

IIF ($s_ValidateSourceFile.TgtSuccessRows = 0,
$s_ValidateSourceFile.Status = succeeded,
$s_ValidateSourceFile.Status = failed)

With this condition is place, session s_LoadFile will execute only when no row is being inserted to target by session s_ValidateSourceFile. As per the mapping logic, m_ValidateSourceFile will insert the row in target only when file is failed for validation. So no row inserted in target means file is valid and should be loaded to EMPLOYEE table using m_LoadFile mapping.
Using sample flat file to execute this workflow will insert no row in T_FILE_PROCESS_STATUS. 12 rows will be inserted to EMPLOYEE table.

Comments

Popular posts from this blog

Contact Me

Do You have any queries ?                   If you are having any query or wishing to get any type of help related Datawarehouse, OBIEE, OBIA, OAC then please e-email on below. I will reply to your email within 24 hrs. If I didn’t reply to you within 24 Hrs., Please be patience, I must be busy in some work. kashif7222@gmail.com

Top 130 SQL Interview Questions And Answers

1. Display the dept information from department table.   Select   *   from   dept; 2. Display the details of all employees   Select * from emp; 3. Display the name and job for all employees    Select ename ,job from emp; 4. Display name and salary for all employees.   Select ename   , sal   from emp;   5. Display employee number and total salary   for each employee. Select empno, sal+comm from emp; 6. Display employee name and annual salary for all employees.   Select empno,empname,12*sal+nvl(comm,0) annualsal from emp; 7. Display the names of all employees who are working in department number 10   Select ename from emp where deptno=10; 8. Display the names of all employees working as   clerks and drawing a salary more than 3000   Select ename from emp where job=’clerk’and sal>3000; 9. Display employee number and names for employees who earn commissi...

Informatica sample project

Informatica sample project - 1 CareFirst – Blue Cross Blue Shield, Maryland (April 2009 – Current) Senior ETL Developer/Lead Model Office DWH Implementation (April 2009 – Current) CareFirst Blue Cross Blue Shield is one of the leading health care insurance provided in Atlantic region of United States covering Maryland, Delaware and Washington DC. Model Office project was built to create data warehouse for multiple subject areas including Members, Claims, and Revenue etc. The project was to provide data into EDM and to third party vendor (Verisk) to develop cubes based on data provided into EDM. I was responsible for analyzing source systems data, designing and developing ETL mappings. I was also responsible for coordinating testing with analysts and users. Responsibilities: ·          Interacted with Data Modelers and Business Analysts to understand the requirements and the impact of the ETL on the business. ·  ...