Skip to main content

Data Warehousing and ETL Processes

 


1

Goals of Data Warehousing



 1. “The data warehouse must make an organization’s information easily accessible.”


 2. “The data warehouse must present the organization’s information consistently.”


3. “The data warehouse must be adaptive and resilient to change.”


4. “The data warehouse must be a secure bastion that protects out information assets.”


5.“The data warehouse must serve as the foundation for improved decision     making.”


6. “The business community must accept the data warehouse if it is to be deemed successful.”


 

ETL Processes


ETL is an acronym for a three-step process to extract data from source systems and load the data to a data warehouse. The three steps of the ETL process are:



1.Extract and load Staging Tables:


 Extracts and consolidates data from one or more source systems and loads into the data warehouse staging tables.


2.Transforms the data


Transforms data in the staging tables and computes calculated values in preparation for the load.


3.Load Dimension and Fact Tables


Generates and maintains data warehouse surrogate keys and loads target dimension and fact tables.




ETL Processes





ETL processes are further refined to two types of mappings:



 1. Source Dependent Extract (SDE) mappings 


 Extracts the data from the transactional systems and loads to the data warehouse staging tables. SDE mappings are designed with respect to the source’s unique data model.

 



SDE Mapping




2. Source Independent Load (SIL) mappings 


Extracts and transforms data from the staging tables and loads to the data warehouse target tables. SIL mappings are designed to be universal with any source.



SIL Mapping





There is third type mapping known as a Post Load Processing (PLP) mapping, which is used to load aggregate tables after target tables have been loaded. This mapping is designed to be source independent.




ETL Terminology



 ETL – the process by which data is extracted from source A and transformed, aggregated and loaded into source B. ETL can be implemented by such mediums as PL/SQL or various ETL development tools.


 Full Load – the process of extracting all required data from the source and loading it into target tables. A full load will truncate all the target tables.


 Incremental Load – subsequent loads after the full load, extracting source data deltas.


Star Schema – a denormalized schema consisting of a centralized fact and one or more dimensions.


 Snowflake Schema – a normalized schema with multiple facts at different levels of grain and dimensions containing foreign key relationships to other dimensions.

                                                                        11                            

Comments

Popular posts from this blog

Contact Me

Do You have any queries ?                   If you are having any query or wishing to get any type of help related Datawarehouse, OBIEE, OBIA, OAC then please e-email on below. I will reply to your email within 24 hrs. If I didn’t reply to you within 24 Hrs., Please be patience, I must be busy in some work. kashif7222@gmail.com

Top 130 SQL Interview Questions And Answers

1. Display the dept information from department table.   Select   *   from   dept; 2. Display the details of all employees   Select * from emp; 3. Display the name and job for all employees    Select ename ,job from emp; 4. Display name and salary for all employees.   Select ename   , sal   from emp;   5. Display employee number and total salary   for each employee. Select empno, sal+comm from emp; 6. Display employee name and annual salary for all employees.   Select empno,empname,12*sal+nvl(comm,0) annualsal from emp; 7. Display the names of all employees who are working in department number 10   Select ename from emp where deptno=10; 8. Display the names of all employees working as   clerks and drawing a salary more than 3000   Select ename from emp where job=’clerk’and sal>3000; 9. Display employee number and names for employees who earn commission   Select empno,ename from emp where comm is not null and comm>0. 10

Informatica sample project

Informatica sample project - 1 CareFirst – Blue Cross Blue Shield, Maryland (April 2009 – Current) Senior ETL Developer/Lead Model Office DWH Implementation (April 2009 – Current) CareFirst Blue Cross Blue Shield is one of the leading health care insurance provided in Atlantic region of United States covering Maryland, Delaware and Washington DC. Model Office project was built to create data warehouse for multiple subject areas including Members, Claims, and Revenue etc. The project was to provide data into EDM and to third party vendor (Verisk) to develop cubes based on data provided into EDM. I was responsible for analyzing source systems data, designing and developing ETL mappings. I was also responsible for coordinating testing with analysts and users. Responsibilities: ·          Interacted with Data Modelers and Business Analysts to understand the requirements and the impact of the ETL on the business. ·          Understood the requirement and develope