1
Goals of Data Warehousing
1. “The data warehouse must make an
organization’s information easily accessible.”
2. “The data warehouse must present the
organization’s information consistently.”
3. “The data
warehouse must be adaptive and resilient to change.”
4. “The data
warehouse must be a secure bastion that protects out information assets.”
5.“The data warehouse must serve as the
foundation for improved decision making.”
6. “The business
community must accept the data warehouse if it is to be deemed successful.”
ETL
Processes
ETL is an acronym for a
three-step process to extract data from source systems and load the data to a
data warehouse. The three steps of the ETL process are:
1.Extract and load Staging Tables:
Extracts and consolidates data from one or more source systems and
loads into the data warehouse staging tables.
2.Transforms the data:
Transforms data in the staging tables and computes calculated values in
preparation for the load.
3.Load Dimension and Fact Tables:
Generates and
maintains data warehouse surrogate keys and loads target dimension and fact
tables.
ETL Processes |
ETL
processes are further refined to two types of mappings:
1. Source Dependent Extract (SDE) mappings
Extracts the data from the
transactional systems and loads to the data warehouse staging tables. SDE
mappings are designed with respect to the source’s unique data model.
SDE Mapping |
2. Source Independent Load (SIL) mappings
Extracts and transforms
data from the staging tables and loads to the data warehouse target tables. SIL
mappings are designed to be universal with any source.
SIL Mapping |
There is third type
mapping known as a Post Load Processing (PLP) mapping, which is used to load
aggregate tables after target tables have been loaded. This mapping is designed
to be source independent.
ETL
Terminology
ETL – the process by which
data is extracted from source A and transformed, aggregated and loaded into
source B. ETL can be implemented by such mediums as PL/SQL or various ETL
development tools.
Full Load – the process of
extracting all required data from the source and loading it into target tables.
A full load will truncate all the target tables.
Incremental
Load – subsequent loads after the full load, extracting source data deltas.
Star
Schema – a denormalized schema consisting of a centralized
fact and one or more dimensions.
Snowflake
Schema – a normalized schema with multiple facts at different levels of
grain and dimensions containing foreign key relationships to other dimensions.
11
Comments
Post a Comment