It attempts to capitalize on the speed and user-orientation of the “bottom-up” approach without sacrificing the integration enforced by a data warehouse in a “top down” approach. Pieter Mimno is currently the most vocal proponent of this approach.
The hybrid approach recommends spending about two weeks developing an enterprise model before developing the first data mart. The first several data marts are also designed concurrently. After deploying the first few “dependent” data marts, an organization then backfills a data warehouse behind the data marts, instantiating the “fleshed out” version of the enterprise data model. The organization then transfers atomic data from the data marts to the data warehouse and consolidates redundant data feeds, saving the organization time, money, and processing resources. Organizations typically backfill a data warehouse once business users request views of atomic data across multiple data marts.
Once the DM is proven to be a good investment, we have to avoid building too many DMs (generally no more than two to three) without implementing an enterprise DW. The main reason for this is to be able to create a common data model for the enterprise, by which smaller DMs can be refreshed, maintained and tuned for performance. It is here that we combine the strength of the top-down approach by reducing the data replication from the operational systems. Furthermore, it is essential to centralize the data extraction and removal process by having one common, shared repository. Data cleansing and transformation, which account for most of the unexpected cost, can now be minimized. At the end, this translates to less maintenance and better performance.
Comments
Post a Comment