Sunday, April 7, 2013

Data Warehousing concepts: Kimball vs. Inmon vs. Hybrid vs. Vault

  • Ralph Kimball - bottom-up design: approach data marts are first created to provide reporting and analytical capabilities for specific business processes.
    • Data marts - contain, primarily, dimensions and facts.
      • Facts - either contain atomic (detailed) data, and, if necessary, summarized data.
      • Single data mart often models a specific business area (unit) i.e. "Sales," "Production."
    • Integration of data marts is managed through implementation is called: "data warehouse bus architecture."
    • Data Warehouse bus architecture: is an implementation of "the bus," a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts.
      • Conformed dimensions - is a set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation.
      • Integration Points(between data marts) - integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus").
      • Drill across process - integration of two or more data marts process.
        • Works by grouping (summarizing) the data long the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.
      • Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse.
      • Important management task is making sure dimensions among data marts are consistent or "conform."
    • Advantage: data warehouse ends up being "segmented." into number of logically self contained (up and including The Bus) and consistent data marts, rather than a big and often complex centralized model.
      • Business value can be returned as quickly as the first data marts can be created.
      • i.e. data warehouse can start from "Sales" department, by building a Sales-data mart. And if business is expanded into "Production" department, then Production-data mart can be integrable, because they share the same "BUS."
      • If integration via the bus is achieved, the data warehouse, through its two data marts, will only be able to deliver the specific information that the individual data marts are designed to do, but integrated "Sales-Production" information, which, often is of critical business value.
  • Bill Inmon - top-down design: 1st author on the subject of data warehouse, as a centralized repository for the entire enterprise. Data warehouse design using normalized enterprise data model.
    • Inmon's vision the data warehouse is at the center of the "Corporate Information Factory (CIF)," which provides a logical framework for delivering business intelligence (BI) and business management capabilities.
    • Atomic data, data at the lowest level of detail, are stored in the data warehouse.
    • Dimensional data marts: are created containing data needed for specific business processes or department from the data warehouse.
    • Data warehouse is:
      • Subject oriented: the data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together.
      • Non-volitile: data in the data warehouse are never over-written or deleted - once committed, the data are static (do not change), read-only, and retained for future reporting.
      • Integrated: data warehouse contains data from most or all of an organization's operational systems and these data are made consistent.
      • Time variant: top-down design methodology generates highly consistent dimensional views of data across data marts, since all data marts are loaded from the centralized repository.
        • Top-down: design is robust against business changes.
        • Generating new dimensional data marts against the data stores in the data warehouse is a relatively simple task.
    • Disadvantage: top-down  design represents a very large project with a very broad scope.
      • Upfront cost for implementing a data warehouse is significant, and the duration of time from the start of project to he point that end users experience initial benefits can be substantial.
      • Inflexible and unresponsive to changing departmental needs during the implementation phases.
  • Hybrid design: data warehouse solutions often resemble hub and spoke architecture. Legacy systems feeding the DW/BI solution often include CRM and ERP, generating large amounts of data. To consolidate these various data models, and facilitate the ETL process, DW solutions often make use of an operational data store (ODS). The information then parsed into the actual DW.
    • Operational Data Store: is  database designed to integrate data from multiple sources for additional operations on the data)
    • To reduce redundancy, large systems will often store data in a normalized way. Data marts for specific reports can then be built on top of the DW solution.
    • Important to note that DW database in a hybrid solution is kept on 3d normal form to eliminate data redundancy. Although normal relational database is not efficient for BI reports.
    • DW effectively provides a single source of information from which the data marts can read, creating a highly flexible solutions from a BI point of view.
    • Hybrid architecture: allows DW to be replaced with a master data management solution where operational, not static information could reside.
      • Master data management: comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization (which may include reference data).
  • Data Vault Modeling: is a hybrid design, consisting of the best of breed practices from both 3rd normal form and star-schema.
    • Data Vault model is not a true 3rd normal form, and breaks some of the rules that 3NF dictates be followed.
    • It is a top-down architecture with bottom-up design, geared to be strictly a data warehouse.
    • It is not geared to be end-user accessible, which when built, still requires the user of a data mart or star-schema based release are for business purposes.

2 comments:

  1. Kimball is not a bottom up architecture

    ReplyDelete
  2. As a result many people find that data vault modeling is very effective for data warehousing (especially enterprise data warehousing), operational integration applications, operational data stores, and integration master data management solutions.local paper shredding

    ReplyDelete

Note: Only a member of this blog may post a comment.