Data Warehouse Architecture
Data Warehouse Architecture
Most businesses that use data marts as a server make use of the two-tier data warehouse
architecture, which is also made up of two tiers:
1. The Data Tier
This is the layer where actual data is stored after various ETL processes have been used to load
data into the data warehouse.
It’s also made up of three layers:
A source layer
A data staging layer
A data warehouse layer
Source layer: A data warehouse system uses a heterogeneous source of data. That data
is stored initially to corporate relational databases or legacy databases, or it may come
from an information system outside the corporate walls.
Data Staging: The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one
standard schema. The so-named Extraction, Transformation, and Loading Tools (ETL) can
combine heterogeneous schemata, extract, transform, cleanse, validate, filter, and load
source data into a data warehouse.
Data Warehouse layer: Information is saved to one logically centralized individual
repository: a data warehouse. The data warehouses can be directly accessed, but it can
also be used as a source for creating data marts, which partially replicate data warehouse
contents and are designed for specific enterprise departments. Meta-data repositories
store information on sources, access procedures, data staging, users, data mart schema,
and so on.
Analysis Layer (Presentation Layer): In this layer, integrated data is efficiently, and flexible
accessed to issue reports, dynamically analyze information, and simulate hypothetical business
scenarios. It should feature aggregate information navigators, complex query optimizers, and
customer-friendly GUIs.
Three-tier Data Warehouse Architecture
The three-tier approach is the most widely used architecture for data warehouse systems.
The three-tier architecture is what most organizations go for when building a data warehouse
system. It solves the connectivity problems that the two-tier architecture commonly faces.
The three-tier architecture is made up of:
A source layer
A reconciled layer
A data warehouse layer
The three-tier architecture is useful for extensive, enterprise-wide systems.
The three-tier architecture consists of the source layer (containing multiple source system), the
reconciled layer and the data warehouse layer (containing both data warehouses and data
marts). The reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data model
for a whole enterprise. At the same time, it separates the problems of source data extraction and
integration from those of data warehouse population.
Essentially, the three-tier architecture also has three tiers:
1. The bottom tier is the database of the warehouse, where the cleansed and transformed
data is loaded.
2. The middle tier is the application layer giving an abstracted view of the database. It
arranges the data to make it more suitable for analysis.
3. The top-tier is where the user accesses and interacts with the data. It represents the
front-end client layer. You can use reporting tools, query, analysis or data mining
tools.
1. Top-down approach:
L(Load): Data is loaded into data warehouse after transforming it into the standard format.
3. Data-warehouse –
After cleansing of data, it is stored in the data warehouse as central repository. It actually
stores the meta data and the actual data gets stored in the data marts. Note that
datawarehouse stores the data in its purest form in this top-down approach.
4. Data Marts –
Data mart is also a part of storage component. It stores the information of a particular function
of an organization which is handled by single authority. There can be as many number of data
marts in an organization depending upon the functions. We can also say that data mart
contains subset of the data stored in data warehouse.
5. Data Mining –
The practice of analyzing the big data present in data warehouse is data mining. It is used to
find the hidden patterns that are present in the database or in data warehouse with the help of
algorithm of data mining.
This approach is defined by Inmon as – data warehouse as a central repository for the
complete organization and data marts are created from it after the complete data warehouse
has been created.
2. Also, this model is considered as the strongest model for business changes. That’s why; big
organizations prefer to follow this approach.
2. Bottom-up approach:
1. First, the data is extracted from external sources (same as happens in top-down approach).
2. Then, the data go through the staging area (as explained above) and loaded into data marts
instead of data warehouse. The data marts are created first and provide reporting capability. It
addresses a single business area.
4. This approach is given by Kinball as – data marts are created first and provides a thin view for
analyses and data warehouse is created after complete data marts have been created.
2. We can accommodate more number of data marts here and in this way data warehouse can
be extended.
3. Also, the cost and time taken in designing this model is low comparatively.