Introduction To Data Warehousing
Introduction To Data Warehousing
Manish Bhardwaj
Student of Information Technology
Ch.Brahm Prakash Government Engineering College, New Delhi
E-mail:[email protected]
In reality, even though the name appeared for the first II. NEED FOR A DATA WAREHOUSE
time in a 1988 IBM Systems Journal article – “An
Architecture for a Business Information System” –, Bill Data Warehousing is an vital component and in most
Inmon, the man who is considered the “Father of Data cases the foundation of BI architecture. The data
Warehousing,” used a alike term way back in the 1970s warehouse exists to answer questions users have about
while working as a data professional and becoming an the business, the performance of the various operations,
expert in relational data modeling. His experience in the business trends, and about what can be done to
database technology and in developing data warehouses improve the business. Let me highlight what you need a
lead to his first company, Prism Solutions in 1991. There data warehouse for:
he released the Prism Warehouse Manager product,
which was one of the first examples of a product for A. Data Integration
creating and running a data warehouse. Although you are a small Credit Union, still your
enterprise data flows through and lives in a variety of in-
During Inmon’s career, he has written books, founded a house and peripheral systems. You want to ask questions
new company where he came up with a methodology to that characterize those slices of key information
achieve “data integration,” and been a keynote speaker at (referred to as Key Performance Indicators or KPIs) such
industry conferences and trade shows. He even held as - What is the member profitability or member value
seminars on developing data warehouses. His 1992 book attrition? Oh, by the way, you want to be able to
“Building the Data Warehouse,” is still exceedingly examine it across all products by location, time and
regarded by IT professionals fascinated in the database channel. You understand that all the requisite data is
world. possibly there but not integrated and organized in a way
for you to get the answers simply.
Even though the original data warehouse concept was Perhaps your IT staff has been providing the reports you
identified by Bill Inmon, the technology advanced as a necessitate each time through a series of manual and
result of Ralph Kimball’s dimensional modeling concept automated steps of stripping or extracted the data from
for data warehouse design. His series of “Data one source, sorting / merging with data from other
Warehouse Toolkit” books, as well as the rising interest sources, manually scrubbing and enriching the data and
and significance of amorphous data and improvements in then running reports against it. You surprise there ought
database technology that adjoin value to business to be a superior and trustworthy way of doing this. Data
operations, have also affected changes in data Warehouse serves not only as a repository for historical
warehousing. data but also as an exceptional data integration platform.
The data in the data warehouse is integrated, subject
The 1990s presented operational business intelligence oriented, time-variant and non-volatile to facilitate you
and shapeless content, as well as ordered data sources, to to get a 360° view of your organization.
amplify pace of release and to permit less-structured
decision-making processes and echo the requirements of B. Advanced Reporting & Analysis
the present day businesses.
The data warehouse is intended specifically to prop up
These days, data warehousing is sprouting to meet the querying, reporting and analysis tasks. The data model is
rising desires of professionals worldwide, but the ground flattened (denormalized) and structured by subject areas
work done by Bill Inmon and Ralph Kimball still to create it easier for users to acquire even intricate
influence today’s’ practices. summarized information with a relatively easy query and
carry out multi-dimensional analysis. This has two
Inmon’s work in support of centralized data warehouses influential reimbursement – multilevel trend analysis and
of hefty size and Kindall’s integrated systems of smaller end-user empowerment.
data marts are still influencing today’s architectures. As Multi-level trend analysis provides the capability to
larger business may promote from Inmon’s data analyze key trends at every level across numerous
warehouse approach, smaller businesses might profit diverse dimensions, e.g., Organization, Product,
from the Kindall’s approach which normally requires a Location, Channel and Time, and hierarchies within
lesser budget to put into practice. them. Most reporting, data analysis, and visualization
tools seize benefit of the fundamental data model to give
The development of today’s data warehousing is also influential capabilities such as drill-down, roll-up, drill-
determined by users’ need for real-time access to across and diverse ways of slicing and dicing data.
information on-the-go for research and decision-making The flattened data model makes it much easier for users
purposes, as well as by the advances in the technology to understand the data and write queries rather than work
and enlargement of cloud computing. Significance is also with potentially numerous hundreds of tables and write
given today to the governance and data quality but the long queries with multifarious table joins and clauses.
key element still remains, regardless if using Inmon’s or
KIndall’s approach, the capability to integrate the data C. Knowledge Discovery and DSS
warehouse with the existing business data
architecture.[4] Knowledge discovery and data mining (KDD) is the
routine extraction of non-obvious veiled knowledge
from huge volumes of data. For example, Classification
models could be used to classify members into low, It represents the unlike data sources that supply data into
medium and high lifetime value. As an alternative of the data warehouse. The data source can be of any
coming up with a one-size-fits-all product, the format — plain text file, relational database, other type
membership can be divided into diverse clusters based of database, Excel file, etc., can all act as a data source.
on member profile using Clustering models, and Many different types of data can be a data source:
products could be tailored for each cluster. Affinity Operations — such as sales data, HR data, product data,
groupings could be worn to recognize enhanced product inventory data, marketing data, and systems data. Web
bundling strategies. server logs with user browsing data. Internal market
research data. All these data sources together form the
These KDD applications employ diverse statistical and Data Source Layer.
data mining techniques and rely on subject oriented,
summarized, cleansed and “de-noised” data which a well B. Data Extraction Layer:
designed data warehouse can readily offer.
Data gets pulled from the data source into the data
The data warehouse also enables an Executive warehouse system.
Information System (EIS). Executives naturally could Staging Area:
not be anticipated to sift through numerous dissimilar
reports trying to get a holistic picture of the This is where data is prior to being scrubbed and
organization’s performance and construct decisions. transformed into a data warehouse / data mart.
They require the KPIs delivered to them.
This is where data gains its logic applied to transform
Some of these KPIs may entail cross product or cross the data from a transactional nature to an analytical
departmental analysis, which may be too manually nature. This layer is also where data cleansing happens.
demanding.
C. ETL Layer:
D. Performance
This is principally applicable to association marketing
Lastly, the performance of transactional systems and and profitability analysis. The data in data warehouse is
query response time construct the case for a data already prepared and structured to sustain this kind of
warehouse. The transactional systems are meant to do analysis.
just that – achieve transactions competently – and hence,
are designed to optimize numerous database reads and D. Data Storage Layer:
writes. The data warehouse, on the other hand, is planned
to optimize recurrent complex querying and analysis. This is where the transformed and cleansed data occurs.
Some of the improvised queries and interactive analysis, Based on scope and functionality, 3 types of entities can
which could be performed in a small number of seconds be found here: data warehouse, data mart, and
to minutes on a data warehouse could take a weighty operational data store (ODS). In any given system, you
duty on the transactional systems and factually pull their may have just one of the three, two of the three, or all
performance down. Holding historical data in three types.
transactional systems for longer period of time could also
obstruct with their performance. Hence, the E. Data Logic Layer:
chronological data requests to discover its place in the
data warehouse. [5] This is where business rules are stored. Business rules
stored here do not affect the underlying data
III. ARCHITECTURE transformation rules, but do affect what the report looks
like.
Diverse data warehousing systems have dissimilar Figure 1 below describes the Data Warehousing
structures. Some may have an ODS (operational data Architecture:
store), while some may have manifold data marts. Some
may have a petite number of data sources, while some
may have dozens of data sources. In general, all data
Fig.2: Three-level architecture for a data warehousing [9] Meenakshi Arora, Anjana Gosain- University School of
system [7] Information Technology, Guru Gobind Singh Indraprastha University
Delhi, India, et al.” Schema Evolution for Data Warehouse: A Survey”,
H. System Operations Layer: International Journal of Computer Applications (0975 – 8887) Volume
22– No.6, May 2011
This layer includes information on how the data
warehouse system operates, such as ETL job status, [10] “Data Warehousing Fundamentals” by Paulraj Ponniah
system performance, and user access history.[6]
IV. CONCLUSION: