Lecture 1,2 Introduction To DM
Lecture 1,2 Introduction To DM
3
Database Management System
A software system that is used to create, maintain, and provide
controlled access to user databases
Order Filing
System
Relationship
5
Figure 1-5 Components of the Database Environment
6
Components of the
Database Environment
• CASE Tools–computer-aided software engineering
• Repository–centralized storehouse of metadata
• Database Management System (DBMS) –
software for managing the database
• Database–storehouse of the data
• Application Programs–software using the data
• User Interface–text and graphical displays to users
• Data/Database Administrators–personnel
responsible for maintaining the database
• System Developers–personnel responsible for
designing databases and software
• End Users–people who use the applications and
databases
7
The Range of Database
Applications
• Personal databases
• Two-tier Client/Server databases
• Multitier Client/Server databases
• Enterprise applications
– Enterprise resource planning (ERP) systems
– Data warehousing implementations
8
9
Figure 1-8a Evolution of database technologies
10
• Data Warehouse an introduction
11
Data Marts
From a data warehouse, data flows to various departments for
their customized DSS usage. These individual departmental
components are called data marts.
Data mart is a subset of a data warehouse and is much more
popular than data warehouse.
Reasons ?
As the data warehouse grows, it becomes more complex.
The cost of processing the data also increases as the volume increases.
The data becomes harder to customize.
When data warehouse is small, the DSS can easily summarize.
Software available to access or analyze large quantity of data may not be
as easy as the software that will be able to process small amount of data.
The department that owns the data mart can easily customize the data.
the processing load and overload is also very limited in data mart.
the Data mart can be fed by data from external sources .
12
Loading a Data Marts
The data mart is loaded with data from a data warehouse by
means of a
load program.
Main considerations for load program are:
• Frequency
• Total or partial refreshment
• Customization of data from the warehouse (selection)
• Merging of data
• Summarization
• Efficiency
• Integrity of data
• Data relationship
13
Metadata for Data Mart
MetaData describes the details about the data in a data
warehouse or in data mart (properties).
Following are the components of metadata
• Description of source of the data.
• Description of customization that may have taken place as
the data passes from data warehouse into data mart.
• Information regarding Data Mart itself (its tables, relationship
etc..).
• Definitions of all types
Meta data is created and updated from the load programs that
move the data from data warehouse to data mart.
The linkage and relationship between metadata of the
warehouse and metadata of the mart have to be well
established and well understood by the manager or analyst
using metadata.
14
Data Model for a Data Mart
For a large data mart which may have some processing
involved, a formal data model is required.
For a simple small data mart which may not have any
processing involved, no data model is required.
15
Maintenance of a Data Mart
A periodic maintenance of a data mart means loading,
refreshing and purging that data in it.
• daily basis (weather forecast)
• weekly basis (prices)
• monthly basis
• quarterly basis
• yearly basis
• after every 10 years (census)
16
Nature of Data in a Data Mart
Data can be:
• Detailed
• A summary
• Ad hoc
Software Components for a Data Mart
Software's that are found with a data
mart includes:
• DBMS
• Access and analysis
• Software for purging
• Software for metadata management
External Data
• First, the external data, if required to be used
by more than one data mart, shall be placed
in the data warehouse itself and then
subsequently to moved to the data marts
required.
• This avoid duplication as its centralized.
• The details in addition to the data are also
required to be stored.
Performance
Good performance can be achieved in
a data mart environment by:
– Extensive indexing
– Using star joins
– Limiting the volume of the data
– Creating array of data
– Creating profile record Star joins
– Creating pre-joined tables
Star Schema for Multidimensional View
Region products
Item (dimension)
Iid
Star Schema for Multidimensional View cont...