0% found this document useful (0 votes)
37 views

KM Notes Unit-3

The document discusses multi-dimensional analysis and data mining. It defines data mining and describes the data mining architecture. It then discusses different types of databases and applications of data mining such as in healthcare, retail, education, manufacturing, customer relationship management, fraud detection, and banking.

Uploaded by

prince pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

KM Notes Unit-3

The document discusses multi-dimensional analysis and data mining. It defines data mining and describes the data mining architecture. It then discusses different types of databases and applications of data mining such as in healthcare, retail, education, manufacturing, customer relationship management, fraud detection, and banking.

Uploaded by

prince pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

MAHARANA PRATAP GROUP OF INSTITUTIONS

KOTHI MANDHANA, KANPUR


(Approved by AICTE, New Delhi and Affiliated to Dr. AKTU, Lucknow)

Digital Notes
[Department of Computer Applications]

Subject Name : Knowledge Management


Subject Code : BCA-5001
Course : BCA
Branch : BCA
Semester : Vth
Prepared by : Mr. Sandeep Tripathi
Unit - 3

Multi- Dimensional Analysis

1. Data Mining
Data Mining is the process of investigating hidden patterns of information to various
perspectives for categorization into useful data, which is collected and assembled in particular
areas such as data warehouses, efficient analysis, data mining algorithm, helping decision
making and other data requirement to eventually cost-cutting and generating revenue.
Data mining is the act of automatically searching for large stores of information to find trends
and patterns that go beyond simple analysis procedures. Data mining utilizes complex
mathematical algorithms for data segments and evaluates the probability of future events. Data
Mining is also called Knowledge Discovery of Data (KDD).
Data Mining is similar to Data Science carried out by a person, in a specific situation, on a
particular data set, with an objective. This process includes various types of services such as text
mining, web mining, audio and video mining, pictorial data mining, and social media mining. It
is done through software that is simple or highly specific. By outsourcing data mining, all the
work can be done faster with low operation costs.

1.1 Data Mining Architecture

2
Page
Data mining architecture has many elements like Data Warehouse, Data Mining Engine, Pattern
evaluation, User Interface and Knowledge Base.

Data Warehouse:
A data warehouse is a place which store information collected from multiple sources under
unified schema. Information stored in a data warehouse is critical to organizations for the process
of decision-making.

Data Mining Engine:


Data Mining Engine is the core component of data mining process which consists of various
modules that are used to perform various tasks like clustering, classification, prediction and
correlation analysis.

Pattern Evaluation:
Pattern Evaluation is responsible for finding various patterns with the help of Data Mining
Engine.

User Interface:
User Interface provides communication between user and data mining system. It allows user to
use the system easily even if user doesn't have proper knowledge of the system.

Knowledge Base:
Knowledge Base consists of data that is very important in the process of data mining.Knowledge
Base provides input to the data mining engine which guides data mining engine in the process of
pattern search.

1.2 Types of Database

Relational Database:
A relational database is a collection of multiple data sets formally organized by tables, records,
and columns from which data can be accessed in various ways without having to recognize the
database tables. Tables convey and share information, which facilitates data searchability,
3

reporting, and organization.


Page
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the
organization to provide meaningful business insights. The huge amount of data comes from
multiple places such as Marketing and Finance. The extracted data is utilized for analytical
purposes and helps in decision- making for a business organization. The data warehouse is
designed for the analysis of data rather than transaction processing.

Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT
professionals utilize the term more clearly to refer to a specific kind of setup within an IT
structure. For example, a group of databases, where an organization has kept various kinds of
information.

Object-Relational Database:
A combination of an object-oriented database model and relational database model is called an
object-relational model. It supports Classes, Objects, Inheritance, etc.
One of the primary objectives of the Object-relational data model is to close the gap between the
Relational database and the object-oriented model practices frequently utilized in many
programming languages, for example, C++, Java, C#, and so on.

Transactional Database:
A transactional database refers to a database management system (DBMS) that has the potential
to undo a database transaction if it is not performed appropriately. Even though this was a unique
capability a very long while back, today, most of the relational database systems support
transactional database activities.

1.3 Data Mining Applications


Data Mining is primarily used by organizations with intense consumer demands- Retail,
Communication, Financial, marketing company, determine price, consumer preferences, product
positioning, and impact on sales, customer satisfaction, and corporate profits. Data mining
enables a retailer to use point-of-sale records of customer purchases to develop products and
4

promotions that help the organization to attract the customer.


Page
These are the following areas where data mining is widely used:

Data Mining in Healthcare:


Data mining in healthcare has excellent potential to improve the health system. It uses data and
analytics for better insights and to identify best practices that will enhance health care services
and reduce costs. Analysts use data mining approaches such as Machine learning, Multi-
dimensional database, Data visualization, Soft computing, and statistics. Data Mining can be
used to forecast patients in each category. The procedures ensure that the patients get intensive
care at the right place and at the right time. Data mining also enables healthcare insurers to
recognize fraud and abuse.

Data Mining in Market Basket Analysis:


Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group
of products, then you are more likely to buy another group of products. This technique may
enable the retailer to understand the purchase behavior of a buyer. This data may assist the
retailer in understanding the requirements of the buyer and altering the store's layout
accordingly. Using a different analytical comparison of results between various stores, between
customers in different demographic groups can be done.

Data mining in Education:


Education data mining is a newly emerging field, concerned with developing techniques that
explore knowledge from the data generated from educational Environments. EDM objectives are
recognized as affirming student's future learning behavior, studying the impact of educational
support, and promoting learning science. An organization can use data mining to make precise
decisions and also to predict the results of the student. With the results, the institution can
concentrate on what to teach and how to teach.

Data Mining in Manufacturing Engineering:


Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be
beneficial to find patterns in a complex manufacturing process. Data mining can be used in
system-level designing to obtain the relationships between product architecture, product
5
Page
portfolio, and data needs of the customers. It can also be used to forecast the product
development period, cost, and expectations among the other tasks.

Data Mining in CRM (Customer Relationship Management):


Customer Relationship Management (CRM) is all about obtaining and holding Customers, also
enhancing customer loyalty and implementing customer-oriented strategies. To get a decent
relationship with the customer, a business organization needs to collect data and analyze the data.
With data mining technologies, the collected data can be used for analytics.

Data Mining in Fraud detection:


Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a
little bit time consuming and sophisticated. Data mining provides meaningful patterns and
turning data into information. An ideal fraud detection system should protect the data of all the
users. Supervised methods consist of a collection of sample records, and these records are
classified as fraudulent or non-fraudulent. A model is constructed using this data, and the
technique is made to identify whether the document is fraudulent or not.

Data Mining in Lie Detection:


Apprehending a criminal is not a big deal, but bringing out the truth from him is a very
challenging task. Law enforcement may use data mining techniques to investigate offenses,
monitor suspected terrorist communications, etc. This technique includes text mining also, and it
seeks meaningful patterns in data, which is usually unstructured text. The information collected
from the previous investigations is compared, and a model for lie detection is constructed.

Data Mining Financial Banking:


The Digitalization of the banking system is supposed to generate an enormous amount of data
with every new transaction. The data mining technique can help bankers by solving business-
related problems in banking and finance by identifying trends, casualties, and correlations in
business information and market costs that are not instantly evident to managers or executives
because the data volume is too large or are produced too rapidly on the screen by experts. The
manager may find these data for better targeting, acquiring, retaining, segmenting, and maintain
a profitable customer.
6
Page
2. Knowledge Discovery Process (KDD Process)
Data mining is the core part of the knowledge discovery process.
KDD is a process of finding knowledge in data, it does this by using data mining methods
(algorithms) in order to extract demanding knowledge from large amount of data.

Knowledge Discovery Process (KDD)


Knowledge Discovery Process may consist of the following steps:

1 Data cleaning -
First step in the Knowledge Discovery Process is Data cleaning in which noise and inconsistent
data is removed.

2 Data Integration -
Second step is Data Integration in which multiple data sources are combined.

3 Data Selection -
7

Next step is Data Selection in which data relevant to the analysis task are retrieved from the
Page

database.
4 Data Transformation -
In Data Transformation, data are transformed into forms appropriate for mining by performing
summary or aggregation operations.

5 Data Mining -
In Data Mining, data mining methods (algorithms) are applied in order to extract data patterns.

6 Pattern Evaluation -
In Pattern Evaluation, data patterns are identified based on some interesting measures.

7 Knowledge Presentation -
In Knowledge Presentation, knowledge is represented to user using many knowledge
representation techniques.

3. Data Mining Techniques

Data Mining Techniques

1. Classification:
This analysis is used to retrieve important and relevant information about data, and metadata.
This data mining method helps to classify data in different classes.
8
Page
2. Clustering:
Clustering analysis is a data mining technique to identify data that are like each other. This
process helps to understand the differences and similarities between the data.

3. Regression:
Regression analysis is the data mining method of identifying and analyzing the relationship
between variables. It is used to identify the likelihood of a specific variable, given the presence
of other variables.

4. Association Rules:
This data mining technique helps to find the association between two or more Items. It discovers
a hidden pattern in the data set.

5. Outer detection:
This type of data mining technique refers to observation of data items in the dataset which do not
match an expected pattern or expected behavior. This technique can be used in a variety of
domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called
Outlier Analysis or Outlier mining.

6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in transaction
data for certain period.

7. Prediction:
Prediction has used a combination of the other techniques of data mining like trends, sequential
patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence for
predicting a future event.

3.1 Benefits of Data Mining:


 Data mining technique helps companies to get knowledge-based information.
 Data mining helps organizations to make the profitable adjustments in operation and
production.

9

The data mining is a cost-effective and efficient solution compared to other statistical
Page

data applications.
 Data mining helps with the decision-making process.
 Facilitates automated prediction of trends and behaviors as well as automated discovery
of hidden patterns.
 It can be implemented in new systems as well as existing platforms
 It is the speedy process which makes it easy for the users to analyze huge amount of data
in less time.

4. Multidimensional Data Model


A multidimensional model views data in the form of a data-cube. A data cube enables data to be
modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
The dimensions are the perspectives or entities concerning which an organization keeps records.
A multidimensional database allows to rapidly and reliably providing data-related responses to
complicated market questions. The Multidimensional Data Model can be defined as a way to
arrange the data in the database, to help structure and organize the contents of the database. The
Multidimensional Data Model can include two or three dimensions of objects from the database
structure, versus a system of one dimension, such as a list.
In organisations, it is usually used for objective findings and report production, which can be
used as the primary source for imperative decision-making processes. Usually, this model is
extended to applications working with OLAP techniques (Online Analytical Processing).

10
Page
4.1 How does the Multidimensional Data Model work?
The Multidimensional Data Model, like every other system, often operates based on preset steps
to preserve the same pattern in the industry and to allow the database structures already built or
developed to be reusable. Any project should go all the way through the steps below to construct
a multidimensional data model.
 Congregating the requirements from the client
 Categorizing the various modules of the system
 Spotting the various dimensions based on which the system needs to be designed
 Drafting the real-time dimensions and the corresponding properties
 Discovering the facts from the already listed dimensions and their properties
 Constructing the Schema to place the data, for the information gathered from the above
steps

For example, a shop may create a sales data warehouse to keep records of the store's sales for the
dimension time, item, and location. These dimensions allow the save to keep track of things, for
example, monthly sales of items and the locations at which the items were sold. Each dimension
has a table related to it, called a dimensional table, which describes the dimension further. For
example, a dimensional table for an item may contain the attributes item_name, brand, and type.
A multidimensional data model is organized around a central theme, for example, sales. This
theme is represented by a fact table. Facts are numerical measures. The fact table contains the
names of the facts or measures of the related dimensional tables.

11
Page
Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in
the table. In this 2D representation, the sales for Delhi are shown for the time dimension
(organized in quarters) and the item dimension (classified according to the types of an item sold).
The fact or measure displayed in rupee_sold (in thousands).

Now, if we want to view the sales data with a third dimension, For example, suppose the data
according to time and item, as well as the location is considered for the cities Chennai, Kolkata,
Mumbai, and Delhi. These 3D data are shown in the table. The 3D data of the table are
represented as a series of 2D tables.

12
Page
Conceptually, it may also be represented by the same data in the form of a 3D data cube, as
shown in fig:

5. Data Warehousing - OLAP


Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It
allows managers, and analysts to get an insight of the information through fast, consistent, and
interactive access to information. This chapter cover the types of OLAP, operations on OLAP,
difference between OLAP, and statistical databases and OLTP.

5.1 Types of OLAP Servers


We have four types of OLAP servers −
 Relational OLAP (ROLAP)

 Multidimensional OLAP (MOLAP)

 Hybrid OLAP (HOLAP)

 Specialized SQL Servers


13
Page
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To
store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.

ROLAP includes the following −


 Implementation of aggregation navigation logic.

 Optimization for each DBMS back end.

 Additional tools and services.

Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data.
With multidimensional data stores, the storage utilization may be low if the data set is sparse.
Therefore, many MOLAP server use two levels of data storage representation to handle dense
and sparse data sets.

Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.

Specialized SQL Servers


Specialized SQL servers provide advanced query language and query processing support for
SQL queries over star and snowflake schemas in a read-only environment.

OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.

Here is the list of OLAP operations −


 Roll-up

 Drill-down

 Slice and dice


14

 Pivot (rotate)
Page
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
 By climbing up a concept hierarchy for a dimension

 By dimension reduction

The following diagram illustrates how roll-up works.

 Roll-up is performed by climbing up a concept hierarchy for the dimension location.


 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the level
of city to the level of country.
 The data is grouped into cities rather than countries.
15

 When roll-up is performed, one or more dimensions from the data cube are removed.
Page
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −
 By stepping down a concept hierarchy for a dimension
 By introducing a new dimension.
The following diagram illustrates how drill-down works −

 Drill-down is performed by stepping down a concept hierarchy for the dimension time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the level
of month.
 When drill-down is performed, one or more dimensions from the data cube are added.
 It navigates the data from less detailed data to highly detailed data.

Slice
16

The slice operation selects one particular dimension from a given cube and provides a new sub-
Page

cube. Consider the following diagram that shows how slice works.
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider
the following diagram that shows the dice operation.

17
Page
The dice operation on the cube based on the following selection criteria involves three
dimensions.
 (location = "Toronto" or "Vancouver")
 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")

Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide
an alternative presentation of data. Consider the following diagram that shows the pivot
operation.

18
Page
OLAP vs OLTP

Sr. No. Data Warehouse (OLAP) Operational Database (OLTP)

1 Involves historical processing of Involves day-to-day processing.


information.

2 OLAP systems are used by knowledge OLTP systems are used by clerks, DBAs,
workers such as executives, managers or database professionals.
and analysts.

3 Useful in analyzing the business. Useful in running the business.

4 It focuses on Information out. It focuses on Data in.

5 Based on Star Schema, Snowflake, Based on Entity Relationship Model.


Schema and Fact Constellation
Schema.

6 Contains historical data. Contains current data.

7 Provides summarized and consolidated Provides primitive and highly detailed


data. data.

8 Provides summarized and Provides detailed and flat relational view


multidimensional view of data. of data.

9 Number or users is in hundreds. Number of users is in thousands.

10 Number of records accessed is in Number of records accessed is in tens.


19

millions.
Page
11 Database size is from 100 GB to 1 TB Database size is from 100 MB to 1 GB.

12 Highly flexible. Provides high performance.

References:
1. Decision support system, EIS, 2000
2. W.H.Inmon, “Building Data Warehousing”, Willey, 1998.
3. Han, Jiawei, Kamber, Michelinal, “ Data Mining Concepts & Techniques”, Harcourt
India, 2001
4. https://www.javatpoint.com/data-mining
5. http://www.lastnightstudy.com/Show?id=30/What-is-Data-Mining?
6. https://www.includehelp.com/data-warehouse/multidimensional-data-model.aspx
7. https://www.tutorialspoint.com/dwh/dwh_olap.htm

20
Page

You might also like