0% found this document useful (0 votes)

12 views

solved DM questions

Data mining solved question

Uploaded by

chudarybushra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

solved DM questions

Data mining solved question

Uploaded by

chudarybushra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question 1: What is data mining?

in your answer, address the following:

(a) is it another hype?
(b) is it a simple transformation of technology developed from databases,
statistics, and machine learning?
(c) explain how the evolution of database technology led to data mining.
(d) describe the steps involved in data mining when viewed as a process of
knowledge discovery

Data mining refers to the process of extracting or mining interesting knowledge or patterns from
large amounts of data.

(a) No, Data mining is not another hype. "We are living in the information age" is a popular
saying; however, we are actually living in the data age. Terabytes or petabytes of data pour
into our computer networks, the World Wide Web (WWW), and various data storage
devices every day from business, society, science and engineering, medicine, and almost
every other aspect of daily life. Powerful and versatile tools are badly needed to
automatically uncover valuable information from the tremendous amounts of data and to
transform such data into organized knowledge. This necessity has led to the birth of data
mining.

(b) No. Data mining is not a simple transformation of technology developed from databases,
statistics, and machine learning. Instead, it involves an integration of data rather than a
simple transformation of techniques from multiple disciplines such as database technology,
statistics, machine learning, high-performance computing, pattern recognition, neural
networks, data visualization, and information retrieval and so on.

(c) the "knowledge discovery in databases" process, or KDD.

Qwcheetah Ambitious
Data mining has got it roots from three family lines:
Classical statistics - The standard stats to analyse a situation or mathematical
data for number predictions.
Artificial intelligence - It has high-end queries to act more humanly.
Machine learning - It is the combination of Classical and AI stats in order to
build itself up.
Over the years this huge amount of data has been collected and stored, but it
has increased the complexity to retrieve that data whenever needed.
Data retrieval has lead to the term of data mining.
The term was first coined in 1990s.

(d) Steps involved in Data mining when viewed as Knowledge Discovery process.

Data Cleaning- a process that removes or transforms noise and inconsistent data.

Data Integration- where data from heterogeneous data sources is combined for mining
purpose.

Data Selection- where data relevant to the analysis task are retrieved from the database.

Data Transformation - where data is transformed or consolidated into forms suitable for
mining.

Data Mining - an essential process where intelligent and efficient methods are applied in order
to extract patterns.

Pattern Evaluation - a process that identifies the truly interesting patterns representing
knowledge based on some interestingness measures.

Knowledge Presentation- where visualization and knowledge representation techniques are

used to present the mined knowledge to the user.

Question 2: How database is different from datawarehouse?

Database System:
Database System is used in traditional way of storing and retrieving data. The major
task of database system is to perform query processing. These systems are
generally referred as online transaction processing system. These systems are used
day to day operations of ans organization.
Data Warehouse:
Data Warehouse is the place where huge amount of data is stored. It is meant for
users or knowledge workers in the role of data analysis and decision making. These
systems are supposed to organize and present data in different format and different
forms in order to serve the need of the specific user for specific purpose. These
systems are referred as online analytical processing.
Difference between Database System and Data Warehouse:
Database System Data Warehouse
1. It supports operational processes. It supports analysis and performance
reporting.

2. Capture and maintain the data. Explore the data.

3. Current data. Multiple years of history.

4. Data is balanced within the scope Data must be integrated and balanced
of this one system. from multiple system.

Data is updated on scheduled

5. Data is updated when transaction
processes.
occurs.

6. Data verification occurs when

entry is done. Data verification occurs after the fact.

7. 100 MB to GB. 100 GB to TB.

8. ER based. Star/Snowflake.

9. Application oriented. Subject oriented.

10. Primitive and highly detailed. Summarized and consolidated.

11. Flat relational. Multidimensional.

Second ANS

What is Database?
A database is a collection of related data which represents some elements of the
real world. It is designed to be built and populated with data for a specific task. It
is also a building block of your data solution.
What is a Data Warehouse?
A data warehouse is an information system which stores historical and
commutative data from single or multiple sources. It is designed to analyze,
report, integrate transaction data from different sources.

Data Warehouse eases the analysis and reporting process of an organization. It is

also a single version of truth for the organization for decision making and
forecasting process.
Parameter Database Data Warehouse
Purpose Is designed to record Is designed to analyze
Processing The database uses the Online Data warehouse uses Online Analytical Processing
Method Transactional Processing (OLTP) (OLAP).
The database helps to perform
Data warehouse allows you to analyze your
Usage fundamental operations for your
business.
business
Tables and Tables and joins of a database are Table and joins are simple in a data warehouse
Joins complex as they are normalized. because they are denormalized.
Is an application-oriented
Orientation collection of It is a subject-oriented collection of data
data
Storage Generally limited to a single
Stores data from any number of applications
limit application
Data is refreshed from source systems as and when
Availability Data is available real-time
needed
ER modeling techniques are used
Usage Data modeling techniques are used for designing.
for designing.
Technique Capture data Analyze data
Data stored in the Database is up Current and Historical Data is stored in Data
Data Type
to date. Warehouse. May not be up to date.
Data Ware House uses dimensional and normalized
Storage of Flat Relational Approach method
approach for the data structure. Example: Star and
data is used for data storage.
snowflake schema.
Query Simple transaction queries are
Complex queries are used for analysis purpose.
Type used.
Data Detailed Data is stored in a
It stores highly summarized data.
Summary database.
.

Question 3: Briefly explain the steps of making decision tree.

1. Calculate entropy for dataset.
2. For each attribute/feature.
2.1. Calculate entropy for all its categorical values.
2.2. Calculate information gain for the feature.
3. Find the feature with maximum information gain.
4. Repeat it until we get the desired tree.
Question 5 :Explain the importance of evaluation criteria for classification
methods.
Performance evaluation of classification model is important for understanding the quality of
the model, to refine the model, and for choosing the adequate model. The performance
evaluation criteria used in classification models are:
• Predictive (Classification ) accuracy: this refers to the ability of the model to correctly predict the
class label of new or previously unseen data:
• accuracy = % of testing set examples correctly classified by the classifier
• Speed: this refers to the computation costs involved in generating and using the model
• Robustness: this is the ability of the model to make correct predictions given noisy data or data
with missing values.
•Scalability: this refers to the ability to construct the model efficiently given large amount of data
• Interpretability: this refers to the level of understanding and insight that is provided by the model
• Simplicity:
• decision tree size
• rule compactness
• Domain-dependent quality indicators
Question no 6: how to improve the accuracy of classification?
1 - Cross Validation : Separate your train dataset in groups, always separe a group for prediction and change the
groups in each execution. Then you will know what data is better to train a more accurate model.

2 - Cross Dataset : The same as cross validation, but using different datasets.

3 - Tuning your model : Its basically change the parameters you're using to train your classification model (IDK
which classification algorithm you're using so its hard to help more).

4 - Improve, or use (if you're not using) the normalization process : Discover which techniques will provide a more
concise data to you to use on the training.

5 - Understand more the problem you're treating... Try to implement other methods to solve the same problem.
Always there's at least more than one way to solve the same problem. You maybe not using the best approach.

6-More Data: More variety and more volume will give better results.
7-Ensemble Methods : (Probably the easiest and most interesting) Combining of multiple weak
models to make a strong model with better prediction by compensating for each other losses.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Surface Chemistry Notes - 2020-21
No ratings yet
Surface Chemistry Notes - 2020-21
24 pages
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
DMA_qb_solved
No ratings yet
DMA_qb_solved
42 pages
Data Mining Questions
No ratings yet
Data Mining Questions
24 pages
Data Warehousing & Data Mining - Study Material
No ratings yet
Data Warehousing & Data Mining - Study Material
27 pages
Recent Trends in IT
No ratings yet
Recent Trends in IT
7 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
QUESTIONS AND ANSWERS
No ratings yet
QUESTIONS AND ANSWERS
19 pages
Data warehousing and Data Mining Unit 1,2,3 Q and A
No ratings yet
Data warehousing and Data Mining Unit 1,2,3 Q and A
41 pages
DS notes BCA
No ratings yet
DS notes BCA
16 pages
DMBI Viva
No ratings yet
DMBI Viva
18 pages
Ctit Qb Solution-u1
No ratings yet
Ctit Qb Solution-u1
12 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Viva Questions For Data Mining and Warehousing: Q1. Ans.
No ratings yet
Viva Questions For Data Mining and Warehousing: Q1. Ans.
13 pages
dwm 2
No ratings yet
dwm 2
31 pages
DM HarshQuesAns
No ratings yet
DM HarshQuesAns
183 pages
DMDW Imp Ques
No ratings yet
DMDW Imp Ques
17 pages
Data Mining
No ratings yet
Data Mining
7 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
Sheet 1 Solution1
No ratings yet
Sheet 1 Solution1
4 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
dataqb
No ratings yet
dataqb
38 pages
Data Mining Edited
No ratings yet
Data Mining Edited
29 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
Dbms Data Warehosuing
No ratings yet
Dbms Data Warehosuing
80 pages
HU-DM-2024
No ratings yet
HU-DM-2024
205 pages
DWDM Short YNotes
No ratings yet
DWDM Short YNotes
9 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
No ratings yet
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
4 pages
Unit-2
No ratings yet
Unit-2
144 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Data Mining
No ratings yet
Data Mining
26 pages
Short Notes On Data Mining & Warehousing
No ratings yet
Short Notes On Data Mining & Warehousing
43 pages
DM
No ratings yet
DM
99 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Datamining Unit -1
No ratings yet
Datamining Unit -1
20 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
11 pages
best chapter 1 DM
No ratings yet
best chapter 1 DM
22 pages
DM Unit2(Part1)
No ratings yet
DM Unit2(Part1)
19 pages
01Intro
No ratings yet
01Intro
28 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
3 pages
Data Mining
No ratings yet
Data Mining
25 pages
CXCXX C C
No ratings yet
CXCXX C C
14 pages
Vivaquestions
No ratings yet
Vivaquestions
14 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Current Trends
No ratings yet
Current Trends
35 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Mining
No ratings yet
Data Mining
15 pages
Describe The Data Processing Chain: Business Understanding
No ratings yet
Describe The Data Processing Chain: Business Understanding
4 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Python for Bioinformatics 2nd Edition Sebastian Bassi 2024 scribd download
100% (1)
Python for Bioinformatics 2nd Edition Sebastian Bassi 2024 scribd download
67 pages
1sxu000023c0202 - 15 - s200 ABB MCB
No ratings yet
1sxu000023c0202 - 15 - s200 ABB MCB
42 pages
HeatEquivalent-memo-2
No ratings yet
HeatEquivalent-memo-2
7 pages
An Experimental Study of Non-Newtonian Fluid Flow in Fluidized Beds: Minimum Fluidization Velocity and
No ratings yet
An Experimental Study of Non-Newtonian Fluid Flow in Fluidized Beds: Minimum Fluidization Velocity and
11 pages
Cse III Discrete Mathematical Structures 10cs34 Notes
No ratings yet
Cse III Discrete Mathematical Structures 10cs34 Notes
115 pages
Get (Ebook) Elevation Based Corneal Tomography by by Michael W. Belin, MD, FACS, Stephen S. Khachikian, MD, Renato Ambrósio Jr., MD, PhD ISBN 9789962678533, 9962678536 free all chapters
100% (1)
Get (Ebook) Elevation Based Corneal Tomography by by Michael W. Belin, MD, FACS, Stephen S. Khachikian, MD, Renato Ambrósio Jr., MD, PhD ISBN 9789962678533, 9962678536 free all chapters
67 pages
Modelling and Stressing The Interest Rates Swap Curve: September 2013
No ratings yet
Modelling and Stressing The Interest Rates Swap Curve: September 2013
19 pages
(Done - GFR) 20mlg047 - Kalpatarmlg - Erwin
No ratings yet
(Done - GFR) 20mlg047 - Kalpatarmlg - Erwin
211 pages
Maths P2 Oct 2019 Sec1
No ratings yet
Maths P2 Oct 2019 Sec1
11 pages
JFFNMS Installation Guide
No ratings yet
JFFNMS Installation Guide
10 pages
Yamaha-RXV1200 - 2200 Rec PDF
No ratings yet
Yamaha-RXV1200 - 2200 Rec PDF
106 pages
EF
No ratings yet
EF
2 pages
10th Physics Guess Paper 2025
No ratings yet
10th Physics Guess Paper 2025
7 pages
Jurnal PCD Kelompok
No ratings yet
Jurnal PCD Kelompok
5 pages
Catenary Cables
100% (1)
Catenary Cables
110 pages
DFPlayer Mini SKU - DFR0299 - DFRobot Electronic Product Wiki and Tutorial - Arduino and Robot
No ratings yet
DFPlayer Mini SKU - DFR0299 - DFRobot Electronic Product Wiki and Tutorial - Arduino and Robot
14 pages
PPI1626A01 Total & Direct Bilirubin Rev A
No ratings yet
PPI1626A01 Total & Direct Bilirubin Rev A
2 pages
Angostura Gold-Silver Underground Project
100% (1)
Angostura Gold-Silver Underground Project
294 pages
FDD Parameter Audit - 20131201
No ratings yet
FDD Parameter Audit - 20131201
48 pages
JHR01-03-DRG-ELE-3-ZZ-SW-6007-SP01-SG Shop Drawing
No ratings yet
JHR01-03-DRG-ELE-3-ZZ-SW-6007-SP01-SG Shop Drawing
107 pages
Class Routine (B.Sc. in CE (HSC) Spring-2021) : Department of Civil Engineering
No ratings yet
Class Routine (B.Sc. in CE (HSC) Spring-2021) : Department of Civil Engineering
3 pages
Encoders: Rotary Encoders Incremental/Absolute Linear Encoders Motor Feedback Systems
No ratings yet
Encoders: Rotary Encoders Incremental/Absolute Linear Encoders Motor Feedback Systems
4 pages
WM1 Manual
No ratings yet
WM1 Manual
176 pages
Research Methodology
No ratings yet
Research Methodology
117 pages
Single Legged Staircase Design
No ratings yet
Single Legged Staircase Design
12 pages
Chapter 13 Notes - Risk and Capital Budgeting
No ratings yet
Chapter 13 Notes - Risk and Capital Budgeting
5 pages
Parts Catalog: 0CB10-M59303EN
No ratings yet
Parts Catalog: 0CB10-M59303EN
27 pages
2012 ENGINE Intake Manifold and Exhaust System - TL
No ratings yet
2012 ENGINE Intake Manifold and Exhaust System - TL
13 pages
CSEC Physics P2 2014 January
0% (1)
CSEC Physics P2 2014 January
20 pages

solved DM questions

Uploaded by

solved DM questions

Uploaded by

Question 1: What is data mining?

in your answer, address the following:

(c) the "knowledge discovery in databases" process, or KDD.

Knowledge Presentation- where visualization and knowledge representation techniques are

Question 2: How database is different from datawarehouse?

2. Capture and maintain the data. Explore the data.

3. Current data. Multiple years of history.

Data is updated on scheduled

6. Data verification occurs when

7. 100 MB to GB. 100 GB to TB.

9. Application oriented. Subject oriented.

10. Primitive and highly detailed. Summarized and consolidated.

11. Flat relational. Multidimensional.

Data Warehouse eases the analysis and reporting process of an organization. It is

Question 3: Briefly explain the steps of making decision tree.

You might also like