0% found this document useful (0 votes)

83 views6 pages

Data Mining Exam for B.Sc. Students

Du question paper computer science honours

Uploaded by

Shaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views6 pages

Data Mining Exam for B.Sc. Students

Du question paper computer science honours

Uploaded by

Shaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

4192 t2 [This question paper contains

0I nh
printed
12 p
fn)
ages.l

7 Given a dataset with six records about startup Your Roll N o....,..,.......

companies, each record has two fields: Number of

Sr. No. of Question Ptper: 4192 H
Clients and Annual Turnover. Assuming that k = 2
and initial cluster centres as the first two records, Unique Paper Code : 2343012005

compute the cluster centres of the resulting clusters

) ) Name of the Paper Data Mining I
until the stopping criterion is met. Use Euclidean
distance as the distance metric. Also, cornpute the Name of the Course B.Sc. (Hons.) C om P uter

SSE (Sum of Squared Error) of each generated cluster. Science

Semester IV
Number of Annua1 Turnover
clients (in Lakhs) Duration: 3 Hours Maximum Marks :90
185 72
170 56
r,68 60
o6
lnstructions for Candidates
L19
L82
1,8 8 (15) Write your Roll No. on the top immediately on receipt
of this question paper.

) ) 2. Section A (Question No. 1) 1S compulsory

3 Attempt any four questions from Section B

(Questions 2 lo 7).

4. The use of a simple calculator is allowed

5. Parts of the question must be answered together

(1000) P.T.O
4792 2 4t92 11

Section A
ID A9e Fever BD Oulcome
Young Yes High In ICU
(a) Differentiate between the unsupervised and Young No High Hospital ized
P3 E Ide rfy Hi.gh In ICU
supervised evaluation measures used for cluster MiddIe
P4 Moderate In ICU
validity. (3) aged
P5 MiddIe No High Hone Care
aged
(b) What is the anti-monotone property of the support ) ) P6 MiddIe Yes Moderate In ICU
measure in association rule mining? Does the aged
P7 EIderIy No Moderate In ICU
confidence measure follow anti-monotone P8 Elderly No High Deceased
property? (3)
P9 E Ide rLy High In ICU
P10 Younq No High HospitalizeA
BD: Breathing Difficulty
(c) Consider a dataset with two class labels, News
and Entertainment, and six labeled documents D l -
(a) Compute the Gini Index of Age, Fever, and BD
D6. A new document, D7, is to be classified. The
similarity values of D7 with D1, DZ, D3, D4, D5 attributes. Given that you construct a decision tree
and D6 are 0.75,0.85,0.66,0.87,0.70 and 0.84 using the Gini Index as the splitting criteria, which
respectively. Using the k-Nearest Neighbor of the three attributes would you choose at the
classifier, predict the class label that should be root? Justify your choice. (9)
assigned to D7 when k:3 . Will the predicted class
label change with k=5? (4) (b) Compute the Gini Index of ID. Why should it not
) ) be used as a splitting attribute for constructing a
Document Class tabel
Dl- News
decision tree? (3)
D2 Entertainment
D3 Entertainment
(c) Civen ten objects in the dataset (p1-p10),
D4 News mention all train and test distributions for
D5 News performing k-fold cross-validation. Assume the
D6 Entertainment
value of k = 5. (3)

P. T. O.
4192 10 4192 3

(i) List the confusion matrix for "Classifier (d) Consider the given dataset, which contains six
objects, each with two attributes: Age and Salary.
A" and "Classifier B". Find the accuracy,
K-means clustering is used to cluster the given
precision, sensitivity, recall and specificity objects. Do you see any issue with applying K-
for each classifier. (8) means to the given dataset? If yes, then state the
issue. Also apply the appropriate preprocessing
)' ) technique to overcome it. If no, state explicitly
(ii) What problem may occur if the provided
that no preprocessing technique is required. (4)
training dataset of 500 patients had only
l5 positive instances and the remaining Age Salary
(in years ) (in rupees )
negative instances? Which performance
object 1 40 62000
measure would you choose to evaluate the Object 2 24 48000
object 3 30 54000
classifiers in such a scenario? Which is
object 4 35 6?000
the better classifier between Classifier A object 5 46 80000

in such a scenario?
object 6 ?i 66000
and Classifier B
(4)
(e) Define the curse of dimensionality. The Iris flower
(b) Consider a categorical attribute Grade with three dataset comprises of 150 data points and four
) ) features, namely sepal length, sepal width, petal
values {A, B, and C}. Convert this attribute to
width, and petal length. Is it a high-dimensional
asymmetric binary attributes. (3)
data or low-dimensional data? Justify your answer.
(4)

6 Consider the given COVID-19 dataset of ten (f) Consider a decision tree to classify the health of
patients. an individual as Fit or Unfit given below :

P.T.O.
4192 4 4192 9

(c) Enumerate all association rules generated from the

Age < 30 largest frequent itemset found in each dataset scan.
Yes No Compute the confidence of each generated rule.
Smokes,/ workout ?
Assuming that the minimum confidence threshold
Drinks ?
is 70%, find all the strong association rules.
Yes o Yes ) )
(6)
Unfit IIL Fir Dlet. Control ? (a) A medical team develops classification models for
5

predicting the occurrence of a ..genetic disorder,,

Y No
using Classifier A and Classifier B. patients having

Fir Unfit
genetic disorders are considered positive
instances. In contrast, negative instances are ones
with the absence of genetic disorders. The
(i) Extract all classification rules from the classifiers were tested on data from 500 patients
decision tree. and then obtained the result as :

(ii) Classify the following object: Actual tabeL

Plesence of Absence of
Age : 50, Workout : No, Smokes/ Drinks =
Genetic
Diso!de!
G6netic
Diso!der
Classl"fie! A, predicted
No, Diet Control: No, Health: ? (4) ) "presence of genetic 131
dlsorder"
Classifier A, pledlcted
(g) Classify the following tasks as ..predictive,, or *abselce of genetlc 19 195
disorde!"
"descriptive". Justify your answer. (4) classifie. B; pledicted
"presence of genetic 82 72
dlsordet"
(i) Foretelling whether an online user will shop CIas6ifi er B, predicted
"absence of genetic 68 27A
on Flipkart for a specific item. dlsorder,,

P.T.O
4192 8 4L92 5

(iii) What is an outlier? Spot an outlier in the (ii) Grouping the customers of a company
provided dataset. (3) according to their buying interests.

(b) What is the need for sampling in data mining ?

(iii) Finding a group of genes such that genes
What problems arise if the sample size is too small in each group have related functionality.
or too large? (3) )) (iv) Using historical data from previous financial
statements to project sales, revenue, and
1 Consider the following transactional data of a grocery expenses for a company.
store
(h) Given two objects X = (22,1,42, 10) and
Transaction 1D I tems
I.L Boots, ttoodiGl EliiEE- Y = (20,0,36,8), compute the distance between
T2 Boots, Hoodle these two objects using the following distance
T3 Hoodie, Coat, Cardj. an
T4 Cardi an, Coat measures:
15 Cardi an, Gloves
Hoodie, Coat, Cardigan (i) Euclidean D istance

(ii) Manhattan Distance (4)

(a) What is the maximum number of rules that can be
extracted from this data (including rules that have ) )
Section B
zero support). (3)

(b) Use the Apriori algorithm on the given transactional 2 (a) Given the following training dataset, compute all
dataset and compute the candidate and frequent class conditional and prior probabilities. Use the
itemsets for each dataset scan. Assume a support Naive Bayes approach to predict the class label
threshold of 33.34%. (6)
(Salary) for the test instance: (12)

P.T.O
4192 6 4192 7

Education Level = PG, Career Management,

Years of Experience = 3 to l0 IO Dept. Name Location Establish Si2e Annual
ed On Budget
DPT2 Finance Nehru 5-0L-2020 Large 460
Place
DP19 Marketlng Nehru 8-08 -2020 Med i.um 300
Education Career Years of salary Place
Experience Human Hauz Khag 2-0r-2020 Mediwr 240
tevel DP21
Resourco
UG Management Less than 3 Low
Production 2-02-2020 Medium 290
UG Management 3to10 Low ) DP21
DP3 3 Resealch 6 NehrLr 4-07-2021 snall 90
PG Management Less than 3 High Devefopment Place
PG Sexvice More than 10 Low DP39 InfornatioD Hauz Khas 6-09-2020 Medi um 210
UG service 3to10 Low Technology
SaIes 9-09-2020 Large 5t 0
PG service 3to10 Hiqh DP41 NEhTU
Place
PG Management More lhan 10 High DP52 Custome! Ilauz Khas 2-10-2020 Mediun
PG Service less than 3 Low
UG Management More than 10 High DP5 5 Public Nehru 3-O3-202r Large 900
t'lore ttran f0 Re!.atLons Place
UG service Low

Annual Budget is In Lakhs

(b) A data mining application uses a particular type
of data. Cive one application for each of the (i) Identify the type of attributes ID, Dept.

following type : (3) Name, Location, Established On, Size, and

Annual Budget as nominal, ordinal, interval,
(i) Sparse dataset each.
) or ratio. Give justification for (6)

(i i) Spatio-'l emporal data

(ii) Suggest a technique for dealing with
(iii) Graph-based data missing values in the attribute Location.
Will the same technique apply to the
(a) Consider the following dataset having details about attribute Annual Budget? Justify. (3)

different departments of a company :

P.T.O

DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
DM 2023
No ratings yet
DM 2023
8 pages
CSE4005
No ratings yet
CSE4005
6 pages
Data Mining & Decision Trees Quiz
No ratings yet
Data Mining & Decision Trees Quiz
30 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Quiz2 A
No ratings yet
Quiz2 A
5 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
3rd Sem Fatgghu
No ratings yet
3rd Sem Fatgghu
11 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
May 2021 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics ID5059
6 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
DM 2019
No ratings yet
DM 2019
7 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
3rd Sem ct2
No ratings yet
3rd Sem ct2
7 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
Quiz2 B
No ratings yet
Quiz2 B
6 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Last Yr Paper
No ratings yet
Last Yr Paper
5 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
Data Mining Multiple Choice Quiz
No ratings yet
Data Mining Multiple Choice Quiz
16 pages
DM 2022
No ratings yet
DM 2022
4 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
Data Mining Question Set
No ratings yet
Data Mining Question Set
5 pages
Data Mining & Warehousing Exam 2024
No ratings yet
Data Mining & Warehousing Exam 2024
2 pages
DM Practice Problem Set-2
No ratings yet
DM Practice Problem Set-2
7 pages
MT2023 Sol
No ratings yet
MT2023 Sol
8 pages
DM 23
No ratings yet
DM 23
8 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
UCS622
No ratings yet
UCS622
1 page
AI Lecture 12-b
No ratings yet
AI Lecture 12-b
20 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
ML MCQs Set
No ratings yet
ML MCQs Set
18 pages
Machine Learning - AKTU PAPER (Session 2019 - 2020)
No ratings yet
Machine Learning - AKTU PAPER (Session 2019 - 2020)
10 pages
DMW MCQ
No ratings yet
DMW MCQ
388 pages
Data Mining End 23 24
No ratings yet
Data Mining End 23 24
2 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
DM 24
No ratings yet
DM 24
7 pages
ML End Sem Nov2024 Paper
No ratings yet
ML End Sem Nov2024 Paper
4 pages
2021 - Data Mining DU CBCS
No ratings yet
2021 - Data Mining DU CBCS
4 pages
Data Mining Assignment Guide
No ratings yet
Data Mining Assignment Guide
4 pages
Uct633 MST e Mar25
No ratings yet
Uct633 MST e Mar25
2 pages
DWM Quesans
No ratings yet
DWM Quesans
21 pages
DM Endsem 2023-1
No ratings yet
DM Endsem 2023-1
4 pages
Sample QP For Mid-Semester Exam
No ratings yet
Sample QP For Mid-Semester Exam
5 pages
QB - Data Science
No ratings yet
QB - Data Science
4 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
Data Mining and Warehousing22
No ratings yet
Data Mining and Warehousing22
3 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
Mid-Semester Make-Up Data Mining QP v1
No ratings yet
Mid-Semester Make-Up Data Mining QP v1
3 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
DM Quiz2 Ans DJ
No ratings yet
DM Quiz2 Ans DJ
4 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
Data Mining Exam Questions
No ratings yet
Data Mining Exam Questions
25 pages
Supported Query Method Predicate Keywords and Modifiers
No ratings yet
Supported Query Method Predicate Keywords and Modifiers
6 pages
Overview of Cloud Computing by Asst. Prof. Lija Mishra
100% (3)
Overview of Cloud Computing by Asst. Prof. Lija Mishra
23 pages
History of Optical Storage Media
No ratings yet
History of Optical Storage Media
56 pages
F5 Load Balancer
No ratings yet
F5 Load Balancer
260 pages
Supplement111 112
No ratings yet
Supplement111 112
31 pages
Supplychainobject
No ratings yet
Supplychainobject
10 pages
Serial Matrix Printer Maintenance Guide
No ratings yet
Serial Matrix Printer Maintenance Guide
54 pages
D
No ratings yet
D
11 pages
Manual Centralina Megane Classic 1.4
No ratings yet
Manual Centralina Megane Classic 1.4
35 pages
KNN Experiments Housing Student x22 - Jupyter Notebook-1
No ratings yet
KNN Experiments Housing Student x22 - Jupyter Notebook-1
15 pages
CGO MARAWI LANAO DEL SUR-Human Resource Management Officer III PDF
No ratings yet
CGO MARAWI LANAO DEL SUR-Human Resource Management Officer III PDF
1 page
OLSX - API Doc 1
No ratings yet
OLSX - API Doc 1
19 pages
Using SAP PI Lookup API and Dynamic Configuration in SAP GRC NFE Outbound B2B Interface For Dynamic E-Mail Determination PDF
No ratings yet
Using SAP PI Lookup API and Dynamic Configuration in SAP GRC NFE Outbound B2B Interface For Dynamic E-Mail Determination PDF
16 pages
Thermo Calc Documentation Set
No ratings yet
Thermo Calc Documentation Set
999 pages
IOT - Architecture and Setup - 20250124
100% (1)
IOT - Architecture and Setup - 20250124
7 pages
IBM Watson Analytics Automating Visualization Desc
No ratings yet
IBM Watson Analytics Automating Visualization Desc
12 pages
Anatomy of Commercial IMSI Catchers and Detectors
No ratings yet
Anatomy of Commercial IMSI Catchers and Detectors
13 pages
Alfa Account Server V2.2.36
No ratings yet
Alfa Account Server V2.2.36
26 pages
A Scoping Review of Computational Thinking Assessments in Higher Education
No ratings yet
A Scoping Review of Computational Thinking Assessments in Higher Education
46 pages
Module 10 - IPv4 Addressing
No ratings yet
Module 10 - IPv4 Addressing
3 pages
(Ebook PDF) Adaptive Health Management Information Systems: Concepts, Cases, and Practical Applications 4th Edition Instant Download
0% (1)
(Ebook PDF) Adaptive Health Management Information Systems: Concepts, Cases, and Practical Applications 4th Edition Instant Download
56 pages
Digital-Twin Predictive Control of Nonlinear Systems With Time Delays - Unknown Dynamics - and Communication Delays
No ratings yet
Digital-Twin Predictive Control of Nonlinear Systems With Time Delays - Unknown Dynamics - and Communication Delays
13 pages
Netbackup For Sybase
No ratings yet
Netbackup For Sybase
68 pages
Conditional Formating and Sorting
No ratings yet
Conditional Formating and Sorting
14 pages
E3d Command List
No ratings yet
E3d Command List
13 pages
Panasonic VL-SV74 PDF
No ratings yet
Panasonic VL-SV74 PDF
2 pages
Indefinite Integrals: Powers, Logs, Exponentials
No ratings yet
Indefinite Integrals: Powers, Logs, Exponentials
9 pages
Computer Architecture Essentials
No ratings yet
Computer Architecture Essentials
34 pages
Applied One
No ratings yet
Applied One
166 pages

Data Mining Exam for B.Sc. Students

Uploaded by

Data Mining Exam for B.Sc. Students

Uploaded by

4192 t2 [This question paper contains

companies, each record has two fields: Number of

compute the cluster centres of the resulting clusters

SSE (Sum of Squared Error) of each generated cluster. Science

) ) 2. Section A (Question No. 1) 1S compulsory

3 Attempt any four questions from Section B

4. The use of a simple calculator is allowed

5. Parts of the question must be answered together

(c) Enumerate all association rules generated from the

predicting the occurrence of a ..genetic disorder,,

(ii) Classify the following object: Actual tabeL

(b) What is the need for sampling in data mining ?

(ii) Manhattan Distance (4)

Education Level = PG, Career Management,

Annual Budget is In Lakhs

following type : (3) Name, Location, Established On, Size, and

(i i) Spatio-'l emporal data

different departments of a company :

You might also like