SCHOOL OF COMPUTER SCIENCE AND ENGINEERING
[Link] in COMPUTER SCIENCE & BUSINESS SYSTEMS (CSBS)
(Academic Year 2024-2025)
ASSIGNMENT QUESTIONS UNIT WISE
VI Semester
Subject: Data Mining and Analytics Code: 22CSBS212
Marks
SI. No Questions CO RBTL to be
asked
UNIT-1
1 What is Data Mining? Discuss the application of Data Mining with
CO1 3 5
example.
2 Explain OLAP operation with an example. CO1 3 8
3 Explain Data pre-processing steps and the challenges faced in Data
CO1 3 7
Mining.
4 What is data mining? Explain motivating challenges. CO1 2 5
5 Explain the KDD process with the help of neat diagram. CO1 3 7
6 What is knowledge extraction process? Explain in detail with a diagram CO1 2 5
7 Explain visualization techniques in data mining. CO1 3 8
8 Explain the followings:
CO1 3 6
i) Data ii) Data mining classification iii) Data warehouse
9 Explain the kinds of patterns can be mined in detail. CO1 3 5
10 Discuss the application of Data Mining and major issues in Data mining
CO1 2 5
UNIT-2
1 What is Data Cleaning? Describe various methods of Data Cleaning. CO2 2 5
2 Explain about the different Data Reduction techniques. CO2 3 8
3 Explain about Data Transformation method with suitable example CO2 3 6
4 Define Data Integration and explain the issues of data integration. CO2 2 5
5 Define normalization? Explain the different techniques in the
CO2 2 8
normalization.
6 Apply Z-score techniques and normalize the given data shown below.
Student Height(inches)
1 64
2 70 CO2 4 5
3 72
4 68
5 76
7 Apply min-max techniques and normalize the given data shown below.
House Sqrtfeet Bedrooms
1 1200 3
CO2 4 5
2 1500 4
3 1000 2
4 1800 5
8 Define Attributes. Explain the types of attributes with an example CO2 3 6
9 Interpret the usage of statistical measures in data mining and also briefly
explain the followings with formulas: CO2 3 8
a) Mean b) median c) mode d) midrange
10 Write a short note on:
CO2 1 6
i) Class comparison ii) Attribute-oriented analysis
UNIT-3
1 Define Corelation analysis. Find the Corelation for below data set.
Define A B
20 8
20 34 CO3 2 6
9 4
8 7
2 Define Classification. Explain briefly about 1R algorithm. CO3 3 8
3 Suppose a credit card company wants to determine which customer
should be sent promotional material for life insurance offer using this
problem statement Determine which attribute set has the highest
accuracy for model generation by applying the 1R method to the data set
below.
Ag Gende Income Previous Life insurance
e r insurance has promotion CO3 4 10
45 M 40-50K NO NO
40 F 30-40K NO YES
42 M 40-50 NO NO
43 M 30-40 YES YES
38 F 5060 NO YES
4 Explain the frequent item set generation and rule generation in Apriori
CO3 2 8
Algorithm with example.
5 A Database has five transactions. Let the minimum support be 60% and
min conf=80%
i) Find all frequent itemset using Apriori Algorithm.
ii) List and construct all the strong association Rules.
TID ITEMS
T1 {Bread, Butter, Milk} CO3 4 10
T2 {Bread, Butter}
T3 {Beer, cookies, Diapers}
T4 {Milk, Diapers, Bread, Butter}
T5 {Beer, Diapers}
6 A Database has five transactions. Let the minimum support be 3 and min
conf=75%
i) Find the order item set ii) Construct FP- Tree
iii) Find conditional frequent pattern, frequent pattern generation
by FP algorithm.
TID ITEMS CO3 4 10
T1 {M, O, N, K, E, Y}
T2 {D, O, N, K, E, Y}
T3 {M, A, K, E}
T4 {M, U, C, K, Y}
T5 {C, O, K, I, E}
7 Define association analysis? Explain followings with an example.
CO3 2 8
i) Association rule ii) Support iii) Confidence iv) Frequent Itemset.
8 Construct an FP tree for the following dataset: Let minimum support = 3
TID ITEMS
1 {a, b}
2 {b, c, d}
3 {a, c, d, e}
4 {a, d, e}
5 {a, b, c} CO3 4 8
6 {a, b, c, d}
7 {a}
8 {a, b, c}
9 {a, b, d}
10 {b, c, e}
9 Construct FP tree by showing tree seperatley after reading each
transaction and find the frequent itemset generation. Consider the
transaction dataset:
TID ITEMS
T1 {f, a, c, d, g, i, m, p}
T2 {a, b, c, f, l, m, o} CO3 4 8
T3 {b, f, h, j, o}
T4 {b, c, k, s, p}
T5 {a, f, c, e, l, p, m, n}
Let minimum support = 3 and confidence is 70%
10 Define Bayesian Classification with formula. Explain the types of
CO3 3 7
Bayesian Classifiers.
UNIT-4
1 Define Trend Analysis. List the common methods or algorithms used for
CO4 3 2
trend analysis in data mining?
2 Define Descriptive Analytics. List the Applications of Descriptive
CO4 1 2
Analytics.
3 Define Linear Regression. Explain the properties of Linear Regression. CO4 1 5
4 Find the linear regression equation for the given data:
X Y
3 8
9 6 CO4 1 5
5 4
3 2
5 List the Parameters of Linear Regression? Find slope of linear regression CO4 1 5
line if ∑x = 50, ∑y = 44, ∑x2 = 150, ∑xy = 230 and n = 4.
6 Interpret forward, backward method regression techniques in detail. CO4 1 6
7 Defining Hypotheses. Interpret the following tests:
i) Wald test ii) LR test iii) score test
CO4 2 6
8 Explain the step-by-step approach to implementing predictive models
and elaborate how to build predictive model for Predicting House Prices CO4 2 7
using Linear Regression
9 Explain the following Generalized Linear model:
CO4 2 8
i) Poisson ii) binomial iii) inverse Gaussian iv) Gamma
10 Define heuristic method in data analytics. Find the shortest path from
Start (S) to Goal (G) using the A* Algorithm in the given graph.
CO4 1 10
11 Define Linearization. List the Common Types of Linearization models. CO4 1 2
12 List the drawbacks of non-Linearity in Regression Models and explain
CO4 3 5
the methods to detect non-linearity.
13 Explain any 3 iterative methods for Non-Linear Regression model. CO4 3 8
UNIT-5
1 Define auto-covariance in time series analysis? CO5 1 2
2 List the relationship between auto-covariance and variance? CO5 1 2
3 Define Moving Average in time series analysis? CO5 1 2
4 Define Auto-Correlation and explain its properties CO5 1 6
5 Explain the properties Auto-Covariance. CO5 2 5
6 Define Moving average. Consider the stock with the following prices
over 5 days: P=[20,22,24,26,28]. Compute the 3-day Simple Moving CO5 1 5
Average.
7 Explain the steps for estimating an ARIMA Models CO5 2 5
8 Interpret the process involved in the Forecasting Using ARIMA Models CO5 3 7
9 Define Decision tree. Explain followings on decision tree:
i) Structure ii) Hyperparameter iii) Advantages and Disadvantages CO5 3 10
iv) Algorithm Steps
10 Explain the followings on Mathematical Optimization:
i) Components of an Optimization Problem ii) Types of Optimization CO5 3 10
Problems iii) Optimization Methods iv) Applications of Optimization