Review Questions For CS410 Data Mining and Data Warehousing
Review Questions For CS410 Data Mining and Data Warehousing
Question
From the given data set explain how to handle the missing value
Question
With a help of Graph Discuss the knowledge Discovery Process
Question
Data mining is
A) The actual discovery phase of a knowledge discovery process
B) The stage of selecting the right data for a KDD process
C) A subject-oriented integrated time variant non-volatile collection of data in support of
management
D) None of these
Question
Classification is
A) A subdivision of a set of examples into a number of classes.
B) A measure of the accuracy, of the classification of a concept that is given by a certain theory.
C) The task of assigning a classification to a set of examples
D) None of these
QUESTION
. Use these methods to normalize the following group of data: 200, 300, 400, 600,1000
a) min-max normalization by setting min D 0 and max D 1
b) z-score normalization
c) normalization by decimal scaling
Page 1 of 6
Question
Given the following TD, hand simulate the Apriori algorithm and generate the association rule.
S=2, C=50%
Question
Generate FT-tree for the following transaction data set.
Transaction
Item code
_ID
1 E,A,D,B
2 D,A,C,E,B
3 C,A,B,E
4 B,A,D
5 D
6 D,B
7 A,D,E
8 B,C
QUESTION
Online Analytical Processing (OLAP) Is a category of software that allows users to analyze
information from a single database systems at the same time.
A) True
B) False
QUESTION
Data mining is
A) The actual discovery phase of a knowledge discovery process
B) The stage of selecting the right data for a KDD process
Page 2 of 6
C) A subject-oriented integrated time variant non-volatile collection of data in support of
management
D) None of these
Question
Data selection is
A) The actual discovery phase of a knowledge discovery process
B) The stage of selecting the right data for a KDD process
C) A subject-oriented integrated time variant non-volatile collection of data in support of
management
D) None of these
Question
Hidden knowledge referred to
A) A set of databases from different vendors, possibly using different database paradigms
B) An approach to a problem that is not guaranteed to work but performs well in most cases
C) Information that is hidden in a database and that cannot be recovered by a simple SQL
query.
D) None of these
Question
KDD (Knowledge Discovery in Databases) is referred to
A) Non-trivial extraction of implicit previously unknown and potentially useful information
from data
B) Set of columns in a database table that can be used to identify each record within this table
uniquely.
C) Collection of interesting and useful patterns in a database
D) none of these
Question
...................... is an essential process where intelligent methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
Page 3 of 6
D) Data selection
Question
............................. is a summarization of the general characteristics or features of a target class of
data.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
Question
Trace the results of using the Apriori algorithm on the grocery store example with support
threshold s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets
for each database scan. Also indicate the association rules that are generated and highlight the
strong ones considering confidence.
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 HotDogs, Buns
T3 HotDogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 HotDogs, Coke, Chips
-
Question . Which of the following code will loading dataset named data_store.csv if pandas has
been imported as pd and the file is store in the same directory as your python code.
Page 4 of 6
Question
Discrimination between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Question
a) Explain Data preprocessing technique in data mining.
Question
Explain about the Apriori algorithm for finding frequent item sets with an example. •
Question
What is KDD? Explain about data mining as a step in the process of knowledge discovery
Question
List the characteristics of a data warehouse. ANSWER There are four key characteristics
which separate the data warehouse from other major operational systems:
o Subject Orientation: Data organized by subject
Question
Apply the Apriori algorithm for discovering frequent item sets of the following. Use 0.3 for minimum
support value.
101 milk,bread,eggs
102 milk,juice
103 juice,butter
104 milk,bread,eggs
105 coffee,eggs
106 coffee
Page 5 of 6
107 coffee,juice
108 milk,bread,cookies,eggs
109 cookies,butter
110 milk,bread
Question
Discussion application of association analysis
ANSWER
Association analysis, also known as market basket analysis, has numerous applications in various fields.
Here are some of the popular applications of association analysis:
1. Retail sales: Retailers use association analysis to understand the buying behavior of customers.
By analyzing the purchase history of customers, retailers can identify the items that are
frequently purchased together and use this information to make informed decisions about
product placement and marketing strategies.
2. Cross-selling and up-selling: Association analysis can be used by companies to cross-sell and up-
sell products to customers. By identifying the products that are frequently purchased together,
companies can offer bundle deals and discounts to encourage customers to purchase more
products.
3. E-commerce: Online retailers can use association analysis to recommend products to customers.
By analyzing the browsing and purchase history of customers, online retailers can provide
personalized recommendations to customers, leading to an increase in sales.
4. Healthcare: Association analysis is used in healthcare to identify patterns and trends in patient
data. For instance, it can be used to identify the combination of symptoms that frequently co-
occur in patients, leading to the identification of potential diseases or conditions.
5. Fraud detection: Association analysis can be used in fraud detection by identifying patterns and
relationships in transaction data. For instance, it can be used to identify suspicious patterns of
purchases or transactions that are often associated with fraudulent activities.
6. Social network analysis: Association analysis can be used in social network analysis to identify
the relationships between individuals or groups. This can be useful in identifying influencers and
opinion leaders in social networks.
GOOD LUCK!
Page 6 of 6