0% found this document useful (0 votes)
24 views5 pages

Data Mining and Warehousing

This document outlines the examination paper for the Sixth Semester B. Tech. in Computer Science and Engineering/Artificial Intelligence and Machine Learning, focusing on Data Mining and Warehousing. It includes various questions related to OLAP queries, ETL processes, schema design, SQL commands, data normalization, decision trees, and clustering algorithms. The exam is structured to assess students' understanding of key concepts in data mining and warehousing within a 3-hour timeframe.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Data Mining and Warehousing

This document outlines the examination paper for the Sixth Semester B. Tech. in Computer Science and Engineering/Artificial Intelligence and Machine Learning, focusing on Data Mining and Warehousing. It includes various questions related to OLAP queries, ETL processes, schema design, SQL commands, data normalization, decision trees, and clustering algorithms. The exam is structured to assess students' understanding of key concepts in data mining and warehousing within a 3-hour timeframe.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Course Code : CAT 307 MPNO/MS – 24 / 1747

Sixth Semester B. Tech. ( Computer Science and Engineering /


Artificial Intelligence and Machine Learning ) Examination

DATA MINING AND WAREHOUSING

Time : 3 Hours ] [ Max. Marks : 60

Instructions to Candidates :—
(1) Assume suitable data wherever necessary.
(2) All questions carry marks as indicated.

1. (a) What is CUBE ? If we create CUBE for sales application with three
dimension for time, location and item, illustrate with example how sub cubes
in lattice can be created. 4(CO1)
(b) Consider the star schema of an automobile data warehouse :
Autos (ModelId, modelname, serialNo, color)
Dealers (DealerId, name, city state, phone)
Time (TimeId, day, week, month, year)
Sales (ModelId, DealerId, TimeId, QtySold, CountSold)
Where the attribute QtySold is intended to be the total price of all automobiles
for the given model, color, date and dealer, while CountSold is the total number
of automobiles in that category. Answer the following OLAP queries :—
(i) Find total sales generated for model name (Maruti, Honda) and dealer
state (Maharashtra, Gujarat) in September 2017 and October 2017
using ROLL – UP across three dimensions – ModelId, DealerId and
TimeId.
(ii) Find total sales generated for model name (Maruti, Honda) and dealer
state (Maharashtra, Gujarat) in September 2017 and October 2017
using CUBE across the dimensions – ModelId, DealerId and TimeId.
(iii) Comment on difference in output using ROLL – UP and CUBE
aggregation clause. 3(CO1)
(c) What do you mean by ETL Process ? What is the purpose of 'refresh'
in ETL process ? 3(CO1)

MPNO/MS-24 / 1747 Contd.


2. (a) Suppose two stocks infosys and TCS have the following values in one
week : (3, 6), (4, 9), (6, 11), (5, 12), (7, 15). If the stocks are affected
by the same industry trends, will their prices rise or fall together ?
4(CO3)

(b) The Restaurants 'SR' wholesale restaurant company supplies equipment to


55 different restaurants in Mumbai, such as tables, chairs, table cloths, napkin
holders, cutlery and so on, as well as kitchen equipment such as saucepans,
knives and chef clothing. They wish to analyze their daily sales in terms of
revenue, unit sales, costs and profit for each product and customer. They
also would like to know this information by product line and product group.

= Design a STAR schema according to the given scenario.

= Convert STAR schema into Snowflake Schema.

Bring out the difference between STAR and Snowflake Schema. 6(CO3)

3. (a) Consider the following snapshot of SALES table :—


Explain how they query : Select the rows from the Sales table where product
is "Washer" and color is "Almond" and division is "East" or "South" will be
executed if bitmap indexes are created on Product, Color and Region columns.
Show the intermediate steps.

5(CO1)

MPNO/MS-24 / 1747 2 Contd.


(b) Write SQL command to create Index Organized Table Employee with the
attributes empno, empname and salary in tablespace tsa as directed :

(1) Empno is primary key for the table.

(2) PCTTHRESHOLD is 20.

(3) Specify Overflow and Including clause. Assume empname to be


included in Including clause.

(4) Give meaning of PCTTHRESHOLD, including and overflow clause.


Mention advantages of IOT over B – tree indexes. 5(CO2)

4. (a) Given is the data for age in particular region after survey :
15, 17, 18, 18, 21, 22, 22, 23, 24, 24, 27, 27, 27, 27, 32, 35,
35, 37, 37, 37, 37, 38, 42, 47, 48, 54, 72. Apply the following methods
and show the results :

(i) Use smoothing by bin means with a depth of 3.

(ii) Use Min - Max normalization to transform the value 36 into the
range 0 . 0 to 1 . 0.

(iii) Use z - score normalization to transform the value.

(iv) Use normalization by decimal scaling to transform the value 36.

(v) Plot an equi - width histogram of width 10.

Sketch examples of different sampling techniques using sample of size 5 and


the strata low, medium and high. 5(CO2)

(b) State what is bitmap join index ? List the advantage of creating bitmap
join index over normal index. Write query which will explain bitmap join
index. 5(CO3)

MPNO/MS-24 / 1747 3 Contd.


5. (a) Construct a decision tree for the following data set using Gini Index.

5(CO3)
(b) Generate the frequent itemsets using the Apriori algorithm for the transaction
database shown below and a minimum support s_min = 3 and minimum
confidence = 60%.

5(CO1)

6. (a) Use DBSCAN algorithm to cluster the following examples with Euclidean
distance as a distance measure.
How many cluster(s) the algorithm will form with Epsilon = 3 and minpoint = 3 ?
Draw the 11 by 11 space on Graph paper and illustrate the discovered clusters.
A1 = (3, 11), A2 = (3, 6), A3 = (9, 5), A4 = (6, 9), A5 = (8, 6), A6 = (7, 5),
A7 = (2, 3), A8 = (5, 11). 5(CO4)

MPNO/MS-24 / 1747 4 Contd.


(b) The distance between five pair of cases given below :
Cluster the five cases using below procedure and draw the Dendograms structure.
(a) Single linkage hierarchical procedure.
(b) Complete linkage hierarchical procedure.

5(CO4)

MPNO/MS-24 / 1747 5 55

You might also like