DATA MINING MCQ
UNIT-1
1. Data Warehouse is defined as subject-oriented, integrated, time-variant and ___.
a. Volatile
b. Distributed
c. Non-Volatile
d. None of the above
Ans: c
2. Which one of the following is not a tool for Data warehouse development?
a. COGNOS
b. SCCS
c. Informatica
d. Business Objects
Ans: b
3. The Data Warehouse does not cater to the Real-time operational requirements of
the enterprise. (True/False).
Ans: True
4. Data Warehouse contains data for ___ purpose.
a. Real-Time Operation
b. Analysis
c. Validation
d. All of the above
Ans: b
5. In Data Warehouse, the requirements are gathered subject area wise.
(True/False)
Ans: True
6. Which of the following is a Source Data Component in Data Warehouse?
a. Production Data
b. Sales Data
c. Marketing Data
d. Purchase Data
Ans: a
7. Data Marts are.
a. Department level
b. Limited in size
c. Read-only
d. All the above
Ans: d
8. The three major Data Staging Components are Data Extraction, Data
Transformation and ___.
a. Data Retrieval
b. Data Loading
c. Data Refresh
d. Data Access
Ans: d
9. Dimensional model can be implemented with the following databases.
a. Relation database
b. MDDB
c. Flat files
d. Excel data files
Ans: a
10. Fact-Tables usually consists of ___ relationships.
a. Many to many
b. One too many
c. One to one
d. Many to one
Ans: a
11. Each Dimension table has a ___ relationship to the fact table.
a. Many to many
b. One too many
c. Many to one
d. One to one
Ans: b
12. Dimensional table and a fact table can be connected with the following
database keys:
a. Foreign key
b. Surrogate key
c. Candidate key
d. All of the above
Ans: a
13. In Data Warehouse, a single record link to all the duplicate record in the source
systems is called ___.
a. Decoding of fields
b. De-duplication
c. Merging of Information
d. Summarization
Ans: b
14. Which of the following is not a type of data loading?
a. Initial load
b. Incremental load
c. Iterative load
d. Full refresh
Ans: c
15. Adding value to the data to give it more meaning is called ___.
a. Data cleansing
b. Data profiling
c. Data integration
d. Data Enrichment
Ans: d
16. How many levels does CMM have?
a. One
b. Four
c. Five
d. Two
Ans: c
17. The full form of CMM is:
a. Capability Maturity Model
b. Capability Model Maturity
c. Comprehensive Material Management
d. Computer Material Management
Ans: a
18. OLAP stands for.
a. On-Line Application Processing
b. On-Line Analytical Processing
c. On-Line Ability Processing
d. None of the above
Ans: b
19. Which of the following are the intermediate servers that stand in between a
relational back-end server and client front end tools?
a. ROLAP
b. MOLAP
c. HOLAP
d. All of the above
Ans: d
20. A dimensional table does not contain hierarchies. (True/False)
Ans: True
21. ___ is used as a (dynamic) indexing method in relational database management
systems.
a. Bit map indexing
b. B+ tree indexing
c. Compression indexing
d. Clustered indexing
Ans: b
22. Parallelism improve processing for.
a. Large table scans and joins
b. Creation of large indexes
c. Bulk inserts, update and deletes
d. All of the above
Ans: d
23. According to Ralph Kimball, Back-room metadata guides.
a. Extraction
b. Cleaning
c. Loading processes
d. All the above
Ans: d
24. Storing, data mapping and transformation from source systems to the Data
Warehouse fall into:
a. Technical metadata
b. Operational metadata
c. Business metadata
d. None of the above
Ans: a
25. Key hierarchies and key performance indicators are ___ kind of metadata.
a. Technical metadata
b. Operational metadata
c. Business metadata
d. None of the above
Ans: c
26. Which of the following is the white box testing?
a. Unit testing
b. Regression
c. User accepting testing
d. Integration testing
Ans: a
27. Which of the following are the main areas of testing that should be done for the
ETL process.
a. Making sure that all the records in the source system that should be brought into
the warehouse and all the components of the ETL process are complete.
b. All of the extracted source data is correctly transformed into dimension tables
and fact tables
c. All of the extracted and transformed data is successfully loaded into Data
Warehouse
d. All of the above
Ans: d
28. The advantage of using a data cube is that it allows fast indexing to pre-
computed summarized data. (True/False)
Ans: True
29. RAID stands for.
a. Rapid Application integration and Development
b. Redundant Array of Inexpensive Disks
c. Redundant Application of Inexpensive Disks
d. Redundant Array of Integrated Disks
Ans: b
30. Which of the following analytic tools should be used for extracting the data
from the Data Warehouse?
a. OLAP tools
b. Data mining tools
c. SQL
d. All the above
Ans: d
UNIT-2
1. Which of the following data mining technique is used for optimization?
a. Artificial Neural Networks
b. If then rule induction
c. Genetic algorithms
d. Decision trees
Ans: c
2. Which of the following tools provide enterprise intelligence?
a. Data mining
b. Data warehouse
c. Databases
d. None of the above
Ans: a
3. Predictive modelling requires which of the following Data set for initial model
creation?
a. Training data set
b. Test data set
c. Raw data set
d. All of the above
Ans: a
4. Click stream data is used for the following.
a. To track the user activity on the web page
b. To study customer buying patterns
c. Feed about web site design
d. All the above
Ans: d
5. Which of the following is the private network to access the data through the
web.
a. Internet
b. Extranet
c. Intranet
d. None of the above
Ans: c
6. Web-enabling the Data Warehouse uses the following as the information
delivery mechanism.
a. Web technology
b. Grid computing
c. Artificial intelligence
d. None of these
Ans: a
7. Web house is what kind of network?
a. Distributed system
b. Client and server only
c. Parallel system
d. None of the above
Ans: a
8. The system delivers the result of requests for information through remote
browsers is called.
a. Web browser
b. Information delivery
c. Data presentation
d. Data dissemination
Ans: b
9. Who is called the Father of Data Warehouse?
a. Charles Babbage
b. Ralph Kimball
c. Bill Inmon
d. Fritz Bauer
Ans: c
10. Which of the following schema supports the normalization in dimensional
modelling.
a. Star schema
b. Snow-Flake schema
c. Fact-Constellation
d. None of these
Ans: b
11. CMMI means ____.
a. Capability Model Maturity Integration
b. Comprehensive Material Management Information
c. Capability Maturity Model Information
d. Capability Maturity Model Integration
Ans: d
12. Data Cubes contains ___ and ___.
a. Facts, Information
b. Dimensions, Weight
c. Dimensions, Facts
d. Data, Information
Ans: c
13. The hypercube is the cube with ____dimensions.
a. Three
b. Two
c. Four
d. One
Ans: c
44. Writing the same data to two disk drives connected to the same controller ifs
known as ___.
a. Data Duplexing
b. Data Mirroring
c. Disk Striping
d. Data Profiling
Ans: b
15. ___ provides the Enterprise with intelligence and ___ provides the Enterprise
with a memory.
a. Data Warehouse, Databases
b. Databases, Data Mining
c. Data mining, Data warehouse
d. Data Warehouse, Data Mining
Ans: c
16. Which of the following is an open-source Data mining tool?
a. Clementine
b. Intelligent Miner
c. Weka3
d. Enterprise Miner
Ans: c
17. In the star schema, the dimension table is ___ and the fact table is ___.
a. Wide, Wide
b. Wide, Deep
c. Deep, Wide
d. Deep, Deep
Ans: b
18. Which of the following is an open-source ETL tool?
a. Cover
b. SAS data Integrator
c. Cognos Decision Stream
d. Microsoft DTS
Ans: a
19. Confirmed dimension allows user to:
a. Share non-Key dimension data
b. Query Across fact tables with consistency
c. Work on fact and business subjects for which all users have the same meaning
d. All of the above
Ans: d
20. Which of the following is true for the CMM level2?
a. Data quality issues are acknowledged
b. Major problems are handled as and when they surfaced
c. Both a and b.
d. None of these
Ans: c
21. Data Warehouse is ___ triggered whereas OLTP is ___ triggered.
a. Event, User
b. System, User
c. System, Event
d. Insert, Update
Ans: b
22. UAT means.
a. User Acquisition Test
b. User Acceptance Test
c. Usage Acceptance Test
d. Usage Ambiguity Test
Ans: b
23. Meta Data means.
a. Data about Data
b. Catalogue of data
c. Data Warehouse Roadmap
d. All of the above
Ans: d
24. Which of the following interfaces are used to access the Data Warehouse?
a. Browser
b. Search engine
c. Active X applets
d. All the above
Ans: d
25. Data mining is ____ driven approach not ____ driven approach.
a. Event, Data
b. Data, User
c. User, Event
d. User, Data
Ans: b
26. Which of the following is true for Administrative Metadata?
a. Access rights, protocols, physical location, retention criteria
b. Protocols, audit controls, source tables, usage statistics
c. Access rights, audit control, process automation, usage statistics
d. Audit control, schema definition, physical location, retention criteria
Ans: a
27. Which of the following RAID level does not implement error checking?
a. RAID1
b. RAID (0+1)
c. RAID0
d. RAID5
Ans: c
28. ____ and ____ of data take place on a large scale in the data staging area.
a. Sorting, searching
b. Searching, merging
c. Sorting, merging
d. Searching, acquisition
Ans: c
29. True/False
1. Data Warehouse contains only aggregated data and individual transactions.
2. A dimension is an entity or Subject area, which can group the data.
3. E-R modelling and dimensional modelling are the same.
a. 1-T, 2-F, 3-T
b. 1-F, 2-F, 3-F
c. 1-T, 2-T, 3-F
d. 1-F, 2-T, 3-T
Ans: c
30. True/False
1. Sorting the data in the given source file is a transformation
2. OLAP tools enable the user to access the data in Data Warehouse in an
interactive manner.
3. Data mining is a data-driven approach, not a user-driven approach
a. 1-T, 2-T, 3-T
b. 1-F, 2-F, 3-F
c. 1-T, 2-T, 3-F
d. 1-F, 2-T, 3-T
Ans: a
UNIT-3
1. OLTP stands for ___.
Ans. Online Analytical Processing
2. OLTP handles day to day business transactions (true/false)
Ans. True
3. Updates on the Data Warehouse is allowed (true/false)
Ans. False
4. Data Warehouse is a database that is designed for facilitating ___ and ___.
Ans. Query and Analysis
5. Data Warehouse is defined as subject-oriented, integrated, time-variant and ___.
Ans. Non-Volatile
6. Data Warehouse contains only aggregated data and individual transactions (true/false)
Ans. True
7. List the types of the data warehouse.
Ans. Real-time, federated and distributed
8. ___ data Warehouse will allow changes in the information to be monitored and recorded over time.
Ans. time-variant
9. The Data Warehouse functions as ___ and an Executive Information System (EIS).
Ans. DSS
10. Data about data is called ___.
Ans. Metadata
11. Data Warehouse contains data for ___ purpose.
Ans. Analysis
12. Data Warehouse is a storehouse of ___ data.
Ans. Historical
13. In most organizations, two groups of people are key to the success of the project, ___ and ___.
Ans. Senior Management and Working Management
14. OLTP systems are designed for ___.
Ans. Real-time business operations
15. Data Warehouses does not require real-time validation (True / False)
Ans. True
16. In most organizations, two groups of people are key to the success of the project, ___ and ___.
Ans. Senior Management,
17. In Data Warehouse, the requirements are gathered subject area wise. (True / False)
Ans. True
18. The 3 major functions that needed to be performed for getting the data ready into the Data
Warehouse are extraction, transformation and ___.
Ans. Loading
19. ___ and ___ of data take place on a large scale in the data staging area.
Ans. Sorting and Merging
20. Knowledge discovery is called ___.
Ans. Data Mining
21. The main purpose of E-R modelling is
a. To remove redundancy
b. To improve analysis for decision-making
c. To record historical data
d. None
Ans. a
22. E-R modelling and Dimensional modelling are the same (True / False)
Ans. No
23. A Dimension is an entity or subject area, which can group the data (True / False)
Ans. True
24. Dimensional model consists of ___ and ___ tables.
Ans. Dimensions and fact tables
25. ___ is often used in dimensional modelling.
Ans. Text data
26. Fact –tables usually consist of ___to___ relationships.
Ans. Many to many
27. Dimensional model can be implemented with the following databases,
a. Relational database
b. MDDB
c. Flat files
d. Excel data files
e. None
Ans. a
28. Customer name change in the dimensional model comes under ___.
Ans. Slowly-changing-dimension
29. The most popular model for the data warehouse is ___.
Ans. Multidimensional model
30. Which of the following schema supports the normalization in dimensional modelling?
a. Star Schema
b. Snow-Flake schema
c. Fact-Constellation
Ans. a
UNIT-4
1. Each dimension table is in ___ relationship with the central fact table.
Ans. One-to-many
2. Dimensional table and a fact table can be connected with the following database keys:
a. Foreign key
b. Surrogate key
c. Candidate key
Ans. a
3. For sales analysis units sold is a ___ kind of measure.
Ans. Additive numeric measure
4. OLAP tools are data accessing and discovery tools (True / False)
Ans. True
5. In Data Warehouse a system with multiple architectures is called ___
Ans. Federated Data Warehouse architecture
6. Data marts are,
a) Department level
b) Limited in size
c) Read-only
d) All the above
Ans. d
7. Data Warehouse functions are a Decision support system and ___.
Ans. EIS
8. Info Data extraction, ___ and ___ encompass the areas of data acquisition and data storage.
Ans. Transformation and Loading
9. Populating all the Data Warehouse tables for the very first time is called ___.
Ans. Initial Load
10. Which of the following are open source ETL tools?
a) SAS Data Integrator
b) Ascetical Data Stage
c) Cognos Decision Stream
d) Microsoft DTS
e) Clover
Ans. Clover
11. Average daily balances ___ attribute.
Ans. Derived attribute
12. OLAP stands for ___
Ans. Online analytical processing
13. OLAP tools enable the user to access the data in Data Warehouse in an interactive manner (True /
False)
Ans. True
14. ERP and CRM are ___ kinds of systems.
Ans. OLTP
15. Data cube contains ___ and ___.
Ans. Dimensions and Facts
16. A dimensional table contains hierarchies (True / False)
Ans. true
17. Which of the following are the intermediate servers that stand in between a relational back-end
server and client front-end tools?
a. ROLAP
b. MOLAP
c. HOLAP
d. All the above
Ans. all
18. The advantage of using a data cube is that it allows fast indexing to precomputed summarized data.
(True / False)
Ans. true
19. In Data Warehouse, a single record link to all the duplicate record in the sources systems is called
___.
Ans. De-duplication
20. Sorting the data in the given source file is a transformation (True / False).
Ans. True
21. OLTP is abbreviated as ___
Ans. Online transaction processing
22. Query response time is ___ kind of metadata.
Ans. Operational metadata
23. Key hierarchies and key performance indicators are ___ kind of Metadata.
Ans. Business metadata
24. Storing, data mapping and transformation from source systems to the data warehouse fall into:
a. Technical metadata
b. Operational metadata
c. Business metadata
Ans. a
25. According to Ralph Kimball, Back-room metadata guides:
a. Extraction
b. Cleaning
c. Loading processes
d. All the above
Ans. d
26. One tool that can allow data warehouse managers to deal with metadata is called___.
Ans. Repository
27. Access rights, protocols are ___ metadata.
Ans. Administrative metadata
28. Data about data is called ___.
Ans. Metadata
29. Information can be converted into knowledge about ___ patterns and future trends.
Ans. Historical
30. Data about data is called ___.
Ans. Metadata
UNIT-5
1. A priori algorithm operates in ______ method
a. Bottom-up search method
b. Breadth-first search method
c. None of above
d. Both a & b
2. A bi-directional search takes advantage of ______ process
a. Bottom-up process
b. Top-down process
c. None
d. Both a & b
3. The pincer-search has an advantage over a priori algorithm when the largest frequent itemset
is long.
a. True
b. false
4. MCFS stand for
a. Maximum Frequent Candidate Set
b. Minimal Frequent Candidate Set
c. None of above
5. MFCS helps in pruning the candidate set
a. True
b. False
6. DIC algorithm stands for___
a. Dynamic itemset counting algorithm
b. Dynamic itself counting algorithm
c. Dynamic item set countless algorithms
d. None of above
7. If the item set is in a dashed circle while completing a full pass it moves towards
a. Dashed circle
b. Dashed box
c. Solid Box
d. Solid circle
8. If the item set is in the dashed box then it moves into a solid box after completing a full pass
a. True
b. False
9. The dashed arrow indicates the movement of the item set
a. True
b. False
10. The vertical arrow indicates the movement of the item set after reaching the frequency
threshold
a. True
b. False
2
11. Frequent set properties are:
a. Downward closure property
b. Upward closure property
c. A & B
d. None of these
12. Any subset of a frequent set is a frequent set is
A. Downward closure property
B. Upward closure property
C. A and b
13. Periodic maintenance of a data mart means
a. Loading
b. Refreshing
c. Purging
d. All are true
14. The Fp-tree Growth algorithm was proposed by
a. Srikant
b. Aggrawal
c. Hanetal
d. None of these
15. The main idea of the algorithm is to maintain a frequent pattern tree of the date set. An
extended prefix tree
structure starting crucial and quantitative information about frequent sets
a. Priori Algorithm
b. Pinchers Algorithm
c. FP- Tree Growth algo.
d. All of these
16. The data warehousing and data mining technologies have extensive potential applications in
the govt in various
central govt sectors such as :
a. Agriculture
b. Rural Development
c. Health and Energy
d. all of the true
17. ODS Stands for
a. External operational data sources
b. operational data source
c. output data source
d. none of the above
18. Good performance can be achieved in a data mart environment by extensive use of
a. Indexes
b. creating profile records
c. volumes of data
d. all of the above
19. Features of Fp tree are
(i). It is dependent on the support threshold
(ii). It depends on the ordering of the items
3
(iii). It depends on the different values of trees
(iv). It depends on frequent itemsets with respect to give information
a. (i) & (ii)
b. (iii) & (iv)
c. (i) & (iii)
d. (ii) only
20. For a list T, we denote head_t as its first element and body-t as the remaining part of the list
(the portion of the
list T often removal of head_t) thus t is
a. {head} {body}
b. {head_t} {body_t}
c. {t_head}{t_body}
d. None of these
21. Partition Algorithm executes in
a. One phase
b. Two Phase
c. Three phase
d. None of these
22. In the first Phase of Partition Algorithm
a. Logically divides into a number of non-overlapping partitions
b. Logically divides into a number of overlapping Partitions
c. Not divides into partitions
d. Divides into non-logically and non-overlapping Partitions
23. Functions of the second phase of the partition algorithm are
a. Actual support of item sets are generated
b. Frequent itemsets are identified
c. Both (a) & (b)
d. None of these
24. Partition algorithm is based on the
a. Size of the global Candidate set
b. Size of the local Candidate set
c. Size of frequent item sets
d. No. Of item sets
25. Pincer search algorithm based on the principle of
a. Bottom-up
b. Top-Down
c. Directional
d. Bi-Directional
26. Pincer-Search Method Algorithm contains
(i) Frequent item set in a bottom-up manner
(ii) Recovery procedure to recover candidates
(iii) List of maximal frequent itemsets
(iv) Generate a number of partitions
a. (i) only b. (i) & (iii) only
c. (i),(iii) & (iv) d. (i),(ii)&(iii)
4
27. Is a full-breadth search, where no background knowledge of frequent itemsets is used for
pruning?
a. Level-crises filtering by the single item
b. Level-by-level independent
c. Multi-level mining with uniform support
d. Multi-level mining with reduced support
28. Disadvantage of uniform support is
a. Items at lower levels of abstraction will occur as frequently.
b. If minimum support threshold is set too high, I could miss several meaningful associations
c. Both (a) & (b)
d. None of these
29.Warehouse administrator responsible for
a. Administrator
b. maintenance
c. both a and b
d. none of the above
30. The pincer-search has an advantage over a priori algorithm when the largest frequent itemset
is long
a. True
b. false
31. What are the common approaches to tree pruning?
a. Prepruning and Postpruning approach.
b. Prepruning.
c. Postpruning.
d. None of the above.
32. Tree pruning methods address this problem of ___________?
a. Overfitting the branches
b. Overfitting the data
c. a and b both
d. None of the above
33. What is the Full Form of MDL.
a. Maximum Description Length
b. Minimum Description Length
c. Mean Described Length
d. Minimum Described Length