Sr. No. Questions A B C D Ans: Unit ONE SUB: 410243 DA
Sr. No. Questions A B C D Ans: Unit ONE SUB: 410243 DA
ONE
Sr. No. Questions a b c d Ans
23 What are the five V’s of Big Data? Volume velocity Variety All of the above
d
24 _________ hides the limitations of Java behind
a powerful
Scalding Cascalog Hcatalog Hcalding
b
and concise Clojure API for Cascading.
25 What are the main components of Big Data? MapReduce HDFS YARN All of these
d
26 What are the different features of Big Data
Analytics?
Open-Source Scalability Data Recovery All the above
d
27 Define the Port Numbers for NameNode, Task NameNode
Tracker and
Task Tracker Job Tracker All of the above
d
Job Tracker.
28 Facebook Tackles Big Data With _______ based Project Prism
on Hadoop
Prism ProjectData ProjectBid
a
38 Heuristic is A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
b
different not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases simple SQL query.
paradigms
39 In an Internet context, this is the practice of
tailoring Web
a. Web services b. customer-facin c. client/server
g
d. personalizatio
n
d
pages to individual users’ characteristics or
preferences.
40 Heterogeneous databases referred to A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
a
different b not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases. simple SQL query.
paradigms
UNIT SUB : 410243 DA
TWO
Sr. Questions a b c d Ans
No.
1 Movie Recommendation systems are an example of: Classification Clustering Reinforcement Regression
Learning
b,c
3 0 1 2 3
What is the minimum no. of variables/ features
required to perform clustering?
b
27 Algorithm is It uses
machine-lear
Computation
al procedure
Science of
making
None of these
b
ning that takes machines
techniques. some value performs tasks
Here program as input and that would
can learn produces require
from past some value intelligence
experience as output when
and adapt performed by
themselves to humans
new
situations
28 Bias is A class of
learning
Any
mechanism
An approach to None of these
the design of
b
algorithm employed by learning
that tries to a learning algorithms that
find an system to is inspired by
optimum constrain the the fact that
classification search space when people
of a set of of a encounter new
examples hypothesis situations, they
using the often explain
probabilistic them by
theory reference to
familiar
experiences,
adapting the
explanations to
fit the new
situation.
29 Classification is A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
a
examples into of the classification to
a number of classification a set of
classes of a concept examples
that is given
by a certain
theory
30 Binary attribute are This takes
only two
The natural
environment
Systems that
can be used
None of these
a
values. In of a certain without
general, these species knowledge of
values will be internal
0 and 1 operations
and .they can
be coded as
one bit
32 Cluster is Group of
similar
Operations
on a
Symbolic
representation
None of these
a
objects that database to of facts or ideas
differ transform or from which
significantly simplify data information
from other in order to can potentially
objects prepare it for be extracted
a
machine-lear
ning
algorithm
33 A definition of a concept is-----if it recognizes all the Complete
instances of that concept
Consistent Constant None of these
a
34 A definition or a concept is------------- if it classifies
any examples as coming within the concept
Complete Consistent Constant None of these
b
38 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that carries
(an example useful genetic
IS encrypted information information in
information). from data the form of
genes.
24 Which Association Rule would you prefer High support High support Low support
and medium and low and high
Low support
and low
c
confidence confidence confidence confidence
27 If an item set ‘XYZ’ is a frequent item set, then all subsets of
that frequent item set are
Undefined Not frequent Frequent Can not say
c
28 0.0368 0.0396 0.0389 0.0398
The probability that a person owns a sports car given that
they subscribe to automotive magazine is 40%. We also
b
know that 3% of the adult population subscribes to
automotive magazine. The probability of a person owning a
sports car given that they don’t subscribe to automotive
magazine is 30%. Use this information to compute the
probability that a person subscribes to automotive magazine
given that they own a sports car
33 Classification rules are extracted from _____________ decision tree root node branches siblings
a
34 What does K refers in the K-Means algorithm which is a
non-hierarchical clustering approach?
Complexity Fixed value No of
iterations
. number of
clusters
d
35 If Linear regression model perfectly first i.e., train error is
zero, then _____________________
Test error is
also always
Test error is
non zero
Couldn’t
comment on
Test error is
equal to Train
c
zero Test error error
37 1 2 3 4
How many coefficients do you need to estimate in a simple
linear regression model (One independent variable)?
b
12 1 2 3 4
How many terms are required for building a bayes model?
c
13 Where does the bayes rule can be used? Solving
queries
Increasing
complexity
Decreasing
complexity
Answering
probabilistic
d
query
21 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that
(an example useful carries
IS encrypted information genetic
information). from data information
in the form of
genes.
22 Classification task referred to A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
c
examples into of the classification
a number of classification to a set of
classes of a concept examples
that is given
by a certain
theory
33 20 25 4 15
larger value is 60 and the smallest value is 40 and the
number of classes is 5 then the class interval is
c
35 the classification method in which the upper and lower limit exclusive
of interval is also in class interval itself is called…. method
inclusive
method
mid point
method
None of these
b
36 0.05 0.06 0.07 0.08
Suppose there are 25 base classifiers. Each classifier has
error rates of e = 0.35. Suppose you are using averaging as
b
ensemble of above 25 classifiers will make a wrong
prediction? Note: all classifiers are independent of each
other
37 The most widely used metrics and tools to assess a
classification model are:
Confusion
matrix
Cost-sensitive Area under
accuracy the ROC curve
All of Above
d
III. Patterns that exist in the data can be found more easily by
using a visualization
5 Point out the correct combination with regards to kind keyword ‘hist’ for
for graph plotting. histogram
‘box’ for
boxplot
‘area’ for
area plots
all of the
mentioned
d
6 Which of the following value is provided by kind keyword for
barplot?
bar bar bar none of the
mentioned
a
7 You can create a scatter plot matrix using the __________ method sca_matrix
in pandas.tools.plotting.
scatter_matri DataFrame.pl all of the
x ot mentioned
b
8 Plots may also be adorned with error bars or tables. True FALSE Cannot Tell All Above
a
9 Which of the following plots are often used for checking
randomness in time series?
Autocausation Autorank Autocorrelati none of the
on mentioned
c
29 information Visualtization techniques are Pie Chart Scatterplot Histogram Area Chart
a
30 Which of the following is category of timeline? Linear
Timeline
Modular
Timeline
Variant
Timeline
ER Timeline
a
34 Information Visualtization techniques are Flow Chart Time Line DFD All of above
d
35 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these
c
36 Data visualtization is realted with… Pictorial
representaion
numerical
representatio
numerical
calculations
None of these
a
s n
37 Which of the following follows interactive visualization
approach?
Zoom+Pan Focus+Contex
t
Overview+De all of above
tails
d
38 Which of the following are Use of data visualtization See context of Clear data
data
finding
understandin pattern in
all of above
d
g data
39 Which of the following specifies relationship amongst
variables?
Pie Chart Histogram Area Chart None of these
c
40 Which of the following specifies category Proportions? Pie Chart Scatter Plot Line Chart None of these
a
UNIT SUB : 410243 DA
SIX
Sr. No. Questions a b c d Ans
Which of the following is not a classification Logistic Random K-Means Naïve Bayes
32
techique? Regression Forest
c
Which of the following are components of HIVE? FLATTEN Thrift Server Muster All of above
33 b
Hadoop is a framework that works with a variety MapReduce, MapReduce, MapReduce, All of above
35
of related tools. Common cohorts include Hive and MySQL and Hummer and
a
____________ HBase Google Apps Iguana
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Which of the following is not an example of Social Media?
a. Twitter
1. b. Google
c. Insta
d. Youtube
By 2025, the volume of digital data will increase to
a. TB
2. b. YB
c. ZB
d. EB
For Drawing insights for Business what are need?
a. Collecting the data
3. b. Storing the data
c. Analysing the data
d. All the above
Does Facebook uses "Big Data " to perform the concept of Flashback? Is this True or
False.
4.
a. TRUE
b. FALSE
The Process of describing the data that is huge and complex to store and process is known
as
a. Analytics
5.
b. Data mining
c. Big Data
d. Data Warehouse
Data generated from online transactions is one of the example for volume of big data. Is
this true or False.
6.
a. TRUE
b. FALSE
Velocity is the speed at which the data is processed
7. a. TRUE
b. FALSE
have a structure but cannot be stored in a database.
a. Structured
8. b. Semi-Structured
c. Unstructured
d. None of these
refers to the ability to turn your data useful for business.
a. Velocity
9. b. Variety
c. Value
d. Volume
SUB : 410243 DA
There is only one operation between Mapping and Reducing is it True or False…
a. TRUE
20.
b. FALSE
is a type of local Reducer that groups similar data from the map phase
into identifiable sets.
a. MAPPER
30. b. REDUCER
c. COMBINER
d. PARTITIONER
While Installing Hadoop how many xml files are edited and list them ?
i. core-site.xml
ii. hdfs-site.xml
31.
iii. mapred.xml
iv. yarn.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
32.
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
</?xml >
33. Write the code for hdfs-site.xml ?
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Movie Recommendation systems are an example of
1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression
a. 2 Only
1.
b. 1 and 2
c. 1 and 3
d. 2 and 3
Sentiment Analysis is an example of
1. Regression 2. Classification 3. Clustering 4 Reinforcement Learning
a. 1, 2 and 4
2.
b. 1 and 3
c. 1, 2 and 3
d. 1 and 2
Can decision trees be used for performing clustering?
3. a. True
b. False
What is the minimum no. of variables/ features required to perform clustering?
1. 0
4. 2. 1
3. 2
4. 3
For two runs of K-Mean clustering is it expected to get same clustering results?
5. 1. Yes
2. No
Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations. 4.Terminate when RSS falls
6.
below a threshold.
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Which of the following algorithm is most sensitive to outliers?
1. K-means clustering algorithm
7. 2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
After performing K-Means Clustering analysis on a dataset, you observed the following
8.
dendrogram. Which of the following conclusion can be drawn from the dendrogram?
SUB : 410243 DA
9.
1. 1
2. 2
3. 3
4. 4
In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
10. 4. Data points with non-convex shapes
a. 1 and 2
b. 2 and 3
c. 2 and 4
d. 1, 2 and 4
The discrete variables and continuous variables are two types of
a. Open end classification
11. b. Time series classification
c. Qualitative classification
d. Quantitative classification
SUB : 410243 DA
Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a
12. hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
4. None of these
Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a
13.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given by a
14.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible
15.
solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. None of these
is good at handle missing data and support both the kind of
attributes ( i.e Categorial and Continuous attributes )
a. ID3.
16.
b. C4.5.
c. CART.
d. Naïve Bayes.
Decision trees use , in that they always choose the option
that seems the best available at that moment.
a. Greedy Algorithms.
17.
b. Divide and Conquer.
c. Backtracking.
d. Shortest Path Method.
Decision trees cannot handle categorical attributes with many distinct values, such as
country codes for telephone numbers.
18.
a. TRUE
b. FALSE
19. are easy to implement and can execute efficiently even without
SUB : 410243 DA
prior knowledge of the data, they are among the most popular algorithms for classifying
text documents.
a. ID3
b. Naïve Bayes classifiers
c. CART
d. None of these.
High entropy means that the partitions in classification are
a. Pure
20. b. Not pure
c. Useful
d. Useless
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
21. b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
The maximum value for entropy depends on the number of classes so if we have 8 Classes
what will be the max entropy.
22.
a. Max Entropy is 1
b. Max Entropy is 2
c. Max Entropy is 3
d. Max Entropy is 4
John flies frequently and likes to upgrade his seat to first class. He has determined that if
he checks in for his flight at least two hours early, the probability that he will get an
upgrade is 0.75; otherwise, the probability that he will get an upgrade is 0.35. With his
busy schedule, he checks in at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt. What is the
23.
probability that he did not arrive two hours early?
a. 0.892
b. 0.796
c. 0.685
d. 0.999
Point out the wrong statement.
a. k-nearest neighbor is same as k-means
24. b. k-means clustering is a method of vector quantization
c. k-means clustering aims to partition n observations into k clusters
d. none of the mentioned
Consider the following example “How we can divide set of articles such that those articles
have the same theme (we do not know the theme of the articles ahead of time) " is this:
25.
1. Clustering
2. Classification
3. Regression
4. None of These
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
metric is examined to determine a reasonably optimal value of
k.
1. Mean Square Error
1.
2. Within Sum of Squares (WSS)
3. Speed
4. None of These
If an itemset is considered frequent, then any subset of the frequent itemset must also be
frequent.
1. Apriori Property
2.
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 & 2
if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the
confidence of rule {bread,eggs}→{milk} is
1. 0
3.
2. 1
3. 2
4. 3
Confidence is a measure of how X and Y are really related rather than coincidentally
happening together.
4.
a. True
b. False
A high-confidence rule can sometimes be misleading because confidence does not consider
support of the itemset in the rule consequent. Is This True ?
5.
a. Yes
b. No
recommend items based on similarity measures between users and/or
items.
1. Content Based Systems
6.
2. Hybrid System
3. Collaborative Filtering Systems
4. None of These
SUB : 410243 DA
Answer
A
MCQ No - 2
What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
Answer
D
MCQ No - 3
What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
Answer
D
MCQ No - 4
According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like
Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
Answer
A
MCQ No - 5
What are the four V’s of Big Data?
(A) Volume
(B) Velocity
(C) Variety
(D) All the above
SUB : 410243 DA
Answer
D
(B) Real-time
(C) Java-based
Answer
B
MCQ No - 7
(B) Drill
(C) Oozie
Answer
A
MCQ No - 8
Answer
C
MCQ No - 9
SUB : 410243 DA
Answer
B
MCQ No - 10
(B) Both data and cost effective ways to mine data to make business sense out of it
Answer
B
The new source of big data that will trigger a Big Data revolution in the
years to come is
(A) Business transactions
(D) RDBMS
Answer
C
MCQ No - 12
(B) Row
(C) Event
SUB : 410243 DA
(D) Record
Answer
C
MCQ No - 13
Listed below are the three steps that are followed to deploy a Big Data
Solution except
(A) Data Ingestion
Answer
C
MCQ No - 14
Check below the best answer to "which industries employ the use of so-
called "Big Data" in their day to day operations?
(A) Weather forecasting
(B) Marketing
(C) Healthcare
Answer
D
MCQ No - 15
(B) False
Answer
A
SUB : 410243 DA
MCQ No - 16
Answer
A
MCQ No - 17
(B) 1970
(C) 1998
(D) 2005
Answer
C
MCQ No - 18
(B) Unstructured
(C) Processed
(D) Semi-Structured
Answer
C
MCQ No - 19
(B) Ad targeting
SUB : 410243 DA
(C) Scheduling optimization
Answer
D
MCQ No - 20
The feature of big data that refers to the quality of the stored data is
______
(A) Variety
(B) Volume
(C) Variability
(D) Veracity
Answer
D
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1
1) What is Big Data?
a) Huge amount of data
b) Small amount of data
c) Huge File
d) Big Storage
Ans: a
Explanation: It is Huge amount of data
2) According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a
Explanation: Big data management and data mining
3) What are the main components of Big Data?
a)MapReduce
b)HDFS
c)YARN
d)All of these
Ans: d
Explanation: All of these
4) The sources of Big Data are
a)Stock Exchange
b)Transport Data
c) Banking Data
d) All of the Above
Ans: d
Explanation:
5) Big Data Characteristics are:
a) Structured data
b) Semi-structured data
c) Quasi-structured data
d) All of the above
Ans: d
Explanation:
6) Bl tends to provide reports, dashboards, and queries on business
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: a
Explanation:
12) Select from option which is not the phase of data analytics
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) model planning
b) testing
c) discovery
d) operationalize
Ans: b
Explanation:
13) Which phase of data analytics require more time to complete
a) Data preparation
b) model building
c) communicate results
d) Discovery
Ans: a
Explanation:
14) What is analytic sandbox?
a) Tool
b) Separate repository
c) data cleaning
d) Data conditioning
Ans: b
Explanation:
15) The person which provides analytic techniques and modeling is called as.
a) Data Engineer
b) Data scientist
c) Business user
d) Project manager
Ans: b
Explanation:
16) What is task of Project manager?
a) analytic modelling
b) Provide requirement
c) ensure meeting objectives
d) creates DB environment
Ans: c
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Explanation:
17) Identifying Key Stakeholders this task is performed in which phase?
a) Data preparation
b) model building
c) Discovery
d) communicate results
Ans: c
Explanation:
18) ETL process is performed in which phase
a) Discovery
b) communicate results
c) model planning
d) Data preparation
Ans: d
Explanation:
19) How much data Data science teams prefer for analysis?
a) too little
b) average
c) more
d) more than average
Ans: c
Explanation:
20) select from option tool which is not used in model planning phase
a) Data wrangler
b) R
c) SQL Analysis service
d) SAS/ACESS
Ans: c
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
21) if reports and dashboards will be impacted and need to change this task is
performed by.
a) Project sponsor
b) BI Analyst
c) Data Engineer
d) Project manager
Ans: b
Explanation:
22) What is need of data analytic lifecycle.
a) Data cleaning
b) To solve Big data problems
c) Data conditioning
d) Data Exploration
Ans: b
Explanation:
23) How many phases are there in data analytic lifecycle?
a) 4
b) 5
c) 6
d) 7
Ans: c
24) The person with technical skills is called as?
a) Business user
b) Data Engineer
c) Data scientist
d) Project sponsor
Ans: b
25) What is outcome of Model building phase?
a) Analytic results
b) Quality data
c) Data
d) Potential resources
Ans: a
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Pravin S.Patil
Subject Teacher
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1I
1) 1. A statement made about a population for testing purpose is called?
a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic
Ans: b
Explanation:
2) If the assumed hypothesis is tested for rejection considering it to be true is
called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: a
Explanation:
3) A statement whose validity is tested on the basis of a sample is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: b
Explanation:
4) A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: c
Explanation:
5) If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
Ans: d
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
b) β
c) α
d) 1-β
Ans: c
Explanation:
13) Alternative Hypothesis is also called as?
a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis
Ans: b
Explanation:
14) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: d
Explanation:
15) Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Ans: c
Explanation:
16) Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Ans: a
Explanation:
17) Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Ans: a
Explanation:
18) Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Ans: b
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) True
b) False
Ans: a
20) Depending on acceptance and rejection of null hypothesis there are 2 types of
error produced
a) Type 1
b) Type 2
c) None of these
d) All of these
Ans: d
21) The power of a test can be defined as a possibility of …
a) Image Processing
b) Medical
c) Customer Segmentation
d) All of the above
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: d
Pravin S.Patil
Subject Teacher
Unit-I
View Answer
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data.
View Answer
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are
Volume, Velocity, Variety, Veracity, Value
View Answer
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the
bank.
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured.
View Answer
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
View Answer
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies.
7. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80%
B. 85%
C. 90%
D. 95%
View Answer
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past
two years is 90%.
8) Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Ans: Data Cleansing
10. Communicative and collaborative is one among the key skill sets and behavioral characteristics of a
data scientist [True / False]?
a. True
b. False
Answer : a
11. ---------- are the sources of Bigdata [select all that apply]
I. Book
II. Facebook
III. Genome sequence
IV. Video Surveillance
Ans:
12. BI analyses the past data and make future predictions True/False ?
a. True
b. False
Answer : b
Ans: Phase 2 Data preparation is done in this phase. An analytical sandbox is used in this to perform
analytics for the entire duration of the project. While you explore, preprocess and condition data,
modeling follows suit. To get the data into the sandbox, you will perform ETLT (extract, transform, load
and transform).
A. Discovery
B. Model Planning
C. Model Building
D. Data Preparation
Phase 2 — Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team
can work with data and perform analytics for the duration of the project. The team needs to execute
extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
14. In which phase would the team expect to invest most of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
15. In which phase would the team expect to invest least time of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
16. from following tools which tool is used for Model building?
Ans B
17. from following tools which tool is used for Data preparation
a. Alpine Miner b. Excel c. Matlab d.Weka
Ans . A
18. To determine if the project was completed on time and within budget, is the key role of _____
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
A. 3
B. 6
C. 7
D. Any
20. In data Analytics life cycle we can move back and refine the work done. True or False
A. True
B. False
A. PPT
B.report
C. code
D. All of above
22. ________ provides subject matter expertise for analytical techniques, data modeling and applying
valid analytical techniques to give business problems.
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
Unit-II
(a) Hypothesis
(d) Test-statistic
Answer : a
2. Any hypothesis which is tested for the purpose of rejection under the assumption that it is true is
called:
Answer : a
3. A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is
false is called:
Answer : d
Answer : b
6. If the critical region is located equally in both sides of the sampling distribution of test-statistic, the
test is called:
Answer : b
Answer : b
Answer : a
10. A formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
Answer : a
Answer : a
(a) Size of α
(b) Size of β
(c) Test-statistic
Answer : a
13. Student’s t-test is applicable only when:
Answer : a
14. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated t should be
obtained for:
Answer : d
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the conclusions (c) To
draw conclusions about populations from sample data
Answer : c
16. The histogram to the right represents the hospital length of stay (in days) for patients at a nearby
medical facility. How many patients are included in the histogram?
a. 5
b. 21
c. 17
d. 9
Answer : b
17. Using the histogram to the right that represents the hospital lengths of stay (in days) for patients at a
nearby medical facility, determine the relationship between the mean and the median.
a. Mean = Median
b. Mean ≈ Median
Answer : d
18. The statement “If there is sufficient evidence to reject a null hypothesis at the 10%
significance level, then there is sufficient evidence to reject it at the 5% significance level” :
a. Always True
b. Never True
c. Sometimes True; the p-value for the statistical test needs to be provided for a conclusion
d. Not Enough Information; this would depend on the type of statistical test used
Answer : c
a) ANOV
b) AVA
c) ANOVA
d) ANVA
Ans:c
20) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: defined distance metric, number of clusters, initial guess as to cluster centroids
25) Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1),
(-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
a) Yes
b) No
Ans: Yes
27) The most commonly used measure of similarity is the _____ or its square.
a)euclidean distance
b)city-block distance
c)Chebychev’s distance
d)Manhattan distance
Ans: euclidean distance
30) Clustering is a-
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. None
Ans: Unsupervised learning
31) Which of the following clustering algorithms suffers from the problem of convergence at local
optima?
A. K- Means clustering
B. Hierarchical clustering
C. Diverse clustering
D. All of the above
Ans: K- Means clustering, Hierarchical clustering, Diverse clustering
33) Which of the following is a bad characteristic of a dataset for clustering analysis-
A. Data points with outliers
B. Data points with different densities
C. Data points with non-convex shapes
D. All of the above
Ans: Data points with outliers, Data points with different densities, Data points with non-convex
shapes
34) For clustering, we do not require-
A. Labeled data
B. Unlabeled data
C. Numerical data
D. Categorical data
Ans: Labeled Data
a. Parametric
b. non parametric
c. Distributed
d. Normal
38. Input data for Wilcoxon test is normally distributed, True or False?
d. None of these
40 Which of following test statics is used in Wilcoxon Rank Sum Test?
d. none of these.
40. What must you include when applying Wilcoxon Rank sum test?
a. variance
b. Critical Value
c. Rank sum
e. standard deviation
a. False Positive
b. false negative
c. True Positive
d. True negative
a. False Positive
b. False negative
c. True Positive
d. True negative
ANOVA
a. Means
b. variance
c. standard Deviation
d. None of above.
b. F ratio
c. T-score
d. Chi Square
Q.25 What are the two types of variance which can occur in your data?
Q.26 If between group mean sum of square variability increases value of F statistics_____
a. Increases
b. Decreases
c. Neutral
d. None of these
Q.27 What must you include when applying ANOVA test?
a. Means
b. Critical Value
c. degree of freedom
d. F statistics
e. All of above
a.1
b.3
c.2
d.any
d.None of these
b.ANCOVA
c.MANOVA
d.ZANOVA
Unit-III
(A)Itemset
(B)Support
(C)Confidence
(D)Support Count
Ans:A
(A)Support
(B)Confidence
(C)Support Count
(D)Rules
Ans:C
3.An itemset whose support is greater than or equal to a minimum support threshold is ______
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Ans:B
(A)It mines all frequent patterns through pruning rules with lesser support
(B)It mines all frequent patterns through pruning rules with higher support
Ans:C
(B)Transaction Increases
(C)Sampling
(D)Cleaning
Ans:A
A) TRUE
B) FALSE
Ans:A
A) TRUE
B) FALSE
Ans:A
8.Which of the following methods do we use to find the best fit line for data in Linear
Regression?
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans:A
9. A local retailer has a database that stores 10,000 transactions of lastsummer. After
analyzing the data,a data science team has identified thefollowing statistics:• {battery}
appears in 6,000 transactions.• {sunscreen}appears in 5,000 transactions.• {sandals}
appears in 4,000 transactions.•{bowls} appears in 2,000 transactions.• {battery, sunscreen}
appears in1,500 transactions.• {battery, sandals} appears in 1,000 transactions.•{battery,
bowls} appears in 250 transactions.• {battery, sunscreen, sandals}appears in 600
transactions. Q) What are the confidence values of{battery}->{ sunscreen} and {battery,
sunscreen}->{ sandals} ?
a) 0.3 and 0.4
b) 0.25 and 0.4
c) 0.25 and 0.15
d) 0.6 and 0.4
Ans: b
a) Cor(X, Y) = 1
b) Cor(X, Y) = 0
c) Cor(X, Y) = 2
Ans:b
11. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
Ans:C
12.Which of the following metrics can be used for evaluating regression models?
i) R Squared
iii) F Statistics
b) i and ii
Ans:d
13.How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Ans:b
14.In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
Ans:d
a) lm(formula, data)
b) lr(formula, data)
c) lrm(formula, data)
d) regression.linear(formula, data)
Ans:a
16.In syntax of linear model lm(formula,data,..), data refers to ______
a) Matrix
b) Vector
c) Array
d) List
Ans:b
17.In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Ans:c
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
19.The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
Ans:b
20.Predicting y for a value of x that’s outside the range of values we actually saw for x in the
original data is called ___________
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:b
21.What is predicting y for a value of x that is within the interval of points that we saw in the
original data called?
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:c
22. ________ is a simple approach to supervised learning. It assumes that the dependence of Y
on X1, X2, . . . Xp is linear.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
23.Although it may seem overly simplistic, _______ is extremely useful both conceptually and
practically.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
24. __________ refers to a group of techniques for fitting and studying the straight-line
relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
Ans: c
Data Processing and Analysis
Unit 4
1. What is a hypothesis?
Answer: a
a. True
b. False
Answer: a
a. Concurring
b. Coding
c. Colouring
d. Segmenting
Answer: b
4. What is the cyclical process of collecting and analysing data
during a single research study called?
a. Interim analysis
b. Inter analysis
c. Inter-item analysis
d. Constant analysis
Answer: a
a. Typology
b. Diagramming
c. Enumeration
d. Coding
Answer: c
a. Can reduce time required to analyse data (i.e., after the data are
transcribed)
b. Help in storing and organising data
c. Make many procedures available that are rarely done by hand
due to time constraints
d. All of the above
Answer: d
7. Boolean operators are words that are used to create logical
combinations.
a. True
b. False
Answer: a
a. Categories
b. Units
c. Individuals
d. None of the above
Answer: a
a. Segmenting
b. Coding
c. Transcription
d. Mnemoning
Answer: c
Answer: a
11. Hypothesis testing and estimation are both types of descriptive
statistics.
a. True
b. False
Answer: b
a. True
b. False
Answer: a
13. A graph that uses vertical bars to represent data is called a ___
Answer: b
14. ___________ are used when you want to visually examine the
relationship between two quantitative variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
Answer: d
15. The denominator (bottom) of the z-score formula is
Answer: a
a. Normal Distribution
b. Chi-Squared Distribution
c. Gamma Distribution
d. Poisson Distribution
Answer b
a. Statistic
b. Hypothesis
c. Level of Significance
d. Test-Statistic
Answer: b
18. If the assumed hypothesis is tested for rejection considering it to
be true is called?
a. Null Hypothesis
b. Statistical Hypothesis
c. Simple Hypothesis
d. Composite Hypothesis
Answer: a
a. Null Hypothesis
b. Positive Hypothesis
c. Negative Hypothesis
d. Alternative Hypothesis.
Answer: d
a. Composite hypothesis
b. Research Hypothesis
c. Simple Hypothesis
d. Null Hypothesis
Answer: b
marks question A B C D ans
A group of 4 bits is also
0 1 Nibble Byte Kb None 4 bits make one nibble.
called?
There are how many types of
1 1 3 2 1 None Big Data is of 3 types.
Big Data:
Which of the following are the
2 1 All Volume Variety Velocity. This is an explaination.
V's of Big Data:
Which of these is not a
3 1 Storage Volume Variety Velocity. This is an explaination.
characterstic of Big data?
Which of the following is a Big Data requires high cost to
4 2 Cost Significant Process Fraud Detection
drawback of Big Data: maintain huge amount of data
GINA stands for Global
Global Innovation Network and Global Invention in Globally Investment in
5 2 Fullform of GINA is: None Innovations Networks and
Analysis. Networks and Analytics Neurons and Analytics
Analysis.
Which is the phase 3 in Data Model Planning is the 3rd phase
6 2 Model Planning Model Building Data Preparation Operationalize
Analytics Life cycle. in life cycle.
GINA team thought to GINA targeted to achieve three
7 2 3 2 1 5
accomplish mainly____ goals: goals for the project.
The Data Preparation stage
8 2 Analyzation Collection Cleansing Processing. This is an explaination.
doesn’t involve:
Unstructured Data is further Unstructured data is divided into
9 2 2 3 4 5
divided into how many types? 2 types.
The GINA team mainly used
The team used Tableau to
10 2 which software tool to analyze Tableau Hadoop HIVE SQL
visualize the Data.
the Data
Which of the follwing is the first
11 2 step of Data Analytics Life Discovery Data Preparation. Model Planning Data Aware This is an explaination.
Cycle:
There are how many phases in there are 6 stages in data
12 2 6 5 4 7
data analytics life cycle: analytics life cycle.
SEMMA Methodology has SEMMA methodology has five
13 2 5 4 6 7
how many stages: stages.
Which phase of Life Cycle
Phase 5 involves collaboration
14 2 requires collaboration with Phase 5 Phase 6 Phase 4 Phase 3
with stakeholders.
stakeholders?
In Building a Model, how many
15 2 2 3 4 5 This is an Explaination.
phases are required:
How much Data in the whole Only 20% of world's total data is
16 2 0.2 0.4 0.6 0.5
world is structured: structured.
10^7 bytes of memory is equal
17 2 1ZB 1TB 1YB 1XB 10^7 B is equal to 1 ZB.
to:
Data Scientists in the GINA
NLP technique was used on the
team used which technique on Natural Language
18 2 Hadoop HIVE SQL description of Innovation
the textual Description of the Processing(NLP)
Roadmap Idea.
Innovation Roadmap Idea.
How many types of data Two types of data anlytical
19 2 analytics methodologies are 2 4 3 6 methodologies are there. EDA
there? and CDA
Bell Curve is also known as
20 3 Other name for Bell Curve is: Normal Distribution. Poisson Distribution Bionomial Distribution Bernoulli Distribution.
normal distribution.
One of the most important tasks
One of the most important
21 3 Statical Modeling Testing of Data Visualization Operationalize in big data analytics is statistical
tasks in big data analytics is:
modeling
Some of the approaches
considered for building the data
22 3 All CRISP-DM SEMMA MAD Skills This is an explaination.
analytics lifecycle framework
best practices are:
In Phase 4, the team develops
23 3 All Testing of Data Training of Data Production purposes This is an explaination.
datasets for:
Cross International Company's Initial CRISP-DM stands for Cross
Fullform of CRISP-DM Cross Industry Standard Process Common Industry Standard
24 3 Standard Process for Standards Progress for Industry Standard Process for
Methodology is: for Data Mining Program for Data Mining
Data Modeling Data Methods Data Mining.
SEMMA Methodology
25 3 doesn’t include which of the Evaluate Sample Explore Asses This is an Explaination.
following stages:
In Which stage, the data is In last phase i.e. Opeartionalize
monitored and analyzed to see Data is monitored and analyzed
26 3 Operationalize Collection Plan Model Data Aware
if the generated model is to see if the generated model is
creating the expected results. creating the expected results.
Data is captured in how many
27 3 3 4 5 6 Data is captured in 3 main ways.
ways:
marks question A B C D ans
In phase 2 of the Data
The team performs ETL and
Anlaytics Life Cycle, the team
28 3 3 2 4 6 ELT and ETLT in 2nd phase of
performs how many analytics
the cycle.
to get the data in the sandbox.
The total area under the bell Area under the bell curve is 1
29 3 1 2 3 4
curve is____unit. unit.
Wilcoxon rank-sum test is also Wilcoxon rank-sum test is also
30 1 Mann-Whiteney U test Mean Difference Alternative Hypothesis Null Hypothesis
known as? called Mann- Whiteney U Test.
Which test is also known as T-
31 1 Hypothesis Test Mean Difference K-means test None This is an explaination.
test?
This eqn is of Mean difference
32 1 This equation is of which test? Mean Difference K-Means Null Hypothesis Alternative Hypothesis
test.
A test of a statistical A test of a statistical hypothesis,
hypothesis, where the region of where the region of rejection is
33 1 rejection is on a side of the One tailed test Two-tailed test Tailed test Null test on only one side of the sampling
sampling distribution, is distribution, is called a one-tailed
called___________. test
How many types of Statical There are two types of Statical
34 1 2 3 4 6
Hypothesis is there? Hypothesis.
Analysis of Variance is also ANOVA stands for Analysis of
35 1 ANOVA Mean Difference Alternative Hypothesis Null Hypothesis
refered as? Variance.
How many steps are involved There are 4 steps in Hypothesis
36 1 4 2 3 5
in a Hypothesis Testing? testing.
The strength of evidence in The strength of evidence in
37 2 support of a null hypothesis is P-value K-value H-value Null-value support of a null hypothesis is
measured by? measured by the P-value.
Difference in means is also Difference in means is also
38 2 Two sample t-test T- test M-test Two sample test
called? known as two sample t test.
The k-medoids is also The k-medoids is also called
Partitioning Around Medoids
39 2 called_______________ Lloyd's Algorithm Poisson's Algorithm Regression partitioning around medoids
(PAM)
algorithm. (PAM) algorithm .
Clustering is an example of Clustering is an example of
40 2 Unsupervised Learning Supervised Learning Classification Regression
____? unsupervised learning.
Which of the following is not an
41 2 advantage of K means Requires a Priori Fast Robust easy to evaluate. This is an explaination.
Clustering?
The probability of committing a The probability of committing a
42 2 Beta Alpha Delta Theta
Type 2 error is called Type II error is called Beta
The______ variation we have
The less variation we have within
within clusters, the more
clusters, the more homogeneous
43 2 homogeneous (similar) the data Less More Variable Fixed
(similar) the data points are
points are within the same
within the same cluster.
cluster.
Which hypothesis is usually the Null Hypothesis is usually the
hypothesis in which sample hypothesis that sample
44 2 Null-Hypothesis Mean Difference K-means test Alternative Hypothesis
observations result is purely observations result purely from
from chance? chance.
Classical" ANOVA for
Classical" ANOVA for balanced
45 2 balanced data does how many 3 2 1 4
data does three things at once.
things at once?
K-mean clustering is used to NP hard problems are solved
46 2 NP-hard problems NP Problems Hypothesis Problems P problems
solve which problems? using K means clustering.
The probability of committing a The probability of committing a
47 2 Alpha Beta Gama Delta
Type I error is called? Type I error is called alpha
K means Clustering is also K means clustering is also called
48 2 Lloyd's Algorithm Gaussian Algorithm Poisson's Algorithm None
known as? Lloyds algo.
Which algorithm requires the k-means clustering requires the
49 3 user to specify the number of K-means clustering Gaussian Algorithm Alternative Hypothesis Null Hypothesis user to specify the number of
clusters k to be generated. clusters k to be generated.
K means clsutering uses which expectation-maximization
50 3 approach to solve the Expectation-maximization Greedy Approach Divide and Conquer None technique is used by k means
problems? clustering.
How many factors affect the The power of a hypothesis test is
51 3 3 2 1 4
power of a hypothesis test? affected by three factors.
Law of variance is also called
52 3 Law of Variance is called? Eve's Law Laplace Law Poisson's Algorithm Regression
Eve's law.
K-Medoids use which K Medoids use greddy
53 3 Greedy Approach Divide and Conquer Recursive None
approach to solve problems? approach to solve problems
The time complexity of k Time complexity is O(n^2) of k
54 3 O(n^2) O(nlogn) O(n) O(1)
means clustering is? means clustering.
the number (k ) of clusters
The number k of clusters
55 3 assumed in k-medoids is Priori Null Hypothesis ANNOVA Effect size
assumed known as priori.
known as?
marks question A B C D ans
The effect size is the difference
What is the difference between
between the true value and the
56 3 the true value and the value Effect -size Null Hypothesis Alternative Hypothesis ANOVA
value specified in the null
specified in the null hypothesis.
hypothesis.
Time complexity of k medoids
57 3 O(n^2) O(nlogn) O(n) O(n^3) This is an explaination.
is?
Which algorithm aims at K means algorithm aims at
58 3 minimizing an objective function K-means Mean Difference Alternative Hypothesis ANOVA minimizing an objective function
know as squared error function know as squared error function
Which algorithm was the
Apriori Algorithm was earliest in
59 1 earliest of the association rule Apriori Algorithm Gaussian Algorithm K means clustering Bernoulli Distribution.
the association of algorithms.
algorithms?\n
The Apriori algorithm takes The Apriori algorithm takes a
a______ iterative approach to bottom-up iterative approach to
60 1 uncovering the frequent Bottom-Up Top-Down Recursive None uncovering the frequent itemsets
itemsets by first determining all by first determining all the
the possible items possible items
Apriori uses breadth-first search
Apriori uses which structure to
and a Hash tree structure to
61 1 count candidate item sets BFS DFS Queue Stack
count candidate item sets
efficiently?
efficiently
"y=a+b*x^2". This equation
62 1 Polynomial Regression Logistic Regreasion Linear Regression Lasso Regression This is an explaination.
shows which regression?
__________ is defined as the Confidence is defined as the
measure of certainty or measure of certainty or
63 2 Confidence Recursion Item-set None
trustworthiness associated with trustworthiness associated
each discovered rule. with\neach discovered rule.
In which Regression, we In Logistic Regression, we
64 2 Logistic Regression Linear Regression Both None
predict the value by 1 or 0? predict the value by 1 or 0.
The formula for linear The formula for linear regression
65 2 Y’ = bX+A Y’ = bX - A. Y’ = bX /A. Y’ = bX * A.
regression is: is: Y’ = bX + A.
Which regression is useful PLS regression is also useful
Partial Least Squares(PLS)
66 2 when there are a large number Cox Regression Lasso Regression Logistic Regression when there are a large number of
Regression
of independent variables. independent variables.
Which regression is an Simple linear regression is an
67 2 approach for predicting a Linear-Regression Logistic Regreasion Elasticnet Regression None approach for predicting a
response using a single feature. response using a single feature.
Association rule mining consists Association rule mining consists
68 2 2 3 4 5
of _______ steps. of 2 steps
Which type of regression is Ordinal regression is suitable
69 2 suitable when dependent Ordinal Regression Linear Regression Cox Regession Logistic Regression when dependent variable is
variable is ordinal in nature? ordinal in nature
Which regression is used for ElasticNet regression is used for
70 2 ElasticNet Regression Linear Regression Logistic Regression None
support vector machines support vector machines,
Which regression can solve Support-Vector Regession can
71 2 both linear and non-linear Support Vector Regression Linear Regression Logistic Regression ElasticNet Regression solve both linear and non linear
models? models.
Which is the most common Least Square Method is the most
72 2 method used for fitting a Least Square Method Mean Difference Null Hypothesis Classification common method used for fitting
regression line a regression line
_______problems are when A regression problem is when
73 2 the output variable is a real or Regression Classification Recursive Hypothesis the output variable is a real or
continuous value. continuous value.
Linear Regression is a machine
Linear Regression is a machine
learning algorithm based on
74 2 Supervised Learning Unsupervised Learning Recursive Learning All learning algorithm based on
______ learning regression
supervised regression algorithm.
model.
When dependent variable's
When dependent variable's
variability is not equal across
variability is not equal across
75 2 Heteroscedasticity Homooscedasticity Multicolinearity Outliers. values of an independent
values of an independent
variable, it is called
variable, it is called
heteroscedasticity
_________requires large Logistic Regression requires
sample sizes because maximum large sample sizes because
76 2 likelihood estimates are less Logistic Regression Linear Regression Lasso Regression ElasticNet Regression maximum likelihood estimates
powerful at low sample sizes are less powerful at low sample
than ordinary least square sizes than ordinary least square
PCR Regression is divided into PCR regression is divided into 2
77 2 2 3 4 5
how many steps? steps
78 3 L2 regularization is also called? Tikhonov Regularization Norm Regularization Poisson's Regularization None This is an explaination.
When the variance of count When the variance of count data
79 3 data is greater than the mean Overdispersion Underdispersion Dispersion High dispersion is greater than the mean count, it
count, it is a case of? is a case of overdispersion
marks question A B C D ans
Which regression assumes the Linear regression assumes the
80 3 normal distribution of the Linear-Regression Logistic Regreasion Elasticnet Regression None normal or gaussian distribution of
dependent variable? the dependent variable.
Nature of predicted data in Nature of predicted data in
81 3 Ordered Unordered Both None
regression is? regression is ordered.
Which regression uses a binary Logistic regression uses a binary
82 3 dependent variable but ignores Logistic Regression Linear Regression Cox Regession Lasso Regression dependent variable but ignores
the timing of events. the timing of events.
The Ridge Regression is also The ridge regression is also
83 3 Shrinkage Regression Percentile Regression Elasticnet Regression Lasso Regression
known as? known as Shrinkage Regression.
In which regression, we In Linear Regession we calculate
calculate Root Mean Square Root Mean Square
84 3 Linear-Regression ElasticNet Regression Logistic Regression All
Error(RMSE) to predict the Error(RMSE) to predict the next
next weight value. weight value.
The______ is the standard The residual standard error is the
85 3 deviation of the observed Residual standard error Mean Difference Error Data Error All standard deviation of
residuals. the\nobserved residuals.
Which Regression is used Poisson regression is used when
86 3 when dependent variable has Poisson Regression Linear Regression Cox Regession Lasso Regression dependent variable has count
count data. data.
________________regression
Quasi-Poisson regression can
can handle both over-
87 3 Quasi-Poisson regression Cox Regression Elasticnet Regression Linear Regression handle both over-dispersion and
dispersion and under-
under-dispersion.\n
dispersion.\n
___ is the regularization
λ is the regularization parameter
88 3 parameter in Lasso λ θ Ω β
in lasso regression.
Regression?
Decision Tree is a hierarchical Decision Tree is a hierarchical
model that does the separation model that recursively does the
89 1 Recursion Pointers Greedy Approach Divide and Conquer
of the\ninput space into class separation of the\ninput space
regions using: into class regions
Learning Algorithm of Decision Decision Tree uses greedy
90 1 Greedy Approach Divide and Conquer Both None
Tree is: approach for learning algorithm.
Normal Distribution is also
91 1 Gausiann Distribution Bernoulli Distribution Naïve Bias Binary Distribution This is an explaination.
called?
Classification has how many There are 2 phases of
92 1 2 3 4 5
phases: classification.
"Every pair of features being Naïve Bias uses the principle that
classified is independent of every pair of features being
93 1 Naïve Bais Classifier Decision Tree Bernoulli Distribution Normal Distribution
each other".This principle is classified is independent of each
used by: other.
This equation is of which
94 2 Gausiann Distribution Binary Distribution Naïve Bias Gross-Entrpoy This is an explaination.
theorem?
In Naïve Bias, The Datasets
data sets are divided into two
95 2 are divided into how many 2 3 4 5
types in naïve bias.
types?
Decision trees can be used to Decision trees can be used to
96 2 predict non-categorical values Regression Trees Categorial trees Normal tree None predict non-categorical values is
is called? called regression trees
An attribute with____Gini
an attribute with lower Gini index
97 2 index should be preferred in a Lower Higher Recursive Negative
should be preferred.
decision tree.
In Naïve Bias, if any two If any two events A and B are
98 2 events A and B are P(A,B)=P(A)P(B) P(A,B)=P(A)/P(B) P(A,B)=P(B) P(A,B)=P(B)P/(A) independent,
independent, then, then,P(A,B)=P(A)P(B)
What is the measure of
Entropy is the measure of
99 2 uncertainty of a random Entropy. Gain Gini Index None
uncertainty of a random variable
variable in a decision tree.
Which of the following is not
100 2 Stable Easy to understand Easy to explain Easy to evaluate. this is an explaination.
true for decision trees?
Decision tree algorithm falls Decision tree algorithm falls
101 2 under the category of which Supervised Unsupervised Regression Classification under the category of supervised
learning? learning
False Positives and False One of the use Bayes Theorem is
102 2 Negatives is an application of Bayes' Theorem Binary Distribution Bernoulli Distribution Normal Distribution false positives and false
which theorem? negatives.
Decision Tree used in mining
There are 2 types of decision
103 2 the data are of how many 2 3 4 5
trees used in data mining.
types?
In Bayes' Theorem, P(A) and
P(A) and P(B) are the
P(B) are the probabilities of
probabilities of observing A and
104 3 observing A and B Marginal Probability Normal Distribution Bernoulli Distribution Parallel Algorithm.
B respectively; they are known
respectively; they are known
as the marginal probability.
as:
marks question A B C D ans
ID3 Algorithm in a decision ID3 stands for Iterative
105 3 Iterative Dichotomiser 3 (ID3) Interval Driven Interconnected Decision None
tree stands for? Dichotomiser 3 (ID3)
Probably the best way of
Probably the best way of
estimating performance for very
106 3 estimating performance for Boot Strapped Method Normal Distribution Naïve Bias Binary Distribution
small data sets is bootstrapped
very small\ndata sets is:
method
The Decision Tree works on Decision Tree works on
107 3 Disjunctive Normal Form Product of Sum Bijective Form Conjuctive Form
which form? Disjunctive normal form.
The decoupling of the class The decoupling of the class
conditional feature distributions conditional feature distributions
108 3 means that each distribution 1-D 2-D 3-D NONE means that each distribution can
can be independently estimated be independently estimated as a
as a________ distribution. one dimensional distribution.
Theoretical concept to evaluate
109 3 COLT PAC Model Naïve Bias Prediction. This is an explaination.
Classfiers is:
____________is a metric to Gini Index is a metric to measure
measure how often a randomly how often a randomly chosen
110 3 Gini Index Entropy Pointer Gross-Entrpoy
chosen element would be element would be incorrectly
incorrectly identified identified
The most notable types of The most notable types of
111 3 3 2 1 4
decision tree algorithms are: decision tree algorithms are 3
Which process is completed The recursive partition is
when the subset at a node all completed when the subset at a
112 3 Recursive Partitioning Termination Transformation Prediction.
has the same value of the target node all has the same value of
variable? the target variable
The_______ method reserves The holdout method reserves a
113 3 a certain amount for testing and Holdout Parallel Algorithm Naïve Bias Normal Distribution certain amount\nfor testing and
uses the remainder for training. uses the remainder for training
This equation is of which
114 3 Bayes' Theorem Normal Distribution Bernoulli Distribution Gross-Entrpoy This is an explaination.
theorem?
"Independence among the Independence among the
115 3 features". This is an assumption Naïve Bais Classifier Bernoulli Distribution Parallel Algorithm Binary Distribution features is an assumption in
in: Naïve bias.
Error rate obtained from error rate obtained from training
116 3 Resubstitution Error Grid Gini Index True error
training data is called: data is called resubstitution error.
In Decision Tree entropy is
117 3 proportional inverse High Less This is an explaination.
__________ to content.
In Decision Tree, No root-to-
No root-to-leaf path should
leaf path should contain the
118 3 Twice Once Thrice Four Times. contain the same discrete
same discrete attribute
attribute twice
____________.
Using_________, designers
Using data visualization methods,
can make information
119 1 Data Visualization Classification Regression Supervised Learning. designers can make information
understandable for
understandable for stakeholders.
stakeholders.
The additional visual methods
120 1 All Tree Map Parallel Coordinates Semantic Networks. This is an explaination.
include:
Data Visualization tools
121 1 Ms--Excel Tableau Power BI Jupyter This is an explaination.
Doesn’t include:
Which of the following requires
122 1 Javascript Knowledge to run All Chart.js Polymap Sigmajs This is an explaination.
the visualization tool?
Merits of Tableau doesn’t Merits of tableau doesn’t include
123 1 Cost Performance Usage Computation
include which factor: the cost factor.
Which of these is not a type of
124 1 Pictograph Bar-Graph Line-Chart Pie-Chart This is an explaination.
Big Data Visualization.
The drag-and-drop editor od
The drag-and-drop editor of
which tool makes it easy to
Infogram makes it easy to create
125 2 create professional-looking Infogram Google Chart Tableau Grafana
professional-looking designs
designs without a lot of visual
without a lot of visual design skill.
design skill.
How many V's are defined for There are 4 V's of Data
126 2 4 6 2 3
Data Visualization. visualization.
Which of the following is not a Tableau is a chargeable tool of
127 2 Tableau Google Chart Jupyter Hub-Spot CRM
free Data Visualization tool? data visualization.
Companies that work with
Companies that work with both
both traditional and big data
traditional and big data may use
128 2 use which technique to look at Pie-Chart Bar-Graph Stream graph Line-Chart
pie chart to look at customer
customer segments or market
segments or market shares
shares?
Visualization of Data includes
129 2 which of the following All Information Loss Visual Noise Large Image Perception. This is an explaination.
problems:
Mainly, Data Visualization has There are 5 main challenges to
130 2 5 6 4 2
how many types of challenges? data visualization.
marks question A B C D ans
Google charts uses
Which tool uses HTML5/SVG
131 2 Google Charts Jupyter Grafana Tableau HTML5/SVG since its browser
to visualize data
compatible.
According to Colin Ware’s According to Colin Ware’s
Information Visualization: Information Visualization:
132 2 Perception for Design, he 4 2 1 3 Perception for Design, he defines
defines_____ pre-attentive four pre-attentive visual
visual properties. properties
_____ is based on space-filling Tree map method is based on
133 2 visualization of hierarchical Tree-Map Stream graph Bar-graph Line-Chart space-filling visualization of
data. hierarchical data
Which graph shows the Gantt chart show the
dependency relationships dependency relationships
134 2 Gantt-Chart Line-Chart Pie-Chart Bar-Graph
between activities and current between activities and current
schedule status. schedule status.
Another name for distribution Non parametric data is also
135 2 Non parametric data Parametric Data static data Dynamic data
free data is: called distribution free data.
Which chart is used for Bar Graph is used for
comparison of values, such as Comparison of values, such as
136 2 sales performance for several Bar-Graph Gantt-Graph Line-Chart Pie-Chart sales performance for several
persons or businesses in a persons or businesses in a single
single time. time
Graphical Techniques are
_____________are graphics
graphics in the field of statistics
137 2 in the field of statistics used to Graphical-Techniques Line-Chart Regression Classification
used to visualize quantitative
visualize quantitative data.
data.
_____ can handle several Parallel Coordinates can handle
factors for a large number of several factors for a large
138 2 objects per single screen, so it Parallel Coordinates Stream graph Google Chart Jupyter number of objects per single
satisfies the data variety screen, so it satisfies the data
criterion. variety criterion
Chart.js provides how many
139 3 8 5 3 6 This is an explaination.
types of charts?
Which visualization tool
Grafana supports mixed data
supports mixed data sources,
sources, annotations, and
annotations, and customizable
140 3 Grafana Tableau Google Chart Jupyter customizable alert functions, and
alert functions, and it can be
it can be extended via hundreds
extended via hundreds of
of available plugins.
available plugins.
Which tool was created Datawrapper was created
141 3 specifically for adding charts Data Wrapper Tableau Google Chart Jupyter specifically for adding charts and
and maps to news stories. maps to news stories.
Conventional Visualization Mekko chart is a new technique
142 3 Mekko Chart Pie-Chart Bar-graph Histogram
methods doesn’t include: to visualize data.
_____________ is a type of a Streamgraph is a type of a
stacked area graph, which is stacked area graph, which is
143 3 displaced around a central axis, Streamgraph Bar-Graph Pie-Chart Line-Chart displaced around a central axis,
resulting in flowing and organic resulting in flowing and organic
shape. shape
Which visual tool includes over
Fusion charts includes over 150
144 3 150 chart types and 1,000 Fusion charts Tableau Google Chart Jupyter
chart types and 1,000 map types
map types?
Which graph/chart is a
A semantic network is a
graphical representation of
graphical representation of
logical relationship between
logical relationship between
different concepts. It generates
145 3 Semantic Networks Bar-Graph Pie-Chart Line-Chart different concepts. It generates
directed graph, the
directed graph, the combination
combination of nodes or
of nodes or vertices, edges or
vertices, edges or arcs, and
arcs, and label over each edge
label over each edge.
According to SAS we can According to SAS we can
process only______ of process only 1 kilobit of
146 3 1 Kilobit 1 Byte 1 Bit 1 MB
information per second on a information per second on a flat
flat screen. screen
There are____ steps for
147 3 4 5 3 6 This is an explaination.
interactive data visualization:
When working with big data, When working with big data,
companies can use which companies can use the line chart
visualization technique to track visualization technique to track
148 3 total application clicks by Line-Chart Bar-Graph Pie-Chart Stream graph total application clicks by weeks,
weeks, the average number of the average number of
complaints to the call center by complaints to the call center by
months, etc.\n\n months, etc.\n\n
Which of the following
149 1 All Facebook Netflix Adobe This is an explaination.
Enterprises use HBase?
marks question A B C D ans
Which NLP is used in the From 2010, Neural NLP is
150 1 Neural NLP Symbolic NLP Statical NLP None
present era? being used.
The Computer World magazine The Computer World magazine
states that unstructured states that unstructured
151 1 information might account for 70-80% 0.9 0.5 0.6 information might account for
more than______of all data in more than 70%–80% of all data
organizations. in organizations.
Almost all of the information Almost all of the information we
we use and share every day, use and share every day, such as
152 1 such as articles, documents and Unstructured Structured Semantic None articles, documents and e-mails,
e-mails, are are completely or partly
completely___________. unstructured
The Unstructured Information
Which standard provided a Management Architecture
common framework for (UIMA) standard provided a
Unstructured Information
processing information to Management common framework for
153 1 Management Architecture Data Architecure None
extract meaning and create Architecture for Data processing this information to
(UIMA)
structured data about the extract meaning and create
information? structured data about the
information.
The base Apache Hadoop The base Apache Hadoop
154 2 framework is composed of the 4 2 3 6 framework is composed of the
how many modules? four modules.
No-SQL doesn’t include
155 2 MS-SQL HBASE DyanoDB MongoDB This is an explaination.
which software?
There are _______main types There are 3 types of OLAP
156 2 3 2 5 6
of OLAP systems. systems.
SQL alternative in Apache HIVE-QL is the alternative to
157 2 HIVEQL BASEQL SPARK-QL H-QL
HIVE is called? SQL in Apche Hive family.
MapReduce program executes MapReduce program executes in
158 2 3 2 5 4
in how many stages? three stages.
How many types of NO-SQL There are 4 types of databases in
159 2 4 3 2 6
database are there? NO-SQL.
MapReduce is a processing
MapReduce is a processing
technique and a program
technique and a program model
160 2 model for distributed JAVA Python C++ R
for distributed computing based
computing based on which
on java
programming Language?
Hive supports how many Hive supports all four properties
161 2 4 3 2 1
properties of transactions? of transactions
HDFS consists of only one
HDFS consists of only one
162 2 Master Node Slave Node Both None Name Node that is called the
Name Node that is called as?
Master Node.
Which Apache Software is
needed to process massive Hbase to process massive
163 2 amounts of data for the Apache HBASE Apache Spark Apache-PIG Apache-mahout amounts of data for the purposes
purposes of natural-language of natural-language search
search?
Which database store data in a No-sql databases that store data
164 2 format other than relational NO-SQL HIVESQL SPARK-QL H-QL in a format other than relational
tables tables.
Which is a project of the Mahout is a project of the
Apache Software Foundation Apache Software Foundation to
to produce free produce free implementations of
165 2 implementations of distributed Apache Mahout Apache Spark Apache-PIG Apache HBASE distributed or otherwise scalable
or otherwise scalable machine machine learning algorithms
learning algorithms focused focused primarily on linear
primarily on linear algebra? algebra.
MapReduce model is a
Which model is a specialization
specialization of the split-apply-
166 2 of the split-apply-combine MapReduce Hadoop HBASE HIVE
combine strategy for data
strategy for data analysis?
analysis.
All Hadoop commands are
All Hadoop commands are invoked by the
167 2 $HADOOP_HOME/bin/hadoop $HADOOP/bin/hadoop $HADOOP_HOME/hadoop $HADOOP_HOME/bin
invoked by which command? $HADOOP_HOME/bin/hadoop
command
The table typically enforces the The table typically enforces the
schema when the data is schema when the data is loaded
loaded into the table. This into the table. This enables the
enables the database to make database to make sure that the
168 3 sure that the data entered Schema on Write Schema on Read Schema for Read Write None data entered follows the
follows the representation of representation of the table as
the table as specified by the specified by the table definition.
table definition. This design is This design is called schema on
called? write.
marks question A B C D ans
Which command formats the Namenode -format command
169 3 Namenode -format Node -format Name -format Format
DFS filesystem? formats the DFS file system.
Which command applies the
oiv applies the offline fsimage
170 3 offline fsimage viewer to an oiv fs fc ov
viewer to an fsimage.
fsimage?
Hadoop requires which Java
Hadoop requires Java Runtime
171 3 Runtime Environment (JRE) or 1.6 1.2 1.5 1
Environment (JRE) 1.6 or higher
higher version?
Every Data node sends a
Every Data node sends a
Heartbeat message to the
Heartbeat message to the Name
172 3 Name node every____ 3 2 4 1
node every 3 seconds and
seconds and conveys that it is
conveys that it is alive
alive.
HDFS can store upto1 TB of
173 3 HDFS can store files upto: 1 TB 1 GB 1ZB 1PB
files.
Which of the following is a HBASE is a popular wide
174 3 HBase SQL DyanoDB MongoDB
wide-column store? columnn store.
Which node acts as both a A slave or worker node acts as
175 3 DataNode and TaskTracker in Slave Node Data Node Admin Node Name Node both a DataNode and
Hadooop. TaskTracker.
HDFS system uses which HDFS system uses TCP/IP
176 3 TCP/IP TCP UDP IP
protocol for communication? sockets for communication
177 3 HDFS has how many services? 5 4 2 6 HDFS has five services.
____________is a data
HIVE is a data warehouse
warehouse software project
software project built on top of
178 3 built on top of Apache Hadoop Apache HIVE Apache Spark Apache-PIG Apache HBASE
Apache Hadoop for providing
for providing data query and
data query and analysis
analysis
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:03
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
C - Can process data faster under the same network bandwidth as compared to HPC.
Q 5 - Which of the following is true for disk drives over a period of time?
B - Data Seek time is improving more slowly than data transfer rate.
C - Data Seek time and data transfer rate are both increasing proportionately.
D - Only the storage capacity is increasing without increase in data transfer rate.
A - Solr
B - Tez
C - Spark
D - Hive
C - Occupies only the size it needs and not the full block.
D - Can span over multiple blocks.
Q 10 - HDFS block size is larger as compared to the size of the disk blocks so that
D - A single file larger than the disk size can be stored across many disks in the cluster.
Q 11 - In a Hadoop cluster, what is true for a HDFS block that is no longer available
due to disk corruption or machine failure?
C - The namenode allows new client request to keep trying to read it.
D - The Mapreduce job process runs ignoring the block and the data stored in it.
Q 12 - Which utility is used for checking the health of a HDFS file system?
A - fchk
B - fsck
C - fsch
D - fcks
Q 13 - Which command lists the blocks that make up each file in the filesystem.
D - None
Q 15 - In the local disk of the namenode the files which are stored persistently are −
D - None of these
A - Take backup of filesystem metadata to a local disk and a remote NFS mount.
Q 19 - For the frequently accessed HDFS files the blocks are cached in
C - Both A&B
D - In the memory of the client application which requested the access to these files.
C - Failure of one namenode causes loss of some metadata availability from the entire
filesystem.
B - To reduce the cycle time required to bring back a new primary namenode after existing
primary fails.
A - When a client request comes, one of them chosen at random serves the request.
B - One of them is active while the other one remains powered off.
B - Preventing the start of a failover in the event of network failure with the active namenode.
C - Preventing the power down to the previously active namenode.
D - STONITH
Q 28 - The property used to set the default filesystem for Hadoop in core-site.xml is-
A - filesystem.default
B - fs.default
C - fs.defaultFS
D - hdfs.default
A-1
B-2
C-3
D-4
A-2
B-1
C-0
D-3
B - Zero
C-3
B - Renaming
C - Moving
D - Executing.
ANSWER SHEET
1 C
2 A
3 D
4 B
5 B
6 C
7 C
8 B
9 C
10 D
11 B
12 B
13 A
14 B
15 A
16 C
17 A
18 D
19 A
20 C
21 C
22 B
23 B
24 D
25 B
26 D
27 C
28 B
29 C
30 B
31 D
32 D
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
D - HDFS ftp
C - You can edit a existing record in HDFS file which is already mounted using NFS.
D - gets both the data and block location from the namenode
Q 4 - Which scenario demands highest bandwidth for data transfer between nodes in
Hadoop?
Q 5 - The current block location of HDFS where data is being written to,
A - Optimal Scheduler
B - FIFO scheduler
C - Capacity scheduler
D - Fair scheduler
D - Fully-Distributed mode
A - C++
B - Python
C - Java
D - GO
Q 10 - The hdfs command to create the copy of a file from a local system is
A - CopyFromLocal
B - copyfromlocal
C - CopyLocal
D - copyFromLocal
D - The number of replicated copies is less than as specified by the replication factor.
Q 13 - When the namenode finds that some blocks are over replicated, it
A - Replication factor
A - Replication factor
A - Replication factor
A - Replication factor
A - Jsp
B - Jps
C - Hadoop fs –test
D - None
Q 19 - The information mapping data blocks with their corresponding files is stored in
A - Data node
B - Job Tracker
C - Task Tracker
D - Namenode
Q 20 - The file in Namenode which stores the information mapping the data block
location with file name is −
A - dfsimage
B - nameimage
C - fsimage
D - image
Q 21 - The namenode knows that the datanode is active using a mechanism known as
A - heartbeats
B - datapulse
C - h-signal
D - Active-pulse
B - Commodity grade
Q 24 - Which of the below apache system deals with ingesting streaming data to
hadoop
A - Ozie
B - Kafka
C - Flume
D - Hive
A - The average size of the data blocks used as input for the program
B - The location details of where the first whole record in a block begins and the last whole
record in the block ends.
C - Splitting the input data to a MapReduce program into a size already configured in the
mapred-site.xml
D - None of these
B - The Key-value pair of all the records from the input split processed by the mapper
B - Report the edit log information of the blocks in the data node.
Q 28 - The Zookeeper
A - The namenode updates the mapping between file name and block name
B - The namenode need not update mapping between file name and block name
Q 30 - When a client contacts the namenode for accessing a file, the namenode
responds with
C - Block ID and hostname of any one of the data nodes containing that block.
D - Block ID and hostname of all the data nodes containing that block.
Q 32 - The Hadoop tool used for uniformly spreading the data across the data nodes is
named −
A - Scheduler
B - Balancer
C - Spreader
D - Reporter
ANSWER SHEET
1 B
2 A
3 C
4 C
5 D
6 A
7 B
8 B
9 C
10 D
11 B
12 D
13 C
14 B
15 A
16 C
17 D
18 B
19 D
20 C
21 A
22 A
23 B
24 C
25 B
26 B
27 B
28 A
29 B
30 D
31 D
32 B
33 A
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
B - Check if the fsimage file is in sync between namenode and secondary namenode
C - Merges the fsimage and edit log and uploads it back to active namenode.
D - Rack awareness
A - it is lost forever
C - It becomes hidden from the user but stays in the file system
A - REST API
B - RPC
C - RMI
D - IP Exchange
A - Structred
B - Semi-structured
C - Unstructured
B - 3 Physical machines
C - 4 Physical machines
D - 1 Physical machine
A - read
B - deleted
C - executed
D - Archived
Q 15 - hadoop fs –expunge
A - getmerge
B - putmerge
C - remerge
D - mergeall
A - changerep
B - rerep
C - setrep
D - xrep
Q 18 - The comman used to copy a directory form one node to another in HDFS is
A - rcp
B - dcp
C - drcp
D - distcp
A - .hrc
B - .har
C - .hrh
D - .hrar
A - unrar
B - unhar
C - cp
D - cphar
Q 23 - When you increase the number of files stored in HDFS, The memory required by
namenode
A - Increases
B - Decreases
C - Remains unchanged
Q 24 - If we increase the size of files stored in HDFS without increasing the number of
files, then the memory required by namenode
A - Decreases
B - Increases
C - Remains unchanged
Q 27 - You can reserve the amount of disk usage in a data node by configuring the
dfs.datanode.du.reserved in which of the following file
A - Hdfs-site.xml
B - Hdfs-defaukt.xml
C - Core-site.xml
D - Mapred-site.xml
Q 28 - The namenode loses its only copy of fsimage file. We can recover this from
A - Datanodes
B - Secondary namenode
C - Checkpoint node
D - Never
Q 29 - In a HDFS system with block size 64MB we store a file which is less than 64MB.
Which of the following is true?
B - Not limited
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
ANSWER SHEET
1 C
2 A
3 B
4 B
5 B
6 A
7 B
8 D
9 B
10 A
11 A
12 C
13 D
14 C
15 D
16 A
17 C
18 D
19 B
20 C
21 D
22 D
23 A
24 A
25 C
26 B
27 A
28 C
29 C
30 A
31 C
32 A
33 B
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
A - Namenode
B - Datanode
C - Secondary namenode
D - Secondary datanode
A - Balanced scheduler
B - Fair scheduler
C - Capacity scheduler
D - FiFO schesduler.
A - The default input format is xml. Developer can specify other input formats as appropriate if
xml is not the correct input.
B - There is no default input format. The input format always should be specified.
C - The default input format is a sequence file format. The data needs to be preprocessed before
using the default input format.
D - The default input format is TextInputFormat with byte offset as a key and entire line as a
value.
A - Velocity
B - Veracity
C - volume
D - variety
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
Q 10 - Which of the following technologies is a document store database?
A - HBase
B - Hive
C - Cassandra
D - CouchDB
A - It is a distributed framework.
A - Name node
B - Data node
C - Master node
D - None of these
A - Name node
B - Data node
C - slave node
D - None of these
Q 14 - What is AVRO?
A - Yes, Avro was specifically designed for data processing via Map-Reduce.
D - Avro specifies metadata that allows easier data access. This data cannot be used as part of
map-reduce execution, rather input specification only.
A - The distributed cache is special component on name node that will cache frequently used
data for faster client response. It is used during reduce step.
B - The distributed cache is special component on data node that will cache frequently used data
for faster client response. It is used during map step.
D - The distributed cache is a component that allows developers to deploy jars for Map-Reduce
processing.
Q 17 - What is writable?
A - Writable is a java interface that needs to be implemented for streaming data to remote
servers.
Q 18 - What is HBASE?
B - Hbase is a part of the Apache Hadoop project that provides interface for scanning large
amount of data using Hadoop infrastructure.
D - HBase is a part of the Apache Hadoop project that provides a SQL like interface for data
processing.
B - Hadoop was specifically designed to process large amount of data by taking advantage of
MPP hardware.
C - Hadoop ships the code to the data instead of sending the data to the code.
D - Hadoop uses sophisticated caching techniques on name node to speed processing of data.
Q 20 - When using HDFS, what occurs when a file is deleted from the command line?
B - It is placed into a trash directory common to all users for that cluster.
C - It is permanently deleted and the file attributes are recorded in a log file.
D - It is moved into the trash directory of the user who deleted it if trash is enabled.
Q 21 - When archiving Hadoop files, which of the following statements are true?
Choosetwoanswers
3. MapReduce processes the original files names even after files are archived.
4. Archived files must be UN archived for HDFS and MapReduce to access the
original, small files.
5. Archive is intended for files that need to be saved but no longer accessed by
HDFS.
A-1&3
B-2&3
C-2&4
D-3&4
Q 22 - When writing data to HDFS what is true if the replication factor is three?
Choose2answers
2. The Data is stored on each DataNode with a separate file which contains a
checksum value.
4. The Client is returned with a success upon the successful writing of the first
block and checksum check.
A-1&3
B-2&3
C-3&4
D-1&4
Q 23 - Which of the following are among the duties of the Data Nodes in HDFS?
A - Maintain the file system tree and metadata for all files and directories.
Q 24 - Which of the following components retrieves the input splits directly from
HDFS to determine the number of map tasks?
A - The NameNode.
B - The TaskTrackers.
C - The JobClient.
D - The JobTracker.
A-1&4
B-2&3
C-3&4
D-2&4
Q 27 - Which one of the following statements is false regarding the Distributed Cache?
A - The Hadoop framework will ensure that any files in the Distributed Cache are distributed to all
map and reduce tasks.
B - The files in the cache can be text files, or they can be archive files like zip and JAR files.
D - The Hadoop framework will copy the files in the Distributed Cache on to the slave node
before any tasks for the job are executed on that node.
A - Region Server.
B - Nagios.
C - ZooKeeper.
D - Master Server.
A - HDFS.
B - Task Tracker.
C - Job Tracker.
D - Name Node.
E - Data Node.
Q 31 - Keys from the output of shuffle and sort implement which of the following
interface?
A - Writable.
B - WritableComparable.
C - Configurable.
D - ComparableWritable.
E - Comparable.
B - Output of the mapper and output of the combiner has to be same key value pair and they can
be heterogeneous
C - Output of the mapper and output of the combiner has to be same key value pair. Only if the
values satisfy associative and commutative property it can be done.
ANSWER SHEET
1 A
2 B
3 A
4 A
5 D
6 B
7 A
8 B
9 C
10 D
11 D
12 B
13 A
14 A
15 A
16 B
17 C
18 B
19 C
20 C
21 B
22 C
23 D
24 D
25 A
26 B
27 C
28 B
29 C
30 D
31 B
32 C
Loading [MathJax]/jax/output/HTML-CSS/jax.js
Seat No -
Total number of questions : 60
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a
MCQ No - 2
What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
Answer
D
MCQ No - 3
What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
Answer
D
MCQ No - 4
According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
Answer
A
MCQ No - 5
What are the four V’s of Big Data?
(A) Volume
(B) Velocity
OptimusPrime Page 1
(C) Variety
(D) All the above
Answer
D
Answer
B
MCQ No - 7
___________ is general-purpose computing model and runtime system for distributed data
analytics.
(A) Mapreduce
(B) Drill
(C) Oozie
(D) None of the above
Answer
A
MCQ No - 8
The examination of large amounts of data to see what patterns or other useful information
can be found is known as
(A) Data examination
(B) Information analysis
(C) Big data analytics
(D) Data analysis
Answer
C
MCQ No - 9
Big data analysis does the following except
(A) Collects data
(B) Spreads data
(C) Organizes data
(D) Analyzes data
Answer
B
OptimusPrime Page 2
MCQ No - 10
What makes Big Data analysis difficult to optimize?
(A) Big Data is not difficult to optimize
(B) Both data and cost effective ways to mine data to make business sense out of it
(C) The technology to mine data
(D) All of the above
Answer
B
The new source of big data that will trigger a Big Data revolution in the years to come is
(A) Business transactions
(B) Social media
(C) Transactional data and sensor data
(D) RDBMS
Answer
C
MCQ No - 12
The unit of data that flows through a Flume agent is
(A) Log
(B) Row
(C) Event
(D) Record
Answer
C
MCQ No - 13
Listed below are the three steps that are followed to deploy a Big Data Solution except
(A) Data Ingestion
(B) Data Processing
(C) Data dissemination
(D) Data Storage
Answer
C
MCQ No - 14
Check below the best answer to "which industries employ the use of so-called "Big Data"
in their day to day operations?
(A) Weather forecasting
(B) Marketing
(C) Healthcare
(D) All of the above
OptimusPrime Page 3
Answer
D
MCQ No - 15
There are almost as many bits of information in the digital universe as there are stars in
the actual universe?
(A) True
(B) False
Answer
A
MCQ No - 16
The word 'Big data' was coined by
(A) Roger Mougalas
(B) John Philips
(C) Simon Woods
(D) Martin Green
Answer
A
MCQ No - 17
The word 'Big Data' was coined in the year
(A) 2000
(B) 1970
(C) 1998
(D) 2005
Answer
C
MCQ No - 18
Concerning the Forms of Big Data, which one of these is odd?
(A) Structured
(B) Unstructured
(C) Processed
(D) Semi-Structured
Answer
C
MCQ No - 19
Big Data applications benefit the media and entertainment industry by
(A) Predicting what the audience wants
OptimusPrime Page 4
(B) Ad targeting
(C) Scheduling optimization
(D) All of the above
Answer
D
MCQ No - 20
The feature of big data that refers to the quality of the stored data is ______
(A) Variety
(B) Volume
(C) Variability
(D) Veracity
Answer
D
Question 1
a) The distance between categories is equal across the range of interval/ratio data.
Question 2
Question 3
Question 4
Correct answer:
b) It summarizes the frequencies of two variables so that they can be compared.
Question 5
If there were a perfect positive correlation between two interval/ratio variables, the
Pearson's r test would give a correlation coefficient of:
Correct answer:
OptimusPrime Page 5
b) +1
Question 6
What is the name of the test that is used to assess the relationship between two ordinal variables?
Correct answer:
a) Spearman's rho
Question 7
Correct answer:
d) All of the above.
Question 8
Correct answer:
c) A relationship that appears to be true because each variable is related to a third one.
Question 9
Correct answer:
d) generalising their findings from the sample to the population.
Question 10
---------------------------------------------------------------------------------------------------------------------
SET 2 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 6
2. Which of the following is not a major data analysis approaches?
A. Data Mining
B. Predictive Intelligence
C. Business Intelligence
D. Text Analytics
View Answer
Ans : B
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors
B. floating descriptors
C. numerical descriptors
D. decimal descriptors
View Answer
Ans : C
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to
identify new opportunities.
OptimusPrime Page 7
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
View Answer
Ans : A
8. The branch of statistics which deals with development of particular statistical methods is
classified as
A. industry statistics
B. economic statistics
C. applied statistics
D. applied statistics
View Answer
Ans : D
---------------------------------------------------------------------------------------------------------------------
SET 3 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 8
Two group means are equal.
What must you include when applying Wilcoxon Rank sum test?
Answer
“Critical Value”, “Rank sum”
OptimusPrime Page 9
What are the two types of variance which can occur in your data?
Answer
Between and within groups
Clustering extracts the known patterns from the existing data, True or False?
Answer
False
OptimusPrime Page 10
bottom-up
K Means is _____
Answer
Centroid based method
WSS metric is the sum of the squares of the distances between each data point and the_____.
Answer
closest centroid
Once the clusters are identified, it is often useful to label them in a descriptive way.True or
False?
Answer
True
OptimusPrime Page 11
The process of identifying the appropriate value of k is referred to as finding the_____.
Answer
elbow
A _____ is a decision support tool that uses a tree-like graph or model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility
Answer
Decision tree
OptimusPrime Page 12
What is true about Data Visualization?
Answer
All of the above
Which one of the following is most basic and commonly used techniques?
Answer
“Line charts”
When a client contacts the namenode for accessing a file, the namenode responds with____
Answer
OptimusPrime Page 13
Block Id and hostname of all the data nodes containing that block.
---------------------------------------------------------------------------------------------------------------------
SET 4 MCQs
---------------------------------------------------------------------------------------------------------------------
2)
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a Explanation: Big data management and data mining
3)
What are the main components of Big Data?
a)MapReduce
b)HDFS
c)YARN
d)All of these
Ans: d Explanation: All of these
4)
The sources of Big Data are
a)Stock Exchange
b)Transport Data
OptimusPrime Page 14
c) Banking Data
d) All of the Above
Ans: d Explanation:
5)
Big Data Characteristics are:
a) Structured data
b) Semi-structured data
c) Quasi-structured data
d) All of the above
Ans: d Explanation:
6)
Bl tends to provide reports, dashboards, and queries on business questions for the current period
or in the past.
a) True
b) False
Ans: a Explanation:
7)
Big data can come in multiple forms, including structured and non-structured data
a) True
b) False
Ans: a Explanation:
8)
BI problems tend to require highly structured data organized
a) Rows
b) Columns
c) Accurate Reporting
d) All of the Above
Ans: d Explanation:
9)
EDW achieves the objective of reporting and sometimes the creation of dashboards, perform
analysis on unstructured data
a) High-value data is hard to reach and leverage
b) Data moves in batches from EDW to local analytical tools
c) Data Science projects will remain isolated
d) All of the Above
Ans: d Explanation:
10)
Drivers of Big Data
a) Medical information
b) Photos and video footage uploaded to the World Wide Web
OptimusPrime Page 15
c) data extracts
d) Both a and b
Ans: d Explanation:
11)
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a Explanation:
12)
Select from option which is not the phase of data analytics
a) model planning
b) testing
c) discovery
d) operationalize
Ans: b Explanation:
13)
Which phase of data analytics require more time to complete
a) Data preparation
b) model building
c) communicate results
d) Discovery
Ans: a Explanation:
14)
What is analytic sandbox?
a) Tool
b) Separate repository
c) data cleaning
d) Data conditioning
Ans: b Explanation:
15)
The person which provides analytic techniques and modeling is called as.
a) Data Engineer
b) Data scientist
c) Business user
d) Project manager
Ans: b Explanation:
16)
OptimusPrime Page 16
What is task of Project manager?
a) analytic modelling
b) Provide requirement
c) ensure meeting objectives
d) creates DB environment
Ans: c
17)
Identifying Key Stakeholders this task is performed in which phase?
a) Data preparation
b) model building
c) Discovery
d) communicate results
Ans: c Explanation:
18)
ETL process is performed in which phase
a) Discovery
b) communicate results
c) model planning
d) Data preparation
Ans: d Explanation:
19)
How much data Data science teams prefer for analysis?
a) too little
b) average
c) more
d) more than average
Ans: c Explanation:
20)
select from option tool which is not used in model planning phase
a) Data wrangler
b) R
c) SQL Analysis service
d) SAS/ACESS
Ans: c Explanation:
21)
if reports and dashboards will be impacted and need to change this task is performed by.
a) Project sponsor
b) BI Analyst
c) Data Engineer
d) Project manager
Ans: b Explanation:
OptimusPrime Page 17
22)
What is need of data analytic lifecycle.
a) Data cleaning
b) To solve Big data problems
c) Data conditioning
d) Data Exploration
Ans: b Explanation:
23)
How many phases are there in data analytic lifecycle?
a) 4
b) 5
c) 6
d) 7
Ans: c
24)
The person with technical skills is called as?
a) Business user
b) Data Engineer
c) Data scientist
d) Project sponsor
Ans: b
25)
What is outcome of Model building phase?
a) Analytic results
b) Quality data
c) Data
d) Potential resources
Ans: a
2)
If the assumed hypothesis is tested for rejection considering it to be true is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: a Explanation:
OptimusPrime Page 18
3)
A statement whose validity is tested on the basis of a sample is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: b Explanation:
4)
A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: c Explanation:
5)
If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
Ans: d Explanation:
6)
The rejection probability of Null Hypothesis when it is true is called as?
a) Level of Confidence
b) Level of Significance
c) Level of Margin
d) Level of Rejection Ans: b Explanation:
7)
The point where the Null Hypothesis gets rejected is called as?
a) Significant Value
b) Rejection Value
c) Acceptance Value
d) Critical Value
Ans: d Explanation:
8)
If the Critical region is evenly distributed then the test is referred as?
a) Two tailed
b) One tailed
c) Three tailed
d) Zero tailed
Ans: a Explanation:
OptimusPrime Page 19
9)
The type of test is defined by which of the following?
a) Null Hypothesis
b) Simple Hypothesis
c) Alternative Hypothesis
d) Composite Hypothesis
Ans: c Explanation:
10)
Which of the following is defined as the rule or formula to test a Null Hypothesis?
a) Test statistic
b) Population statistic
c) Variance statistic
d) Null statistic
Ans: a Explanation:
11)
Type 1 error occurs when?
a) We reject H0 if it is True
b) We reject H0 if it is False
c) We accept H0 if it is True
d) We accept H0 if it is False Ans: a Explanation:
12) The probability of Type 1 error is referred as?
a) 1-α
b) β
c) α
d) 1-β
Ans: c Explanation:
13)
Alternative Hypothesis is also called as?
a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis
Ans: b Explanation:
14)
Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: d Explanation:
15)
OptimusPrime Page 20
Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Ans: c Explanation:
16)
Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Ans: a Explanation:
17)
Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Ans: a Explanation:
18)
Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Ans: b Explanation:
19)
K-means is not deterministic and it also consists of number of iterations.
a) True
b) False
Ans: a
20)
Depending on acceptance and rejection of null hypothesis there are 2 types of error produced
a) Type 1
b) Type 2
c) None of these
d) All of these
Ans: d
21)
The power of a test can be defined as a possibility of …
a) Rejecting null hypothesis
OptimusPrime Page 21
b) Accepting null hypothesis
c) Increasing null hypothesis
d) Decreasing null hypothesis
Ans: a
22)
For a fixed significance level, a greater sample size is mandatory to discover a
a) Minor difference in mean
b) Major difference in mean
c) Average difference in mean
d) None of the above
Ans: a
23)
ANNOVA tests if any of the population means vary from other population means
a) True
b) False
Ans: a
24)
Clustering is defined as group of same kind of objects which are gathered by use of
a) Unsupervised method
b) Supervised method
c) Semi supervised method
d) None of these
Ans: a
25)
Following are the applications of Kmeans
a) Image Processing
b) Medical
c) Customer Segmentation
d) All of the above
Ans: d
---------------------------------------------------------------------------------------------------------------------
SET 5 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 22
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data.
OptimusPrime Page 23
View Answer
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies.
7. The overall percentage of the world’s total data has been created just within the past two years
is ?
A. 80%
B. 85%
C. 90%
D. 95%
View Answer
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the
past
two years is 90%.
8) Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Ans: Data Cleansing
10. Communicative and collaborative is one among the key skill sets and behavioral
characteristics of a
data scientist [True / False]?
a. True
b. False
Answer : a
11. ---------- are the sources of Bigdata [select all that apply]
I. Book
II. Facebook
III. Genome sequence
IV. Video Surveillance
Ans:
12. BI analyses the past data and make future predictions True/False ?
a. True
b. False
Answer : b
OptimusPrime Page 24
12. In which phase of data analytics ETLT is performed?
Ans: Phase 2 Data preparation is done in this phase. An analytical sandbox is used in this to
perform
analytics for the entire duration of the project. While you explore, preprocess and condition data,
modeling follows suit. To get the data into the sandbox, you will perform ETLT (extract,
transform, load
and transform).
A. Discovery
B. Model Planning
C. Model Building
D. Data Preparation
14. In which phase would the team expect to invest most of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
15. In which phase would the team expect to invest least time of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
16. from following tools which tool is used for Model building?
a. Hadoop b. Octave c. OpenRefine d. All of Above
Ans B
17. from following tools which tool is used for Data preparation
a. Alpine Miner b. Excel c. Matlab d.Weka
Ans . A
18. To determine if the project was completed on time and within budget, is the key role of
_____
OptimusPrime Page 25
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
20. In data Analytics life cycle we can move back and refine the work done. True or False
A. True
B. False
22. ________ provides subject matter expertise for analytical techniques, data modeling and
applying
valid analytical techniques to give business problems.
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
---------------------------------------------------------------------------------------------------------------------
SET 5 MCQs
---------------------------------------------------------------------------------------------------------------------
2. Any hypothesis which is tested for the purpose of rejection under the assumption that it is true
is
called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
OptimusPrime Page 26
(d) Composite hypothesis
Answer : a
3. A statement that is accepted if the sample data provide sufficient evidence that the null
hypothesis is
false is called:
(a) Simple hypothesis
(b) Composite hypothesis
(c) Statistical hypothesis
(d) Alternative hypothesis
Answer : d
6. If the critical region is located equally in both sides of the sampling distribution of test-
statistic, the
test is called:
(a) One tailed
(b) Two tailed
(c) Right tailed
(d) Left tailed
Answer : b
OptimusPrime Page 27
(d) Difficult to tell
Answer : b
10. A formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
(b) Population statistic
(c) Both of these
(d) None of the above
Answer : a
14. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated t
should be
obtained for:
(a) 10 degrees of freedom
(b) 21 degrees of freedom
(c) 22 degrees of freedom
(d) 20 degrees of freedom
Answer : d
OptimusPrime Page 28
15. The purpose of statistical inference is:
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the
conclusions (c) To
draw conclusions about populations from sample data
(d) To draw conclusions about the known value of population parameter
Answer : c
16. The histogram to the right represents the hospital length of stay (in days) for patients at a
nearby
medical facility. How many patients are included in the histogram?
a. 5
b. 21
c. 17
d. 9
Answer : b
17. Using the histogram to the right that represents the hospital lengths of stay (in days) for
patients at a
nearby medical facility, determine the relationship between the mean and the median.
a. Mean = Median
b. Mean ≈ Median
c. Mean < Median
d. Mean > Median
Answer : d
18. The statement “If there is sufficient evidence to reject a null hypothesis at the 10%
significance level, then there is sufficient evidence to reject it at the 5% significance level” :
Please select the best answer of those provided below.
a. Always True
b. Never True
c. Sometimes True; the p-value for the statistical test needs to be provided for a conclusion
d. Not Enough Information; this would depend on the type of statistical test used
Answer : c
OptimusPrime Page 29
d) all of the mentioned
Ans: defined distance metric, number of clusters, initial guess as to cluster centroids
25) Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1),
(-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
a) Yes
b) No
Ans: Yes
OptimusPrime Page 30
27) The most commonly used measure of similarity is the _____ or its square.
a)euclidean distance
b)city-block distance
c)Chebychev’s distance
d)Manhattan distance
Ans: euclidean distance
30) Clustering is a-
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. None
Ans: Unsupervised learning
31) Which of the following clustering algorithms suffers from the problem of convergence at
local
optima?
A. K- Means clustering
B. Hierarchical clustering
C. Diverse clustering
D. All of the above
Ans: K- Means clustering, Hierarchical clustering, Diverse clustering
33) Which of the following is a bad characteristic of a dataset for clustering analysis-
A. Data points with outliers
B. Data points with different densities
C. Data points with non-convex shapes
D. All of the above
Ans: Data points with outliers, Data points with different densities, Data points with non-convex
shapes
OptimusPrime Page 31
B. Unlabeled data
C. Numerical data
D. Categorical data
Ans: Labeled Data
OptimusPrime Page 32
42. Type 2 is also called as
a. False Positive
b. False negative
c. True Positive
d. True negative
Q.25 What are the two types of variance which can occur in your data?
a. Independent and Dependent
b. Between and within groups
c. Personal and interpersonal
OptimusPrime Page 33
d. Anova and Anoca
Q.26 If between group mean sum of square variability increases value of F statistics_____
a. Increases
b. Decreases
c. Neutral
d. None of these
------------------------------------------------------------------------------------------ ---------------------------
SET 6 MCQs
---------------------------------------------------------------------------------------------------------------------
3.An itemset whose support is greater than or equal to a minimum support threshold is ______
OptimusPrime Page 34
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Ans:B
8.Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans:A
9. A local retailer has a database that stores 10,000 transactions of lastsummer. After
analyzing the data,a data science team has identified thefollowing statistics:• {battery}
appears in 6,000 transactions.• {sunscreen}appears in 5,000 transactions.• {sandals}
appears in 4,000 transactions.•{bowls} appears in 2,000 transactions.• {battery, sunscreen}
appears in1,500 transactions.• {battery, sandals} appears in 1,000 transactions.•{battery,
bowls} appears in 250 transactions.• {battery, sunscreen, sandals}appears in 600
transactions. Q) What are the confidence values of{battery}->{ sunscreen} and {battery,
sunscreen}->{ sandals} ?
OptimusPrime Page 35
a) 0.3 and 0.4
b) 0.25 and 0.4
c) 0.25 and 0.15
d) 0.6 and 0.4
Ans: b
11. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
a) Test error is also always zero
b) Test error is non zero
c) Couldn’t comment on Test error
d) Test error is equal to Train error
Ans:C
12.Which of the following metrics can be used for evaluating regression models?
i) R Squared
ii) Adjusted R Squared
iii) F Statistics
iv) RMSE / MSE / MAE
a) ii and iv
b) i and ii
c) ii, iii and iv
d) i, ii, iii and iv
Ans:d
13.How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Ans:b
14.In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
OptimusPrime Page 36
Ans:d
17.In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Ans:c
19.The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
Ans:b
20.Predicting y for a value of x that’s outside the range of values we actually saw for x in the
original data is called ___________
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:b
OptimusPrime Page 37
21.What is predicting y for a value of x that is within the interval of points that we saw in the
original data called?
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:c
22. ________ is a simple approach to supervised learning. It assumes that the dependence of Y
on X1, X2, . . . Xp is linear.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
23.Although it may seem overly simplistic, _______ is extremely useful both conceptually and
practically.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
24. __________ refers to a group of techniques for fitting and studying the straight- line
relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
---------------------------------------------------------------------------------------------------------------------
SET 7 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 38
B. cleaning data
C. transforming data
D. All of the above
View Answer
Ans : D
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors
View Answer
Ans : C
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to
identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
View Answer
Ans : A
8. The branch of statistics which deals with development of particular statistical methods is
classified as
A. industry statistics B. economic statistics C. applied statistics D. applied statistics
View Answer
Ans : D
OptimusPrime Page 39
B. estimating numerical characteristics of the data
C. modeling relationships within the data
D. describing associations within the data
View Answer
Ans : C
1. What is a hypothesis?
a. A statement that the researcher wants to test through the data
collected in a study.
b. A research question the results will answer.
c. A theory that underpins the study.
d. A statistical method for calculating the extent to which the results
could have happened by chance.
Answer: a
OptimusPrime Page 40
d. Coding
Answer: c
OptimusPrime Page 41
12. A set of data organised in a participants(rows)-by-
variables( columns) format is known as a “data set.”
a. True
b. False
Answer: a
13. A graph that uses vertical bars to represent data is called a ___
a. Line graph
b. Bar graph
c. Scatterplot
d. Vertical graph
Answer: b
14. ___________ are used when you want to visually examine the
relationship between two quantitative variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
Answer: d
OptimusPrime Page 42
b. Statistical Hypothesis
c. Simple Hypothesis
d. Composite Hypothesis
Answer: a
---------------------------------------------------------------------------------------------------------------------
SET 8 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 43
D - FiFO schesduler.
OptimusPrime Page 44
Q 12 - Which one of the following stores data?
A - Name node
B - Data node
C - Master node
D - None of these
Q 14 - What is AVRO?
A - Avro is a java serialization library.
B - Avro is a java compression library.
C - Avro is a java library that create split table files.
D - None of these answers are correct.
Q 17 - What is writable?
A - Writable is a java interface that needs to be implemented for streaming data to remote
servers.
B - Writable is a java interface that needs to be implemented for HDFS writes.
C - Writable is a java interface that needs to be implemented for MapReduce processing.
D - None of these answers are correct.
Q 18 - What is HBASE?
A - Hbase is separate set of the Java API for Hadoop cluster.
B - Hbase is a part of the Apache Hadoop project that provides interface for scanning large
amount of data using Hadoop infrastructure.
OptimusPrime Page 45
D - HBase is a part of the Apache Hadoop project that provides a SQL like interface for data
processing.
Q 20 - When using HDFS, what occurs when a file is deleted from the command line?
A - It is permanently deleted if trash is enabled.
B - It is placed into a trash directory common to all users for that cluster.
C - It is permanently deleted and the file attributes are recorded in a log file.
D - It is moved into the trash directory of the user who deleted it if trash is enabled.
Q 21 - When archiving Hadoop files, which of the following statements are true?
Choosetwoanswers
1. Archived files will display with the extension .arc.
2. Many small files will become fewer large files.
3. MapReduce processes the original files names even after files are archived.
4. Archived files must be UN archived for HDFS and MapReduce to access the
original, small files.
5. Archive is intended for files that need to be saved but no longer accessed by
HDFS.
A-1&3
B-2&3
C-2&4
D-3&4
Q 22 - When writing data to HDFS what is true if the replication factor is three?
Choose2answers
1. Data is written to DataNodes on three separate racks ifRackAware.
2. The Data is stored on each DataNode with a separate file which contains a
checksum value.
3. Data is written to blocks on three different DataNodes.
4. The Client is returned with a success upon the successful writing of the first
block and checksum check.
A-1&3
B-2&3
C-3&4
D-1&4
Q 23 - Which of the following are among the duties of the Data Nodes in HDFS?
A - Maintain the file system tree and metadata for all files and directories.
B - None of the options is correct.
OptimusPrime Page 46
C - Control the execution of an individual map task or a reduce task.
D - Store and retrieve blocks when told to by clients or the NameNode.
E - Manage the file system namespace.
Q 24 - Which of the following components retrieves the input splits directly from
HDFS to determine the number of map tasks?
A - The NameNode.
B - The TaskTrackers.
C - The JobClient.
D - The JobTracker.
E - None of the options is correct.
Q 27 - Which one of the following statements is false regarding the Distributed Cache?
A - The Hadoop framework will ensure that any files in the Distributed Cache are distributed to
all
map and reduce tasks.
B - The files in the cache can be text files, or they can be archive files like zip and JAR files.
C - Disk I/O is avoided because data in the cache is stored in memory.
D - The Hadoop framework will copy the files in the Distributed Cache on to the slave node
before any tasks for the job are executed on that node.
OptimusPrime Page 47
A - Compare the keys by byte.
B - Performance can be improved in sort and suffle phase by using RawComparator.
C - Intermediary keys are deserialized to perform a comparison.
Q 31 - Keys from the output of shuffle and sort implement which of the following
interface?
A - Writable.
B - WritableComparable.
C - Configurable.
D - ComparableWritable.
E - Comparable.
Answer Key :
1A
2B
3A
4A
5D
6B
7A
8B
9C
10 D
11 D
12 B
13 A
14 A
15 A
16 B
17 C
OptimusPrime Page 48
18 B
19 C
20 C
21 B
22 C
23 D
24 D
25 A
26 B
27 C
28 B
29 C
30 D
31 B
32 C
---------------------------------------------------------------------------------------------------------------------
----------------
Q 4 - Which scenario demands highest bandwidth for data transfer between nodes in
Hadoop?
A - Different nodes on the same rack
B - Nodes on different racks in the same data center.
C - Nodes in different data centers
D - Data on the same node.
Q 5 - The current block location of HDFS where data is being written to,
A - is visible to the client requesting for it.
OptimusPrime Page 49
B - Block locations are never visible to client requests.
C - May or may not be visible to the reader.
D - becomes visible only after the buffered data is committed.
Q 10 - The hdfs command to create the copy of a file from a local system is
A - CopyFromLocal
B - copyfromlocal
C - CopyLocal
D – copyFromLocal
OptimusPrime Page 50
Q 13 - When the namenode finds that some blocks are over replicated, it
A - Stops the replication job in the entire hdfs file system.
B - It slows down the replication process for those blocks
C - It deletes the extra blocks.
D - It leaves the extra blocks as it is.
Q 19 - The information mapping data blocks with their corresponding files is stored in
A - Data node
B - Job Tracker
C - Task Tracker
D – Namenode
Q 20 - The file in Namenode which stores the information mapping the data block
location with file name is −
A - dfsimage
B - nameimage
OptimusPrime Page 51
C - fsimage
D – image
Q 21 - The namenode knows that the datanode is active using a mechanism known as
A - heartbeats
B - datapulse
C - h-signal
D - Active-pulse
Q 24 - Which of the below apache system deals with ingesting streaming data to
hadoop
A - Ozie
B - Kafka
C - Flume
D – Hive
OptimusPrime Page 52
D - Report the activity of various components handled by resource manager
Q 28 - The Zookeeper
A - Detects the failure of the namenode and elects a new namenode.
B - Detects the failure of datanodes and elects a new datanode.
C - Prevents the hardware from overheating by shutting them down.
D - Maintains a list of all the components IP address of the Hadoop cluster.
Q 30 - When a client contacts the namenode for accessing a file, the namenode
responds with
A - Size of the file requested.
B - Block ID of the file requested.
C - Block ID and hostname of any one of the data nodes containing that block.
D - Block ID and hostname of all the data nodes containing that block.
Q 32 - The Hadoop tool used for uniformly spreading the data across the data nodes is
named −
A - Scheduler
B - Balancer
C - Spreader
D – Reporter
Answer Key :
1B
2A
3C
4C
5D
OptimusPrime Page 53
6A
7B
8B
9C
10 D
11 B
12 D
13 C
14 B
15 A
16 C
17 D
18 B
19 D
20 C
21 A
22 A
23 B
24 C
25 B
26 B
27 B
28 A
29 B
30 D
31 D
32 B
33 A
---------------------------------------------------------------------------------------------------------------------
----------------
OptimusPrime Page 54
B - It is aware of the mapping between the node and the rack
C - It is aware of the number of nodes in each of the rack
D - It is aware which data nodes are unavailable in the cluster.
OptimusPrime Page 55
Q 11 - Running Start-dfs.sh results in
A - Starting namenode and datanode
B - Starting namenode only
C - Starting datanode only
D - Starting namenode and resource manager
Q 15 - hadoop fs –expunge
A - Gives the list of datanodes
B - Used to delete a file
C - Used to exchange a file between two datanodes.
D - Empties the trash.
Q 18 - The comman used to copy a directory form one node to another in HDFS is
A - rcp
B - dcp
C - drcp
OptimusPrime Page 56
D – distcp
Q 23 - When you increase the number of files stored in HDFS, The memory required by
namenode
A - Increases
B - Decreases
C - Remains unchanged
D - May increase or decrease
Q 24 - If we increase the size of files stored in HDFS without increasing the number of
files, then the memory required by namenode
A - Decreases
B - Increases
C - Remains unchanged
D - May or may not increase
OptimusPrime Page 57
Q 26 - The decommission feature in hadoop is used for
A - Decommissioning the namenode
B - Decommissioning the data nodes
C - Decommissioning the secondary namenode.
D - Decommissioning the entire Hadoop cluster.
Q 27 - You can reserve the amount of disk usage in a data node by configuring the
dfs.datanode.du.reserved in which of the following file
A - Hdfs-site.xml
B - Hdfs-defaukt.xml
C - Core-site.xml
D - Mapred-site.xml
Q 28 - The namenode loses its only copy of fsimage file. We can recover this from
A - Datanodes
B - Secondary namenode
C - Checkpoint node
D – Never
Q 29 - In a HDFS system with block size 64MB we store a file which is less than 64MB.
Which of the following is true?
A - The file will consume 64MB
B - The file will consume more than 64MB
C - The file will consume less than 64MB.
D - Can not be predicted.
OptimusPrime Page 58
B - Tasktracker to Job tracker
C - Jobtracker to namenode
D - Tasktracker to namenode
Answer Key :
1C
2A
3B
4B
5B
6A
7B
8D
9B
10 A
11 A
12 C
13 D
14 C
15 D
16 A
17 C
18 D
19 B
20 C
21 D
22 D
23 A
24 A
25 C
26 B
27 A
28 C
29 C
30 A
31 C
32 A
33 B
---------------------------------------------------------------------------------------------------------------------
------------------------------------------------
OptimusPrime Page 59
large volume of data stored in a storage area network SAN. As compared to HPC,
Hadoop
A - Can process a larger volume of data.
B - Can run on a larger number of machines than HPC cluster.
C - Can process data faster under the same network bandwidth as compared to HPC.
D - Cannot run compute intensive jobs.
Q 4 - What is the main problem faced while reading and writing data in parallel from
multiple disks?
A - Processing high volume of data faster.
B - Combining data from multiple disks.
C - The software required to do this task is extremely costly.
D - The hardware required to do this task is extremely costly.
Q 5 - Which of the following is true for disk drives over a period of time?
A - Data Seek time is improving faster than data transfer rate.
B - Data Seek time is improving more slowly than data transfer rate.
C - Data Seek time and data transfer rate are both increasing proportionately.
D - Only the storage capacity is increasing without increase in data transfer rate.
OptimusPrime Page 60
B - Only append at the end of file
C - Writing into a file only once.
D - Low latency data access.
Q 10 - HDFS block size is larger as compared to the size of the disk blocks so that
A - Only HDFS files can be stored in the disk used.
B - The seek time is maximum
C - Transfer of a large files made of multiple disk blocks is not possible.
D - A single file larger than the disk size can be stored across many disks in the cluster.
Q 11 - In a Hadoop cluster, what is true for a HDFS block that is no longer available
due to disk corruption or machine failure?
A - It is lost for ever
B - It can be replicated form its alternative locations to other live machines.
C - The namenode allows new client request to keep trying to read it.
D - The Mapreduce job process runs ignoring the block and the data stored in it.
Q 12 - Which utility is used for checking the health of a HDFS file system?
A - fchk
B - fsck
C – fsch
D – fcks
Q 13 - Which command lists the blocks that make up each file in the filesystem.
A - hdfs fsck / -files -blocks
B - hdfs fsck / -blocks -files
C - hdfs fchk / -blocks -files
D - hdfs fchk / -files –blocks
Q 15 - In the local disk of the namenode the files which are stored persistently are −
A - namespace image and edit log
B - block locations and namespace image
C - edit log and block locations
D - Namespace image, edit log and block locations.
OptimusPrime Page 61
Q 16 - When a client communicates with the HDFS file system, it needs to
communicate with
A - only the namenode
B - only the data node
C - both the namenode and datanode
D - None of these
Q 19 - For the frequently accessed HDFS files the blocks are cached in
A - the memory of the datanode
B - in the memory of the namenode
C - Both A&B
D - In the memory of the client application which requested the access to these files.
OptimusPrime Page 62
A - Faster creation of the replicas of primary namenode.
B - To reduce the cycle time required to bring back a new primary namenode after existing
primary fails.
C - Prevent data loss due to failure of primary namenode.
D - Prevent the primary namenode form becoming single point of failure.
Q 28 - The property used to set the default filesystem for Hadoop in core -site.xml is-
A - filesystem.default
B - fs.default
C - fs.defaultFS
D - hdfs.default
OptimusPrime Page 63
B-1
C-0
D–3
Answer Key :
1C
2A
3D
4B
5B
6C
7C
8B
9C
10 D
11 B
12 B
13 A
14 B
15 A
16 C
17 A
18 D
19 A
20 C
21 C
22 B
23 B
24 D
25 B
26 D
27 C
28 B
OptimusPrime Page 64
29 C
30 B
31 D
32 D
OptimusPrime Page 65
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Which of the following is not an example of Social Media?
a. Twitter
1. b. Google
c. Insta
d. Youtube
By 2025, the volume of digital data will increase to
a. TB
2. b. YB
c. ZB
d. EB
For Drawing insights for Business what are need?
a. Collecting the data
3. b. Storing the data
c. Analysing the data
d. All the above
Does Facebook uses "Big Data " to perform the concept of Flashback? Is this True or
4.
False.
a. TRUE
b. FALSE
The Process of describing the data that is huge and complex to store and process is known
as
a. Analytics
5.
b. Data mining
c. Big Data
d. Data Warehouse
Data generated from online transactions is one of the example for volume of big data. Is
6.
this true or False.
a. TRUE
b. FALSE
Velocity is the speed at which the data is processed
7. a. TRUE
b. FALSE
have a structure but cannot be stored in a database.
a. Structured
8. b. Semi-Structured
c. Unstructured
d. None of these
refers to the ability to turn your data useful for business.
a. Velocity
9. b. Variety
c. Value
d. Volume
OptimusPrime Page 66
Value tells the trustworthiness of data in terms of quality and accuracy.
10. a. TRUE
b. FALSE
GFS consists of a Master and Chunk Servers
a. Single, Single
11. b. Multiple, Single
c. Single, Multiple
d. Multiple, Multiple
Files are divided into sized Chunks.
a. Static
12. b. Dynamic
c. Fixed
d. Variable
is an open source framework for storing data and running application on
clusters of commodity hardware.
a. HDFS
13.
b. Hadoop
c. MapReduce
d. Cloud
HDFS Stores how much data in each clusters that can be scaled at any time?
a. 32
14. b. 64
c. 128
d. 256
Hadoop MapReduce allows you to perform distributed parallel processing on large
volumes of data quickly and efficiently… is this MapReduce or Hadoop… i.e statement is
15. True or False
a. TRUE
b. FALSE
Hortonworks was introduced by Cloudera and owned by Yahoo.
16. a. TRUE
b. FALSE
Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem.
17. a. TRUE
b. FALSE
Google Introduced MapReduce Programming model in 2004.
18. a. TRUE
b. FALSE
phase sorts the data & creates logical clusters.
a. Reduce, YARN
b. MAP, YARN
19.
c. REDUCE, MAP
d. MAP, REDUCE
OptimusPrime Page 67
There is only one operation between Mapping and Reducing is it True or False…
a. TRUE
20.
b. FALSE
OptimusPrime Page 68
is a programming model for writing applications that can process Big
Data in parallel on multiple nodes.
a. HDFS
28. b. MAP REDUCE
c. HADOOP
d. HIVE
is a type of local Reducer that groups similar data from the map phase
into identifiable sets.
a. MAPPER
30. b. REDUCER
c. COMBINER
d. PARTITIONER
While Installing Hadoop how many xml files are edited and list them ?
i. core-site.xml
ii. hdfs-site.xml
31.
iii. mapred.xml
iv. yarn.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
32.
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
</?xml >
33. Write the code for hdfs-site.xml ?
OptimusPrime Page 69
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Movie Recommendation systems are an example of
1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression
a. 2 Only
1.
b. 1 and 2
c. 1 and 3
d. 2 and 3
Sentiment Analysis is an example of
1. Regression 2. Classification 3. Clustering 4 Reinforcement Learning
a. 1, 2 and 4
2.
b. 1 and 3
c. 1, 2 and 3
d. 1 and 2
Can decision trees be used for performing clustering?
3. a. True
b. False
What is the minimum no. of variables/ features required to perform clustering?
1. 0
4. 2. 1
3. 2
4. 3
For two runs of K-Mean clustering is it expected to get same clustering results?
5. 1. Yes
2. No
Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations. 4.Terminate when RSS falls
6.
below a threshold.
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Which of the following algorithm is most sensitive to outliers?
1. K-means clustering algorithm
7. 2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
After performing K-Means Clustering analysis on a dataset, you observed the following
8.
dendrogram. Which of the following conclusion can be drawn from the dendrogram?
OptimusPrime Page 70
a. There were 28 data points in clustering analysis
b. The best no. of clusters for the analyzed data points is 4
c. The proximity function used is Average-link clustering
d. The above dendrogram interpretation is not possible for K-Means clustering
analysis
In the figure below, if you draw a horizontal line on y- axis for y=2. What will be the
number of clusters formed?
9.
1. 1
2. 2
3. 3
4. 4
In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
10. 4. Data points with non-convex shapes
a. 1 and 2
b. 2 and 3
c. 2 and 4
d. 1, 2 and 4
The discrete variables and continuous variables are two types of
a. Open end classification
11. b. Time series classification
c. Qualitative classification
d. Quantitative classification
OptimusPrime Page 71
Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a
12. hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
4. None of these
Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a
13.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given by a
14.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible
15.
solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. None of these
is good at handle missing data and support both the kind of
attributes ( i.e Categorial and Continuous attributes )
a. ID3.
16.
b. C4.5.
c. CART.
d. Naïve Bayes.
Decision trees use , in that they always choose the option
that seems the best available at that moment.
a. Greedy Algorithms.
17.
b. Divide and Conquer.
c. Backtracking.
d. Shortest Path Method.
Decision trees cannot handle categorical attributes with many distinct values, such as
country codes for telephone numbers.
18.
a. TRUE
b. FALSE
19. are easy to implement and can execute efficiently even without
OptimusPrime Page 72
prior knowledge of the data, they are among the most popular algorithms for classifying
text documents.
a. ID3
b. Naïve Bayes classifiers
c. CART
d. None of these.
High entropy means that the partitions in classification are
a. Pure
20. b. Not pure
c. Useful
d. Useless
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
21. b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
The maximum value for entropy depends on the number of classes so if we have 8 Classes
what will be the max entropy.
22.
a. Max Entropy is 1
b. Max Entropy is 2
c. Max Entropy is 3
d. Max Entropy is 4
John flies frequently and likes to upgrade his seat to first class. He has determined that if
he checks in for his flight at least two hours early, the probability that he will get an
upgrade is 0.75; otherwise, the probability that he will get an upgrade is 0.35. With his
busy schedule, he checks in at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt. What is the
23.
probability that he did not arrive two hours early?
a. 0.892
b. 0.796
c. 0.685
d. 0.999
Point out the wrong statement.
a. k-nearest neighbor is same as k-means
24. b. k-means clustering is a method of vector quantization
c. k-means clustering aims to partition n observations into k clusters
d. none of the mentioned
Consider the following example “How we can divide set of articles such that those articles
have the same theme (we do not know the theme of the articles ahead of time) " is this:
25.
1. Clustering
2. Classification
3. Regression
4. None of These
OptimusPrime Page 73
Can we use K Mean Clustering to identify the objects in video?
26. 1. Yes
2. No
Clustering techniques are in the sense that the data scientist
does not determine, in advance, the labels to apply to the clusters.
1. Unsupervised
27.
2. Supervised
3. Reinforcement
4. Neural network
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
metric is examined to determine a reasonably optimal value of
k.
1. Mean Square Error
1.
2. Within Sum of Squares (WSS)
3. Speed
4. None of These
If an itemset is considered frequent, then any subset of the frequent itemset must also be
frequent.
1. Apriori Property
2.
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 & 2
if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the
confidence of rule {bread,eggs}→{milk} is
1. 0
3.
2. 1
3. 2
4. 3
Confidence is a measure of how X and Y are really related rather than coincidentally
happening together.
4.
a. True
b. False
A high-confidence rule can sometimes be misleading because confidence does not consider
support of the itemset in the rule consequent. Is This True ?
5.
a. Yes
b. No
recommend items based on similarity measures between users and/or
items.
1. Content Based Systems
6.
2. Hybrid System
3. Collaborative Filtering Systems
4. None of These
OptimusPrime Page 74
There are major Classification of Collaborative Filtering Mechanisms
1. 1
7. 2. 2
3. 3
4. None of These
Movie Recommendation to peoples is an example of
1. User Based Recommendation
8. 2. Item Based Recommendation
3. Knowledge Based Recommendation
4. Content Based Recommendation
recommenders rely on an explicitly defined set of recommendation
rules.
1. Constraint Based
9.
2. Case Based
3. Content Based
4. User Based
Parallelized hybrid recommender systems operate dependently of one another and produce
separate recommendation lists.
10.
1. True
2. False
Association rules are sometimes referred to as
a. market basket analysis
11. b. Itemset Filtering
c. Frequent Itemset Analysis
d. None of these.
if 80% of all transactions contain itemset {bread}, then the support of {bread} is 0.8.
Similarly, if 60% of all transactions contain itemset {bread,butter}, then the support of
{bread,butter} is
12. a. 0.4
b. 0.5
c. 0.6
d. 0.7
Lift is defined as the measure of certainty or trustworthiness associated with each
discovered rule.
13.
a. TRUE
b. FALSE
is able to identify trustworthy rules, but it cannot tell whether a rule is
coincidental.
a. Lift
14.
b. Confidence
c. Support
d. Leverage
OptimusPrime Page 75
recommend items based on similarity measures between users
and/or items. The items recommended to a user are those preferred by similar users.
a. Collaborative Filtering System
15.
b. Content Based Recommendation
c. Knowledge Based Recommendation
d. Hybrid Approaches
Pure collaborative approaches take a matrix of given user–item ratings as the only input
and typically produce output. Is it Pure Collaborative?
16.
a. Yes
b. No
With respect to the determination of the set of similar users, one common measure used in
17.
recommender systems is
a. Cosine Similarity Measure
b. Pearson’s correlation coefficient.
c. Mean Squared Error Method
d. None of these.
Large-scale e-commerce sites, often implement a different technique,
which is more apt for offline preprocessing and thus allows for
the computation of recommendations in real time even for a very large rating matrix.
18. a. Item-Based Recommendation
b. User-Based Recommendation
c. Content-Based Recommendation
d. None of these
Here are two very short texts to compare and find the cosine similarity measure?
I. Julie loves me more than Linda loves me
II. Jane likes me more than Julie loves me
19. a. 0.6
b. 0.7
c. 0.8
d. 0.9
is based on the availability of item descriptions and a profile that
assigns importance to these characteristics.
a. Item-Based Recommendation
20.
b. User-Based Recommendation
c. Content-Based Recommendation.
d. None of these
Consider the features of a movie which are not relevant to a recommendation system.
a. The set of actors of the movie.
21. b. The Director
c. The Year in which the movie was made
d. The Budget of the movie.
OptimusPrime Page 76
A has been implemented, for similarity based retrieval under
nearest neighbors.
a. k-nearest-neighbor method (kNN)
22.
b. Conventional Neural Network (CNN)
c. Bayes Theorem
d. Naïve Bayes Classifier
Case-based recommenders focus on the retrieval of similar items on the basis of different
types of similarity measures
23.
a. TRUE
b. FALSE
In recommendation approaches, items are retrieved using similarity
measures that describe to which extent item properties match some given user’s
24. requirements.
a. Item-Based
b. Case-Based
c. Content-Based
d. User-Based
are based on a sequenced order of techniques, in which each succeeding
recommender only refines the recommendations of its predecessor.
a. Weighted Hybrids
25.
b. Mixed Hybrids
c. Cascade Hybrids
d. Switching Hybrids
require an oracle that decides which recommender should be
used in a specific situation, depending on the user profile and/or the quality of
recommendation
26. a. Weighted Hybrids
b. Mixed Hybrids
c. Cascade Hybrids
d. Switching Hybrids
OptimusPrime Page 77
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Business intelligence (BI) is a broad
category a) Decision d) All of the
1 b) Data mining c) OLAP d
of application programs which support mentioned
includes _____________
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
2 d
in terms of _____________ services locations based locations based mentioned
that drive on profitability on probability
revenues
Which of the following areas are d) All of the
3 a) Revenue b) CRM c) Sales b
affected by BI? mentioned
________ is a performance management
tool that recapitulates an organization’s a) Balanced d) All of the
4 b) Data Cube c) Dashboard a
performance from several standpoints Scorecard mentioned
on a single page.
__________ is a system where operations
a) Data b) Data d) None of the
5 like data extraction, transformation and c) ETL a
staging integration mentioned
loading operations are executed.
_________ is a category of applications
and a) Data d) All of the
6 b) MIS c) EIS c
technologies for presenting and analyzing warehouse mentioned
corporate and external data.
Which of the following is the process of a)
basing an organization’s actions and Institutional c) Slice and d) None of the
7 b) Gap analysis a
decisions performance Dice mentioned
on actual measured results of performance? management
Which of the following does not form part
8 a) SSRS b) SSIS c) SSAS d) OBIEE d
of BI Stack in SQL Server?
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
9 d
in terms of _____________ services that locations based locations based mentioned
drive on profitability on probability
revenues
This is an approach to selling goods and
A. customer
services in which C. permission D. one-to-one
10 managed B. data mining c
a prospect explicitly agrees in advance to marketing marketing
relationship
receive marketing information.
In an Internet context, this is the practice of
tailoring Web a. Web b. customer- d. personalizati
11 c. client/server d
pages to individual users’ characteristics or services facing on
preferences.
This is the processing of data about
customers and their c. customer
a. clickstream b. database d. CRM
12 relationship with the enterprise in order to relationship d
analysis marketing analytics
improve the enterprise’s future sales and management
service and lower cost.
This is a broad category of applications and
technologies for c. business
a. best d. business
13 gathering, storing, analyzing, and providing b. data mart information d
practice intelligence
access to data to help enterprise users make warehouse
better business decisions.
OptimusPrime Page 78
This is a systematic approach to the
gathering, consolidation, d. service
a. database b. marketing c. application
14 and processing of consumer data (both for oriented a
marketing encyclopedia integration
customers and potential customers) that is integration
maintained in a company’s databases.
This is an arrangement in which a company
outsources some b. supplier d. Customer
a. spend
15 or all of its customer relationship relationship c. hosted CRM Information c
management
management functions to an application management Control System
service provider (ASP).
This is an XML-based metalanguage
developed by the Business
Process Management Initiative (BPMI) as a
16 means of modeling a. BizTalk b. BPML c. e-biz d. ebXML b
business processes, much as XML is, itself,
a metalanguage
with the ability to model enterprise data.
This is a central point in an enterprise from
a. contact c. multichannel
17 which all customer b. help system d. call center a
center marketing
contacts are managed.
This is the practice of dividing a customer
base into groups of b. customer
a. customer c. customer life d. customer
18 individuals that are similar in specific ways managed d
service chat cycle segmentation
relevant to marketing, such as age, gender, relationship
interests, spending habits, and so on.
In data mining, this is a technique used to
a. predictive b. disaster d. predictive
19 predict future behavior c. phase change d
technology recovery modeling
and anticipate the consequences of change.
1. According to analysts, for what can
Data
traditional IT systems provide a foundation Big data Collecting and
warehousing Management of
when management storing
20 and Hadoop a
they’re integrated with big data and data unstructured
business clusters
technologies mining data
intelligence
like Hadoop?
Distributed
All of the following accurately describe
21 Open source Real-time Java-based computing b
Hadoop, EXCEPT:
approach
__________ has the world’s largest Hadoop None of the
22 Apple Datamatics Facebook c
cluster. mentioned
All of the
23 What are the five V’s of Big Data? Volume velocity Variety d
above
_________ hides the limitations of Java
24 behind a powerful Scalding Cascalog Hcatalog Hcalding b
and concise Clojure API for Cascading.
What are the main components of Big
25 MapReduce HDFS YARN All of these d
Data?
What are the different features of Big Data
26 Open-Source Scalability Data Recovery All the above d
Analytics?
Define the Port Numbers for NameNode,
All of the
27 Task Tracker and NameNode Task Tracker Job Tracker d
above
Job Tracker.
Facebook Tackles Big Data With _______
28 Project Prism Prism ProjectData ProjectBid a
based on Hadoop
What is a unit of data that flows through a
29 Record Event Row Log b
Flume agent?
OptimusPrime Page 79
A feature F1 can take certain value: A, B,
Feature F1 is Feature F1 is an It doesn’t
C, D, E, & F and represents grade of
an example example belong to any
30 students from a college. Which of the Both of these b
of nominal of ordinal of the above
following statement is true in the following
variable. variable. category.
case
Which of the following is an example of a
None of the all of the
31 deterministic PCA K-Means a
above above
algorithm?
-(5/8 log(5/8)
What is the entropy of the target 5/8 log(5/8) + 5/8 log(5/8) + 5/8 log(3/8) –
32 + 3/8 a
variable? 3/8 log(3/8) 3/8 log(3/8) 3/8 log(5/8)
log(3/8))
a) OLAP is
an umbrella
term that
refers to an c) BI makes an
b) Business
assortment of organization
intelligence
software agile
equips
applications thereby giving None of the
33 Point out the correct statement. enterprises to b
for analyzing it a lower edge mentioned
gain business
an in today’s
advantage from
organization’s evolving market
data
raw data for condition
intelligent
decision
making
a) Distinguish b) Rank
c) Ranks
the products customers and
BI can catalyze a business’s success in customers and d) All of the
34 and services locations d
terms of _____________ locations based mentioned
that drive based on
on probability
revenues profitability
Which of the following areas are affected d) All of the
35 a) Revenue b) CRM c) Sales b
by BI? mentioned
Which of the following does not form part
36 a) SSRS b) SSIS c) SSAS d) OBIEE d
of BI Stack in SQL Server?
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
37 d
in terms of _____________ services that locations based locations based mentioned
drive on profitability on probability
revenues
A set of
databases An approach to Information that
from different a problem that is hidden in a
vendors, is not database and
38 Heuristic is possibly guaranteed to that cannot be None of these b
using work but recovered by a
different performs well simple SQL
database in most cases query.
paradigms
In an Internet context, this is the practice of
tailoring Web a. Web b. customer- d. personalizati
39 c. client/server d
pages to individual users’ characteristics or services facing on
preferences.
OptimusPrime Page 80
A set of
databases An approach to Information that
from different a problem that is hidden in a
b vendors, is not database and
40 Heterogeneous databases referred to possibly guaranteed to that cannot be None of these a
using work but recovered by a
different performs well simple SQL
database in most cases. query.
paradigms
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Movie Recommendation systems are an Reinforcement b and
1 Classification Clustering Regression
example of: Learning c
Reinforcement a,b
2 Sentiment Analysis is an example of: Regression Classification Clustering
Learning and d
What is the minimum no. of variables/
3 0 1 2 3 b
features required to perform clustering?
Is it possible that Assignment of
4 observations to clusters does not change Yes No Can’t say None of these a
between successive iterations in K-Means
Assignment of
observations to
clusters does Centroids do
Terminate
For a fixed not change not change
Which of the following can act as possible when RSS falls
5 number of between between a,b,c,d
termination conditions in K-Means? below a
iterations. iterations. successive
threshold.
Except for iterations.
cases with a bad
local minimum.
Expectation-
Which of the following clustering K- Means Agglomerative Diverse
Maximization a and
6 algorithms suffers from the problem of clustering clustering clustering
clustering c
convergence at local optima? algorithm algorithm algorithm
algorithm
K-means K-medians K-modes K-medoids
Which of the following algorithm is most
7 clustering clustering clustering clustering a
sensitive to outliers?
algorithm algorithm algorithm algorithm
Creating Creating an
Creating an Creating an
How can Clustering (Unsupervised different input feature for
input feature for input feature
Learning) be used to improve the accuracy models for cluster
8 cluster ids as an for cluster size a,b,c,d
of Linear Regression model (Supervised different centroids as a
ordinal as a continuous
Learning): cluster continuous
variable. variable.
groups. variable.
What could be the possible reason(s) for
producing two different dendrograms using Proximity of data points of variables All of the
9 d
agglomerative clustering algorithm for the function used used used above
same dataset?
Data points Data points Data points
In which of the following cases will K- Data points a,b,an
10 with different with round with non-
Means clustering fail to give good results? with outliers dd
densities shapes convex shapes
mputation with
Which of the following is/are valid iterative Nearest
Imputation Expectation All of the
11 strategy for treating missing values before Neighbor c
with mean Maximization above
clustering analysis? assignment
algorithm
OptimusPrime Page 81
In distance
You always get In Manhattan
calculation it
Feature scaling is an important step before the same distance it is an
will give the
12 applying K-Mean algorithm. What is reason clusters. If you important step None of these a
same weights
behind this? use or don’t use but in Euclidian
for all
feature scaling it is not
features
Which of the following method is used for
Elbow Manhattan Ecludian All of the
13 finding optimal of cluster in K-Mean a
method method mehthod above
algorithm?
K-means is Bad Bad
extremely initialization initialization
14 What is true about K-Mean Clustering? sensitive to can lead to Poor can lead to bad None of these d
cluster center convergence overall
initializations speed clustering
Try to run
Which of the following can be applied to algorithm for Find out the
Adjust number
15 get good results for K-means algorithm different optimal number None of these a,b,c
of iterations
corresponding to global minima? centroid of clusters
initialization
If you are using Multinomial mixture All the data All the data All the data
All the data
models with the expectation-maximization points follow n points follow points follow n
points follow
16 algorithm for clustering a set of data points Gaussian two multinomial c
two Gaussian
into two clusters, which of the assumptions distribution (n multinomial distribution (n
distribution
are important: >2) distribution >2)
Both have
Which of the following is/are not true about Expectation
strong
Centroid based K-Means clustering Both starts Both are maximization
assumptions
17 algorithm and Distribution based with random iterative algorithm is a d
that the data
expectation-maximization clustering initializations algorithms special case of
points must
algorithm: K-Means
fulfill
For data
points to be in It has strong It has It does not
a cluster, they assumptions for substantially require prior
Which of the following is/are not true about b and
18 must be in a the distribution high time knowledge of
DBSCAN clustering algorithm: c
distance of data points complexity of the no. of
threshold to a in dataspace order O(n3) desired clusters
core point
Which of the following are the high and low None of the
19 [0,1] (0,1) [-1,1] a
bounds for the existence of F-Score? above
a. Increased
1. All of the following increase the width b. Increased c. Increased d. Decreased
20 confidence c
of a confidence interval except: variability sample size sample size
level
d. The
c. The probability of
a. The
probability observing
probability of b. The
that the results as
3The p-value in hypothesis testing failing to probability
observed results extreme or
represents reject the null that the null
21 are statistically more extreme d
which of the following: Please select the hypothesis, hypothesis is
significant, than currently
best answer of those provided below. given the true, given the
given that the observed,
observed observed results
null hypothesis given that the
results
is true null hypothesis
is true
OptimusPrime Page 82
4. Assume that the difference between the
observed, paired sample values is defined in
the same manner and that the specified
significance level is the same for both
hypothesis tests. Using the same data, the
a. Always c. Sometimes d. Not Enough
22 statement that “a paired/dependent two b. Never True a
True True Information
sample t-test is equivalent to a one sample t-
test on the paired differences, resulting in
the same test statistic, same p-value, and
same conclusion” is: Please select the best
answer of those provided below.
19. Green sea turtles have normally
distributed weights, measured in kilograms,
with a mean of 134.5 and a variance of
23 49.0. A particular green sea turtle’s weight a. 17 kg b. 151 kg c. 118 kg d. 252 kg c
has a z-score of -2.4. What is the weight of
this green sea turtle? Round to the nearest
whole number.
What percentage of measurements in a
d. Cannot Be
24 dataset a. 49% b. 50% c. 51% d
Determined
fall above the median?
24. The proportion of variation in 5k race
times that can be explained by the variation
in the age of competitive male runners was
25 a. 0.663 b. 0.814 c. -0.814 d. 0.440 c
approximately 0.663. What is the value of
the sample linear correlation coefficient?
Round to 3 decimal places.
a. Yes; linear c. No; linear
b. Yes; both the d. No; the age
correlation correlation
25. Using all of the results provided, is it sample linear provided
between age between age
reasonable to predict the 5k race time regression is beyond the
26 and 5k race and 5k race d
(minutes) of a competitive male runner 73 equation and an scope of our
times is times is not
years of age? age in years is available
statistically statistically
provided sample data
significant significant
It uses
machine- Science of
learning making
Computational
techniques. machines
procedure that
Here program performs tasks
takes some
can learn that would
27 Algorithm is value as input None of these b
from past require
and produces
experience intelligence
some value as
and adapt when
output
themselves to performed by
new humans
situations
OptimusPrime Page 83
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
28 Bias is classification None of these b
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory
adapting the
explanations to
fit the new
situation.
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
29 Classification is examples into classification of classification to None of these a
a number of a concept that is a set of
classes given by a examples
certain theory
This takes
only two
Systems that
values. In
The natural can be used
general, these
environment of without
30 Binary attribute are values will be None of these a
a certain knowledge of
0 and 1 and
species internal
.they can be
operations
coded as one
bit
Measure of the
A subdivision The task of
accuracy, of the
of a set of assigning a
classification of
31 Classification accuracy is examples into classification to None of these b
a concept that is
a number of a set of
given by a
classes examples
certain theory
Operations on a
Group of database to Symbolic
similar transform or representation
objects that simplify data in of facts or ideas
32 Cluster is differ order to prepare from which None of these a
significantly it for a information can
from other machine- potentially be
objects learning extracted
algorithm
A definition of a concept is-----if it
33 Complete Consistent Constant None of these a
recognizes all the instances of that concept
A definition or a concept is------------- if it
34 classifies any examples as coming within Complete Consistent Constant None of these b
the concept
OptimusPrime Page 84
A subject-
The actual oriented
discovery The stage of integrated time
phase of a selecting the variant non-
35 Data selection is None of these b
knowledge right data for a volatile
discovery KDD process collection of
process data in support
of management
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
36 Classification task referred to examples into classification of classification to None of these c
a number of a concept that is a set of
classes given by a examples
certain theory
Decision
Approach to the support systems
design of that contain an
Combining
learning information
different
algorithms that base filled with
37 Hybrid is types of None of these a
is structured the knowledge
method or
along the lines of an expert
information
of the theory of formulated in
evolution. terms of if-then
rules.
An extremely
It is hidden
The process of complex
within a
executing molecule that
database and
implicit occurs in
can only be
previously human
recovered if
38 Discovery is unknown and chromosomes None of these b
one is given
potentially and that carries
certain clues
useful genetic
(an example
information information in
IS encrypted
from data the form of
information).
genes.
What could be the possible reason(s) for
producing two different dendrograms using Proximity of data points of variables All of the
39 d
agglomerative clustering algorithm for the function used used used above
same dataset?
Is it possible that Assignment of
40 observations to clusters does not change Yes No Can’t say None of these a
between successive iterations in K-Means
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
This clustering algorithm terminates when
mean values computed for the current
K-Means conceptual expectation agglomerative
1 iteration of the algorithm are identical to the a
clustering clustering maximization clustering
computed mean values for the previous
iteration
As the value of As the value of
The attributes one attribute one attribute
The correlation coefficient for two real- The attributes
are not decreases the increases the
2 valued attributes is –0.85. What does this show a linear b
linearly value of the value of the
value tell you? relationship
related. second attribute second attribute
increases. also increases.
OptimusPrime Page 85
Y is false
Given a rule of the form IF X THEN Y, rule Y is true when X is true when X is false when
when X is
3 confidence is defined as the conditional X is known to Y is known to Y is known to b
known to be
probability that be true. be true be false.
false.
Density based Hierarchical
Partitioning Model based
4 Chameleon is clustering clustering d
based algorithm algorithm
algorithm algorithm
5 Find odd man out DBSCAN K-Mean PAM None of above a
decreases with
increases with
increases with decreases with increase in size
The number of iterations in apriori the size of the
6 the size of the the increase in of the c
___________ maximum
data size of the data maximum
frequent set
frequent set
Which of the following are interestingness
7 Recall Lift Accuracy All of Above b
measures for association rules?
2k – 1 2k – 2
2k candidate 2k -2 candidate
Given a frequent itemset L, If |L| = k, then candidate candidate
8 association association c
there are association association
rules rules
rules rules
_________ is an example for case based- Neural Genetic K-nearest
9 Decision trees d
learning networks algorithm neighbor
The average positive difference between mean positive mean squared mean absolute root mean
10 c
computed and desired outcome values. error error error squared error
Superset of
both closed
Superset of Superset of Subset of
frequent item
only closed only maximal maximal
11 Frequent item sets is sets and d
frequent item frequent item frequent item
maximal
sets sets sets
frequent item
sets
Assume that we have a dataset containing
information about 200 individuals. A
supervised data mining session has
discovered the following rule: IF age < 30
& credit card insurance = yes THEN life
12 63 38 40 89 b
insurance = yes Rule
Accuracy: 70% and Rule
Coverage: 63% How many individuals in
the class life insurance= no have credit card
insurance and are less than 30 years old?
Simple Grouping Labeled Query results
13 Which of the following is cluster analysis? b
segmentation similar objects classification grouping
Which two parameters are needed for Min points and Min sup and Number of
15 Min threshold b
DBSCAN eps min confidence centroids
Both
techniques
build models
whose Both models
The output of Both models
output is require numeric
Which statement is true about neural both models is a require input
16 determined attributes to d
network and linear regression models? categorical attributes to be
by a linear range between
attribute value. numeric.
sum of 0 and 1.
weighted
input attribute
values.
OptimusPrime Page 86
In Apriori algorithm, if 1 item-sets are 100,
17 100 200 4950 5000 c
then the number of candidate 2 item-sets are
Finding
Significant Bottleneck in the Apriori Candidate Number of
18 frequent Pruning c
algorithm is generation iterations
itemsets
typically
are better able
Machine learning techniques differ from assume an have trouble are not able to
to deal with
19 statistical techniques in that machine underlying with large-sized explain their a
missing and
learning methods distribution for datasets behavior.
noisy data
the data
The probability of a hypothesis before the
20 a priori posterior conditional subjective a
presentation of evidence.
21 KDD represents extraction of data knowledge rules model b
Outliers
Outliers should
should be part The nature of
Outliers should be part of the
of the training the problem
be identified test dataset but
22 Which statement about outliers is true? dataset but determines how c
and removed should not be
should not be outliers are
from a dataset. present in the
present in the used
training data.
test data.
23 The most general form of distance is Manhattan Eucledian Mean Minkowski d
High support High support Low support Low support
24 Which Association Rule would you prefer and medium and low and high and low c
confidence confidence confidence confidence
In a Rule based classifier, If there is a rule
Mutually
25 for each combination of attribute values, Exhaustive Inclusive Comprehensive a
exclusive
what do you called that rule set R
To decrease the To improve the
If a set cannot If a set can
efficiency, do efficiency, do
pass a test, its pass a test, its
level-wise level-wise
26 The apriori property means supersets will supersets will a
generation of generation of
also fail the fail the same
frequent item frequent item
same test test
sets sets
If an item set ‘XYZ’ is a frequent item set,
27 Undefined Not frequent Frequent Can not say c
then all subsets of that frequent item set are
The probability that a person owns a sports
car given that they subscribe to automotive
magazine is 40%. We also know that 3% of
the adult population subscribes to
automotive magazine. The probability of a
28 person owning a sports car given that they 0.0368 0.0396 0.0389 0.0398 b
don’t subscribe to automotive magazine
is 30%. Use this information to compute
the probability that a person subscribes to
automotive magazine given that they own a
sports car
Simple regression assumes a __________
29 relationship between the input attribute and quadratic inverse linear reciprocal c
output attribute.
Only Both minimum
Neither support Minimum
To determine association rules from minimum support and
30 not confidence support is c
frequent item sets confidence confidence are
needed needed
needed needed
If {A,B,C,D} is a frequent itemset,
31 C –> A D –>ABCD A –> BC B –> ADC b
candidate rules which is not possible is
High support Low support Low support High support
32 Which Association Rule would you prefer and low and high and low and medium b
confidence confidence confidence confidence
OptimusPrime Page 87
Classification rules are extracted from
33 decision tree root node branches siblings a
_____________
What does K refers in the K-Means
. number of
34 algorithm which is a non-hierarchical Complexity Fixed value No of iterations d
clusters
clustering approach?
If Linear regression model perfectly first Test error is Couldn’t Test error is
Test error is
35 i.e., train error is zero, then also always comment on equal to Train c
non zero
_____________________ zero Test error error
Which of the following metrics can be used
for evaluating regression models? i)R
ii and iv i and ii ii, iii and iv i, ii, iii and iv d
Squared ii) Adjusted R Squared iii) F
Statistics iv) RMSE/MSE/MAE
How many coefficients do you need to
37 estimate in a simple linear regression model 1 2 3 4 b
(One independent variable)?
In a simple linear regression model (One
independent variable), If we change the
38 by 1 no change by intercept by its slope d
input variable by 1 unit. How much output
variable will change?
In syntax of linear model
39 Matrix array vector list c
lm(formula,data,..), data refers to ______
In the mathematical Equation of Linear
(X-intercept, (Slope, X- (Y-Intercept, (slope, Y-
40 Regression Y = β1 + β2X + ϵ, (β1, β2) c
Slope) Intercept) Slope) Intercept)
refers to __________
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
A _________ is a decision support tool that
uses a tree-like graph or model of decisions
Neural
1 and their possible consequences, including Decision tree Graphs Trees a
Networks
chance event outcomes, resource costs, and
utility.
2 Decision Tree is a display of an algorithm. TRUE FALSE a
Flow-Chart &
Structure in
Structure in
which internal
which internal
node represents
node represents
test on an
test on an
attribute, each
attribute, each
3 What is Decision Tree? branch None of Above c
branch
represents
represents
outcome of test
outcome of test
and each leaf
and each leaf
node represents
node represents
class label
class label
OptimusPrime Page 88
Worst, best and
Use a white box
expected values
Possible model, If given
Which of the following are the advantage/s can be
9 Scenarios can result is All of Above d
of Decision Trees? determined for
be added provided by a
different
model
scenarios
Attributes are Attributes are
statistically statistically
Attributes are Attributes can
Which of the following statements about dependent of independent of
10 equally be nominal or b
Naive Bayes is incorrect? one another one another
important. numeric
given the class given the class
value. value.
Which of the following is not supervised Linear
11 Clustering Decision Tree Naive Bayesian a
learning? Regression
How many terms are required for building
12 1 2 3 4 c
a bayes model?
Answering
Solving Increasing Decreasing
13 Where does the bayes rule can be used? probabilistic d
queries complexity complexity
query
How the bayesian network can be used to Full Joint Partial
14 All of Above b
answer any query? distribution distribution distribution
Both
What is the consequence between a node
Functionally Conditionally Conditionally
15 and its predecessors while creating bayesian Dependant c
dependent independent dependant &
network?
Dependant
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
16 Bayesian classifiers is classification None of these a
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory.
adapting the
explanations to
fit the new
situation.
OptimusPrime Page 89
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
17 Bias is classification None of these b
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory
adapting the
explanations to
fit the new
situation.
Additional
acquaintance
used by a A neural
It is a form of
learning network that
18 Background knowledge referred to automatic None of these a
algorithm to makes use of a
learning.
facilitate the hidden layer
learning
process
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
19 Classification accuracy is examples into classification of classification to None of these b
a number of a concept that is a set of
classes given by a examples
certain theory
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
20 Classification is examples into classification of classification to None of these a
a number of a concept that is a set of
classes given by a examples
certain theory
An extremely
It is hidden
The process of complex
within a
executing molecule that
database and
implicit occurs in
can only be
previously human
recovered if
21 Discovery is unknown and chromosomes None of these b
one is given
potentially and that carries
certain clues
useful genetic
(an example
information information in
IS encrypted
from data the form of
information).
genes.
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
22 Classification task referred to examples into classification of classification to None of these c
a number of a concept that is a set of
classes given by a examples
certain theory
OptimusPrime Page 90
The process of
finding a
solution for a
A stage of the problem simply The distance
KDD process by enumerating between two
in which new all possible points as
23 Euclidean distance measure is None of these c
data is added solutions calculated using
to the existing according to the Pythagoras
selection. some pre- theorem
defined order
and then testing
them
The problem of finding hidden structure in Supervised Unsupervised Reinforcement
24 None of these b
unlabeled data is called learning learning learning
Assume you want to perform supervised
learning and to predict number of newborns Structural
25 according to size of storks’ population Classification Regression Clustering equation b
(http://www.brixtonhealth.com/storksBabie modeling
s.pdf), it is an example of
Discriminating between spam and ham e-
26 TRUE FALSE a
mails is a classification task, true or false?
which of the following is not involve in data Knowledge Data Data Data
27 d
mining? extraction archaeology exploration transformation
A class of
A prediction
learning A table with n
made using an
algorithms independent
extremely
that try to attributes can
28 Naive prediction is simple method, None of these c
derive a be seen as an n-
such as always
Prolog dimensional
predicting the
program from space.
same output.
examples
In the context
of KDD and
One of the
data mining,
A component defining aspects
29 Node is this refers to None of these a
of a network of a data
random errors
warehouse
in a database
table.
One of several
possible enters Discipline in
within a statistics that
The result of
database table studies ways to
the
that is chosen find the most
application of
30 Prediction is by the designer interesting None of these a
a theory or a
as the primary projections of
rule in a
means of multi-
specific case
accessing the dimensional
data in the spaces.
table.
What is the relation between the distance
inversely-
31 between clusters and the corresponding proportional no-relation None of these a
proportional
class discriminability?
the classification method in which the upper
exclusive inclusive mid point
32 limit of interval is same as of lower class None of these a
method method method
interval is called….
larger value is 60 and the smallest value is
33 40 and the number of classes is 5 then the 20 25 4 15 c
class interval is
OptimusPrime Page 91
summary and presentation of data in tabular
nominal frequency ordinal
34 form with several non overlapping classes is None of these b
distribution distribution distribution
referred as
the classification method in which the upper
exclusive inclusive mid point
35 and lower limit of interval is also in class None of these b
method method method
interval itself is called….
Suppose there are 25 base classifiers. Each
classifier has error rates of e = 0.35.
Suppose you are using averaging as
36 0.05 0.06 0.07 0.08 b
ensemble of above 25 classifiers will make
a wrong prediction? Note: all classifiers are
independent of each other
The most widely used metrics and tools to Confusion Cost-sensitive Area under the
37 All of Above d
assess a classification model are: matrix accuracy ROC curve
Normalize the
Normalize PCA →
When performing regression or data → PCA →
the data → normalize PCA
38 classification, which of the following is the normalize PCA None of these a
PCA → output →
correct way to preprocess the data? output →
training training
training
Assumes that
all the Assumes that
Which of the following is true about Naive features in a all the features
39 both a and b None of these c
Bayes ? dataset are in a dataset are
equally independent
important
In which of the following cases will K-
means clustering fail to give good results?
40 1) Data points with outliers 2) Data points 1 and 2 2 and 3 1, 2, and 3 1 and 3 c
with different densities 3) Data points with
nonconvex shapes
No
Question a b c d ANS
.
Pictorial
numerical numerical
1 Data visualtization is realted with… representaion None of these a
representation calculations
s
Which of the following are Use of data See context of Clear data finding pattern
2 all of above d
visualtization data understanding in data
Which of the following statements are true
about using visualizations to display a
dataset? I. Visualizations are visually
appealing, but don’t help the viewer
understand relationships that exist in the
data
3 I AND II II AND III I AND III ONLY III d
II. Visualizations like graphs, charts, or
visualizations with pictures are useful for
conveying information, while tables just
filled with text are not useful.
OptimusPrime Page 92
You can create a scatter plot matrix using
all of the
7 the __________ method in sca_matrix scatter_matrix DataFrame.plot b
mentioned
pandas.tools.plotting.
Plots may also be adorned with error bars or
8 True FALSE Cannot Tell All Above a
tables.
Which of the following plots are often used Autocausatio none of the
9 Autorank Autocorrelation c
for checking randomness in time series? n mentioned
__________ plots are used to visually
10 Lag RadViz Bootstrap All Above c
assess the uncertainty of a statistic
Which of the following is not a challenge in
11 Velocity Volume Version Variety c
Big Data Visualization>?
Which of the following is not a problem in Large image Information
12 Visual Noise Scaled Data b
Big Data Visualization>? perception Loss
Which of the following is a problem in Big Structured Multiple
13 Scaled Data Visual Noise c
Data Visualization>? Data valued Data
Which of the candidate is suitable for Type of
14 Cardinality Size of data all of above d
interactive visualtization? Visual
Which of the following follows interactive Overview+Deta
15 Zoom+Pan Focus+Context all of above d
visualization approach? ils
Overview+Deta
16 Visual Mapping is important for_______ Remapping Focus Context a
ils
17 Data visualtization techniques are: Scatter Plot Line Chart Pie Chart all of above d
18 Information Visualtization techniques are Flow Chart Time Line DFD All of above d
19 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these c
20 Information Visualtization techniques are Flow Chart Line Chart Pie Chart None of these a
21 Data visualtization techniques are: Scatter Plot Time Line DFD None of these a
22 Information Visualtization techniques are Scatter Plot Time Line Bubble Chart None of these b
Parallel
23 Data visualtization techniques are: Histogram Time Line None of these a
Coordinates
Semantic
24 Information Visualtization techniques are Histogram Area Chart None of these a
Network
Which of the following is realted term with
25 Exponential U-Shape Null All of above d
correlation?
26 Data visualtization techniques are: Scatter Plot Time Line DFD None of these a
27 Coulmn graph is another name for _____ Bar Chart Scatterplot Histogram Area Chart a
Which of the following follows interactive Overview+Deta
28 Zoom+Pan Focus+Context all of above d
visualization approach? ils
29 information Visualtization techniques are Pie Chart Scatterplot Histogram Area Chart a
Which of the following is category of Linear Modular Variant
30 ER Timeline a
timeline? Timeline Timeline Timeline
Which of the following specifies
31 Scatter Plot Line Chart Area Chart All of above d
relationship amongst variables?
Which of the following specifies category
32 Pie Chart Histogram Bar chart All of above d
Proportions?
Which of the following is category of Variant Comarative Modular
33 ER Timeline c
timeline? Timeline Timeline Timeline
34 Information Visualtization techniques are Flow Chart Time Line DFD All of above d
35 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these c
Pictorial
numerical numerical
36 Data visualtization is realted with… representaion None of these a
representation calculations
s
Which of the following follows interactive Overview+Deta
37 Zoom+Pan Focus+Context all of above d
visualization approach? ils
Which of the following are Use of data See context of Clear data finding pattern
38 all of above d
visualtization data understanding in data
OptimusPrime Page 93
Which of the following specifies
39 Pie Chart Histogram Area Chart None of these c
relationship amongst variables?
Which of the following specifies category
40 Pie Chart Scatter Plot Line Chart None of these a
Proportions?
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Structured Un Structured semi Structured Quasi
1 Precies and steady format data is____ a
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
2 Inconsistant Data is______ b
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
3 Format that self defines itself is________ c
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
4 A little Bit inconsistant data is_______ d
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
5 XML is an example of_______
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
6 RDBMS Folllows__________ a
Data Data Data Structured Data
7 Watson is developed by____ IBM Microsoft AT&T Google a
8 Hadoop is _____ based Framework. C++ Python JAVA C# c
Which of the following are components of MAPREDUC
9 YARN HDFS All of Above d
Hadoop? E
Which of the following are components of
10 JDBC Thrift Server CLI All of Above d
HIVE?
JAVA
Mountable
11 Mahout provides__________ Executable C# Executables All of Above a
Image Format
Libraries
Which of the following are components of
12 FLATTEN Thrift Server Muster None of these b
HIVE?
Which of the following are components of
13 FLATTEN Thrift Server Muster All of above b
HIVE?
Which of the following is components of
14 Fork YARN CLI Metadata b
Hadoop?
Structured Un Structured semi Structured Quasi
15 RDBMS Folllows__________ a
Data Data Data Structured Data
Which of the following is a clustering Fuzzy K
16 Canopy K-Means All of above d
techique? means
Which of the following is HBASE Data
17 Row Table Column All of Above d
Model Terminology?
Which of the following is not a Logistic Recommender
18 Random Forest Naïve Bayes c
classification techique? Regression Algo
Which of the following is a classification Logistic
19 Random Forest Naïve Bayes All of Above d
techique? Regression
Which of the following is HBASE Data Column
20 Cell Timestamp All of Above d
Model Terminology? Family
Which of the following is a clustering Logistic
21 Random Forest K-Means Naïve Bayes c
techique? Regression
Which of the following is HBASE Data None of the
22 Identifier Variant Timestamp c
Model Terminology? above
Which of the following is not a Logistic
23 Random Forest K-Means Naïve Bayes c
classification techique? Regression
Which of the following are components of
24 FLATTEN Thrift Server Muster None of these b
HIVE?
OptimusPrime Page 94
Which of the following is HBASE Data Column None of the
25 Identifier Variant c
Model Terminology? Qualifier above
JAVA
Mountable None of the
26 Mahout provides__________ Executable C# Executables a
Image Format above
Libraries
Which of the following is not a clustering Logistic
27 Canopy K-Means Fuzzy K means a
techique? Regression
Which of the following is a clustering Fuzzy K
28 Canopy K-Means All of above d
techique? means
Hadoop do In Hadoop
Hadoop 2.0
need programming
allows live
specialized framework None of the
29 Point out the correct statement. stream b
hardware to output files are above
processing of
process the divided into
real-time data
data lines or records
30
A sound
Creator Doug
Cutting’s high The toy Cutting’s
Cutting’s
31 What was Hadoop named after? school rock elephant of laptop made c
favorite
band Cutting’s son during Hadoop
circus act
development
___________programming model used to
None of the
32 develop Hadoop-based applications that can MapReduce Mahout Oozie a
above
process massive amounts of data.
Which of the following is not a Logistic
33 Random Forest K-Means Naïve Bayes c
classification techique? Regression
Which of the following are components of
34 FLATTEN Thrift Server Muster All of above b
HIVE?
Which of the following is components of
35 Fork YARN CLI None of above b
Hadoop?
Hadoop is a framework that works with a MapReduce, MapReduce, MapReduce,
36 variety of related tools. Common cohorts Hive and MySQL and Hummer and All of above a
include ____________ HBase Google Apps Iguana
NoSQL databases is used mainly for
Structured Un Structured semi Structured Quasi
37 handling large volumes of ______________ b
Data Data Data Structured Data
data.
Which of the following is not a phase of Communicati Data Model
38 Recall b
Data Analytics Life Cycle? on Preparation Planning
Which of the following is a NoSQL Document
39 SQL JSON All of above b
Database Type? databases
Which of the following is not a NoSQL None of the
40 SQL Server MongoDB Cassandra a
database above
OptimusPrime Page 95
marks question A B C D ans
A group of 4 bits is also
0 1 Nibble Byte Kb None 4 bits make one nibble.
called?
There are how many types of
1 1 3 2 1 None Big Data is of 3 types.
Big Data:
Which of the following are the
2 1 All Volume Variety Velocity. This is an explaination.
V's of Big Data:
Which of these is not a
3 1 Storage Volume Variety Velocity. This is an explaination.
characterstic of Big data?
Which of the following is a Big Data requires high cost to
4 2 Cost Significant Process Fraud Detection
drawback of Big Data: maintain huge amount of data
GINA stands for Global
Global Innovation Network and Global Invention in Globally Investment in
5 2 Fullform of GINA is: None Innovations Networks and
Analysis. Networks and Analytics Neurons and Analytics
Analysis.
Which is the phase 3 in Data Model Planning is the 3rd phase
6 2 Model Planning Model Building Data Preparation Operationalize
Analytics Life cycle. in life cycle.
GINA team thought to GINA targeted to achieve three
7 2 3 2 1 5
accomplish mainly____ goals: goals for the project.
The Data Preparation stage
8 2 Analyzation Collection Cleansing Processing. This is an explaination.
doesn’t involve:
Unstructured Data is further Unstructured data is divided into
9 2 2 3 4 5
divided into how many types? 2 types.
The GINA team mainly used
The team used Tableau to
10 2 which software tool to analyze Tableau Hadoop HIVE SQL
visualize the Data.
the Data
Which of the follwing is the first
11 2 step of Data Analytics Life Discovery Data Preparation. Model Planning Data Aware This is an explaination.
Cycle:
There are how many phases in there are 6 stages in data
12 2 6 5 4 7
data analytics life cycle: analytics life cycle.
SEMMA Methodology has SEMMA methodology has five
13 2 5 4 6 7
how many stages: stages.
Which phase of Life Cycle
Phase 5 involves collaboration
14 2 requires collaboration with Phase 5 Phase 6 Phase 4 Phase 3
with stakeholders.
stakeholders?
In Building a Model, how many
15 2 2 3 4 5 This is an Explaination.
phases are required:
How much Data in the whole Only 20% of world's total data is
16 2 0.2 0.4 0.6 0.5
world is structured: structured.
10^7 bytes of memory is equal
17 2 1ZB 1TB 1YB 1XB 10^7 B is equal to 1 ZB.
to:
Data Scientists in the GINA
NLP technique was used on the
team used which technique on Natural Language
18 2 Hadoop HIVE SQL description of Innovation
the textual Description of the Processing(NLP)
Roadmap Idea.
Innovation Roadmap Idea.
How many types of data Two types of data anlytical
19 2 analytics methodologies are 2 4 3 6 methodologies are there. EDA
there? and CDA
Bell Curve is also known as
20 3 Other name for Bell Curve is: Normal Distribution. Poisson Distribution Bionomial Distribution Bernoulli Distribution.
normal distribution.
One of the most important tasks
One of the most important
21 3 Statical Modeling Testing of Data Visualization Operationalize in big data analytics is statistical
tasks in big data analytics is:
modeling
Some of the approaches
considered for building the data
22 3 All CRISP-DM SEMMA MAD Skills This is an explaination.
analytics lifecycle framework
best practices are:
In Phase 4, the team develops
23 3 All Testing of Data Training of Data Production purposes This is an explaination.
datasets for:
Cross International Company's Initial CRISP-DM stands for Cross
Fullform of CRISP-DM Cross Industry Standard Process Common Industry Standard
24 3 Standard Process for Standards Progress for Industry Standard Process for
Methodology is: for Data Mining Program for Data Mining
Data Modeling Data Methods Data Mining.
SEMMA Methodology
25 3 doesn’t include which of the Evaluate Sample Explore Asses This is an Explaination.
following stages:
In Which stage, the data is In last phase i.e. Opeartionalize
monitored and analyzed to see Data is monitored and analyzed
26 3 Operationalize Collection Plan Model Data Aware
if the generated model is to see if the generated model is
creating the expected results. creating the expected results.
Data is captured in how many
27 3 3 4 5 6 Data is captured in 3 main ways.
ways:
OptimusPrime Page 96
marks question A B C D ans
In phase 2 of the Data
The team performs ETL and
Anlaytics Life Cycle, the team
28 3 3 2 4 6 ELT and ETLT in 2nd phase of
performs how many analytics
the cycle.
to get the data in the sandbox.
The total area under the bell Area under the bell curve is 1
29 3 1 2 3 4
curve is____unit. unit.
Wilcoxon rank-sum test is also Wilcoxon rank-sum test is also
30 1 Mann-Whiteney U test Mean Difference Alternative Hypothesis Null Hypothesis
known as? called Mann- Whiteney U Test.
Which test is also known as T-
31 1 Hypothesis Test Mean Difference K-means test None This is an explaination.
test?
This eqn is of Mean difference
32 1 This equation is of which test? Mean Difference K-Means Null Hypothesis Alternative Hypothesis
test.
A test of a statistical A test of a statistical hypothesis,
hypothesis, where the region of where the region of rejection is
33 1 rejection is on a side of the One tailed test Two-tailed test Tailed test Null test on only one side of the sampling
sampling distribution, is distribution, is called a one-tailed
called___________. test
How many types of Statical There are two types of Statical
34 1 2 3 4 6
Hypothesis is there? Hypothesis.
Analysis of Variance is also ANOVA stands for Analysis of
35 1 ANOVA Mean Difference Alternative Hypothesis Null Hypothesis
refered as? Variance.
How many steps are involved There are 4 steps in Hypothesis
36 1 4 2 3 5
in a Hypothesis Testing? testing.
The strength of evidence in The strength of evidence in
37 2 support of a null hypothesis is P-value K-value H-value Null-value support of a null hypothesis is
measured by? measured by the P-value.
Difference in means is also Difference in means is also
38 2 Two sample t-test T- test M-test Two sample test
called? known as two sample t test.
The k-medoids is also The k-medoids is also called
Partitioning Around Medoids
39 2 called_______________ Lloyd's Algorithm Poisson's Algorithm Regression partitioning around medoids
(PAM)
algorithm. (PAM) algorithm .
Clustering is an example of Clustering is an example of
40 2 Unsupervised Learning Supervised Learning Classification Regression
____? unsupervised learning.
Which of the following is not an
41 2 advantage of K means Requires a Priori Fast Robust easy to evaluate. This is an explaination.
Clustering?
The probability of committing a The probability of committing a
42 2 Beta Alpha Delta Theta
Type 2 error is called Type II error is called Beta
The______ variation we have
The less variation we have within
within clusters, the more
clusters, the more homogeneous
43 2 homogeneous (similar) the data Less More Variable Fixed
(similar) the data points are
points are within the same
within the same cluster.
cluster.
Which hypothesis is usually the Null Hypothesis is usually the
hypothesis in which sample hypothesis that sample
44 2 Null-Hypothesis Mean Difference K-means test Alternative Hypothesis
observations result is purely observations result purely from
from chance? chance.
Classical" ANOVA for
Classical" ANOVA for balanced
45 2 balanced data does how many 3 2 1 4
data does three things at once.
things at once?
K-mean clustering is used to NP hard problems are solved
46 2 NP-hard problems NP Problems Hypothesis Problems P problems
solve which problems? using K means clustering.
The probability of committing a The probability of committing a
47 2 Alpha Beta Gama Delta
Type I error is called? Type I error is called alpha
K means Clustering is also K means clustering is also called
48 2 Lloyd's Algorithm Gaussian Algorithm Poisson's Algorithm None
known as? Lloyds algo.
Which algorithm requires the k-means clustering requires the
49 3 user to specify the number of K-means clustering Gaussian Algorithm Alternative Hypothesis Null Hypothesis user to specify the number of
clusters k to be generated. clusters k to be generated.
K means clsutering uses which expectation-maximization
50 3 approach to solve the Expectation-maximization Greedy Approach Divide and Conquer None technique is used by k means
problems? clustering.
How many factors affect the The power of a hypothesis test is
51 3 3 2 1 4
power of a hypothesis test? affected by three factors.
Law of variance is also called
52 3 Law of Variance is called? Eve's Law Laplace Law Poisson's Algorithm Regression
Eve's law.
K-Medoids use which K Medoids use greddy
53 3 Greedy Approach Divide and Conquer Recursive None
approach to solve problems? approach to solve problems
The time complexity of k Time complexity is O(n^2) of k
54 3 O(n^2) O(nlogn) O(n) O(1)
means clustering is? means clustering.
the number (k ) of clusters
The number k of clusters
55 3 assumed in k-medoids is Priori Null Hypothesis ANNOVA
OptimusPrime Effect size Page 97
assumed known as priori.
known as?
marks question A B C D ans
The effect size is the difference
What is the difference between
between the true value and the
56 3 the true value and the value Effect -size Null Hypothesis Alternative Hypothesis ANOVA
value specified in the null
specified in the null hypothesis.
hypothesis.
Time complexity of k medoids
57 3 O(n^2) O(nlogn) O(n) O(n^3) This is an explaination.
is?
Which algorithm aims at K means algorithm aims at
58 3 minimizing an objective function K-means Mean Difference Alternative Hypothesis ANOVA minimizing an objective function
know as squared error function know as squared error function
Which algorithm was the
Apriori Algorithm was earliest in
59 1 earliest of the association rule Apriori Algorithm Gaussian Algorithm K means clustering Bernoulli Distribution.
the association of algorithms.
algorithms?\n
The Apriori algorithm takes The Apriori algorithm takes a
a______ iterative approach to bottom-up iterative approach to
60 1 uncovering the frequent Bottom-Up Top-Down Recursive None uncovering the frequent itemsets
itemsets by first determining all by first determining all the
the possible items possible items
Apriori uses breadth-first search
Apriori uses which structure to
and a Hash tree structure to
61 1 count candidate item sets BFS DFS Queue Stack
count candidate item sets
efficiently?
efficiently
"y=a+b*x^2". This equation
62 1 Polynomial Regression Logistic Regreasion Linear Regression Lasso Regression This is an explaination.
shows which regression?
__________ is defined as the Confidence is defined as the
measure of certainty or measure of certainty or
63 2 Confidence Recursion Item-set None
trustworthiness associated with trustworthiness associated
each discovered rule. with\neach discovered rule.
In which Regression, we In Logistic Regression, we
64 2 Logistic Regression Linear Regression Both None
predict the value by 1 or 0? predict the value by 1 or 0.
The formula for linear The formula for linear regression
65 2 Y’ = bX+A Y’ = bX - A. Y’ = bX /A. Y’ = bX * A.
regression is: is: Y’ = bX + A.
Which regression is useful PLS regression is also useful
Partial Least Squares(PLS)
66 2 when there are a large number Cox Regression Lasso Regression Logistic Regression when there are a large number of
Regression
of independent variables. independent variables.
Which regression is an Simple linear regression is an
67 2 approach for predicting a Linear-Regression Logistic Regreasion Elasticnet Regression None approach for predicting a
response using a single feature. response using a single feature.
Association rule mining consists Association rule mining consists
68 2 2 3 4 5
of _______ steps. of 2 steps
Which type of regression is Ordinal regression is suitable
69 2 suitable when dependent Ordinal Regression Linear Regression Cox Regession Logistic Regression when dependent variable is
variable is ordinal in nature? ordinal in nature
Which regression is used for ElasticNet regression is used for
70 2 ElasticNet Regression Linear Regression Logistic Regression None
support vector machines support vector machines,
Which regression can solve Support-Vector Regession can
71 2 both linear and non-linear Support Vector Regression Linear Regression Logistic Regression ElasticNet Regression solve both linear and non linear
models? models.
Which is the most common Least Square Method is the most
72 2 method used for fitting a Least Square Method Mean Difference Null Hypothesis Classification common method used for fitting
regression line a regression line
_______problems are when A regression problem is when
73 2 the output variable is a real or Regression Classification Recursive Hypothesis the output variable is a real or
continuous value. continuous value.
Linear Regression is a machine
Linear Regression is a machine
learning algorithm based on
74 2 Supervised Learning Unsupervised Learning Recursive Learning All learning algorithm based on
______ learning regression
supervised regression algorithm.
model.
When dependent variable's
When dependent variable's
variability is not equal across
variability is not equal across
75 2 Heteroscedasticity Homooscedasticity Multicolinearity Outliers. values of an independent
values of an independent
variable, it is called
variable, it is called
heteroscedasticity
_________requires large Logistic Regression requires
sample sizes because maximum large sample sizes because
76 2 likelihood estimates are less Logistic Regression Linear Regression Lasso Regression ElasticNet Regression maximum likelihood estimates
powerful at low sample sizes are less powerful at low sample
than ordinary least square sizes than ordinary least square
PCR Regression is divided into PCR regression is divided into 2
77 2 2 3 4 5
how many steps? steps
78 3 L2 regularization is also called? Tikhonov Regularization Norm Regularization Poisson's Regularization None This is an explaination.
When the variance of count When the variance of count data
79 3 data is greater than the mean Overdispersion Underdispersion Dispersion High dispersion is greater than the mean count, it
count, it is a case of? is a case of overdispersion
OptimusPrime Page 98
marks question A B C D ans
Which regression assumes the Linear regression assumes the
80 3 normal distribution of the Linear-Regression Logistic Regreasion Elasticnet Regression None normal or gaussian distribution of
dependent variable? the dependent variable.
Nature of predicted data in Nature of predicted data in
81 3 Ordered Unordered Both None
regression is? regression is ordered.
Which regression uses a binary Logistic regression uses a binary
82 3 dependent variable but ignores Logistic Regression Linear Regression Cox Regession Lasso Regression dependent variable but ignores
the timing of events. the timing of events.
The Ridge Regression is also The ridge regression is also
83 3 Shrinkage Regression Percentile Regression Elasticnet Regression Lasso Regression
known as? known as Shrinkage Regression.
In which regression, we In Linear Regession we calculate
calculate Root Mean Square Root Mean Square
84 3 Linear-Regression ElasticNet Regression Logistic Regression All
Error(RMSE) to predict the Error(RMSE) to predict the next
next weight value. weight value.
The______ is the standard The residual standard error is the
85 3 deviation of the observed Residual standard error Mean Difference Error Data Error All standard deviation of
residuals. the\nobserved residuals.
Which Regression is used Poisson regression is used when
86 3 when dependent variable has Poisson Regression Linear Regression Cox Regession Lasso Regression dependent variable has count
count data. data.
________________regression
Quasi-Poisson regression can
can handle both over-
87 3 Quasi-Poisson regression Cox Regression Elasticnet Regression Linear Regression handle both over-dispersion and
dispersion and under-
under-dispersion.\n
dispersion.\n
___ is the regularization
λ is the regularization parameter
88 3 parameter in Lasso λ θ Ω β
in lasso regression.
Regression?
Decision Tree is a hierarchical Decision Tree is a hierarchical
model that does the separation model that recursively does the
89 1 Recursion Pointers Greedy Approach Divide and Conquer
of the\ninput space into class separation of the\ninput space
regions using: into class regions
Learning Algorithm of Decision Decision Tree uses greedy
90 1 Greedy Approach Divide and Conquer Both None
Tree is: approach for learning algorithm.
Normal Distribution is also
91 1 Gausiann Distribution Bernoulli Distribution Naïve Bias Binary Distribution This is an explaination.
called?
Classification has how many There are 2 phases of
92 1 2 3 4 5
phases: classification.
"Every pair of features being Naïve Bias uses the principle that
classified is independent of every pair of features being
93 1 Naïve Bais Classifier Decision Tree Bernoulli Distribution Normal Distribution
each other".This principle is classified is independent of each
used by: other.
This equation is of which
94 2 Gausiann Distribution Binary Distribution Naïve Bias Gross-Entrpoy This is an explaination.
theorem?
In Naïve Bias, The Datasets
data sets are divided into two
95 2 are divided into how many 2 3 4 5
types in naïve bias.
types?
Decision trees can be used to Decision trees can be used to
96 2 predict non-categorical values Regression Trees Categorial trees Normal tree None predict non-categorical values is
is called? called regression trees
An attribute with____Gini
an attribute with lower Gini index
97 2 index should be preferred in a Lower Higher Recursive Negative
should be preferred.
decision tree.
In Naïve Bias, if any two If any two events A and B are
98 2 events A and B are P(A,B)=P(A)P(B) P(A,B)=P(A)/P(B) P(A,B)=P(B) P(A,B)=P(B)P/(A) independent,
independent, then, then,P(A,B)=P(A)P(B)
What is the measure of
Entropy is the measure of
99 2 uncertainty of a random Entropy. Gain Gini Index None
uncertainty of a random variable
variable in a decision tree.
Which of the following is not
100 2 Stable Easy to understand Easy to explain Easy to evaluate. this is an explaination.
true for decision trees?
Decision tree algorithm falls Decision tree algorithm falls
101 2 under the category of which Supervised Unsupervised Regression Classification under the category of supervised
learning? learning
False Positives and False One of the use Bayes Theorem is
102 2 Negatives is an application of Bayes' Theorem Binary Distribution Bernoulli Distribution Normal Distribution false positives and false
which theorem? negatives.
Decision Tree used in mining
There are 2 types of decision
103 2 the data are of how many 2 3 4 5
trees used in data mining.
types?
In Bayes' Theorem, P(A) and
P(A) and P(B) are the
P(B) are the probabilities of
probabilities of observing A and
104 3 observing A and B Marginal Probability Normal Distribution Bernoulli Distribution Parallel Algorithm.
B respectively; they are known
respectively; they are known OptimusPrime Page 99
as the marginal probability.
as:
marks question A B C D ans
ID3 Algorithm in a decision ID3 stands for Iterative
105 3 Iterative Dichotomiser 3 (ID3) Interval Driven Interconnected Decision None
tree stands for? Dichotomiser 3 (ID3)
Probably the best way of
Probably the best way of
estimating performance for very
106 3 estimating performance for Boot Strapped Method Normal Distribution Naïve Bias Binary Distribution
small data sets is bootstrapped
very small\ndata sets is:
method
The Decision Tree works on Decision Tree works on
107 3 Disjunctive Normal Form Product of Sum Bijective Form Conjuctive Form
which form? Disjunctive normal form.
The decoupling of the class The decoupling of the class
conditional feature distributions conditional feature distributions
108 3 means that each distribution 1-D 2-D 3-D NONE means that each distribution can
can be independently estimated be independently estimated as a
as a________ distribution. one dimensional distribution.
Theoretical concept to evaluate
109 3 COLT PAC Model Naïve Bias Prediction. This is an explaination.
Classfiers is:
____________is a metric to Gini Index is a metric to measure
measure how often a randomly how often a randomly chosen
110 3 Gini Index Entropy Pointer Gross-Entrpoy
chosen element would be element would be incorrectly
incorrectly identified identified
The most notable types of The most notable types of
111 3 3 2 1 4
decision tree algorithms are: decision tree algorithms are 3
Which process is completed The recursive partition is
when the subset at a node all completed when the subset at a
112 3 Recursive Partitioning Termination Transformation Prediction.
has the same value of the target node all has the same value of
variable? the target variable
The_______ method reserves The holdout method reserves a
113 3 a certain amount for testing and Holdout Parallel Algorithm Naïve Bias Normal Distribution certain amount\nfor testing and
uses the remainder for training. uses the remainder for training
This equation is of which
114 3 Bayes' Theorem Normal Distribution Bernoulli Distribution Gross-Entrpoy This is an explaination.
theorem?
"Independence among the Independence among the
115 3 features". This is an assumption Naïve Bais Classifier Bernoulli Distribution Parallel Algorithm Binary Distribution features is an assumption in
in: Naïve bias.
Error rate obtained from error rate obtained from training
116 3 Resubstitution Error Grid Gini Index True error
training data is called: data is called resubstitution error.
In Decision Tree entropy is
117 3 proportional inverse High Less This is an explaination.
__________ to content.
In Decision Tree, No root-to-
No root-to-leaf path should
leaf path should contain the
118 3 Twice Once Thrice Four Times. contain the same discrete
same discrete attribute
attribute twice
____________.
Using_________, designers
Using data visualization methods,
can make information
119 1 Data Visualization Classification Regression Supervised Learning. designers can make information
understandable for
understandable for stakeholders.
stakeholders.
The additional visual methods
120 1 All Tree Map Parallel Coordinates Semantic Networks. This is an explaination.
include:
Data Visualization tools
121 1 Ms--Excel Tableau Power BI Jupyter This is an explaination.
Doesn’t include:
Which of the following requires
122 1 Javascript Knowledge to run All Chart.js Polymap Sigmajs This is an explaination.
the visualization tool?
Merits of Tableau doesn’t Merits of tableau doesn’t include
123 1 Cost Performance Usage Computation
include which factor: the cost factor.
Which of these is not a type of
124 1 Pictograph Bar-Graph Line-Chart Pie-Chart This is an explaination.
Big Data Visualization.
The drag-and-drop editor od
The drag-and-drop editor of
which tool makes it easy to
Infogram makes it easy to create
125 2 create professional-looking Infogram Google Chart Tableau Grafana
professional-looking designs
designs without a lot of visual
without a lot of visual design skill.
design skill.
How many V's are defined for There are 4 V's of Data
126 2 4 6 2 3
Data Visualization. visualization.
Which of the following is not a Tableau is a chargeable tool of
127 2 Tableau Google Chart Jupyter Hub-Spot CRM
free Data Visualization tool? data visualization.
Companies that work with
Companies that work with both
both traditional and big data
traditional and big data may use
128 2 use which technique to look at Pie-Chart Bar-Graph Stream graph Line-Chart
pie chart to look at customer
customer segments or market
segments or market shares
shares?
Visualization of Data includes
129 2 which of the following All Information Loss Visual Noise Large Image Perception. This is an explaination.
problems: OptimusPrime Page 100
Mainly, Data Visualization has There are 5 main challenges to
130 2 5 6 4 2
how many types of challenges? data visualization.
marks question A B C D ans
Google charts uses
Which tool uses HTML5/SVG
131 2 Google Charts Jupyter Grafana Tableau HTML5/SVG since its browser
to visualize data
compatible.
According to Colin Ware’s According to Colin Ware’s
Information Visualization: Information Visualization:
132 2 Perception for Design, he 4 2 1 3 Perception for Design, he defines
defines_____ pre-attentive four pre-attentive visual
visual properties. properties
_____ is based on space-filling Tree map method is based on
133 2 visualization of hierarchical Tree-Map Stream graph Bar-graph Line-Chart space-filling visualization of
data. hierarchical data
Which graph shows the Gantt chart show the
dependency relationships dependency relationships
134 2 Gantt-Chart Line-Chart Pie-Chart Bar-Graph
between activities and current between activities and current
schedule status. schedule status.
Another name for distribution Non parametric data is also
135 2 Non parametric data Parametric Data static data Dynamic data
free data is: called distribution free data.
Which chart is used for Bar Graph is used for
comparison of values, such as Comparison of values, such as
136 2 sales performance for several Bar-Graph Gantt-Graph Line-Chart Pie-Chart sales performance for several
persons or businesses in a persons or businesses in a single
single time. time
Graphical Techniques are
_____________are graphics
graphics in the field of statistics
137 2 in the field of statistics used to Graphical-Techniques Line-Chart Regression Classification
used to visualize quantitative
visualize quantitative data.
data.
_____ can handle several Parallel Coordinates can handle
factors for a large number of several factors for a large
138 2 objects per single screen, so it Parallel Coordinates Stream graph Google Chart Jupyter number of objects per single
satisfies the data variety screen, so it satisfies the data
criterion. variety criterion
Chart.js provides how many
139 3 8 5 3 6 This is an explaination.
types of charts?
Which visualization tool
Grafana supports mixed data
supports mixed data sources,
sources, annotations, and
annotations, and customizable
140 3 Grafana Tableau Google Chart Jupyter customizable alert functions, and
alert functions, and it can be
it can be extended via hundreds
extended via hundreds of
of available plugins.
available plugins.
Which tool was created Datawrapper was created
141 3 specifically for adding charts Data Wrapper Tableau Google Chart Jupyter specifically for adding charts and
and maps to news stories. maps to news stories.
Conventional Visualization Mekko chart is a new technique
142 3 Mekko Chart Pie-Chart Bar-graph Histogram
methods doesn’t include: to visualize data.
_____________ is a type of a Streamgraph is a type of a
stacked area graph, which is stacked area graph, which is
143 3 displaced around a central axis, Streamgraph Bar-Graph Pie-Chart Line-Chart displaced around a central axis,
resulting in flowing and organic resulting in flowing and organic
shape. shape
Which visual tool includes over
Fusion charts includes over 150
144 3 150 chart types and 1,000 Fusion charts Tableau Google Chart Jupyter
chart types and 1,000 map types
map types?
Which graph/chart is a
A semantic network is a
graphical representation of
graphical representation of
logical relationship between
logical relationship between
different concepts. It generates
145 3 Semantic Networks Bar-Graph Pie-Chart Line-Chart different concepts. It generates
directed graph, the
directed graph, the combination
combination of nodes or
of nodes or vertices, edges or
vertices, edges or arcs, and
arcs, and label over each edge
label over each edge.
According to SAS we can According to SAS we can
process only______ of process only 1 kilobit of
146 3 1 Kilobit 1 Byte 1 Bit 1 MB
information per second on a information per second on a flat
flat screen. screen
There are____ steps for
147 3 4 5 3 6 This is an explaination.
interactive data visualization:
When working with big data, When working with big data,
companies can use which companies can use the line chart
visualization technique to track visualization technique to track
148 3 total application clicks by Line-Chart Bar-Graph Pie-Chart Stream graph total application clicks by weeks,
weeks, the average number of the average number of
complaints to the call center by complaints to the call center by
months, etc.\n\n months, etc.\n\n
Which of the following
149 1 All Facebook Netflix Adobe This is an explaination.
Enterprises use HBase? OptimusPrime Page 101
marks question A B C D ans
Which NLP is used in the From 2010, Neural NLP is
150 1 Neural NLP Symbolic NLP Statical NLP None
present era? being used.
The Computer World magazine The Computer World magazine
states that unstructured states that unstructured
151 1 information might account for 70-80% 0.9 0.5 0.6 information might account for
more than______of all data in more than 70%–80% of all data
organizations. in organizations.
Almost all of the information Almost all of the information we
we use and share every day, use and share every day, such as
152 1 such as articles, documents and Unstructured Structured Semantic None articles, documents and e-mails,
e-mails, are are completely or partly
completely___________. unstructured
The Unstructured Information
Which standard provided a Management Architecture
common framework for (UIMA) standard provided a
Unstructured Information
processing information to Management common framework for
153 1 Management Architecture Data Architecure None
extract meaning and create Architecture for Data processing this information to
(UIMA)
structured data about the extract meaning and create
information? structured data about the
information.
The base Apache Hadoop The base Apache Hadoop
154 2 framework is composed of the 4 2 3 6 framework is composed of the
how many modules? four modules.
No-SQL doesn’t include
155 2 MS-SQL HBASE DyanoDB MongoDB This is an explaination.
which software?
There are _______main types There are 3 types of OLAP
156 2 3 2 5 6
of OLAP systems. systems.
SQL alternative in Apache HIVE-QL is the alternative to
157 2 HIVEQL BASEQL SPARK-QL H-QL
HIVE is called? SQL in Apche Hive family.
MapReduce program executes MapReduce program executes in
158 2 3 2 5 4
in how many stages? three stages.
How many types of NO-SQL There are 4 types of databases in
159 2 4 3 2 6
database are there? NO-SQL.
MapReduce is a processing
MapReduce is a processing
technique and a program
technique and a program model
160 2 model for distributed JAVA Python C++ R
for distributed computing based
computing based on which
on java
programming Language?
Hive supports how many Hive supports all four properties
161 2 4 3 2 1
properties of transactions? of transactions
HDFS consists of only one
HDFS consists of only one
162 2 Master Node Slave Node Both None Name Node that is called the
Name Node that is called as?
Master Node.
Which Apache Software is
needed to process massive Hbase to process massive
163 2 amounts of data for the Apache HBASE Apache Spark Apache-PIG Apache-mahout amounts of data for the purposes
purposes of natural-language of natural-language search
search?
Which database store data in a No-sql databases that store data
164 2 format other than relational NO-SQL HIVESQL SPARK-QL H-QL in a format other than relational
tables tables.
Which is a project of the Mahout is a project of the
Apache Software Foundation Apache Software Foundation to
to produce free produce free implementations of
165 2 implementations of distributed Apache Mahout Apache Spark Apache-PIG Apache HBASE distributed or otherwise scalable
or otherwise scalable machine machine learning algorithms
learning algorithms focused focused primarily on linear
primarily on linear algebra? algebra.
MapReduce model is a
Which model is a specialization
specialization of the split-apply-
166 2 of the split-apply-combine MapReduce Hadoop HBASE HIVE
combine strategy for data
strategy for data analysis?
analysis.
All Hadoop commands are
All Hadoop commands are invoked by the
167 2 $HADOOP_HOME/bin/hadoop $HADOOP/bin/hadoop $HADOOP_HOME/hadoop $HADOOP_HOME/bin
invoked by which command? $HADOOP_HOME/bin/hadoop
command
The table typically enforces the The table typically enforces the
schema when the data is schema when the data is loaded
loaded into the table. This into the table. This enables the
enables the database to make database to make sure that the
168 3 sure that the data entered Schema on Write Schema on Read Schema for Read Write None data entered follows the
follows the representation of representation of the table as
the table as specified by the specified by the table definition.
table definition. This design is This design is called schema on
called? OptimusPrime write. Page 102
marks question A B C D ans
Which command formats the Namenode -format command
169 3 Namenode -format Node -format Name -format Format
DFS filesystem? formats the DFS file system.
Which command applies the
oiv applies the offline fsimage
170 3 offline fsimage viewer to an oiv fs fc ov
viewer to an fsimage.
fsimage?
Hadoop requires which Java
Hadoop requires Java Runtime
171 3 Runtime Environment (JRE) or 1.6 1.2 1.5 1
Environment (JRE) 1.6 or higher
higher version?
Every Data node sends a
Every Data node sends a
Heartbeat message to the
Heartbeat message to the Name
172 3 Name node every____ 3 2 4 1
node every 3 seconds and
seconds and conveys that it is
conveys that it is alive
alive.
HDFS can store upto1 TB of
173 3 HDFS can store files upto: 1 TB 1 GB 1ZB 1PB
files.
Which of the following is a HBASE is a popular wide
174 3 HBase SQL DyanoDB MongoDB
wide-column store? columnn store.
Which node acts as both a A slave or worker node acts as
175 3 DataNode and TaskTracker in Slave Node Data Node Admin Node Name Node both a DataNode and
Hadooop. TaskTracker.
HDFS system uses which HDFS system uses TCP/IP
176 3 TCP/IP TCP UDP IP
protocol for communication? sockets for communication
177 3 HDFS has how many services? 5 4 2 6 HDFS has five services.
____________is a data
HIVE is a data warehouse
warehouse software project
software project built on top of
178 3 built on top of Apache Hadoop Apache HIVE Apache Spark Apache-PIG Apache HBASE
Apache Hadoop for providing
for providing data query and
data query and analysis
analysis
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a
e.g 1 Write down question img.jpg Option a Option b Option c Option d a/b/c/d
1 Movie Recommendation systems are an example of: Classification Clustering Reinforcement Learning Regression b and c
2 Sentiment Analysis is an example of: Regression Classification Clustering Reinforcement Learning a,b and d
What is the minimum no. of variables/ features required to
0 1 2 3
3 perform clustering? b
Is it possible that Assignment of observations to clusters
Yes No Can’t say None of these
4 does not change between successive iterations in K-Means a
Assignment of observations
to clusters does not change Centroids do not change
Which of the following can act as possible termination For a fixed number of Terminate when RSS falls below a
between iterations. Except between successive
conditions in K-Means? iterations. threshold.
for cases with a bad local iterations.
5 minimum. a,b,c,d
Which of the following clustering algorithms suffers from the Agglomerative clustering Expectation-Maximization
K- Means clustering algorithm Diverse clustering algorithm
6 problem of convergence at local optima? algorithm clustering algorithm a and c
K-medians clustering K-modes clustering
Which of the following algorithm is most sensitive to outliers? K-means clustering algorithm K-medoids clustering algorithm
7 algorithm algorithm a
How can Clustering (Unsupervised Learning) be used to Creating an input feature for Creating an input feature
Creating different models for Creating an input feature for cluster
improve the accuracy of Linear Regression model cluster ids as an ordinal for cluster centroids as a
different cluster groups. size as a continuous variable.
8 (Supervised Learning): variable. continuous variable. a,b,c,d
What could be the possible reason(s) for producing two
different dendrograms using agglomerative clustering Proximity function used of data points used of variables used All of the above
9 algorithm for the same dataset? d
In which of the following cases will K-Means clustering fail to Data points with different Data points with round Data points with non-convex
Data points with outliers
10 give good results? densities shapes shapes a,b,and d
mputation with
Which of the following is/are valid iterative strategy for Nearest Neighbor
Imputation with mean Expectation Maximization All of the above
treating missing values before clustering analysis? assignment
11 algorithm c
In distance calculation it will You always get the same In Manhattan distance it
Feature scaling is an important step before applying K-Mean
give the same weights for all clusters. If you use or don’t is an important step but in None of these
algorithm. What is reason behind this?
12 features use feature scaling Euclidian it is not a
Which of the following method is used for finding optimal of
Elbow method Manhattan method Ecludian mehthod All of the above
13 cluster in K-Mean algorithm? a
K-means is extremely sensitive Bad initialization can lead to Bad initialization can lead
What is true about K-Mean Clustering?
14 to cluster center initializations Poor convergence speed to bad overall clustering None of these d
Which of the following can be applied to get good results for Try to run algorithm for Find out the optimal
Adjust number of iterations
15 K-means algorithm corresponding to global minima? different centroid initialization number of clusters None of these a,b,c
If you are using Multinomial mixture models with the
expectation-maximization algorithm for clustering a set of All the data points follow two All the data points follow n All the data points follow All the data points follow n
data points into two clusters, which of the assumptions are Gaussian distribution Gaussian distribution (n >2) two multinomial distribution multinomial distribution (n >2)
16 important: c
Which of the following is/are not true about Centroid based Both have strong
Both starts with random Expectation maximization algorithm
K-Means clustering algorithm and Distribution based Both are iterative algorithms assumptions that the data
initializations is a special case of K-Means
17 expectation-maximization clustering algorithm: points must fulfill d
For data points to be in a
It has strong assumptions for It has substantially high
Which of the following is/are not true about DBSCAN cluster, they must be in a It does not require prior knowledge
the distribution of data time complexity of order
clustering algorithm: distance threshold to a core of the no. of desired clusters
points in dataspace O(n3)
18 point b and c
Which of the following are the high and low bounds for the
[0,1] (0,1) [-1,1] None of the above
19 existence of F-Score? a
1. All of the following increase the width
20 of a confidence interval except: a. Increased confidence level b. Increased variability c. Increased sample size d. Decreased sample size c
c. The probability d. The probability of
a. The probability of b. The probability that the observed results observing results as extreme or
3The p-value in hypothesis testing represents failing to reject the null that the null hypothesis is are statistically significant, more extreme than currently
which of the following: Please select the best answer of hypothesis, given the true, given the observed given that the null observed, given that the null
21 those provided below. observed results results hypothesis is true hypothesis is true d
4. Assume that the difference between the
observed, paired sample values is defined in the same
manner and that the specified significance level is the same
for both hypothesis tests. Using the same data, the
statement that “a paired/dependent two sample t-test is
equivalent to a one sample t-test on the paired differences,
resulting in the same test statistic, same p-value, and same
conclusion” is: Please select the best answer of those
22 provided below. a. Always True b. Never True c. Sometimes True d. Not Enough Information a
19. Green sea turtles have normally
distributed weights, measured in kilograms, with a mean of
134.5 and a variance of 49.0. A particular green sea turtle’s
weight has a z-score of -2.4. What is the weight of this green
23 sea turtle? Round to the nearest whole number. a. 17 kg b. 151 kg c. 118 kg d. 252 kg c
What percentage of measurements in a dataset
24 fall above the median? a. 49% b. 50% c. 51% d. Cannot Be Determined d
24. The proportion of variation in 5k race
times that can be explained by the variation in the age of
competitive male runners was approximately 0.663. What is
the value of the sample linear correlation coefficient? Round
25 to 3 decimal places. a. 0.663 b. 0.814 c. -0.814 d. 0.440 c
a. Yes; linear b. Yes; both the c. No; linear
25. Using all of the results provided, is it correlation between age and sample linear regression correlation between age d. No; the age provided
reasonable to predict the 5k race time (minutes) of a 5k race times is statistically equation and an age in and 5k race times is not is beyond the scope of our
26 competitive male runner 73 years of age? significant years is provided statistically significant available sample data d
It uses machine-learning Science of making
Computational procedure
techniques. Here program can machines performs tasks
that takes some value as
Algorithm is learn from past experience that would require None of these
input and produces some
and adapt themselves to new intelligence when
value as output
27 situations performed by humans b
An approach to the design
of learning algorithms that
A class of learning algorithm is inspired by the fact that
Any mechanism employed
that tries to find an optimum when people encounter
by a learning system to
Bias is classification of a set of new situations, they often None of these
constrain the search space
examples using the explain them by reference
of a hypothesis
probabilistic theory to familiar experiences,
adapting the explanations
28 to fit the new situation. b
A measure of the accuracy,
A subdivision of a set of The task of assigning a
of the classification of a
Classification is examples into a number of classification to a set of None of these
concept that is given by a
classes examples
29 certain theory a
This takes only two values. In
Systems that can be used
general, these values will be 0 The natural environment of a
Binary attribute are without knowledge of None of these
and 1 and .they can be coded certain species
internal operations
30 as one bit a
Measure of the accuracy, of
A subdivision of a set of The task of assigning a
the classification of a
Classification accuracy is examples into a number of classification to a set of None of these
concept that is given by a
classes examples
31 certain theory b
Operations on a database to Symbolic representation of
Group of similar objects that
transform or simplify data in facts or ideas from which
Cluster is differ significantly from other None of these
order to prepare it for a information can potentially
objects
32 machine-learning algorithm be extracted a
A definition of a concept is-----if it recognizes all the instances
Complete Consistent Constant None of these
33 of that concept a
A definition or a concept is------------- if it classifies any
Complete Consistent Constant None of these
34 examples as coming within the concept b
A subject-oriented
The actual discovery phase of integrated time variant
The stage of selecting the
Data selection is a knowledge discovery non-volatile collection of None of these
right data for a KDD process
process data in support of
35 management b
A measure of the accuracy,
A subdivision of a set of The task of assigning a
of the classification of a
Classification task referred to examples into a number of classification to a set of None of these
concept that is given by a
classes examples
36 certain theory c
Decision support systems
Approach to the design of that contain an information
Combining different types of learning algorithms that is base filled with the
Hybrid is None of these
method or information structured along the lines of knowledge of an expert
the theory of evolution. formulated in terms of
37 if-then rules. a
An extremely complex
It is hidden within a database
The process of executing molecule that occurs in
and can only be recovered if
implicit previously unknown human chromosomes and
Discovery is one is given certain clues (an None of these
and potentially useful that carries genetic
example IS encrypted
information from data information in the form of
information).
38 genes. b
What could be the possible reason(s) for producing two
different dendrograms using agglomerative clustering Proximity function used of data points used of variables used All of the above
39 algorithm for the same dataset? d
Is it possible that Assignment of observations to clusters
Yes No Can’t say None of these
40 does not change between successive iterations in K-Means a
DA---Unit III
This clustering
algorithm terminates
when mean values
computed for the
current iteration of
This clustering algorithm terminates the algorithm are
when mean values computed for the identical to the
current iteration of the algorithm are computed mean expectation
identical to the computed mean values values for the K-Means conceptual maximizatio agglomerative
1 for the previous iteration previous iteration clustering clustering n clustering a
As the value
of one
attribute
As the value of increases the
one attribute value of the
The attributes decreases the second
The correlation coefficient for two are not value of the attribute The attributes
real-valued attributes is –0.85. What linearly second attribute also show a linear
2 does this value tell you? related. increases. increases. relationship b
Y is false X is true
Given a rule of the form IF X THEN Y, when X is Y is true when X when Y is X is false when
b
rule confidence is defined as the known to be is known to be known to be Y is known to
3 conditional probability that false. true. true be false.
Density based Hierarchical
clustering Partitioning Model based clustering d
4 Chameleon is algorithm based algorithm algorithm algorithm
5 Find odd man out DBSCAN K-Mean PAM None of above a
increases decreases with
with the size increase in size
increases with decreases with of the of the c
The number of iterations in apriori the size of the the increase in maximum maximum
6 ___________ data size of the data frequent set frequent set
Which of the following are
interestingness measures for b
7 association rules? Recall Lift Accuracy All of Above
2k – 1 2k – 2
candidate candidate 2k -2 candidate
c
Given a frequent itemset L, If |L| = k, association 2k candidate association association
8 then there are rules association rules rules rules
_________ is an example for case Neural Genetic K-nearest
d
9 based-learning Decision trees networks algorithm neighbor
The average positive difference mean
between computed and desired mean positive mean squared absolute root mean c
10 outcome values. error error error squared error
Superset of
both closed
frequent item
Superset of Superset of only Subset of sets and
only closed maximal maximal maximal
frequent item frequent item frequent frequent item
11 Frequent item sets is sets sets item sets sets d
Assume that we have a dataset
containing information about 200
individuals. A supervised data mining
session has discovered the following
rule: IF age < 30 & credit card
insurance = yes THEN life insurance b
= yes Rule Accuracy: 70% and Rule
Coverage: 63% How many
individuals in the class life insurance=
no have credit card insurance and are
12 less than 30 years old? 63 38 40 89
Which of the following is cluster Simple Grouping Labeled Query results
b
13 analysis? segmentation similar objects classification grouping
low intra
class
high inter similarity c
A good clustering method will class high intra class
14 produce high quality clusters with similarity similarity None of above
Min sup and
Which two parameters are needed for Min points and min Number of b
15 DBSCAN Min threshold eps confidence centroids
Both
techniques
build models
whose output
is determined Both models
by a linear require d
sum of numeric
weighted The output of attributes to Both models
Which statement is true about neural input both models is a range require input
network and linear regression attribute categorical between 0 attributes to be
16 models? values. attribute value. and 1. numeric.
In Apriori algorithm, if 1 item-sets are
100, then the number of candidate 2 c
17 item-sets are 100 200 4950 5000
Finding
Significant Bottleneck in the Apriori frequent Candidate Number of c
18 algorithm is itemsets Pruning generation iterations
are better
able to deal typically assume have trouble
Machine learning techniques differ with missing an underlying with are not able to a
from statistical techniques in that and noisy distribution for large-sized explain their
19 machine learning methods data the data datasets behavior.
The probability of a hypothesis before
a
20 the presentation of evidence. a priori posterior conditional subjective
21 KDD represents extraction of data knowledge rules model b
Outliers
should be
part of the The nature Outliers should
training of the be part of the
c
dataset but Outliers should problem test dataset but
should not be be identified determines should not be
Which statement about outliers is present in the and removed how outliers present in the
22 true? test data. from a dataset. are used training data.
23 The most general form of distance is Manhattan Eucledian Mean Minkowski d
High support High support Low support Low support
Which Association Rule would you and medium and low and high and low c
24 prefer confidence confidence confidence confidence
In a Rule based classifier, If there is a
rule for each combination of attribute
a
values, what do you called that rule Comprehens Mutually
25 set R Exhaustive Inclusive ive exclusive
To improve
To decrease the the
If a set cannot efficiency, do efficiency, If a set can pass
pass a test, its level-wise do level-wise a test, its a
supersets will generation of generation supersets will
also fail the frequent item of frequent fail the same
26 The apriori property means same test sets item sets test
If an item set ‘XYZ’ is a frequent item
set, then all subsets of that frequent c
27 item set are Undefined Not frequent Frequent Can not say
The probability that a person owns a
sports car given that they subscribe to
automotive magazine is 40%. We also
know that 3% of the adult population
subscribes to automotive magazine.
The probability of a person owning a
b
sports car given that they don’t
subscribe to automotive magazine is
30%. Use this information to compute
the probability that a person
subscribes to automotive magazine
28 given that they own a sports car 0.0368 0.0396 0.0389 0.0398
Simple regression assumes a __________
relationship between the input c
29 attribute and output attribute. quadratic inverse linear reciprocal
Both
Only minimum
minimum Neither support support and Minimum c
To determine association rules from confidence not confidence confidence support is
30 frequent item sets needed needed are needed needed
If {A,B,C,D} is a frequent itemset,
candidate rules which is not possible b
31 is C –> A D –>ABCD A –> BC B –> ADC
High support Low support Low support High support
Which Association Rule would you and low and high and low and medium b
32 prefer confidence confidence confidence confidence
Classification rules are extracted from
a
33 _____________ decision tree root node branches siblings
What does K refers in the K-Means
algorithm which is a non-hierarchical No of . number of d
34 clustering approach? Complexity Fixed value iterations clusters
If Linear regression model perfectly Test error is Couldn’t Test error is
first i.e., train error is zero, then also always Test error is non comment on equal to Train c
35 _____________________ zero zero Test error error
Which of the following metrics can be
used for evaluating regression
models? i)R Squared ii) Adjusted R d
Squared iii) F Statistics iv) RMSE/MSE/
MAE ii and iv i and ii ii, iii and iv i, ii, iii and iv
How many coefficients do you need to
estimate in a simple linear regression 1 2 3 4 b
37 model (One independent variable)?
In a simple linear regression model
(One independent variable), If we
change the input variable by 1 unit. d
How much output variable will
38 change? by 1 no change by intercept by its slope
In syntax of linear model
lm(formula,data,..), data refers to array vector list c
39 ______ Matrix
In the mathematical Equation of
Linear Regression Y = β1 + β2X + ϵ, (X-intercept, (Slope, (Y-Intercept, (slope, c
40 (β1, β2) refers to __________ Slope) X-Intercept) Slope) Y-Intercept)
DA---Unit IV
23 What are the five V’s of Big Data? Volume velocity Variety All of the above
d
24 _________ hides the limitations of Java behind
a powerful
Scalding Cascalog Hcatalog Hcalding
b
and concise Clojure API for Cascading.
25 What are the main components of Big Data? MapReduce HDFS YARN All of these
d
26 What are the different features of Big Data
Analytics?
Open-Source Scalability Data Recovery All the above
d
27 Define the Port Numbers for NameNode, Task NameNode
Tracker and
Task Tracker Job Tracker All of the above
d
Job Tracker.
28 Facebook Tackles Big Data With _______ based Project Prism
on Hadoop
Prism ProjectData ProjectBid
a
38 Heuristic is A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
b
different not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases simple SQL query.
paradigms
39 In an Internet context, this is the practice of
tailoring Web
a. Web services b. customer-facin c. client/server
g
d. personalizatio
n
d
pages to individual users’ characteristics or
preferences.
40 Heterogeneous databases referred to A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
a
different b not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases. simple SQL query.
paradigms
UNIT SUB : 410243 DA
TWO
Sr. Questions a b c d Ans
No.
1 Movie Recommendation systems are an example of: Classification Clustering Reinforcement Regression
Learning
b,c
3 0 1 2 3
What is the minimum no. of variables/ features
required to perform clustering?
b
27 Algorithm is It uses
machine-lear
Computation
al procedure
Science of
making
None of these
b
ning that takes machines
techniques. some value performs tasks
Here program as input and that would
can learn produces require
from past some value intelligence
experience as output when
and adapt performed by
themselves to humans
new
situations
28 Bias is A class of
learning
Any
mechanism
An approach to None of these
the design of
b
algorithm employed by learning
that tries to a learning algorithms that
find an system to is inspired by
optimum constrain the the fact that
classification search space when people
of a set of of a encounter new
examples hypothesis situations, they
using the often explain
probabilistic them by
theory reference to
familiar
experiences,
adapting the
explanations to
fit the new
situation.
29 Classification is A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
a
examples into of the classification to
a number of classification a set of
classes of a concept examples
that is given
by a certain
theory
30 Binary attribute are This takes
only two
The natural
environment
Systems that
can be used
None of these
a
values. In of a certain without
general, these species knowledge of
values will be internal
0 and 1 operations
and .they can
be coded as
one bit
32 Cluster is Group of
similar
Operations
on a
Symbolic
representation
None of these
a
objects that database to of facts or ideas
differ transform or from which
significantly simplify data information
from other in order to can potentially
objects prepare it for be extracted
a
machine-lear
ning
algorithm
33 A definition of a concept is-----if it recognizes all the Complete
instances of that concept
Consistent Constant None of these
a
34 A definition or a concept is------------- if it classifies
any examples as coming within the concept
Complete Consistent Constant None of these
b
38 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that carries
(an example useful genetic
IS encrypted information information in
information). from data the form of
genes.
24 Which Association Rule would you prefer High support High support Low support
and medium and low and high
Low support
and low
c
confidence confidence confidence confidence
27 If an item set ‘XYZ’ is a frequent item set, then all subsets of
that frequent item set are
Undefined Not frequent Frequent Can not say
c
28 0.0368 0.0396 0.0389 0.0398
The probability that a person owns a sports car given that
they subscribe to automotive magazine is 40%. We also
b
know that 3% of the adult population subscribes to
automotive magazine. The probability of a person owning a
sports car given that they don’t subscribe to automotive
magazine is 30%. Use this information to compute the
probability that a person subscribes to automotive magazine
given that they own a sports car
33 Classification rules are extracted from _____________ decision tree root node branches siblings
a
34 What does K refers in the K-Means algorithm which is a
non-hierarchical clustering approach?
Complexity Fixed value No of
iterations
. number of
clusters
d
35 If Linear regression model perfectly first i.e., train error is
zero, then _____________________
Test error is
also always
Test error is
non zero
Couldn’t
comment on
Test error is
equal to Train
c
zero Test error error
37 1 2 3 4
How many coefficients do you need to estimate in a simple
linear regression model (One independent variable)?
b
12 1 2 3 4
How many terms are required for building a bayes model?
c
13 Where does the bayes rule can be used? Solving
queries
Increasing
complexity
Decreasing
complexity
Answering
probabilistic
d
query
21 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that
(an example useful carries
IS encrypted information genetic
information). from data information
in the form of
genes.
22 Classification task referred to A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
c
examples into of the classification
a number of classification to a set of
classes of a concept examples
that is given
by a certain
theory
33 20 25 4 15
larger value is 60 and the smallest value is 40 and the
number of classes is 5 then the class interval is
c
35 the classification method in which the upper and lower limit exclusive
of interval is also in class interval itself is called…. method
inclusive
method
mid point
method
None of these
b
36 0.05 0.06 0.07 0.08
Suppose there are 25 base classifiers. Each classifier has
error rates of e = 0.35. Suppose you are using averaging as
b
ensemble of above 25 classifiers will make a wrong
prediction? Note: all classifiers are independent of each
other
37 The most widely used metrics and tools to assess a
classification model are:
Confusion
matrix
Cost-sensitive Area under
accuracy the ROC curve
All of Above
d
III. Patterns that exist in the data can be found more easily by
using a visualization
5 Point out the correct combination with regards to kind keyword ‘hist’ for
for graph plotting. histogram
‘box’ for
boxplot
‘area’ for
area plots
all of the
mentioned
d
6 Which of the following value is provided by kind keyword for
barplot?
bar bar bar none of the
mentioned
a
7 You can create a scatter plot matrix using the __________ method sca_matrix
in pandas.tools.plotting.
scatter_matri DataFrame.pl all of the
x ot mentioned
b
8 Plots may also be adorned with error bars or tables. True FALSE Cannot Tell All Above
a
9 Which of the following plots are often used for checking
randomness in time series?
Autocausation Autorank Autocorrelati none of the
on mentioned
c
29 information Visualtization techniques are Pie Chart Scatterplot Histogram Area Chart
a
30 Which of the following is category of timeline? Linear
Timeline
Modular
Timeline
Variant
Timeline
ER Timeline
a
34 Information Visualtization techniques are Flow Chart Time Line DFD All of above
d
35 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these
c
36 Data visualtization is realted with… Pictorial
representaion
numerical
representatio
numerical
calculations
None of these
a
s n
37 Which of the following follows interactive visualization
approach?
Zoom+Pan Focus+Contex
t
Overview+De all of above
tails
d
38 Which of the following are Use of data visualtization See context of Clear data
data
finding
understandin pattern in
all of above
d
g data
39 Which of the following specifies relationship amongst
variables?
Pie Chart Histogram Area Chart None of these
c
40 Which of the following specifies category Proportions? Pie Chart Scatter Plot Line Chart None of these
a
UNIT SUB : 410243 DA
SIX
Sr. No. Questions a b c d Ans
Which of the following is not a classification Logistic Random K-Means Naïve Bayes
32
techique? Regression Forest
c
Which of the following are components of HIVE? FLATTEN Thrift Server Muster All of above
33 b
Hadoop is a framework that works with a variety MapReduce, MapReduce, MapReduce, All of above
35
of related tools. Common cohorts include Hive and MySQL and Hummer and
a
____________ HBase Google Apps Iguana
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Which of the following is not an example of Social Media?
a. Twitter
1. b. Google
c. Insta
d. Youtube
By 2025, the volume of digital data will increase to
a. TB
2. b. YB
c. ZB
d. EB
For Drawing insights for Business what are need?
a. Collecting the data
3. b. Storing the data
c. Analysing the data
d. All the above
Does Facebook uses "Big Data " to perform the concept of Flashback? Is this True or
False.
4.
a. TRUE
b. FALSE
The Process of describing the data that is huge and complex to store and process is known
as
a. Analytics
5.
b. Data mining
c. Big Data
d. Data Warehouse
Data generated from online transactions is one of the example for volume of big data. Is
this true or False.
6.
a. TRUE
b. FALSE
Velocity is the speed at which the data is processed
7. a. TRUE
b. FALSE
have a structure but cannot be stored in a database.
a. Structured
8. b. Semi-Structured
c. Unstructured
d. None of these
refers to the ability to turn your data useful for business.
a. Velocity
9. b. Variety
c. Value
d. Volume
SUB : 410243 DA
There is only one operation between Mapping and Reducing is it True or False…
a. TRUE
20.
b. FALSE
is a type of local Reducer that groups similar data from the map phase
into identifiable sets.
a. MAPPER
30. b. REDUCER
c. COMBINER
d. PARTITIONER
While Installing Hadoop how many xml files are edited and list them ?
i. core-site.xml
ii. hdfs-site.xml
31.
iii. mapred.xml
iv. yarn.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
32.
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
</?xml >
33. Write the code for hdfs-site.xml ?
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Movie Recommendation systems are an example of
1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression
a. 2 Only
1.
b. 1 and 2
c. 1 and 3
d. 2 and 3
Sentiment Analysis is an example of
1. Regression 2. Classification 3. Clustering 4 Reinforcement Learning
a. 1, 2 and 4
2.
b. 1 and 3
c. 1, 2 and 3
d. 1 and 2
Can decision trees be used for performing clustering?
3. a. True
b. False
What is the minimum no. of variables/ features required to perform clustering?
1. 0
4. 2. 1
3. 2
4. 3
For two runs of K-Mean clustering is it expected to get same clustering results?
5. 1. Yes
2. No
Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations. 4.Terminate when RSS falls
6.
below a threshold.
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Which of the following algorithm is most sensitive to outliers?
1. K-means clustering algorithm
7. 2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
After performing K-Means Clustering analysis on a dataset, you observed the following
8.
dendrogram. Which of the following conclusion can be drawn from the dendrogram?
SUB : 410243 DA
9.
1. 1
2. 2
3. 3
4. 4
In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
10. 4. Data points with non-convex shapes
a. 1 and 2
b. 2 and 3
c. 2 and 4
d. 1, 2 and 4
The discrete variables and continuous variables are two types of
a. Open end classification
11. b. Time series classification
c. Qualitative classification
d. Quantitative classification
SUB : 410243 DA
Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a
12. hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
4. None of these
Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a
13.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given by a
14.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible
15.
solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. None of these
is good at handle missing data and support both the kind of
attributes ( i.e Categorial and Continuous attributes )
a. ID3.
16.
b. C4.5.
c. CART.
d. Naïve Bayes.
Decision trees use , in that they always choose the option
that seems the best available at that moment.
a. Greedy Algorithms.
17.
b. Divide and Conquer.
c. Backtracking.
d. Shortest Path Method.
Decision trees cannot handle categorical attributes with many distinct values, such as
country codes for telephone numbers.
18.
a. TRUE
b. FALSE
19. are easy to implement and can execute efficiently even without
SUB : 410243 DA
prior knowledge of the data, they are among the most popular algorithms for classifying
text documents.
a. ID3
b. Naïve Bayes classifiers
c. CART
d. None of these.
High entropy means that the partitions in classification are
a. Pure
20. b. Not pure
c. Useful
d. Useless
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
21. b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
The maximum value for entropy depends on the number of classes so if we have 8 Classes
what will be the max entropy.
22.
a. Max Entropy is 1
b. Max Entropy is 2
c. Max Entropy is 3
d. Max Entropy is 4
John flies frequently and likes to upgrade his seat to first class. He has determined that if
he checks in for his flight at least two hours early, the probability that he will get an
upgrade is 0.75; otherwise, the probability that he will get an upgrade is 0.35. With his
busy schedule, he checks in at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt. What is the
23.
probability that he did not arrive two hours early?
a. 0.892
b. 0.796
c. 0.685
d. 0.999
Point out the wrong statement.
a. k-nearest neighbor is same as k-means
24. b. k-means clustering is a method of vector quantization
c. k-means clustering aims to partition n observations into k clusters
d. none of the mentioned
Consider the following example “How we can divide set of articles such that those articles
have the same theme (we do not know the theme of the articles ahead of time) " is this:
25.
1. Clustering
2. Classification
3. Regression
4. None of These
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
metric is examined to determine a reasonably optimal value of
k.
1. Mean Square Error
1.
2. Within Sum of Squares (WSS)
3. Speed
4. None of These
If an itemset is considered frequent, then any subset of the frequent itemset must also be
frequent.
1. Apriori Property
2.
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 & 2
if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the
confidence of rule {bread,eggs}→{milk} is
1. 0
3.
2. 1
3. 2
4. 3
Confidence is a measure of how X and Y are really related rather than coincidentally
happening together.
4.
a. True
b. False
A high-confidence rule can sometimes be misleading because confidence does not consider
support of the itemset in the rule consequent. Is This True ?
5.
a. Yes
b. No
recommend items based on similarity measures between users and/or
items.
1. Content Based Systems
6.
2. Hybrid System
3. Collaborative Filtering Systems
4. None of These
SUB : 410243 DA
Answer
A
MCQ No - 2
What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
Answer
D
MCQ No - 3
What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
Answer
D
MCQ No - 4
According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like
Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
Answer
A
MCQ No - 5
What are the four V’s of Big Data?
(A) Volume
(B) Velocity
(C) Variety
(D) All the above
SUB : 410243 DA
Answer
D
(B) Real-time
(C) Java-based
Answer
B
MCQ No - 7
(B) Drill
(C) Oozie
Answer
A
MCQ No - 8
Answer
C
MCQ No - 9
SUB : 410243 DA
Answer
B
MCQ No - 10
(B) Both data and cost effective ways to mine data to make business sense out of it
Answer
B
The new source of big data that will trigger a Big Data revolution in the
years to come is
(A) Business transactions
(D) RDBMS
Answer
C
MCQ No - 12
(B) Row
(C) Event
SUB : 410243 DA
(D) Record
Answer
C
MCQ No - 13
Listed below are the three steps that are followed to deploy a Big Data
Solution except
(A) Data Ingestion
Answer
C
MCQ No - 14
Check below the best answer to "which industries employ the use of so-
called "Big Data" in their day to day operations?
(A) Weather forecasting
(B) Marketing
(C) Healthcare
Answer
D
MCQ No - 15
(B) False
Answer
A
SUB : 410243 DA
MCQ No - 16
Answer
A
MCQ No - 17
(B) 1970
(C) 1998
(D) 2005
Answer
C
MCQ No - 18
(B) Unstructured
(C) Processed
(D) Semi-Structured
Answer
C
MCQ No - 19
(B) Ad targeting
SUB : 410243 DA
(C) Scheduling optimization
Answer
D
MCQ No - 20
The feature of big data that refers to the quality of the stored data is
______
(A) Variety
(B) Volume
(C) Variability
(D) Veracity
Answer
D
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1
1) What is Big Data?
a) Huge amount of data
b) Small amount of data
c) Huge File
d) Big Storage
Ans: a
Explanation: It is Huge amount of data
2) According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a
Explanation: Big data management and data mining
3) What are the main components of Big Data?
a)MapReduce
b)HDFS
c)YARN
d)All of these
Ans: d
Explanation: All of these
4) The sources of Big Data are
a)Stock Exchange
b)Transport Data
c) Banking Data
d) All of the Above
Ans: d
Explanation:
5) Big Data Characteristics are:
a) Structured data
b) Semi-structured data
c) Quasi-structured data
d) All of the above
Ans: d
Explanation:
6) Bl tends to provide reports, dashboards, and queries on business
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: a
Explanation:
12) Select from option which is not the phase of data analytics
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) model planning
b) testing
c) discovery
d) operationalize
Ans: b
Explanation:
13) Which phase of data analytics require more time to complete
a) Data preparation
b) model building
c) communicate results
d) Discovery
Ans: a
Explanation:
14) What is analytic sandbox?
a) Tool
b) Separate repository
c) data cleaning
d) Data conditioning
Ans: b
Explanation:
15) The person which provides analytic techniques and modeling is called as.
a) Data Engineer
b) Data scientist
c) Business user
d) Project manager
Ans: b
Explanation:
16) What is task of Project manager?
a) analytic modelling
b) Provide requirement
c) ensure meeting objectives
d) creates DB environment
Ans: c
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Explanation:
17) Identifying Key Stakeholders this task is performed in which phase?
a) Data preparation
b) model building
c) Discovery
d) communicate results
Ans: c
Explanation:
18) ETL process is performed in which phase
a) Discovery
b) communicate results
c) model planning
d) Data preparation
Ans: d
Explanation:
19) How much data Data science teams prefer for analysis?
a) too little
b) average
c) more
d) more than average
Ans: c
Explanation:
20) select from option tool which is not used in model planning phase
a) Data wrangler
b) R
c) SQL Analysis service
d) SAS/ACESS
Ans: c
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
21) if reports and dashboards will be impacted and need to change this task is
performed by.
a) Project sponsor
b) BI Analyst
c) Data Engineer
d) Project manager
Ans: b
Explanation:
22) What is need of data analytic lifecycle.
a) Data cleaning
b) To solve Big data problems
c) Data conditioning
d) Data Exploration
Ans: b
Explanation:
23) How many phases are there in data analytic lifecycle?
a) 4
b) 5
c) 6
d) 7
Ans: c
24) The person with technical skills is called as?
a) Business user
b) Data Engineer
c) Data scientist
d) Project sponsor
Ans: b
25) What is outcome of Model building phase?
a) Analytic results
b) Quality data
c) Data
d) Potential resources
Ans: a
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Pravin S.Patil
Subject Teacher
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1I
1) 1. A statement made about a population for testing purpose is called?
a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic
Ans: b
Explanation:
2) If the assumed hypothesis is tested for rejection considering it to be true is
called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: a
Explanation:
3) A statement whose validity is tested on the basis of a sample is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: b
Explanation:
4) A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: c
Explanation:
5) If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
Ans: d
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
b) β
c) α
d) 1-β
Ans: c
Explanation:
13) Alternative Hypothesis is also called as?
a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis
Ans: b
Explanation:
14) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: d
Explanation:
15) Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Ans: c
Explanation:
16) Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Ans: a
Explanation:
17) Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Ans: a
Explanation:
18) Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Ans: b
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) True
b) False
Ans: a
20) Depending on acceptance and rejection of null hypothesis there are 2 types of
error produced
a) Type 1
b) Type 2
c) None of these
d) All of these
Ans: d
21) The power of a test can be defined as a possibility of …
a) Image Processing
b) Medical
c) Customer Segmentation
d) All of the above
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: d
Pravin S.Patil
Subject Teacher
Unit-I
View Answer
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data.
View Answer
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are
Volume, Velocity, Variety, Veracity, Value
View Answer
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the
bank.
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured.
View Answer
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
View Answer
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies.
7. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80%
B. 85%
C. 90%
D. 95%
View Answer
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past
two years is 90%.
8) Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Ans: Data Cleansing
10. Communicative and collaborative is one among the key skill sets and behavioral characteristics of a
data scientist [True / False]?
a. True
b. False
Answer : a
11. ---------- are the sources of Bigdata [select all that apply]
I. Book
II. Facebook
III. Genome sequence
IV. Video Surveillance
Ans:
12. BI analyses the past data and make future predictions True/False ?
a. True
b. False
Answer : b
Ans: Phase 2 Data preparation is done in this phase. An analytical sandbox is used in this to perform
analytics for the entire duration of the project. While you explore, preprocess and condition data,
modeling follows suit. To get the data into the sandbox, you will perform ETLT (extract, transform, load
and transform).
A. Discovery
B. Model Planning
C. Model Building
D. Data Preparation
Phase 2 — Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team
can work with data and perform analytics for the duration of the project. The team needs to execute
extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
14. In which phase would the team expect to invest most of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
15. In which phase would the team expect to invest least time of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
16. from following tools which tool is used for Model building?
Ans B
17. from following tools which tool is used for Data preparation
a. Alpine Miner b. Excel c. Matlab d.Weka
Ans . A
18. To determine if the project was completed on time and within budget, is the key role of _____
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
A. 3
B. 6
C. 7
D. Any
20. In data Analytics life cycle we can move back and refine the work done. True or False
A. True
B. False
A. PPT
B.report
C. code
D. All of above
22. ________ provides subject matter expertise for analytical techniques, data modeling and applying
valid analytical techniques to give business problems.
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
Unit-II
(a) Hypothesis
(d) Test-statistic
Answer : a
2. Any hypothesis which is tested for the purpose of rejection under the assumption that it is true is
called:
Answer : a
3. A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is
false is called:
Answer : d
Answer : b
6. If the critical region is located equally in both sides of the sampling distribution of test-statistic, the
test is called:
Answer : b
Answer : b
Answer : a
10. A formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
Answer : a
Answer : a
(a) Size of α
(b) Size of β
(c) Test-statistic
Answer : a
13. Student’s t-test is applicable only when:
Answer : a
14. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated t should be
obtained for:
Answer : d
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the conclusions (c) To
draw conclusions about populations from sample data
Answer : c
16. The histogram to the right represents the hospital length of stay (in days) for patients at a nearby
medical facility. How many patients are included in the histogram?
a. 5
b. 21
c. 17
d. 9
Answer : b
17. Using the histogram to the right that represents the hospital lengths of stay (in days) for patients at a
nearby medical facility, determine the relationship between the mean and the median.
a. Mean = Median
b. Mean ≈ Median
Answer : d
18. The statement “If there is sufficient evidence to reject a null hypothesis at the 10%
significance level, then there is sufficient evidence to reject it at the 5% significance level” :
a. Always True
b. Never True
c. Sometimes True; the p-value for the statistical test needs to be provided for a conclusion
d. Not Enough Information; this would depend on the type of statistical test used
Answer : c
a) ANOV
b) AVA
c) ANOVA
d) ANVA
Ans:c
20) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: defined distance metric, number of clusters, initial guess as to cluster centroids
25) Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1),
(-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
a) Yes
b) No
Ans: Yes
27) The most commonly used measure of similarity is the _____ or its square.
a)euclidean distance
b)city-block distance
c)Chebychev’s distance
d)Manhattan distance
Ans: euclidean distance
30) Clustering is a-
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. None
Ans: Unsupervised learning
31) Which of the following clustering algorithms suffers from the problem of convergence at local
optima?
A. K- Means clustering
B. Hierarchical clustering
C. Diverse clustering
D. All of the above
Ans: K- Means clustering, Hierarchical clustering, Diverse clustering
33) Which of the following is a bad characteristic of a dataset for clustering analysis-
A. Data points with outliers
B. Data points with different densities
C. Data points with non-convex shapes
D. All of the above
Ans: Data points with outliers, Data points with different densities, Data points with non-convex
shapes
34) For clustering, we do not require-
A. Labeled data
B. Unlabeled data
C. Numerical data
D. Categorical data
Ans: Labeled Data
a. Parametric
b. non parametric
c. Distributed
d. Normal
38. Input data for Wilcoxon test is normally distributed, True or False?
d. None of these
40 Which of following test statics is used in Wilcoxon Rank Sum Test?
d. none of these.
40. What must you include when applying Wilcoxon Rank sum test?
a. variance
b. Critical Value
c. Rank sum
e. standard deviation
a. False Positive
b. false negative
c. True Positive
d. True negative
a. False Positive
b. False negative
c. True Positive
d. True negative
ANOVA
a. Means
b. variance
c. standard Deviation
d. None of above.
b. F ratio
c. T-score
d. Chi Square
Q.25 What are the two types of variance which can occur in your data?
Q.26 If between group mean sum of square variability increases value of F statistics_____
a. Increases
b. Decreases
c. Neutral
d. None of these
Q.27 What must you include when applying ANOVA test?
a. Means
b. Critical Value
c. degree of freedom
d. F statistics
e. All of above
a.1
b.3
c.2
d.any
d.None of these
b.ANCOVA
c.MANOVA
d.ZANOVA
Unit-III
(A)Itemset
(B)Support
(C)Confidence
(D)Support Count
Ans:A
(A)Support
(B)Confidence
(C)Support Count
(D)Rules
Ans:C
3.An itemset whose support is greater than or equal to a minimum support threshold is ______
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Ans:B
(A)It mines all frequent patterns through pruning rules with lesser support
(B)It mines all frequent patterns through pruning rules with higher support
Ans:C
(B)Transaction Increases
(C)Sampling
(D)Cleaning
Ans:A
A) TRUE
B) FALSE
Ans:A
A) TRUE
B) FALSE
Ans:A
8.Which of the following methods do we use to find the best fit line for data in Linear
Regression?
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans:A
9. A local retailer has a database that stores 10,000 transactions of lastsummer. After
analyzing the data,a data science team has identified thefollowing statistics:• {battery}
appears in 6,000 transactions.• {sunscreen}appears in 5,000 transactions.• {sandals}
appears in 4,000 transactions.•{bowls} appears in 2,000 transactions.• {battery, sunscreen}
appears in1,500 transactions.• {battery, sandals} appears in 1,000 transactions.•{battery,
bowls} appears in 250 transactions.• {battery, sunscreen, sandals}appears in 600
transactions. Q) What are the confidence values of{battery}->{ sunscreen} and {battery,
sunscreen}->{ sandals} ?
a) 0.3 and 0.4
b) 0.25 and 0.4
c) 0.25 and 0.15
d) 0.6 and 0.4
Ans: b
a) Cor(X, Y) = 1
b) Cor(X, Y) = 0
c) Cor(X, Y) = 2
Ans:b
11. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
Ans:C
12.Which of the following metrics can be used for evaluating regression models?
i) R Squared
iii) F Statistics
b) i and ii
Ans:d
13.How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Ans:b
14.In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
Ans:d
a) lm(formula, data)
b) lr(formula, data)
c) lrm(formula, data)
d) regression.linear(formula, data)
Ans:a
16.In syntax of linear model lm(formula,data,..), data refers to ______
a) Matrix
b) Vector
c) Array
d) List
Ans:b
17.In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Ans:c
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
19.The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
Ans:b
20.Predicting y for a value of x that’s outside the range of values we actually saw for x in the
original data is called ___________
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:b
21.What is predicting y for a value of x that is within the interval of points that we saw in the
original data called?
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:c
22. ________ is a simple approach to supervised learning. It assumes that the dependence of Y
on X1, X2, . . . Xp is linear.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
23.Although it may seem overly simplistic, _______ is extremely useful both conceptually and
practically.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
24. __________ refers to a group of techniques for fitting and studying the straight-line
relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
Ans: c
Data Processing and Analysis
Unit 4
1. What is a hypothesis?
Answer: a
a. True
b. False
Answer: a
a. Concurring
b. Coding
c. Colouring
d. Segmenting
Answer: b
4. What is the cyclical process of collecting and analysing data
during a single research study called?
a. Interim analysis
b. Inter analysis
c. Inter-item analysis
d. Constant analysis
Answer: a
a. Typology
b. Diagramming
c. Enumeration
d. Coding
Answer: c
a. Can reduce time required to analyse data (i.e., after the data are
transcribed)
b. Help in storing and organising data
c. Make many procedures available that are rarely done by hand
due to time constraints
d. All of the above
Answer: d
7. Boolean operators are words that are used to create logical
combinations.
a. True
b. False
Answer: a
a. Categories
b. Units
c. Individuals
d. None of the above
Answer: a
a. Segmenting
b. Coding
c. Transcription
d. Mnemoning
Answer: c
Answer: a
11. Hypothesis testing and estimation are both types of descriptive
statistics.
a. True
b. False
Answer: b
a. True
b. False
Answer: a
13. A graph that uses vertical bars to represent data is called a ___
Answer: b
14. ___________ are used when you want to visually examine the
relationship between two quantitative variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
Answer: d
15. The denominator (bottom) of the z-score formula is
Answer: a
a. Normal Distribution
b. Chi-Squared Distribution
c. Gamma Distribution
d. Poisson Distribution
Answer b
a. Statistic
b. Hypothesis
c. Level of Significance
d. Test-Statistic
Answer: b
18. If the assumed hypothesis is tested for rejection considering it to
be true is called?
a. Null Hypothesis
b. Statistical Hypothesis
c. Simple Hypothesis
d. Composite Hypothesis
Answer: a
a. Null Hypothesis
b. Positive Hypothesis
c. Negative Hypothesis
d. Alternative Hypothesis.
Answer: d
a. Composite hypothesis
b. Research Hypothesis
c. Simple Hypothesis
d. Null Hypothesis
Answer: b
marks question A B C D ans
A group of 4 bits is also
0 1 Nibble Byte Kb None 4 bits make one nibble.
called?
There are how many types of
1 1 3 2 1 None Big Data is of 3 types.
Big Data:
Which of the following are the
2 1 All Volume Variety Velocity. This is an explaination.
V's of Big Data:
Which of these is not a
3 1 Storage Volume Variety Velocity. This is an explaination.
characterstic of Big data?
Which of the following is a Big Data requires high cost to
4 2 Cost Significant Process Fraud Detection
drawback of Big Data: maintain huge amount of data
GINA stands for Global
Global Innovation Network and Global Invention in Globally Investment in
5 2 Fullform of GINA is: None Innovations Networks and
Analysis. Networks and Analytics Neurons and Analytics
Analysis.
Which is the phase 3 in Data Model Planning is the 3rd phase
6 2 Model Planning Model Building Data Preparation Operationalize
Analytics Life cycle. in life cycle.
GINA team thought to GINA targeted to achieve three
7 2 3 2 1 5
accomplish mainly____ goals: goals for the project.
The Data Preparation stage
8 2 Analyzation Collection Cleansing Processing. This is an explaination.
doesn’t involve:
Unstructured Data is further Unstructured data is divided into
9 2 2 3 4 5
divided into how many types? 2 types.
The GINA team mainly used
The team used Tableau to
10 2 which software tool to analyze Tableau Hadoop HIVE SQL
visualize the Data.
the Data
Which of the follwing is the first
11 2 step of Data Analytics Life Discovery Data Preparation. Model Planning Data Aware This is an explaination.
Cycle:
There are how many phases in there are 6 stages in data
12 2 6 5 4 7
data analytics life cycle: analytics life cycle.
SEMMA Methodology has SEMMA methodology has five
13 2 5 4 6 7
how many stages: stages.
Which phase of Life Cycle
Phase 5 involves collaboration
14 2 requires collaboration with Phase 5 Phase 6 Phase 4 Phase 3
with stakeholders.
stakeholders?
In Building a Model, how many
15 2 2 3 4 5 This is an Explaination.
phases are required:
How much Data in the whole Only 20% of world's total data is
16 2 0.2 0.4 0.6 0.5
world is structured: structured.
10^7 bytes of memory is equal
17 2 1ZB 1TB 1YB 1XB 10^7 B is equal to 1 ZB.
to:
Data Scientists in the GINA
NLP technique was used on the
team used which technique on Natural Language
18 2 Hadoop HIVE SQL description of Innovation
the textual Description of the Processing(NLP)
Roadmap Idea.
Innovation Roadmap Idea.
How many types of data Two types of data anlytical
19 2 analytics methodologies are 2 4 3 6 methodologies are there. EDA
there? and CDA
Bell Curve is also known as
20 3 Other name for Bell Curve is: Normal Distribution. Poisson Distribution Bionomial Distribution Bernoulli Distribution.
normal distribution.
One of the most important tasks
One of the most important
21 3 Statical Modeling Testing of Data Visualization Operationalize in big data analytics is statistical
tasks in big data analytics is:
modeling
Some of the approaches
considered for building the data
22 3 All CRISP-DM SEMMA MAD Skills This is an explaination.
analytics lifecycle framework
best practices are:
In Phase 4, the team develops
23 3 All Testing of Data Training of Data Production purposes This is an explaination.
datasets for:
Cross International Company's Initial CRISP-DM stands for Cross
Fullform of CRISP-DM Cross Industry Standard Process Common Industry Standard
24 3 Standard Process for Standards Progress for Industry Standard Process for
Methodology is: for Data Mining Program for Data Mining
Data Modeling Data Methods Data Mining.
SEMMA Methodology
25 3 doesn’t include which of the Evaluate Sample Explore Asses This is an Explaination.
following stages:
In Which stage, the data is In last phase i.e. Opeartionalize
monitored and analyzed to see Data is monitored and analyzed
26 3 Operationalize Collection Plan Model Data Aware
if the generated model is to see if the generated model is
creating the expected results. creating the expected results.
Data is captured in how many
27 3 3 4 5 6 Data is captured in 3 main ways.
ways:
marks question A B C D ans
In phase 2 of the Data
The team performs ETL and
Anlaytics Life Cycle, the team
28 3 3 2 4 6 ELT and ETLT in 2nd phase of
performs how many analytics
the cycle.
to get the data in the sandbox.
The total area under the bell Area under the bell curve is 1
29 3 1 2 3 4
curve is____unit. unit.
Wilcoxon rank-sum test is also Wilcoxon rank-sum test is also
30 1 Mann-Whiteney U test Mean Difference Alternative Hypothesis Null Hypothesis
known as? called Mann- Whiteney U Test.
Which test is also known as T-
31 1 Hypothesis Test Mean Difference K-means test None This is an explaination.
test?
This eqn is of Mean difference
32 1 This equation is of which test? Mean Difference K-Means Null Hypothesis Alternative Hypothesis
test.
A test of a statistical A test of a statistical hypothesis,
hypothesis, where the region of where the region of rejection is
33 1 rejection is on a side of the One tailed test Two-tailed test Tailed test Null test on only one side of the sampling
sampling distribution, is distribution, is called a one-tailed
called___________. test
How many types of Statical There are two types of Statical
34 1 2 3 4 6
Hypothesis is there? Hypothesis.
Analysis of Variance is also ANOVA stands for Analysis of
35 1 ANOVA Mean Difference Alternative Hypothesis Null Hypothesis
refered as? Variance.
How many steps are involved There are 4 steps in Hypothesis
36 1 4 2 3 5
in a Hypothesis Testing? testing.
The strength of evidence in The strength of evidence in
37 2 support of a null hypothesis is P-value K-value H-value Null-value support of a null hypothesis is
measured by? measured by the P-value.
Difference in means is also Difference in means is also
38 2 Two sample t-test T- test M-test Two sample test
called? known as two sample t test.
The k-medoids is also The k-medoids is also called
Partitioning Around Medoids
39 2 called_______________ Lloyd's Algorithm Poisson's Algorithm Regression partitioning around medoids
(PAM)
algorithm. (PAM) algorithm .
Clustering is an example of Clustering is an example of
40 2 Unsupervised Learning Supervised Learning Classification Regression
____? unsupervised learning.
Which of the following is not an
41 2 advantage of K means Requires a Priori Fast Robust easy to evaluate. This is an explaination.
Clustering?
The probability of committing a The probability of committing a
42 2 Beta Alpha Delta Theta
Type 2 error is called Type II error is called Beta
The______ variation we have
The less variation we have within
within clusters, the more
clusters, the more homogeneous
43 2 homogeneous (similar) the data Less More Variable Fixed
(similar) the data points are
points are within the same
within the same cluster.
cluster.
Which hypothesis is usually the Null Hypothesis is usually the
hypothesis in which sample hypothesis that sample
44 2 Null-Hypothesis Mean Difference K-means test Alternative Hypothesis
observations result is purely observations result purely from
from chance? chance.
Classical" ANOVA for
Classical" ANOVA for balanced
45 2 balanced data does how many 3 2 1 4
data does three things at once.
things at once?
K-mean clustering is used to NP hard problems are solved
46 2 NP-hard problems NP Problems Hypothesis Problems P problems
solve which problems? using K means clustering.
The probability of committing a The probability of committing a
47 2 Alpha Beta Gama Delta
Type I error is called? Type I error is called alpha
K means Clustering is also K means clustering is also called
48 2 Lloyd's Algorithm Gaussian Algorithm Poisson's Algorithm None
known as? Lloyds algo.
Which algorithm requires the k-means clustering requires the
49 3 user to specify the number of K-means clustering Gaussian Algorithm Alternative Hypothesis Null Hypothesis user to specify the number of
clusters k to be generated. clusters k to be generated.
K means clsutering uses which expectation-maximization
50 3 approach to solve the Expectation-maximization Greedy Approach Divide and Conquer None technique is used by k means
problems? clustering.
How many factors affect the The power of a hypothesis test is
51 3 3 2 1 4
power of a hypothesis test? affected by three factors.
Law of variance is also called
52 3 Law of Variance is called? Eve's Law Laplace Law Poisson's Algorithm Regression
Eve's law.
K-Medoids use which K Medoids use greddy
53 3 Greedy Approach Divide and Conquer Recursive None
approach to solve problems? approach to solve problems
The time complexity of k Time complexity is O(n^2) of k
54 3 O(n^2) O(nlogn) O(n) O(1)
means clustering is? means clustering.
the number (k ) of clusters
The number k of clusters
55 3 assumed in k-medoids is Priori Null Hypothesis ANNOVA Effect size
assumed known as priori.
known as?
marks question A B C D ans
The effect size is the difference
What is the difference between
between the true value and the
56 3 the true value and the value Effect -size Null Hypothesis Alternative Hypothesis ANOVA
value specified in the null
specified in the null hypothesis.
hypothesis.
Time complexity of k medoids
57 3 O(n^2) O(nlogn) O(n) O(n^3) This is an explaination.
is?
Which algorithm aims at K means algorithm aims at
58 3 minimizing an objective function K-means Mean Difference Alternative Hypothesis ANOVA minimizing an objective function
know as squared error function know as squared error function
Which algorithm was the
Apriori Algorithm was earliest in
59 1 earliest of the association rule Apriori Algorithm Gaussian Algorithm K means clustering Bernoulli Distribution.
the association of algorithms.
algorithms?\n
The Apriori algorithm takes The Apriori algorithm takes a
a______ iterative approach to bottom-up iterative approach to
60 1 uncovering the frequent Bottom-Up Top-Down Recursive None uncovering the frequent itemsets
itemsets by first determining all by first determining all the
the possible items possible items
Apriori uses breadth-first search
Apriori uses which structure to
and a Hash tree structure to
61 1 count candidate item sets BFS DFS Queue Stack
count candidate item sets
efficiently?
efficiently
"y=a+b*x^2". This equation
62 1 Polynomial Regression Logistic Regreasion Linear Regression Lasso Regression This is an explaination.
shows which regression?
__________ is defined as the Confidence is defined as the
measure of certainty or measure of certainty or
63 2 Confidence Recursion Item-set None
trustworthiness associated with trustworthiness associated
each discovered rule. with\neach discovered rule.
In which Regression, we In Logistic Regression, we
64 2 Logistic Regression Linear Regression Both None
predict the value by 1 or 0? predict the value by 1 or 0.
The formula for linear The formula for linear regression
65 2 Y’ = bX+A Y’ = bX - A. Y’ = bX /A. Y’ = bX * A.
regression is: is: Y’ = bX + A.
Which regression is useful PLS regression is also useful
Partial Least Squares(PLS)
66 2 when there are a large number Cox Regression Lasso Regression Logistic Regression when there are a large number of
Regression
of independent variables. independent variables.
Which regression is an Simple linear regression is an
67 2 approach for predicting a Linear-Regression Logistic Regreasion Elasticnet Regression None approach for predicting a
response using a single feature. response using a single feature.
Association rule mining consists Association rule mining consists
68 2 2 3 4 5
of _______ steps. of 2 steps
Which type of regression is Ordinal regression is suitable
69 2 suitable when dependent Ordinal Regression Linear Regression Cox Regession Logistic Regression when dependent variable is
variable is ordinal in nature? ordinal in nature
Which regression is used for ElasticNet regression is used for
70 2 ElasticNet Regression Linear Regression Logistic Regression None
support vector machines support vector machines,
Which regression can solve Support-Vector Regession can
71 2 both linear and non-linear Support Vector Regression Linear Regression Logistic Regression ElasticNet Regression solve both linear and non linear
models? models.
Which is the most common Least Square Method is the most
72 2 method used for fitting a Least Square Method Mean Difference Null Hypothesis Classification common method used for fitting
regression line a regression line
_______problems are when A regression problem is when
73 2 the output variable is a real or Regression Classification Recursive Hypothesis the output variable is a real or
continuous value. continuous value.
Linear Regression is a machine
Linear Regression is a machine
learning algorithm based on
74 2 Supervised Learning Unsupervised Learning Recursive Learning All learning algorithm based on
______ learning regression
supervised regression algorithm.
model.
When dependent variable's
When dependent variable's
variability is not equal across
variability is not equal across
75 2 Heteroscedasticity Homooscedasticity Multicolinearity Outliers. values of an independent
values of an independent
variable, it is called
variable, it is called
heteroscedasticity
_________requires large Logistic Regression requires
sample sizes because maximum large sample sizes because
76 2 likelihood estimates are less Logistic Regression Linear Regression Lasso Regression ElasticNet Regression maximum likelihood estimates
powerful at low sample sizes are less powerful at low sample
than ordinary least square sizes than ordinary least square
PCR Regression is divided into PCR regression is divided into 2
77 2 2 3 4 5
how many steps? steps
78 3 L2 regularization is also called? Tikhonov Regularization Norm Regularization Poisson's Regularization None This is an explaination.
When the variance of count When the variance of count data
79 3 data is greater than the mean Overdispersion Underdispersion Dispersion High dispersion is greater than the mean count, it
count, it is a case of? is a case of overdispersion
marks question A B C D ans
Which regression assumes the Linear regression assumes the
80 3 normal distribution of the Linear-Regression Logistic Regreasion Elasticnet Regression None normal or gaussian distribution of
dependent variable? the dependent variable.
Nature of predicted data in Nature of predicted data in
81 3 Ordered Unordered Both None
regression is? regression is ordered.
Which regression uses a binary Logistic regression uses a binary
82 3 dependent variable but ignores Logistic Regression Linear Regression Cox Regession Lasso Regression dependent variable but ignores
the timing of events. the timing of events.
The Ridge Regression is also The ridge regression is also
83 3 Shrinkage Regression Percentile Regression Elasticnet Regression Lasso Regression
known as? known as Shrinkage Regression.
In which regression, we In Linear Regession we calculate
calculate Root Mean Square Root Mean Square
84 3 Linear-Regression ElasticNet Regression Logistic Regression All
Error(RMSE) to predict the Error(RMSE) to predict the next
next weight value. weight value.
The______ is the standard The residual standard error is the
85 3 deviation of the observed Residual standard error Mean Difference Error Data Error All standard deviation of
residuals. the\nobserved residuals.
Which Regression is used Poisson regression is used when
86 3 when dependent variable has Poisson Regression Linear Regression Cox Regession Lasso Regression dependent variable has count
count data. data.
________________regression
Quasi-Poisson regression can
can handle both over-
87 3 Quasi-Poisson regression Cox Regression Elasticnet Regression Linear Regression handle both over-dispersion and
dispersion and under-
under-dispersion.\n
dispersion.\n
___ is the regularization
λ is the regularization parameter
88 3 parameter in Lasso λ θ Ω β
in lasso regression.
Regression?
Decision Tree is a hierarchical Decision Tree is a hierarchical
model that does the separation model that recursively does the
89 1 Recursion Pointers Greedy Approach Divide and Conquer
of the\ninput space into class separation of the\ninput space
regions using: into class regions
Learning Algorithm of Decision Decision Tree uses greedy
90 1 Greedy Approach Divide and Conquer Both None
Tree is: approach for learning algorithm.
Normal Distribution is also
91 1 Gausiann Distribution Bernoulli Distribution Naïve Bias Binary Distribution This is an explaination.
called?
Classification has how many There are 2 phases of
92 1 2 3 4 5
phases: classification.
"Every pair of features being Naïve Bias uses the principle that
classified is independent of every pair of features being
93 1 Naïve Bais Classifier Decision Tree Bernoulli Distribution Normal Distribution
each other".This principle is classified is independent of each
used by: other.
This equation is of which
94 2 Gausiann Distribution Binary Distribution Naïve Bias Gross-Entrpoy This is an explaination.
theorem?
In Naïve Bias, The Datasets
data sets are divided into two
95 2 are divided into how many 2 3 4 5
types in naïve bias.
types?
Decision trees can be used to Decision trees can be used to
96 2 predict non-categorical values Regression Trees Categorial trees Normal tree None predict non-categorical values is
is called? called regression trees
An attribute with____Gini
an attribute with lower Gini index
97 2 index should be preferred in a Lower Higher Recursive Negative
should be preferred.
decision tree.
In Naïve Bias, if any two If any two events A and B are
98 2 events A and B are P(A,B)=P(A)P(B) P(A,B)=P(A)/P(B) P(A,B)=P(B) P(A,B)=P(B)P/(A) independent,
independent, then, then,P(A,B)=P(A)P(B)
What is the measure of
Entropy is the measure of
99 2 uncertainty of a random Entropy. Gain Gini Index None
uncertainty of a random variable
variable in a decision tree.
Which of the following is not
100 2 Stable Easy to understand Easy to explain Easy to evaluate. this is an explaination.
true for decision trees?
Decision tree algorithm falls Decision tree algorithm falls
101 2 under the category of which Supervised Unsupervised Regression Classification under the category of supervised
learning? learning
False Positives and False One of the use Bayes Theorem is
102 2 Negatives is an application of Bayes' Theorem Binary Distribution Bernoulli Distribution Normal Distribution false positives and false
which theorem? negatives.
Decision Tree used in mining
There are 2 types of decision
103 2 the data are of how many 2 3 4 5
trees used in data mining.
types?
In Bayes' Theorem, P(A) and
P(A) and P(B) are the
P(B) are the probabilities of
probabilities of observing A and
104 3 observing A and B Marginal Probability Normal Distribution Bernoulli Distribution Parallel Algorithm.
B respectively; they are known
respectively; they are known
as the marginal probability.
as:
marks question A B C D ans
ID3 Algorithm in a decision ID3 stands for Iterative
105 3 Iterative Dichotomiser 3 (ID3) Interval Driven Interconnected Decision None
tree stands for? Dichotomiser 3 (ID3)
Probably the best way of
Probably the best way of
estimating performance for very
106 3 estimating performance for Boot Strapped Method Normal Distribution Naïve Bias Binary Distribution
small data sets is bootstrapped
very small\ndata sets is:
method
The Decision Tree works on Decision Tree works on
107 3 Disjunctive Normal Form Product of Sum Bijective Form Conjuctive Form
which form? Disjunctive normal form.
The decoupling of the class The decoupling of the class
conditional feature distributions conditional feature distributions
108 3 means that each distribution 1-D 2-D 3-D NONE means that each distribution can
can be independently estimated be independently estimated as a
as a________ distribution. one dimensional distribution.
Theoretical concept to evaluate
109 3 COLT PAC Model Naïve Bias Prediction. This is an explaination.
Classfiers is:
____________is a metric to Gini Index is a metric to measure
measure how often a randomly how often a randomly chosen
110 3 Gini Index Entropy Pointer Gross-Entrpoy
chosen element would be element would be incorrectly
incorrectly identified identified
The most notable types of The most notable types of
111 3 3 2 1 4
decision tree algorithms are: decision tree algorithms are 3
Which process is completed The recursive partition is
when the subset at a node all completed when the subset at a
112 3 Recursive Partitioning Termination Transformation Prediction.
has the same value of the target node all has the same value of
variable? the target variable
The_______ method reserves The holdout method reserves a
113 3 a certain amount for testing and Holdout Parallel Algorithm Naïve Bias Normal Distribution certain amount\nfor testing and
uses the remainder for training. uses the remainder for training
This equation is of which
114 3 Bayes' Theorem Normal Distribution Bernoulli Distribution Gross-Entrpoy This is an explaination.
theorem?
"Independence among the Independence among the
115 3 features". This is an assumption Naïve Bais Classifier Bernoulli Distribution Parallel Algorithm Binary Distribution features is an assumption in
in: Naïve bias.
Error rate obtained from error rate obtained from training
116 3 Resubstitution Error Grid Gini Index True error
training data is called: data is called resubstitution error.
In Decision Tree entropy is
117 3 proportional inverse High Less This is an explaination.
__________ to content.
In Decision Tree, No root-to-
No root-to-leaf path should
leaf path should contain the
118 3 Twice Once Thrice Four Times. contain the same discrete
same discrete attribute
attribute twice
____________.
Using_________, designers
Using data visualization methods,
can make information
119 1 Data Visualization Classification Regression Supervised Learning. designers can make information
understandable for
understandable for stakeholders.
stakeholders.
The additional visual methods
120 1 All Tree Map Parallel Coordinates Semantic Networks. This is an explaination.
include:
Data Visualization tools
121 1 Ms--Excel Tableau Power BI Jupyter This is an explaination.
Doesn’t include:
Which of the following requires
122 1 Javascript Knowledge to run All Chart.js Polymap Sigmajs This is an explaination.
the visualization tool?
Merits of Tableau doesn’t Merits of tableau doesn’t include
123 1 Cost Performance Usage Computation
include which factor: the cost factor.
Which of these is not a type of
124 1 Pictograph Bar-Graph Line-Chart Pie-Chart This is an explaination.
Big Data Visualization.
The drag-and-drop editor od
The drag-and-drop editor of
which tool makes it easy to
Infogram makes it easy to create
125 2 create professional-looking Infogram Google Chart Tableau Grafana
professional-looking designs
designs without a lot of visual
without a lot of visual design skill.
design skill.
How many V's are defined for There are 4 V's of Data
126 2 4 6 2 3
Data Visualization. visualization.
Which of the following is not a Tableau is a chargeable tool of
127 2 Tableau Google Chart Jupyter Hub-Spot CRM
free Data Visualization tool? data visualization.
Companies that work with
Companies that work with both
both traditional and big data
traditional and big data may use
128 2 use which technique to look at Pie-Chart Bar-Graph Stream graph Line-Chart
pie chart to look at customer
customer segments or market
segments or market shares
shares?
Visualization of Data includes
129 2 which of the following All Information Loss Visual Noise Large Image Perception. This is an explaination.
problems:
Mainly, Data Visualization has There are 5 main challenges to
130 2 5 6 4 2
how many types of challenges? data visualization.
marks question A B C D ans
Google charts uses
Which tool uses HTML5/SVG
131 2 Google Charts Jupyter Grafana Tableau HTML5/SVG since its browser
to visualize data
compatible.
According to Colin Ware’s According to Colin Ware’s
Information Visualization: Information Visualization:
132 2 Perception for Design, he 4 2 1 3 Perception for Design, he defines
defines_____ pre-attentive four pre-attentive visual
visual properties. properties
_____ is based on space-filling Tree map method is based on
133 2 visualization of hierarchical Tree-Map Stream graph Bar-graph Line-Chart space-filling visualization of
data. hierarchical data
Which graph shows the Gantt chart show the
dependency relationships dependency relationships
134 2 Gantt-Chart Line-Chart Pie-Chart Bar-Graph
between activities and current between activities and current
schedule status. schedule status.
Another name for distribution Non parametric data is also
135 2 Non parametric data Parametric Data static data Dynamic data
free data is: called distribution free data.
Which chart is used for Bar Graph is used for
comparison of values, such as Comparison of values, such as
136 2 sales performance for several Bar-Graph Gantt-Graph Line-Chart Pie-Chart sales performance for several
persons or businesses in a persons or businesses in a single
single time. time
Graphical Techniques are
_____________are graphics
graphics in the field of statistics
137 2 in the field of statistics used to Graphical-Techniques Line-Chart Regression Classification
used to visualize quantitative
visualize quantitative data.
data.
_____ can handle several Parallel Coordinates can handle
factors for a large number of several factors for a large
138 2 objects per single screen, so it Parallel Coordinates Stream graph Google Chart Jupyter number of objects per single
satisfies the data variety screen, so it satisfies the data
criterion. variety criterion
Chart.js provides how many
139 3 8 5 3 6 This is an explaination.
types of charts?
Which visualization tool
Grafana supports mixed data
supports mixed data sources,
sources, annotations, and
annotations, and customizable
140 3 Grafana Tableau Google Chart Jupyter customizable alert functions, and
alert functions, and it can be
it can be extended via hundreds
extended via hundreds of
of available plugins.
available plugins.
Which tool was created Datawrapper was created
141 3 specifically for adding charts Data Wrapper Tableau Google Chart Jupyter specifically for adding charts and
and maps to news stories. maps to news stories.
Conventional Visualization Mekko chart is a new technique
142 3 Mekko Chart Pie-Chart Bar-graph Histogram
methods doesn’t include: to visualize data.
_____________ is a type of a Streamgraph is a type of a
stacked area graph, which is stacked area graph, which is
143 3 displaced around a central axis, Streamgraph Bar-Graph Pie-Chart Line-Chart displaced around a central axis,
resulting in flowing and organic resulting in flowing and organic
shape. shape
Which visual tool includes over
Fusion charts includes over 150
144 3 150 chart types and 1,000 Fusion charts Tableau Google Chart Jupyter
chart types and 1,000 map types
map types?
Which graph/chart is a
A semantic network is a
graphical representation of
graphical representation of
logical relationship between
logical relationship between
different concepts. It generates
145 3 Semantic Networks Bar-Graph Pie-Chart Line-Chart different concepts. It generates
directed graph, the
directed graph, the combination
combination of nodes or
of nodes or vertices, edges or
vertices, edges or arcs, and
arcs, and label over each edge
label over each edge.
According to SAS we can According to SAS we can
process only______ of process only 1 kilobit of
146 3 1 Kilobit 1 Byte 1 Bit 1 MB
information per second on a information per second on a flat
flat screen. screen
There are____ steps for
147 3 4 5 3 6 This is an explaination.
interactive data visualization:
When working with big data, When working with big data,
companies can use which companies can use the line chart
visualization technique to track visualization technique to track
148 3 total application clicks by Line-Chart Bar-Graph Pie-Chart Stream graph total application clicks by weeks,
weeks, the average number of the average number of
complaints to the call center by complaints to the call center by
months, etc.\n\n months, etc.\n\n
Which of the following
149 1 All Facebook Netflix Adobe This is an explaination.
Enterprises use HBase?
marks question A B C D ans
Which NLP is used in the From 2010, Neural NLP is
150 1 Neural NLP Symbolic NLP Statical NLP None
present era? being used.
The Computer World magazine The Computer World magazine
states that unstructured states that unstructured
151 1 information might account for 70-80% 0.9 0.5 0.6 information might account for
more than______of all data in more than 70%–80% of all data
organizations. in organizations.
Almost all of the information Almost all of the information we
we use and share every day, use and share every day, such as
152 1 such as articles, documents and Unstructured Structured Semantic None articles, documents and e-mails,
e-mails, are are completely or partly
completely___________. unstructured
The Unstructured Information
Which standard provided a Management Architecture
common framework for (UIMA) standard provided a
Unstructured Information
processing information to Management common framework for
153 1 Management Architecture Data Architecure None
extract meaning and create Architecture for Data processing this information to
(UIMA)
structured data about the extract meaning and create
information? structured data about the
information.
The base Apache Hadoop The base Apache Hadoop
154 2 framework is composed of the 4 2 3 6 framework is composed of the
how many modules? four modules.
No-SQL doesn’t include
155 2 MS-SQL HBASE DyanoDB MongoDB This is an explaination.
which software?
There are _______main types There are 3 types of OLAP
156 2 3 2 5 6
of OLAP systems. systems.
SQL alternative in Apache HIVE-QL is the alternative to
157 2 HIVEQL BASEQL SPARK-QL H-QL
HIVE is called? SQL in Apche Hive family.
MapReduce program executes MapReduce program executes in
158 2 3 2 5 4
in how many stages? three stages.
How many types of NO-SQL There are 4 types of databases in
159 2 4 3 2 6
database are there? NO-SQL.
MapReduce is a processing
MapReduce is a processing
technique and a program
technique and a program model
160 2 model for distributed JAVA Python C++ R
for distributed computing based
computing based on which
on java
programming Language?
Hive supports how many Hive supports all four properties
161 2 4 3 2 1
properties of transactions? of transactions
HDFS consists of only one
HDFS consists of only one
162 2 Master Node Slave Node Both None Name Node that is called the
Name Node that is called as?
Master Node.
Which Apache Software is
needed to process massive Hbase to process massive
163 2 amounts of data for the Apache HBASE Apache Spark Apache-PIG Apache-mahout amounts of data for the purposes
purposes of natural-language of natural-language search
search?
Which database store data in a No-sql databases that store data
164 2 format other than relational NO-SQL HIVESQL SPARK-QL H-QL in a format other than relational
tables tables.
Which is a project of the Mahout is a project of the
Apache Software Foundation Apache Software Foundation to
to produce free produce free implementations of
165 2 implementations of distributed Apache Mahout Apache Spark Apache-PIG Apache HBASE distributed or otherwise scalable
or otherwise scalable machine machine learning algorithms
learning algorithms focused focused primarily on linear
primarily on linear algebra? algebra.
MapReduce model is a
Which model is a specialization
specialization of the split-apply-
166 2 of the split-apply-combine MapReduce Hadoop HBASE HIVE
combine strategy for data
strategy for data analysis?
analysis.
All Hadoop commands are
All Hadoop commands are invoked by the
167 2 $HADOOP_HOME/bin/hadoop $HADOOP/bin/hadoop $HADOOP_HOME/hadoop $HADOOP_HOME/bin
invoked by which command? $HADOOP_HOME/bin/hadoop
command
The table typically enforces the The table typically enforces the
schema when the data is schema when the data is loaded
loaded into the table. This into the table. This enables the
enables the database to make database to make sure that the
168 3 sure that the data entered Schema on Write Schema on Read Schema for Read Write None data entered follows the
follows the representation of representation of the table as
the table as specified by the specified by the table definition.
table definition. This design is This design is called schema on
called? write.
marks question A B C D ans
Which command formats the Namenode -format command
169 3 Namenode -format Node -format Name -format Format
DFS filesystem? formats the DFS file system.
Which command applies the
oiv applies the offline fsimage
170 3 offline fsimage viewer to an oiv fs fc ov
viewer to an fsimage.
fsimage?
Hadoop requires which Java
Hadoop requires Java Runtime
171 3 Runtime Environment (JRE) or 1.6 1.2 1.5 1
Environment (JRE) 1.6 or higher
higher version?
Every Data node sends a
Every Data node sends a
Heartbeat message to the
Heartbeat message to the Name
172 3 Name node every____ 3 2 4 1
node every 3 seconds and
seconds and conveys that it is
conveys that it is alive
alive.
HDFS can store upto1 TB of
173 3 HDFS can store files upto: 1 TB 1 GB 1ZB 1PB
files.
Which of the following is a HBASE is a popular wide
174 3 HBase SQL DyanoDB MongoDB
wide-column store? columnn store.
Which node acts as both a A slave or worker node acts as
175 3 DataNode and TaskTracker in Slave Node Data Node Admin Node Name Node both a DataNode and
Hadooop. TaskTracker.
HDFS system uses which HDFS system uses TCP/IP
176 3 TCP/IP TCP UDP IP
protocol for communication? sockets for communication
177 3 HDFS has how many services? 5 4 2 6 HDFS has five services.
____________is a data
HIVE is a data warehouse
warehouse software project
software project built on top of
178 3 built on top of Apache Hadoop Apache HIVE Apache Spark Apache-PIG Apache HBASE
Apache Hadoop for providing
for providing data query and
data query and analysis
analysis
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:03
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
C - Can process data faster under the same network bandwidth as compared to HPC.
Q 5 - Which of the following is true for disk drives over a period of time?
B - Data Seek time is improving more slowly than data transfer rate.
C - Data Seek time and data transfer rate are both increasing proportionately.
D - Only the storage capacity is increasing without increase in data transfer rate.
A - Solr
B - Tez
C - Spark
D - Hive
C - Occupies only the size it needs and not the full block.
D - Can span over multiple blocks.
Q 10 - HDFS block size is larger as compared to the size of the disk blocks so that
D - A single file larger than the disk size can be stored across many disks in the cluster.
Q 11 - In a Hadoop cluster, what is true for a HDFS block that is no longer available
due to disk corruption or machine failure?
C - The namenode allows new client request to keep trying to read it.
D - The Mapreduce job process runs ignoring the block and the data stored in it.
Q 12 - Which utility is used for checking the health of a HDFS file system?
A - fchk
B - fsck
C - fsch
D - fcks
Q 13 - Which command lists the blocks that make up each file in the filesystem.
D - None
Q 15 - In the local disk of the namenode the files which are stored persistently are −
D - None of these
A - Take backup of filesystem metadata to a local disk and a remote NFS mount.
Q 19 - For the frequently accessed HDFS files the blocks are cached in
C - Both A&B
D - In the memory of the client application which requested the access to these files.
C - Failure of one namenode causes loss of some metadata availability from the entire
filesystem.
B - To reduce the cycle time required to bring back a new primary namenode after existing
primary fails.
A - When a client request comes, one of them chosen at random serves the request.
B - One of them is active while the other one remains powered off.
B - Preventing the start of a failover in the event of network failure with the active namenode.
C - Preventing the power down to the previously active namenode.
D - STONITH
Q 28 - The property used to set the default filesystem for Hadoop in core-site.xml is-
A - filesystem.default
B - fs.default
C - fs.defaultFS
D - hdfs.default
A-1
B-2
C-3
D-4
A-2
B-1
C-0
D-3
B - Zero
C-3
B - Renaming
C - Moving
D - Executing.
ANSWER SHEET
1 C
2 A
3 D
4 B
5 B
6 C
7 C
8 B
9 C
10 D
11 B
12 B
13 A
14 B
15 A
16 C
17 A
18 D
19 A
20 C
21 C
22 B
23 B
24 D
25 B
26 D
27 C
28 B
29 C
30 B
31 D
32 D
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
D - HDFS ftp
C - You can edit a existing record in HDFS file which is already mounted using NFS.
D - gets both the data and block location from the namenode
Q 4 - Which scenario demands highest bandwidth for data transfer between nodes in
Hadoop?
Q 5 - The current block location of HDFS where data is being written to,
A - Optimal Scheduler
B - FIFO scheduler
C - Capacity scheduler
D - Fair scheduler
D - Fully-Distributed mode
A - C++
B - Python
C - Java
D - GO
Q 10 - The hdfs command to create the copy of a file from a local system is
A - CopyFromLocal
B - copyfromlocal
C - CopyLocal
D - copyFromLocal
D - The number of replicated copies is less than as specified by the replication factor.
Q 13 - When the namenode finds that some blocks are over replicated, it
A - Replication factor
A - Replication factor
A - Replication factor
A - Replication factor
A - Jsp
B - Jps
C - Hadoop fs –test
D - None
Q 19 - The information mapping data blocks with their corresponding files is stored in
A - Data node
B - Job Tracker
C - Task Tracker
D - Namenode
Q 20 - The file in Namenode which stores the information mapping the data block
location with file name is −
A - dfsimage
B - nameimage
C - fsimage
D - image
Q 21 - The namenode knows that the datanode is active using a mechanism known as
A - heartbeats
B - datapulse
C - h-signal
D - Active-pulse
B - Commodity grade
Q 24 - Which of the below apache system deals with ingesting streaming data to
hadoop
A - Ozie
B - Kafka
C - Flume
D - Hive
A - The average size of the data blocks used as input for the program
B - The location details of where the first whole record in a block begins and the last whole
record in the block ends.
C - Splitting the input data to a MapReduce program into a size already configured in the
mapred-site.xml
D - None of these
B - The Key-value pair of all the records from the input split processed by the mapper
B - Report the edit log information of the blocks in the data node.
Q 28 - The Zookeeper
A - The namenode updates the mapping between file name and block name
B - The namenode need not update mapping between file name and block name
Q 30 - When a client contacts the namenode for accessing a file, the namenode
responds with
C - Block ID and hostname of any one of the data nodes containing that block.
D - Block ID and hostname of all the data nodes containing that block.
Q 32 - The Hadoop tool used for uniformly spreading the data across the data nodes is
named −
A - Scheduler
B - Balancer
C - Spreader
D - Reporter
ANSWER SHEET
1 B
2 A
3 C
4 C
5 D
6 A
7 B
8 B
9 C
10 D
11 B
12 D
13 C
14 B
15 A
16 C
17 D
18 B
19 D
20 C
21 A
22 A
23 B
24 C
25 B
26 B
27 B
28 A
29 B
30 D
31 D
32 B
33 A
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
B - Check if the fsimage file is in sync between namenode and secondary namenode
C - Merges the fsimage and edit log and uploads it back to active namenode.
D - Rack awareness
A - it is lost forever
C - It becomes hidden from the user but stays in the file system
A - REST API
B - RPC
C - RMI
D - IP Exchange
A - Structred
B - Semi-structured
C - Unstructured
B - 3 Physical machines
C - 4 Physical machines
D - 1 Physical machine
A - read
B - deleted
C - executed
D - Archived
Q 15 - hadoop fs –expunge
A - getmerge
B - putmerge
C - remerge
D - mergeall
A - changerep
B - rerep
C - setrep
D - xrep
Q 18 - The comman used to copy a directory form one node to another in HDFS is
A - rcp
B - dcp
C - drcp
D - distcp
A - .hrc
B - .har
C - .hrh
D - .hrar
A - unrar
B - unhar
C - cp
D - cphar
Q 23 - When you increase the number of files stored in HDFS, The memory required by
namenode
A - Increases
B - Decreases
C - Remains unchanged
Q 24 - If we increase the size of files stored in HDFS without increasing the number of
files, then the memory required by namenode
A - Decreases
B - Increases
C - Remains unchanged
Q 27 - You can reserve the amount of disk usage in a data node by configuring the
dfs.datanode.du.reserved in which of the following file
A - Hdfs-site.xml
B - Hdfs-defaukt.xml
C - Core-site.xml
D - Mapred-site.xml
Q 28 - The namenode loses its only copy of fsimage file. We can recover this from
A - Datanodes
B - Secondary namenode
C - Checkpoint node
D - Never
Q 29 - In a HDFS system with block size 64MB we store a file which is less than 64MB.
Which of the following is true?
B - Not limited
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
ANSWER SHEET
1 C
2 A
3 B
4 B
5 B
6 A
7 B
8 D
9 B
10 A
11 A
12 C
13 D
14 C
15 D
16 A
17 C
18 D
19 B
20 C
21 D
22 D
23 A
24 A
25 C
26 B
27 A
28 C
29 C
30 A
31 C
32 A
33 B
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
A - Namenode
B - Datanode
C - Secondary namenode
D - Secondary datanode
A - Balanced scheduler
B - Fair scheduler
C - Capacity scheduler
D - FiFO schesduler.
A - The default input format is xml. Developer can specify other input formats as appropriate if
xml is not the correct input.
B - There is no default input format. The input format always should be specified.
C - The default input format is a sequence file format. The data needs to be preprocessed before
using the default input format.
D - The default input format is TextInputFormat with byte offset as a key and entire line as a
value.
A - Velocity
B - Veracity
C - volume
D - variety
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
Q 10 - Which of the following technologies is a document store database?
A - HBase
B - Hive
C - Cassandra
D - CouchDB
A - It is a distributed framework.
A - Name node
B - Data node
C - Master node
D - None of these
A - Name node
B - Data node
C - slave node
D - None of these
Q 14 - What is AVRO?
A - Yes, Avro was specifically designed for data processing via Map-Reduce.
D - Avro specifies metadata that allows easier data access. This data cannot be used as part of
map-reduce execution, rather input specification only.
A - The distributed cache is special component on name node that will cache frequently used
data for faster client response. It is used during reduce step.
B - The distributed cache is special component on data node that will cache frequently used data
for faster client response. It is used during map step.
D - The distributed cache is a component that allows developers to deploy jars for Map-Reduce
processing.
Q 17 - What is writable?
A - Writable is a java interface that needs to be implemented for streaming data to remote
servers.
Q 18 - What is HBASE?
B - Hbase is a part of the Apache Hadoop project that provides interface for scanning large
amount of data using Hadoop infrastructure.
D - HBase is a part of the Apache Hadoop project that provides a SQL like interface for data
processing.
B - Hadoop was specifically designed to process large amount of data by taking advantage of
MPP hardware.
C - Hadoop ships the code to the data instead of sending the data to the code.
D - Hadoop uses sophisticated caching techniques on name node to speed processing of data.
Q 20 - When using HDFS, what occurs when a file is deleted from the command line?
B - It is placed into a trash directory common to all users for that cluster.
C - It is permanently deleted and the file attributes are recorded in a log file.
D - It is moved into the trash directory of the user who deleted it if trash is enabled.
Q 21 - When archiving Hadoop files, which of the following statements are true?
Choosetwoanswers
3. MapReduce processes the original files names even after files are archived.
4. Archived files must be UN archived for HDFS and MapReduce to access the
original, small files.
5. Archive is intended for files that need to be saved but no longer accessed by
HDFS.
A-1&3
B-2&3
C-2&4
D-3&4
Q 22 - When writing data to HDFS what is true if the replication factor is three?
Choose2answers
2. The Data is stored on each DataNode with a separate file which contains a
checksum value.
4. The Client is returned with a success upon the successful writing of the first
block and checksum check.
A-1&3
B-2&3
C-3&4
D-1&4
Q 23 - Which of the following are among the duties of the Data Nodes in HDFS?
A - Maintain the file system tree and metadata for all files and directories.
Q 24 - Which of the following components retrieves the input splits directly from
HDFS to determine the number of map tasks?
A - The NameNode.
B - The TaskTrackers.
C - The JobClient.
D - The JobTracker.
A-1&4
B-2&3
C-3&4
D-2&4
Q 27 - Which one of the following statements is false regarding the Distributed Cache?
A - The Hadoop framework will ensure that any files in the Distributed Cache are distributed to all
map and reduce tasks.
B - The files in the cache can be text files, or they can be archive files like zip and JAR files.
D - The Hadoop framework will copy the files in the Distributed Cache on to the slave node
before any tasks for the job are executed on that node.
A - Region Server.
B - Nagios.
C - ZooKeeper.
D - Master Server.
A - HDFS.
B - Task Tracker.
C - Job Tracker.
D - Name Node.
E - Data Node.
Q 31 - Keys from the output of shuffle and sort implement which of the following
interface?
A - Writable.
B - WritableComparable.
C - Configurable.
D - ComparableWritable.
E - Comparable.
B - Output of the mapper and output of the combiner has to be same key value pair and they can
be heterogeneous
C - Output of the mapper and output of the combiner has to be same key value pair. Only if the
values satisfy associative and commutative property it can be done.
ANSWER SHEET
1 A
2 B
3 A
4 A
5 D
6 B
7 A
8 B
9 C
10 D
11 D
12 B
13 A
14 A
15 A
16 B
17 C
18 B
19 C
20 C
21 B
22 C
23 D
24 D
25 A
26 B
27 C
28 B
29 C
30 D
31 B
32 C
Loading [MathJax]/jax/output/HTML-CSS/jax.js
Seat No -
Total number of questions : 60
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a
23 What are the five V’s of Big Data? Volume velocity Variety All of the above
d
24 _________ hides the limitations of Java behind
a powerful
Scalding Cascalog Hcatalog Hcalding
b
and concise Clojure API for Cascading.
25 What are the main components of Big Data? MapReduce HDFS YARN All of these
d
26 What are the different features of Big Data
Analytics?
Open-Source Scalability Data Recovery All the above
d
27 Define the Port Numbers for NameNode, Task NameNode
Tracker and
Task Tracker Job Tracker All of the above
d
Job Tracker.
28 Facebook Tackles Big Data With _______ based Project Prism
on Hadoop
Prism ProjectData ProjectBid
a
38 Heuristic is A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
b
different not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases simple SQL query.
paradigms
39 In an Internet context, this is the practice of
tailoring Web
a. Web services b. customer-facin c. client/server
g
d. personalizatio
n
d
pages to individual users’ characteristics or
preferences.
40 Heterogeneous databases referred to A set of
databases from
An approach to a
problem that is
Information that is None of these
hidden in a
a
different b not guaranteed to database and that
vendors, possibly work but cannot be
using different performs well in recovered by a
database most cases. simple SQL query.
paradigms
UNIT SUB : 410243 DA
TWO
Sr. Questions a b c d Ans
No.
1 Movie Recommendation systems are an example of: Classification Clustering Reinforcement Regression
Learning
b,c
3 0 1 2 3
What is the minimum no. of variables/ features
required to perform clustering?
b
27 Algorithm is It uses
machine-lear
Computation
al procedure
Science of
making
None of these
b
ning that takes machines
techniques. some value performs tasks
Here program as input and that would
can learn produces require
from past some value intelligence
experience as output when
and adapt performed by
themselves to humans
new
situations
28 Bias is A class of
learning
Any
mechanism
An approach to None of these
the design of
b
algorithm employed by learning
that tries to a learning algorithms that
find an system to is inspired by
optimum constrain the the fact that
classification search space when people
of a set of of a encounter new
examples hypothesis situations, they
using the often explain
probabilistic them by
theory reference to
familiar
experiences,
adapting the
explanations to
fit the new
situation.
29 Classification is A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
a
examples into of the classification to
a number of classification a set of
classes of a concept examples
that is given
by a certain
theory
30 Binary attribute are This takes
only two
The natural
environment
Systems that
can be used
None of these
a
values. In of a certain without
general, these species knowledge of
values will be internal
0 and 1 operations
and .they can
be coded as
one bit
32 Cluster is Group of
similar
Operations
on a
Symbolic
representation
None of these
a
objects that database to of facts or ideas
differ transform or from which
significantly simplify data information
from other in order to can potentially
objects prepare it for be extracted
a
machine-lear
ning
algorithm
33 A definition of a concept is-----if it recognizes all the Complete
instances of that concept
Consistent Constant None of these
a
34 A definition or a concept is------------- if it classifies
any examples as coming within the concept
Complete Consistent Constant None of these
b
38 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that carries
(an example useful genetic
IS encrypted information information in
information). from data the form of
genes.
24 Which Association Rule would you prefer High support High support Low support
and medium and low and high
Low support
and low
c
confidence confidence confidence confidence
27 If an item set ‘XYZ’ is a frequent item set, then all subsets of
that frequent item set are
Undefined Not frequent Frequent Can not say
c
28 0.0368 0.0396 0.0389 0.0398
The probability that a person owns a sports car given that
they subscribe to automotive magazine is 40%. We also
b
know that 3% of the adult population subscribes to
automotive magazine. The probability of a person owning a
sports car given that they don’t subscribe to automotive
magazine is 30%. Use this information to compute the
probability that a person subscribes to automotive magazine
given that they own a sports car
33 Classification rules are extracted from _____________ decision tree root node branches siblings
a
34 What does K refers in the K-Means algorithm which is a
non-hierarchical clustering approach?
Complexity Fixed value No of
iterations
. number of
clusters
d
35 If Linear regression model perfectly first i.e., train error is
zero, then _____________________
Test error is
also always
Test error is
non zero
Couldn’t
comment on
Test error is
equal to Train
c
zero Test error error
37 1 2 3 4
How many coefficients do you need to estimate in a simple
linear regression model (One independent variable)?
b
12 1 2 3 4
How many terms are required for building a bayes model?
c
13 Where does the bayes rule can be used? Solving
queries
Increasing
complexity
Decreasing
complexity
Answering
probabilistic
d
query
21 Discovery is It is hidden
within a
The process
of executing
An extremely None of these
complex
b
database and implicit molecule that
can only be previously occurs in
recovered if unknown human
one is given and chromosomes
certain clues potentially and that
(an example useful carries
IS encrypted information genetic
information). from data information
in the form of
genes.
22 Classification task referred to A subdivision
of a set of
A measure of
the accuracy,
The task of
assigning a
None of these
c
examples into of the classification
a number of classification to a set of
classes of a concept examples
that is given
by a certain
theory
33 20 25 4 15
larger value is 60 and the smallest value is 40 and the
number of classes is 5 then the class interval is
c
35 the classification method in which the upper and lower limit exclusive
of interval is also in class interval itself is called…. method
inclusive
method
mid point
method
None of these
b
36 0.05 0.06 0.07 0.08
Suppose there are 25 base classifiers. Each classifier has
error rates of e = 0.35. Suppose you are using averaging as
b
ensemble of above 25 classifiers will make a wrong
prediction? Note: all classifiers are independent of each
other
37 The most widely used metrics and tools to assess a
classification model are:
Confusion
matrix
Cost-sensitive Area under
accuracy the ROC curve
All of Above
d
III. Patterns that exist in the data can be found more easily by
using a visualization
5 Point out the correct combination with regards to kind keyword ‘hist’ for
for graph plotting. histogram
‘box’ for
boxplot
‘area’ for
area plots
all of the
mentioned
d
6 Which of the following value is provided by kind keyword for
barplot?
bar bar bar none of the
mentioned
a
7 You can create a scatter plot matrix using the __________ method sca_matrix
in pandas.tools.plotting.
scatter_matri DataFrame.pl all of the
x ot mentioned
b
8 Plots may also be adorned with error bars or tables. True FALSE Cannot Tell All Above
a
9 Which of the following plots are often used for checking
randomness in time series?
Autocausation Autorank Autocorrelati none of the
on mentioned
c
29 information Visualtization techniques are Pie Chart Scatterplot Histogram Area Chart
a
30 Which of the following is category of timeline? Linear
Timeline
Modular
Timeline
Variant
Timeline
ER Timeline
a
34 Information Visualtization techniques are Flow Chart Time Line DFD All of above
d
35 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these
c
36 Data visualtization is realted with… Pictorial
representaion
numerical
representatio
numerical
calculations
None of these
a
s n
37 Which of the following follows interactive visualization
approach?
Zoom+Pan Focus+Contex
t
Overview+De all of above
tails
d
38 Which of the following are Use of data visualtization See context of Clear data
data
finding
understandin pattern in
all of above
d
g data
39 Which of the following specifies relationship amongst
variables?
Pie Chart Histogram Area Chart None of these
c
40 Which of the following specifies category Proportions? Pie Chart Scatter Plot Line Chart None of these
a
UNIT SUB : 410243 DA
SIX
Sr. No. Questions a b c d Ans
Which of the following is not a classification Logistic Random K-Means Naïve Bayes
32
techique? Regression Forest
c
Which of the following are components of HIVE? FLATTEN Thrift Server Muster All of above
33 b
Hadoop is a framework that works with a variety MapReduce, MapReduce, MapReduce, All of above
35
of related tools. Common cohorts include Hive and MySQL and Hummer and
a
____________ HBase Google Apps Iguana
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Which of the following is not an example of Social Media?
a. Twitter
1. b. Google
c. Insta
d. Youtube
By 2025, the volume of digital data will increase to
a. TB
2. b. YB
c. ZB
d. EB
For Drawing insights for Business what are need?
a. Collecting the data
3. b. Storing the data
c. Analysing the data
d. All the above
Does Facebook uses "Big Data " to perform the concept of Flashback? Is this True or
False.
4.
a. TRUE
b. FALSE
The Process of describing the data that is huge and complex to store and process is known
as
a. Analytics
5.
b. Data mining
c. Big Data
d. Data Warehouse
Data generated from online transactions is one of the example for volume of big data. Is
this true or False.
6.
a. TRUE
b. FALSE
Velocity is the speed at which the data is processed
7. a. TRUE
b. FALSE
have a structure but cannot be stored in a database.
a. Structured
8. b. Semi-Structured
c. Unstructured
d. None of these
refers to the ability to turn your data useful for business.
a. Velocity
9. b. Variety
c. Value
d. Volume
SUB : 410243 DA
There is only one operation between Mapping and Reducing is it True or False…
a. TRUE
20.
b. FALSE
is a type of local Reducer that groups similar data from the map phase
into identifiable sets.
a. MAPPER
30. b. REDUCER
c. COMBINER
d. PARTITIONER
While Installing Hadoop how many xml files are edited and list them ?
i. core-site.xml
ii. hdfs-site.xml
31.
iii. mapred.xml
iv. yarn.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
32.
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
</?xml >
33. Write the code for hdfs-site.xml ?
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Movie Recommendation systems are an example of
1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression
a. 2 Only
1.
b. 1 and 2
c. 1 and 3
d. 2 and 3
Sentiment Analysis is an example of
1. Regression 2. Classification 3. Clustering 4 Reinforcement Learning
a. 1, 2 and 4
2.
b. 1 and 3
c. 1, 2 and 3
d. 1 and 2
Can decision trees be used for performing clustering?
3. a. True
b. False
What is the minimum no. of variables/ features required to perform clustering?
1. 0
4. 2. 1
3. 2
4. 3
For two runs of K-Mean clustering is it expected to get same clustering results?
5. 1. Yes
2. No
Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations. 4.Terminate when RSS falls
6.
below a threshold.
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Which of the following algorithm is most sensitive to outliers?
1. K-means clustering algorithm
7. 2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
After performing K-Means Clustering analysis on a dataset, you observed the following
8.
dendrogram. Which of the following conclusion can be drawn from the dendrogram?
SUB : 410243 DA
9.
1. 1
2. 2
3. 3
4. 4
In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
10. 4. Data points with non-convex shapes
a. 1 and 2
b. 2 and 3
c. 2 and 4
d. 1, 2 and 4
The discrete variables and continuous variables are two types of
a. Open end classification
11. b. Time series classification
c. Qualitative classification
d. Quantitative classification
SUB : 410243 DA
Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a
12. hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
4. None of these
Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a
13.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given by a
14.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible
15.
solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. None of these
is good at handle missing data and support both the kind of
attributes ( i.e Categorial and Continuous attributes )
a. ID3.
16.
b. C4.5.
c. CART.
d. Naïve Bayes.
Decision trees use , in that they always choose the option
that seems the best available at that moment.
a. Greedy Algorithms.
17.
b. Divide and Conquer.
c. Backtracking.
d. Shortest Path Method.
Decision trees cannot handle categorical attributes with many distinct values, such as
country codes for telephone numbers.
18.
a. TRUE
b. FALSE
19. are easy to implement and can execute efficiently even without
SUB : 410243 DA
prior knowledge of the data, they are among the most popular algorithms for classifying
text documents.
a. ID3
b. Naïve Bayes classifiers
c. CART
d. None of these.
High entropy means that the partitions in classification are
a. Pure
20. b. Not pure
c. Useful
d. Useless
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
21. b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
The maximum value for entropy depends on the number of classes so if we have 8 Classes
what will be the max entropy.
22.
a. Max Entropy is 1
b. Max Entropy is 2
c. Max Entropy is 3
d. Max Entropy is 4
John flies frequently and likes to upgrade his seat to first class. He has determined that if
he checks in for his flight at least two hours early, the probability that he will get an
upgrade is 0.75; otherwise, the probability that he will get an upgrade is 0.35. With his
busy schedule, he checks in at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt. What is the
23.
probability that he did not arrive two hours early?
a. 0.892
b. 0.796
c. 0.685
d. 0.999
Point out the wrong statement.
a. k-nearest neighbor is same as k-means
24. b. k-means clustering is a method of vector quantization
c. k-means clustering aims to partition n observations into k clusters
d. none of the mentioned
Consider the following example “How we can divide set of articles such that those articles
have the same theme (we do not know the theme of the articles ahead of time) " is this:
25.
1. Clustering
2. Classification
3. Regression
4. None of These
SUB : 410243 DA
Sr.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
metric is examined to determine a reasonably optimal value of
k.
1. Mean Square Error
1.
2. Within Sum of Squares (WSS)
3. Speed
4. None of These
If an itemset is considered frequent, then any subset of the frequent itemset must also be
frequent.
1. Apriori Property
2.
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 & 2
if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the
confidence of rule {bread,eggs}→{milk} is
1. 0
3.
2. 1
3. 2
4. 3
Confidence is a measure of how X and Y are really related rather than coincidentally
happening together.
4.
a. True
b. False
A high-confidence rule can sometimes be misleading because confidence does not consider
support of the itemset in the rule consequent. Is This True ?
5.
a. Yes
b. No
recommend items based on similarity measures between users and/or
items.
1. Content Based Systems
6.
2. Hybrid System
3. Collaborative Filtering Systems
4. None of These
SUB : 410243 DA
Answer
A
MCQ No - 2
What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
Answer
D
MCQ No - 3
What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
Answer
D
MCQ No - 4
According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like
Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
Answer
A
MCQ No - 5
What are the four V’s of Big Data?
(A) Volume
(B) Velocity
(C) Variety
(D) All the above
SUB : 410243 DA
Answer
D
(B) Real-time
(C) Java-based
Answer
B
MCQ No - 7
(B) Drill
(C) Oozie
Answer
A
MCQ No - 8
Answer
C
MCQ No - 9
SUB : 410243 DA
Answer
B
MCQ No - 10
(B) Both data and cost effective ways to mine data to make business sense out of it
Answer
B
The new source of big data that will trigger a Big Data revolution in the
years to come is
(A) Business transactions
(D) RDBMS
Answer
C
MCQ No - 12
(B) Row
(C) Event
SUB : 410243 DA
(D) Record
Answer
C
MCQ No - 13
Listed below are the three steps that are followed to deploy a Big Data
Solution except
(A) Data Ingestion
Answer
C
MCQ No - 14
Check below the best answer to "which industries employ the use of so-
called "Big Data" in their day to day operations?
(A) Weather forecasting
(B) Marketing
(C) Healthcare
Answer
D
MCQ No - 15
(B) False
Answer
A
SUB : 410243 DA
MCQ No - 16
Answer
A
MCQ No - 17
(B) 1970
(C) 1998
(D) 2005
Answer
C
MCQ No - 18
(B) Unstructured
(C) Processed
(D) Semi-Structured
Answer
C
MCQ No - 19
(B) Ad targeting
SUB : 410243 DA
(C) Scheduling optimization
Answer
D
MCQ No - 20
The feature of big data that refers to the quality of the stored data is
______
(A) Variety
(B) Volume
(C) Variability
(D) Veracity
Answer
D
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1
1) What is Big Data?
a) Huge amount of data
b) Small amount of data
c) Huge File
d) Big Storage
Ans: a
Explanation: It is Huge amount of data
2) According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a
Explanation: Big data management and data mining
3) What are the main components of Big Data?
a)MapReduce
b)HDFS
c)YARN
d)All of these
Ans: d
Explanation: All of these
4) The sources of Big Data are
a)Stock Exchange
b)Transport Data
c) Banking Data
d) All of the Above
Ans: d
Explanation:
5) Big Data Characteristics are:
a) Structured data
b) Semi-structured data
c) Quasi-structured data
d) All of the above
Ans: d
Explanation:
6) Bl tends to provide reports, dashboards, and queries on business
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: a
Explanation:
12) Select from option which is not the phase of data analytics
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) model planning
b) testing
c) discovery
d) operationalize
Ans: b
Explanation:
13) Which phase of data analytics require more time to complete
a) Data preparation
b) model building
c) communicate results
d) Discovery
Ans: a
Explanation:
14) What is analytic sandbox?
a) Tool
b) Separate repository
c) data cleaning
d) Data conditioning
Ans: b
Explanation:
15) The person which provides analytic techniques and modeling is called as.
a) Data Engineer
b) Data scientist
c) Business user
d) Project manager
Ans: b
Explanation:
16) What is task of Project manager?
a) analytic modelling
b) Provide requirement
c) ensure meeting objectives
d) creates DB environment
Ans: c
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Explanation:
17) Identifying Key Stakeholders this task is performed in which phase?
a) Data preparation
b) model building
c) Discovery
d) communicate results
Ans: c
Explanation:
18) ETL process is performed in which phase
a) Discovery
b) communicate results
c) model planning
d) Data preparation
Ans: d
Explanation:
19) How much data Data science teams prefer for analysis?
a) too little
b) average
c) more
d) more than average
Ans: c
Explanation:
20) select from option tool which is not used in model planning phase
a) Data wrangler
b) R
c) SQL Analysis service
d) SAS/ACESS
Ans: c
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
21) if reports and dashboards will be impacted and need to change this task is
performed by.
a) Project sponsor
b) BI Analyst
c) Data Engineer
d) Project manager
Ans: b
Explanation:
22) What is need of data analytic lifecycle.
a) Data cleaning
b) To solve Big data problems
c) Data conditioning
d) Data Exploration
Ans: b
Explanation:
23) How many phases are there in data analytic lifecycle?
a) 4
b) 5
c) 6
d) 7
Ans: c
24) The person with technical skills is called as?
a) Business user
b) Data Engineer
c) Data scientist
d) Project sponsor
Ans: b
25) What is outcome of Model building phase?
a) Analytic results
b) Quality data
c) Data
d) Potential resources
Ans: a
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Pravin S.Patil
Subject Teacher
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
UNIT-1I
1) 1. A statement made about a population for testing purpose is called?
a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic
Ans: b
Explanation:
2) If the assumed hypothesis is tested for rejection considering it to be true is
called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: a
Explanation:
3) A statement whose validity is tested on the basis of a sample is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: b
Explanation:
4) A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: c
Explanation:
5) If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
Ans: d
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
b) β
c) α
d) 1-β
Ans: c
Explanation:
13) Alternative Hypothesis is also called as?
a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis
Ans: b
Explanation:
14) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: d
Explanation:
15) Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Ans: c
Explanation:
16) Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Ans: a
Explanation:
17) Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Ans: a
Explanation:
18) Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Ans: b
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) True
b) False
Ans: a
20) Depending on acceptance and rejection of null hypothesis there are 2 types of
error produced
a) Type 1
b) Type 2
c) None of these
d) All of these
Ans: d
21) The power of a test can be defined as a possibility of …
a) Image Processing
b) Medical
c) Customer Segmentation
d) All of the above
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans: d
Pravin S.Patil
Subject Teacher
Unit-I
View Answer
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data.
View Answer
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are
Volume, Velocity, Variety, Veracity, Value
View Answer
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the
bank.
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured.
View Answer
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
View Answer
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies.
7. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80%
B. 85%
C. 90%
D. 95%
View Answer
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past
two years is 90%.
8) Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Ans: Data Cleansing
10. Communicative and collaborative is one among the key skill sets and behavioral characteristics of a
data scientist [True / False]?
a. True
b. False
Answer : a
11. ---------- are the sources of Bigdata [select all that apply]
I. Book
II. Facebook
III. Genome sequence
IV. Video Surveillance
Ans:
12. BI analyses the past data and make future predictions True/False ?
a. True
b. False
Answer : b
Ans: Phase 2 Data preparation is done in this phase. An analytical sandbox is used in this to perform
analytics for the entire duration of the project. While you explore, preprocess and condition data,
modeling follows suit. To get the data into the sandbox, you will perform ETLT (extract, transform, load
and transform).
A. Discovery
B. Model Planning
C. Model Building
D. Data Preparation
Phase 2 — Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team
can work with data and perform analytics for the duration of the project. The team needs to execute
extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
14. In which phase would the team expect to invest most of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
15. In which phase would the team expect to invest least time of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
16. from following tools which tool is used for Model building?
Ans B
17. from following tools which tool is used for Data preparation
a. Alpine Miner b. Excel c. Matlab d.Weka
Ans . A
18. To determine if the project was completed on time and within budget, is the key role of _____
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
A. 3
B. 6
C. 7
D. Any
20. In data Analytics life cycle we can move back and refine the work done. True or False
A. True
B. False
A. PPT
B.report
C. code
D. All of above
22. ________ provides subject matter expertise for analytical techniques, data modeling and applying
valid analytical techniques to give business problems.
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
Unit-II
(a) Hypothesis
(d) Test-statistic
Answer : a
2. Any hypothesis which is tested for the purpose of rejection under the assumption that it is true is
called:
Answer : a
3. A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is
false is called:
Answer : d
Answer : b
6. If the critical region is located equally in both sides of the sampling distribution of test-statistic, the
test is called:
Answer : b
Answer : b
Answer : a
10. A formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
Answer : a
Answer : a
(a) Size of α
(b) Size of β
(c) Test-statistic
Answer : a
13. Student’s t-test is applicable only when:
Answer : a
14. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated t should be
obtained for:
Answer : d
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the conclusions (c) To
draw conclusions about populations from sample data
Answer : c
16. The histogram to the right represents the hospital length of stay (in days) for patients at a nearby
medical facility. How many patients are included in the histogram?
a. 5
b. 21
c. 17
d. 9
Answer : b
17. Using the histogram to the right that represents the hospital lengths of stay (in days) for patients at a
nearby medical facility, determine the relationship between the mean and the median.
a. Mean = Median
b. Mean ≈ Median
Answer : d
18. The statement “If there is sufficient evidence to reject a null hypothesis at the 10%
significance level, then there is sufficient evidence to reject it at the 5% significance level” :
a. Always True
b. Never True
c. Sometimes True; the p-value for the statistical test needs to be provided for a conclusion
d. Not Enough Information; this would depend on the type of statistical test used
Answer : c
a) ANOV
b) AVA
c) ANOVA
d) ANVA
Ans:c
20) Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: defined distance metric, number of clusters, initial guess as to cluster centroids
25) Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1),
(-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
a) Yes
b) No
Ans: Yes
27) The most commonly used measure of similarity is the _____ or its square.
a)euclidean distance
b)city-block distance
c)Chebychev’s distance
d)Manhattan distance
Ans: euclidean distance
30) Clustering is a-
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. None
Ans: Unsupervised learning
31) Which of the following clustering algorithms suffers from the problem of convergence at local
optima?
A. K- Means clustering
B. Hierarchical clustering
C. Diverse clustering
D. All of the above
Ans: K- Means clustering, Hierarchical clustering, Diverse clustering
33) Which of the following is a bad characteristic of a dataset for clustering analysis-
A. Data points with outliers
B. Data points with different densities
C. Data points with non-convex shapes
D. All of the above
Ans: Data points with outliers, Data points with different densities, Data points with non-convex
shapes
34) For clustering, we do not require-
A. Labeled data
B. Unlabeled data
C. Numerical data
D. Categorical data
Ans: Labeled Data
a. Parametric
b. non parametric
c. Distributed
d. Normal
38. Input data for Wilcoxon test is normally distributed, True or False?
d. None of these
40 Which of following test statics is used in Wilcoxon Rank Sum Test?
d. none of these.
40. What must you include when applying Wilcoxon Rank sum test?
a. variance
b. Critical Value
c. Rank sum
e. standard deviation
a. False Positive
b. false negative
c. True Positive
d. True negative
a. False Positive
b. False negative
c. True Positive
d. True negative
ANOVA
a. Means
b. variance
c. standard Deviation
d. None of above.
b. F ratio
c. T-score
d. Chi Square
Q.25 What are the two types of variance which can occur in your data?
Q.26 If between group mean sum of square variability increases value of F statistics_____
a. Increases
b. Decreases
c. Neutral
d. None of these
Q.27 What must you include when applying ANOVA test?
a. Means
b. Critical Value
c. degree of freedom
d. F statistics
e. All of above
a.1
b.3
c.2
d.any
d.None of these
b.ANCOVA
c.MANOVA
d.ZANOVA
Unit-III
(A)Itemset
(B)Support
(C)Confidence
(D)Support Count
Ans:A
(A)Support
(B)Confidence
(C)Support Count
(D)Rules
Ans:C
3.An itemset whose support is greater than or equal to a minimum support threshold is ______
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Ans:B
(A)It mines all frequent patterns through pruning rules with lesser support
(B)It mines all frequent patterns through pruning rules with higher support
Ans:C
(B)Transaction Increases
(C)Sampling
(D)Cleaning
Ans:A
A) TRUE
B) FALSE
Ans:A
A) TRUE
B) FALSE
Ans:A
8.Which of the following methods do we use to find the best fit line for data in Linear
Regression?
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans:A
9. A local retailer has a database that stores 10,000 transactions of lastsummer. After
analyzing the data,a data science team has identified thefollowing statistics:• {battery}
appears in 6,000 transactions.• {sunscreen}appears in 5,000 transactions.• {sandals}
appears in 4,000 transactions.•{bowls} appears in 2,000 transactions.• {battery, sunscreen}
appears in1,500 transactions.• {battery, sandals} appears in 1,000 transactions.•{battery,
bowls} appears in 250 transactions.• {battery, sunscreen, sandals}appears in 600
transactions. Q) What are the confidence values of{battery}->{ sunscreen} and {battery,
sunscreen}->{ sandals} ?
a) 0.3 and 0.4
b) 0.25 and 0.4
c) 0.25 and 0.15
d) 0.6 and 0.4
Ans: b
a) Cor(X, Y) = 1
b) Cor(X, Y) = 0
c) Cor(X, Y) = 2
Ans:b
11. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
Ans:C
12.Which of the following metrics can be used for evaluating regression models?
i) R Squared
iii) F Statistics
b) i and ii
Ans:d
13.How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Ans:b
14.In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
Ans:d
a) lm(formula, data)
b) lr(formula, data)
c) lrm(formula, data)
d) regression.linear(formula, data)
Ans:a
16.In syntax of linear model lm(formula,data,..), data refers to ______
a) Matrix
b) Vector
c) Array
d) List
Ans:b
17.In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Ans:c
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
19.The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
Ans:b
20.Predicting y for a value of x that’s outside the range of values we actually saw for x in the
original data is called ___________
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:b
21.What is predicting y for a value of x that is within the interval of points that we saw in the
original data called?
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:c
22. ________ is a simple approach to supervised learning. It assumes that the dependence of Y
on X1, X2, . . . Xp is linear.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
23.Although it may seem overly simplistic, _______ is extremely useful both conceptually and
practically.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
24. __________ refers to a group of techniques for fitting and studying the straight-line
relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
Ans: c
Data Processing and Analysis
Unit 4
1. What is a hypothesis?
Answer: a
a. True
b. False
Answer: a
a. Concurring
b. Coding
c. Colouring
d. Segmenting
Answer: b
4. What is the cyclical process of collecting and analysing data
during a single research study called?
a. Interim analysis
b. Inter analysis
c. Inter-item analysis
d. Constant analysis
Answer: a
a. Typology
b. Diagramming
c. Enumeration
d. Coding
Answer: c
a. Can reduce time required to analyse data (i.e., after the data are
transcribed)
b. Help in storing and organising data
c. Make many procedures available that are rarely done by hand
due to time constraints
d. All of the above
Answer: d
7. Boolean operators are words that are used to create logical
combinations.
a. True
b. False
Answer: a
a. Categories
b. Units
c. Individuals
d. None of the above
Answer: a
a. Segmenting
b. Coding
c. Transcription
d. Mnemoning
Answer: c
Answer: a
11. Hypothesis testing and estimation are both types of descriptive
statistics.
a. True
b. False
Answer: b
a. True
b. False
Answer: a
13. A graph that uses vertical bars to represent data is called a ___
Answer: b
14. ___________ are used when you want to visually examine the
relationship between two quantitative variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
Answer: d
15. The denominator (bottom) of the z-score formula is
Answer: a
a. Normal Distribution
b. Chi-Squared Distribution
c. Gamma Distribution
d. Poisson Distribution
Answer b
a. Statistic
b. Hypothesis
c. Level of Significance
d. Test-Statistic
Answer: b
18. If the assumed hypothesis is tested for rejection considering it to
be true is called?
a. Null Hypothesis
b. Statistical Hypothesis
c. Simple Hypothesis
d. Composite Hypothesis
Answer: a
a. Null Hypothesis
b. Positive Hypothesis
c. Negative Hypothesis
d. Alternative Hypothesis.
Answer: d
a. Composite hypothesis
b. Research Hypothesis
c. Simple Hypothesis
d. Null Hypothesis
Answer: b
marks question A B C D ans
A group of 4 bits is also
0 1 Nibble Byte Kb None 4 bits make one nibble.
called?
There are how many types of
1 1 3 2 1 None Big Data is of 3 types.
Big Data:
Which of the following are the
2 1 All Volume Variety Velocity. This is an explaination.
V's of Big Data:
Which of these is not a
3 1 Storage Volume Variety Velocity. This is an explaination.
characterstic of Big data?
Which of the following is a Big Data requires high cost to
4 2 Cost Significant Process Fraud Detection
drawback of Big Data: maintain huge amount of data
GINA stands for Global
Global Innovation Network and Global Invention in Globally Investment in
5 2 Fullform of GINA is: None Innovations Networks and
Analysis. Networks and Analytics Neurons and Analytics
Analysis.
Which is the phase 3 in Data Model Planning is the 3rd phase
6 2 Model Planning Model Building Data Preparation Operationalize
Analytics Life cycle. in life cycle.
GINA team thought to GINA targeted to achieve three
7 2 3 2 1 5
accomplish mainly____ goals: goals for the project.
The Data Preparation stage
8 2 Analyzation Collection Cleansing Processing. This is an explaination.
doesn’t involve:
Unstructured Data is further Unstructured data is divided into
9 2 2 3 4 5
divided into how many types? 2 types.
The GINA team mainly used
The team used Tableau to
10 2 which software tool to analyze Tableau Hadoop HIVE SQL
visualize the Data.
the Data
Which of the follwing is the first
11 2 step of Data Analytics Life Discovery Data Preparation. Model Planning Data Aware This is an explaination.
Cycle:
There are how many phases in there are 6 stages in data
12 2 6 5 4 7
data analytics life cycle: analytics life cycle.
SEMMA Methodology has SEMMA methodology has five
13 2 5 4 6 7
how many stages: stages.
Which phase of Life Cycle
Phase 5 involves collaboration
14 2 requires collaboration with Phase 5 Phase 6 Phase 4 Phase 3
with stakeholders.
stakeholders?
In Building a Model, how many
15 2 2 3 4 5 This is an Explaination.
phases are required:
How much Data in the whole Only 20% of world's total data is
16 2 0.2 0.4 0.6 0.5
world is structured: structured.
10^7 bytes of memory is equal
17 2 1ZB 1TB 1YB 1XB 10^7 B is equal to 1 ZB.
to:
Data Scientists in the GINA
NLP technique was used on the
team used which technique on Natural Language
18 2 Hadoop HIVE SQL description of Innovation
the textual Description of the Processing(NLP)
Roadmap Idea.
Innovation Roadmap Idea.
How many types of data Two types of data anlytical
19 2 analytics methodologies are 2 4 3 6 methodologies are there. EDA
there? and CDA
Bell Curve is also known as
20 3 Other name for Bell Curve is: Normal Distribution. Poisson Distribution Bionomial Distribution Bernoulli Distribution.
normal distribution.
One of the most important tasks
One of the most important
21 3 Statical Modeling Testing of Data Visualization Operationalize in big data analytics is statistical
tasks in big data analytics is:
modeling
Some of the approaches
considered for building the data
22 3 All CRISP-DM SEMMA MAD Skills This is an explaination.
analytics lifecycle framework
best practices are:
In Phase 4, the team develops
23 3 All Testing of Data Training of Data Production purposes This is an explaination.
datasets for:
Cross International Company's Initial CRISP-DM stands for Cross
Fullform of CRISP-DM Cross Industry Standard Process Common Industry Standard
24 3 Standard Process for Standards Progress for Industry Standard Process for
Methodology is: for Data Mining Program for Data Mining
Data Modeling Data Methods Data Mining.
SEMMA Methodology
25 3 doesn’t include which of the Evaluate Sample Explore Asses This is an Explaination.
following stages:
In Which stage, the data is In last phase i.e. Opeartionalize
monitored and analyzed to see Data is monitored and analyzed
26 3 Operationalize Collection Plan Model Data Aware
if the generated model is to see if the generated model is
creating the expected results. creating the expected results.
Data is captured in how many
27 3 3 4 5 6 Data is captured in 3 main ways.
ways:
marks question A B C D ans
In phase 2 of the Data
The team performs ETL and
Anlaytics Life Cycle, the team
28 3 3 2 4 6 ELT and ETLT in 2nd phase of
performs how many analytics
the cycle.
to get the data in the sandbox.
The total area under the bell Area under the bell curve is 1
29 3 1 2 3 4
curve is____unit. unit.
Wilcoxon rank-sum test is also Wilcoxon rank-sum test is also
30 1 Mann-Whiteney U test Mean Difference Alternative Hypothesis Null Hypothesis
known as? called Mann- Whiteney U Test.
Which test is also known as T-
31 1 Hypothesis Test Mean Difference K-means test None This is an explaination.
test?
This eqn is of Mean difference
32 1 This equation is of which test? Mean Difference K-Means Null Hypothesis Alternative Hypothesis
test.
A test of a statistical A test of a statistical hypothesis,
hypothesis, where the region of where the region of rejection is
33 1 rejection is on a side of the One tailed test Two-tailed test Tailed test Null test on only one side of the sampling
sampling distribution, is distribution, is called a one-tailed
called___________. test
How many types of Statical There are two types of Statical
34 1 2 3 4 6
Hypothesis is there? Hypothesis.
Analysis of Variance is also ANOVA stands for Analysis of
35 1 ANOVA Mean Difference Alternative Hypothesis Null Hypothesis
refered as? Variance.
How many steps are involved There are 4 steps in Hypothesis
36 1 4 2 3 5
in a Hypothesis Testing? testing.
The strength of evidence in The strength of evidence in
37 2 support of a null hypothesis is P-value K-value H-value Null-value support of a null hypothesis is
measured by? measured by the P-value.
Difference in means is also Difference in means is also
38 2 Two sample t-test T- test M-test Two sample test
called? known as two sample t test.
The k-medoids is also The k-medoids is also called
Partitioning Around Medoids
39 2 called_______________ Lloyd's Algorithm Poisson's Algorithm Regression partitioning around medoids
(PAM)
algorithm. (PAM) algorithm .
Clustering is an example of Clustering is an example of
40 2 Unsupervised Learning Supervised Learning Classification Regression
____? unsupervised learning.
Which of the following is not an
41 2 advantage of K means Requires a Priori Fast Robust easy to evaluate. This is an explaination.
Clustering?
The probability of committing a The probability of committing a
42 2 Beta Alpha Delta Theta
Type 2 error is called Type II error is called Beta
The______ variation we have
The less variation we have within
within clusters, the more
clusters, the more homogeneous
43 2 homogeneous (similar) the data Less More Variable Fixed
(similar) the data points are
points are within the same
within the same cluster.
cluster.
Which hypothesis is usually the Null Hypothesis is usually the
hypothesis in which sample hypothesis that sample
44 2 Null-Hypothesis Mean Difference K-means test Alternative Hypothesis
observations result is purely observations result purely from
from chance? chance.
Classical" ANOVA for
Classical" ANOVA for balanced
45 2 balanced data does how many 3 2 1 4
data does three things at once.
things at once?
K-mean clustering is used to NP hard problems are solved
46 2 NP-hard problems NP Problems Hypothesis Problems P problems
solve which problems? using K means clustering.
The probability of committing a The probability of committing a
47 2 Alpha Beta Gama Delta
Type I error is called? Type I error is called alpha
K means Clustering is also K means clustering is also called
48 2 Lloyd's Algorithm Gaussian Algorithm Poisson's Algorithm None
known as? Lloyds algo.
Which algorithm requires the k-means clustering requires the
49 3 user to specify the number of K-means clustering Gaussian Algorithm Alternative Hypothesis Null Hypothesis user to specify the number of
clusters k to be generated. clusters k to be generated.
K means clsutering uses which expectation-maximization
50 3 approach to solve the Expectation-maximization Greedy Approach Divide and Conquer None technique is used by k means
problems? clustering.
How many factors affect the The power of a hypothesis test is
51 3 3 2 1 4
power of a hypothesis test? affected by three factors.
Law of variance is also called
52 3 Law of Variance is called? Eve's Law Laplace Law Poisson's Algorithm Regression
Eve's law.
K-Medoids use which K Medoids use greddy
53 3 Greedy Approach Divide and Conquer Recursive None
approach to solve problems? approach to solve problems
The time complexity of k Time complexity is O(n^2) of k
54 3 O(n^2) O(nlogn) O(n) O(1)
means clustering is? means clustering.
the number (k ) of clusters
The number k of clusters
55 3 assumed in k-medoids is Priori Null Hypothesis ANNOVA Effect size
assumed known as priori.
known as?
marks question A B C D ans
The effect size is the difference
What is the difference between
between the true value and the
56 3 the true value and the value Effect -size Null Hypothesis Alternative Hypothesis ANOVA
value specified in the null
specified in the null hypothesis.
hypothesis.
Time complexity of k medoids
57 3 O(n^2) O(nlogn) O(n) O(n^3) This is an explaination.
is?
Which algorithm aims at K means algorithm aims at
58 3 minimizing an objective function K-means Mean Difference Alternative Hypothesis ANOVA minimizing an objective function
know as squared error function know as squared error function
Which algorithm was the
Apriori Algorithm was earliest in
59 1 earliest of the association rule Apriori Algorithm Gaussian Algorithm K means clustering Bernoulli Distribution.
the association of algorithms.
algorithms?\n
The Apriori algorithm takes The Apriori algorithm takes a
a______ iterative approach to bottom-up iterative approach to
60 1 uncovering the frequent Bottom-Up Top-Down Recursive None uncovering the frequent itemsets
itemsets by first determining all by first determining all the
the possible items possible items
Apriori uses breadth-first search
Apriori uses which structure to
and a Hash tree structure to
61 1 count candidate item sets BFS DFS Queue Stack
count candidate item sets
efficiently?
efficiently
"y=a+b*x^2". This equation
62 1 Polynomial Regression Logistic Regreasion Linear Regression Lasso Regression This is an explaination.
shows which regression?
__________ is defined as the Confidence is defined as the
measure of certainty or measure of certainty or
63 2 Confidence Recursion Item-set None
trustworthiness associated with trustworthiness associated
each discovered rule. with\neach discovered rule.
In which Regression, we In Logistic Regression, we
64 2 Logistic Regression Linear Regression Both None
predict the value by 1 or 0? predict the value by 1 or 0.
The formula for linear The formula for linear regression
65 2 Y’ = bX+A Y’ = bX - A. Y’ = bX /A. Y’ = bX * A.
regression is: is: Y’ = bX + A.
Which regression is useful PLS regression is also useful
Partial Least Squares(PLS)
66 2 when there are a large number Cox Regression Lasso Regression Logistic Regression when there are a large number of
Regression
of independent variables. independent variables.
Which regression is an Simple linear regression is an
67 2 approach for predicting a Linear-Regression Logistic Regreasion Elasticnet Regression None approach for predicting a
response using a single feature. response using a single feature.
Association rule mining consists Association rule mining consists
68 2 2 3 4 5
of _______ steps. of 2 steps
Which type of regression is Ordinal regression is suitable
69 2 suitable when dependent Ordinal Regression Linear Regression Cox Regession Logistic Regression when dependent variable is
variable is ordinal in nature? ordinal in nature
Which regression is used for ElasticNet regression is used for
70 2 ElasticNet Regression Linear Regression Logistic Regression None
support vector machines support vector machines,
Which regression can solve Support-Vector Regession can
71 2 both linear and non-linear Support Vector Regression Linear Regression Logistic Regression ElasticNet Regression solve both linear and non linear
models? models.
Which is the most common Least Square Method is the most
72 2 method used for fitting a Least Square Method Mean Difference Null Hypothesis Classification common method used for fitting
regression line a regression line
_______problems are when A regression problem is when
73 2 the output variable is a real or Regression Classification Recursive Hypothesis the output variable is a real or
continuous value. continuous value.
Linear Regression is a machine
Linear Regression is a machine
learning algorithm based on
74 2 Supervised Learning Unsupervised Learning Recursive Learning All learning algorithm based on
______ learning regression
supervised regression algorithm.
model.
When dependent variable's
When dependent variable's
variability is not equal across
variability is not equal across
75 2 Heteroscedasticity Homooscedasticity Multicolinearity Outliers. values of an independent
values of an independent
variable, it is called
variable, it is called
heteroscedasticity
_________requires large Logistic Regression requires
sample sizes because maximum large sample sizes because
76 2 likelihood estimates are less Logistic Regression Linear Regression Lasso Regression ElasticNet Regression maximum likelihood estimates
powerful at low sample sizes are less powerful at low sample
than ordinary least square sizes than ordinary least square
PCR Regression is divided into PCR regression is divided into 2
77 2 2 3 4 5
how many steps? steps
78 3 L2 regularization is also called? Tikhonov Regularization Norm Regularization Poisson's Regularization None This is an explaination.
When the variance of count When the variance of count data
79 3 data is greater than the mean Overdispersion Underdispersion Dispersion High dispersion is greater than the mean count, it
count, it is a case of? is a case of overdispersion
marks question A B C D ans
Which regression assumes the Linear regression assumes the
80 3 normal distribution of the Linear-Regression Logistic Regreasion Elasticnet Regression None normal or gaussian distribution of
dependent variable? the dependent variable.
Nature of predicted data in Nature of predicted data in
81 3 Ordered Unordered Both None
regression is? regression is ordered.
Which regression uses a binary Logistic regression uses a binary
82 3 dependent variable but ignores Logistic Regression Linear Regression Cox Regession Lasso Regression dependent variable but ignores
the timing of events. the timing of events.
The Ridge Regression is also The ridge regression is also
83 3 Shrinkage Regression Percentile Regression Elasticnet Regression Lasso Regression
known as? known as Shrinkage Regression.
In which regression, we In Linear Regession we calculate
calculate Root Mean Square Root Mean Square
84 3 Linear-Regression ElasticNet Regression Logistic Regression All
Error(RMSE) to predict the Error(RMSE) to predict the next
next weight value. weight value.
The______ is the standard The residual standard error is the
85 3 deviation of the observed Residual standard error Mean Difference Error Data Error All standard deviation of
residuals. the\nobserved residuals.
Which Regression is used Poisson regression is used when
86 3 when dependent variable has Poisson Regression Linear Regression Cox Regession Lasso Regression dependent variable has count
count data. data.
________________regression
Quasi-Poisson regression can
can handle both over-
87 3 Quasi-Poisson regression Cox Regression Elasticnet Regression Linear Regression handle both over-dispersion and
dispersion and under-
under-dispersion.\n
dispersion.\n
___ is the regularization
λ is the regularization parameter
88 3 parameter in Lasso λ θ Ω β
in lasso regression.
Regression?
Decision Tree is a hierarchical Decision Tree is a hierarchical
model that does the separation model that recursively does the
89 1 Recursion Pointers Greedy Approach Divide and Conquer
of the\ninput space into class separation of the\ninput space
regions using: into class regions
Learning Algorithm of Decision Decision Tree uses greedy
90 1 Greedy Approach Divide and Conquer Both None
Tree is: approach for learning algorithm.
Normal Distribution is also
91 1 Gausiann Distribution Bernoulli Distribution Naïve Bias Binary Distribution This is an explaination.
called?
Classification has how many There are 2 phases of
92 1 2 3 4 5
phases: classification.
"Every pair of features being Naïve Bias uses the principle that
classified is independent of every pair of features being
93 1 Naïve Bais Classifier Decision Tree Bernoulli Distribution Normal Distribution
each other".This principle is classified is independent of each
used by: other.
This equation is of which
94 2 Gausiann Distribution Binary Distribution Naïve Bias Gross-Entrpoy This is an explaination.
theorem?
In Naïve Bias, The Datasets
data sets are divided into two
95 2 are divided into how many 2 3 4 5
types in naïve bias.
types?
Decision trees can be used to Decision trees can be used to
96 2 predict non-categorical values Regression Trees Categorial trees Normal tree None predict non-categorical values is
is called? called regression trees
An attribute with____Gini
an attribute with lower Gini index
97 2 index should be preferred in a Lower Higher Recursive Negative
should be preferred.
decision tree.
In Naïve Bias, if any two If any two events A and B are
98 2 events A and B are P(A,B)=P(A)P(B) P(A,B)=P(A)/P(B) P(A,B)=P(B) P(A,B)=P(B)P/(A) independent,
independent, then, then,P(A,B)=P(A)P(B)
What is the measure of
Entropy is the measure of
99 2 uncertainty of a random Entropy. Gain Gini Index None
uncertainty of a random variable
variable in a decision tree.
Which of the following is not
100 2 Stable Easy to understand Easy to explain Easy to evaluate. this is an explaination.
true for decision trees?
Decision tree algorithm falls Decision tree algorithm falls
101 2 under the category of which Supervised Unsupervised Regression Classification under the category of supervised
learning? learning
False Positives and False One of the use Bayes Theorem is
102 2 Negatives is an application of Bayes' Theorem Binary Distribution Bernoulli Distribution Normal Distribution false positives and false
which theorem? negatives.
Decision Tree used in mining
There are 2 types of decision
103 2 the data are of how many 2 3 4 5
trees used in data mining.
types?
In Bayes' Theorem, P(A) and
P(A) and P(B) are the
P(B) are the probabilities of
probabilities of observing A and
104 3 observing A and B Marginal Probability Normal Distribution Bernoulli Distribution Parallel Algorithm.
B respectively; they are known
respectively; they are known
as the marginal probability.
as:
marks question A B C D ans
ID3 Algorithm in a decision ID3 stands for Iterative
105 3 Iterative Dichotomiser 3 (ID3) Interval Driven Interconnected Decision None
tree stands for? Dichotomiser 3 (ID3)
Probably the best way of
Probably the best way of
estimating performance for very
106 3 estimating performance for Boot Strapped Method Normal Distribution Naïve Bias Binary Distribution
small data sets is bootstrapped
very small\ndata sets is:
method
The Decision Tree works on Decision Tree works on
107 3 Disjunctive Normal Form Product of Sum Bijective Form Conjuctive Form
which form? Disjunctive normal form.
The decoupling of the class The decoupling of the class
conditional feature distributions conditional feature distributions
108 3 means that each distribution 1-D 2-D 3-D NONE means that each distribution can
can be independently estimated be independently estimated as a
as a________ distribution. one dimensional distribution.
Theoretical concept to evaluate
109 3 COLT PAC Model Naïve Bias Prediction. This is an explaination.
Classfiers is:
____________is a metric to Gini Index is a metric to measure
measure how often a randomly how often a randomly chosen
110 3 Gini Index Entropy Pointer Gross-Entrpoy
chosen element would be element would be incorrectly
incorrectly identified identified
The most notable types of The most notable types of
111 3 3 2 1 4
decision tree algorithms are: decision tree algorithms are 3
Which process is completed The recursive partition is
when the subset at a node all completed when the subset at a
112 3 Recursive Partitioning Termination Transformation Prediction.
has the same value of the target node all has the same value of
variable? the target variable
The_______ method reserves The holdout method reserves a
113 3 a certain amount for testing and Holdout Parallel Algorithm Naïve Bias Normal Distribution certain amount\nfor testing and
uses the remainder for training. uses the remainder for training
This equation is of which
114 3 Bayes' Theorem Normal Distribution Bernoulli Distribution Gross-Entrpoy This is an explaination.
theorem?
"Independence among the Independence among the
115 3 features". This is an assumption Naïve Bais Classifier Bernoulli Distribution Parallel Algorithm Binary Distribution features is an assumption in
in: Naïve bias.
Error rate obtained from error rate obtained from training
116 3 Resubstitution Error Grid Gini Index True error
training data is called: data is called resubstitution error.
In Decision Tree entropy is
117 3 proportional inverse High Less This is an explaination.
__________ to content.
In Decision Tree, No root-to-
No root-to-leaf path should
leaf path should contain the
118 3 Twice Once Thrice Four Times. contain the same discrete
same discrete attribute
attribute twice
____________.
Using_________, designers
Using data visualization methods,
can make information
119 1 Data Visualization Classification Regression Supervised Learning. designers can make information
understandable for
understandable for stakeholders.
stakeholders.
The additional visual methods
120 1 All Tree Map Parallel Coordinates Semantic Networks. This is an explaination.
include:
Data Visualization tools
121 1 Ms--Excel Tableau Power BI Jupyter This is an explaination.
Doesn’t include:
Which of the following requires
122 1 Javascript Knowledge to run All Chart.js Polymap Sigmajs This is an explaination.
the visualization tool?
Merits of Tableau doesn’t Merits of tableau doesn’t include
123 1 Cost Performance Usage Computation
include which factor: the cost factor.
Which of these is not a type of
124 1 Pictograph Bar-Graph Line-Chart Pie-Chart This is an explaination.
Big Data Visualization.
The drag-and-drop editor od
The drag-and-drop editor of
which tool makes it easy to
Infogram makes it easy to create
125 2 create professional-looking Infogram Google Chart Tableau Grafana
professional-looking designs
designs without a lot of visual
without a lot of visual design skill.
design skill.
How many V's are defined for There are 4 V's of Data
126 2 4 6 2 3
Data Visualization. visualization.
Which of the following is not a Tableau is a chargeable tool of
127 2 Tableau Google Chart Jupyter Hub-Spot CRM
free Data Visualization tool? data visualization.
Companies that work with
Companies that work with both
both traditional and big data
traditional and big data may use
128 2 use which technique to look at Pie-Chart Bar-Graph Stream graph Line-Chart
pie chart to look at customer
customer segments or market
segments or market shares
shares?
Visualization of Data includes
129 2 which of the following All Information Loss Visual Noise Large Image Perception. This is an explaination.
problems:
Mainly, Data Visualization has There are 5 main challenges to
130 2 5 6 4 2
how many types of challenges? data visualization.
marks question A B C D ans
Google charts uses
Which tool uses HTML5/SVG
131 2 Google Charts Jupyter Grafana Tableau HTML5/SVG since its browser
to visualize data
compatible.
According to Colin Ware’s According to Colin Ware’s
Information Visualization: Information Visualization:
132 2 Perception for Design, he 4 2 1 3 Perception for Design, he defines
defines_____ pre-attentive four pre-attentive visual
visual properties. properties
_____ is based on space-filling Tree map method is based on
133 2 visualization of hierarchical Tree-Map Stream graph Bar-graph Line-Chart space-filling visualization of
data. hierarchical data
Which graph shows the Gantt chart show the
dependency relationships dependency relationships
134 2 Gantt-Chart Line-Chart Pie-Chart Bar-Graph
between activities and current between activities and current
schedule status. schedule status.
Another name for distribution Non parametric data is also
135 2 Non parametric data Parametric Data static data Dynamic data
free data is: called distribution free data.
Which chart is used for Bar Graph is used for
comparison of values, such as Comparison of values, such as
136 2 sales performance for several Bar-Graph Gantt-Graph Line-Chart Pie-Chart sales performance for several
persons or businesses in a persons or businesses in a single
single time. time
Graphical Techniques are
_____________are graphics
graphics in the field of statistics
137 2 in the field of statistics used to Graphical-Techniques Line-Chart Regression Classification
used to visualize quantitative
visualize quantitative data.
data.
_____ can handle several Parallel Coordinates can handle
factors for a large number of several factors for a large
138 2 objects per single screen, so it Parallel Coordinates Stream graph Google Chart Jupyter number of objects per single
satisfies the data variety screen, so it satisfies the data
criterion. variety criterion
Chart.js provides how many
139 3 8 5 3 6 This is an explaination.
types of charts?
Which visualization tool
Grafana supports mixed data
supports mixed data sources,
sources, annotations, and
annotations, and customizable
140 3 Grafana Tableau Google Chart Jupyter customizable alert functions, and
alert functions, and it can be
it can be extended via hundreds
extended via hundreds of
of available plugins.
available plugins.
Which tool was created Datawrapper was created
141 3 specifically for adding charts Data Wrapper Tableau Google Chart Jupyter specifically for adding charts and
and maps to news stories. maps to news stories.
Conventional Visualization Mekko chart is a new technique
142 3 Mekko Chart Pie-Chart Bar-graph Histogram
methods doesn’t include: to visualize data.
_____________ is a type of a Streamgraph is a type of a
stacked area graph, which is stacked area graph, which is
143 3 displaced around a central axis, Streamgraph Bar-Graph Pie-Chart Line-Chart displaced around a central axis,
resulting in flowing and organic resulting in flowing and organic
shape. shape
Which visual tool includes over
Fusion charts includes over 150
144 3 150 chart types and 1,000 Fusion charts Tableau Google Chart Jupyter
chart types and 1,000 map types
map types?
Which graph/chart is a
A semantic network is a
graphical representation of
graphical representation of
logical relationship between
logical relationship between
different concepts. It generates
145 3 Semantic Networks Bar-Graph Pie-Chart Line-Chart different concepts. It generates
directed graph, the
directed graph, the combination
combination of nodes or
of nodes or vertices, edges or
vertices, edges or arcs, and
arcs, and label over each edge
label over each edge.
According to SAS we can According to SAS we can
process only______ of process only 1 kilobit of
146 3 1 Kilobit 1 Byte 1 Bit 1 MB
information per second on a information per second on a flat
flat screen. screen
There are____ steps for
147 3 4 5 3 6 This is an explaination.
interactive data visualization:
When working with big data, When working with big data,
companies can use which companies can use the line chart
visualization technique to track visualization technique to track
148 3 total application clicks by Line-Chart Bar-Graph Pie-Chart Stream graph total application clicks by weeks,
weeks, the average number of the average number of
complaints to the call center by complaints to the call center by
months, etc.\n\n months, etc.\n\n
Which of the following
149 1 All Facebook Netflix Adobe This is an explaination.
Enterprises use HBase?
marks question A B C D ans
Which NLP is used in the From 2010, Neural NLP is
150 1 Neural NLP Symbolic NLP Statical NLP None
present era? being used.
The Computer World magazine The Computer World magazine
states that unstructured states that unstructured
151 1 information might account for 70-80% 0.9 0.5 0.6 information might account for
more than______of all data in more than 70%–80% of all data
organizations. in organizations.
Almost all of the information Almost all of the information we
we use and share every day, use and share every day, such as
152 1 such as articles, documents and Unstructured Structured Semantic None articles, documents and e-mails,
e-mails, are are completely or partly
completely___________. unstructured
The Unstructured Information
Which standard provided a Management Architecture
common framework for (UIMA) standard provided a
Unstructured Information
processing information to Management common framework for
153 1 Management Architecture Data Architecure None
extract meaning and create Architecture for Data processing this information to
(UIMA)
structured data about the extract meaning and create
information? structured data about the
information.
The base Apache Hadoop The base Apache Hadoop
154 2 framework is composed of the 4 2 3 6 framework is composed of the
how many modules? four modules.
No-SQL doesn’t include
155 2 MS-SQL HBASE DyanoDB MongoDB This is an explaination.
which software?
There are _______main types There are 3 types of OLAP
156 2 3 2 5 6
of OLAP systems. systems.
SQL alternative in Apache HIVE-QL is the alternative to
157 2 HIVEQL BASEQL SPARK-QL H-QL
HIVE is called? SQL in Apche Hive family.
MapReduce program executes MapReduce program executes in
158 2 3 2 5 4
in how many stages? three stages.
How many types of NO-SQL There are 4 types of databases in
159 2 4 3 2 6
database are there? NO-SQL.
MapReduce is a processing
MapReduce is a processing
technique and a program
technique and a program model
160 2 model for distributed JAVA Python C++ R
for distributed computing based
computing based on which
on java
programming Language?
Hive supports how many Hive supports all four properties
161 2 4 3 2 1
properties of transactions? of transactions
HDFS consists of only one
HDFS consists of only one
162 2 Master Node Slave Node Both None Name Node that is called the
Name Node that is called as?
Master Node.
Which Apache Software is
needed to process massive Hbase to process massive
163 2 amounts of data for the Apache HBASE Apache Spark Apache-PIG Apache-mahout amounts of data for the purposes
purposes of natural-language of natural-language search
search?
Which database store data in a No-sql databases that store data
164 2 format other than relational NO-SQL HIVESQL SPARK-QL H-QL in a format other than relational
tables tables.
Which is a project of the Mahout is a project of the
Apache Software Foundation Apache Software Foundation to
to produce free produce free implementations of
165 2 implementations of distributed Apache Mahout Apache Spark Apache-PIG Apache HBASE distributed or otherwise scalable
or otherwise scalable machine machine learning algorithms
learning algorithms focused focused primarily on linear
primarily on linear algebra? algebra.
MapReduce model is a
Which model is a specialization
specialization of the split-apply-
166 2 of the split-apply-combine MapReduce Hadoop HBASE HIVE
combine strategy for data
strategy for data analysis?
analysis.
All Hadoop commands are
All Hadoop commands are invoked by the
167 2 $HADOOP_HOME/bin/hadoop $HADOOP/bin/hadoop $HADOOP_HOME/hadoop $HADOOP_HOME/bin
invoked by which command? $HADOOP_HOME/bin/hadoop
command
The table typically enforces the The table typically enforces the
schema when the data is schema when the data is loaded
loaded into the table. This into the table. This enables the
enables the database to make database to make sure that the
168 3 sure that the data entered Schema on Write Schema on Read Schema for Read Write None data entered follows the
follows the representation of representation of the table as
the table as specified by the specified by the table definition.
table definition. This design is This design is called schema on
called? write.
marks question A B C D ans
Which command formats the Namenode -format command
169 3 Namenode -format Node -format Name -format Format
DFS filesystem? formats the DFS file system.
Which command applies the
oiv applies the offline fsimage
170 3 offline fsimage viewer to an oiv fs fc ov
viewer to an fsimage.
fsimage?
Hadoop requires which Java
Hadoop requires Java Runtime
171 3 Runtime Environment (JRE) or 1.6 1.2 1.5 1
Environment (JRE) 1.6 or higher
higher version?
Every Data node sends a
Every Data node sends a
Heartbeat message to the
Heartbeat message to the Name
172 3 Name node every____ 3 2 4 1
node every 3 seconds and
seconds and conveys that it is
conveys that it is alive
alive.
HDFS can store upto1 TB of
173 3 HDFS can store files upto: 1 TB 1 GB 1ZB 1PB
files.
Which of the following is a HBASE is a popular wide
174 3 HBase SQL DyanoDB MongoDB
wide-column store? columnn store.
Which node acts as both a A slave or worker node acts as
175 3 DataNode and TaskTracker in Slave Node Data Node Admin Node Name Node both a DataNode and
Hadooop. TaskTracker.
HDFS system uses which HDFS system uses TCP/IP
176 3 TCP/IP TCP UDP IP
protocol for communication? sockets for communication
177 3 HDFS has how many services? 5 4 2 6 HDFS has five services.
____________is a data
HIVE is a data warehouse
warehouse software project
software project built on top of
178 3 built on top of Apache Hadoop Apache HIVE Apache Spark Apache-PIG Apache HBASE
Apache Hadoop for providing
for providing data query and
data query and analysis
analysis
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:01
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
1 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
2 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
3 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
4 of 5 20-03-2021, 15:03
Hadoop Online Quiz - Tutorialspoint https://www.tutorialspoint.com/hadoop/hadoop_online_quiz.htm
5 of 5 20-03-2021, 15:03
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
C - Can process data faster under the same network bandwidth as compared to HPC.
Q 5 - Which of the following is true for disk drives over a period of time?
B - Data Seek time is improving more slowly than data transfer rate.
C - Data Seek time and data transfer rate are both increasing proportionately.
D - Only the storage capacity is increasing without increase in data transfer rate.
A - Solr
B - Tez
C - Spark
D - Hive
C - Occupies only the size it needs and not the full block.
D - Can span over multiple blocks.
Q 10 - HDFS block size is larger as compared to the size of the disk blocks so that
D - A single file larger than the disk size can be stored across many disks in the cluster.
Q 11 - In a Hadoop cluster, what is true for a HDFS block that is no longer available
due to disk corruption or machine failure?
C - The namenode allows new client request to keep trying to read it.
D - The Mapreduce job process runs ignoring the block and the data stored in it.
Q 12 - Which utility is used for checking the health of a HDFS file system?
A - fchk
B - fsck
C - fsch
D - fcks
Q 13 - Which command lists the blocks that make up each file in the filesystem.
D - None
Q 15 - In the local disk of the namenode the files which are stored persistently are −
D - None of these
A - Take backup of filesystem metadata to a local disk and a remote NFS mount.
Q 19 - For the frequently accessed HDFS files the blocks are cached in
C - Both A&B
D - In the memory of the client application which requested the access to these files.
C - Failure of one namenode causes loss of some metadata availability from the entire
filesystem.
B - To reduce the cycle time required to bring back a new primary namenode after existing
primary fails.
A - When a client request comes, one of them chosen at random serves the request.
B - One of them is active while the other one remains powered off.
B - Preventing the start of a failover in the event of network failure with the active namenode.
C - Preventing the power down to the previously active namenode.
D - STONITH
Q 28 - The property used to set the default filesystem for Hadoop in core-site.xml is-
A - filesystem.default
B - fs.default
C - fs.defaultFS
D - hdfs.default
A-1
B-2
C-3
D-4
A-2
B-1
C-0
D-3
B - Zero
C-3
B - Renaming
C - Moving
D - Executing.
ANSWER SHEET
1 C
2 A
3 D
4 B
5 B
6 C
7 C
8 B
9 C
10 D
11 B
12 B
13 A
14 B
15 A
16 C
17 A
18 D
19 A
20 C
21 C
22 B
23 B
24 D
25 B
26 D
27 C
28 B
29 C
30 B
31 D
32 D
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
D - HDFS ftp
C - You can edit a existing record in HDFS file which is already mounted using NFS.
D - gets both the data and block location from the namenode
Q 4 - Which scenario demands highest bandwidth for data transfer between nodes in
Hadoop?
Q 5 - The current block location of HDFS where data is being written to,
A - Optimal Scheduler
B - FIFO scheduler
C - Capacity scheduler
D - Fair scheduler
D - Fully-Distributed mode
A - C++
B - Python
C - Java
D - GO
Q 10 - The hdfs command to create the copy of a file from a local system is
A - CopyFromLocal
B - copyfromlocal
C - CopyLocal
D - copyFromLocal
D - The number of replicated copies is less than as specified by the replication factor.
Q 13 - When the namenode finds that some blocks are over replicated, it
A - Replication factor
A - Replication factor
A - Replication factor
A - Replication factor
A - Jsp
B - Jps
C - Hadoop fs –test
D - None
Q 19 - The information mapping data blocks with their corresponding files is stored in
A - Data node
B - Job Tracker
C - Task Tracker
D - Namenode
Q 20 - The file in Namenode which stores the information mapping the data block
location with file name is −
A - dfsimage
B - nameimage
C - fsimage
D - image
Q 21 - The namenode knows that the datanode is active using a mechanism known as
A - heartbeats
B - datapulse
C - h-signal
D - Active-pulse
B - Commodity grade
Q 24 - Which of the below apache system deals with ingesting streaming data to
hadoop
A - Ozie
B - Kafka
C - Flume
D - Hive
A - The average size of the data blocks used as input for the program
B - The location details of where the first whole record in a block begins and the last whole
record in the block ends.
C - Splitting the input data to a MapReduce program into a size already configured in the
mapred-site.xml
D - None of these
B - The Key-value pair of all the records from the input split processed by the mapper
B - Report the edit log information of the blocks in the data node.
Q 28 - The Zookeeper
A - The namenode updates the mapping between file name and block name
B - The namenode need not update mapping between file name and block name
Q 30 - When a client contacts the namenode for accessing a file, the namenode
responds with
C - Block ID and hostname of any one of the data nodes containing that block.
D - Block ID and hostname of all the data nodes containing that block.
Q 32 - The Hadoop tool used for uniformly spreading the data across the data nodes is
named −
A - Scheduler
B - Balancer
C - Spreader
D - Reporter
ANSWER SHEET
1 B
2 A
3 C
4 C
5 D
6 A
7 B
8 B
9 C
10 D
11 B
12 D
13 C
14 B
15 A
16 C
17 D
18 B
19 D
20 C
21 A
22 A
23 B
24 C
25 B
26 B
27 B
28 A
29 B
30 D
31 D
32 B
33 A
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
B - Check if the fsimage file is in sync between namenode and secondary namenode
C - Merges the fsimage and edit log and uploads it back to active namenode.
D - Rack awareness
A - it is lost forever
C - It becomes hidden from the user but stays in the file system
A - REST API
B - RPC
C - RMI
D - IP Exchange
A - Structred
B - Semi-structured
C - Unstructured
B - 3 Physical machines
C - 4 Physical machines
D - 1 Physical machine
A - read
B - deleted
C - executed
D - Archived
Q 15 - hadoop fs –expunge
A - getmerge
B - putmerge
C - remerge
D - mergeall
A - changerep
B - rerep
C - setrep
D - xrep
Q 18 - The comman used to copy a directory form one node to another in HDFS is
A - rcp
B - dcp
C - drcp
D - distcp
A - .hrc
B - .har
C - .hrh
D - .hrar
A - unrar
B - unhar
C - cp
D - cphar
Q 23 - When you increase the number of files stored in HDFS, The memory required by
namenode
A - Increases
B - Decreases
C - Remains unchanged
Q 24 - If we increase the size of files stored in HDFS without increasing the number of
files, then the memory required by namenode
A - Decreases
B - Increases
C - Remains unchanged
Q 27 - You can reserve the amount of disk usage in a data node by configuring the
dfs.datanode.du.reserved in which of the following file
A - Hdfs-site.xml
B - Hdfs-defaukt.xml
C - Core-site.xml
D - Mapred-site.xml
Q 28 - The namenode loses its only copy of fsimage file. We can recover this from
A - Datanodes
B - Secondary namenode
C - Checkpoint node
D - Never
Q 29 - In a HDFS system with block size 64MB we store a file which is less than 64MB.
Which of the following is true?
B - Not limited
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
ANSWER SHEET
1 C
2 A
3 B
4 B
5 B
6 A
7 B
8 D
9 B
10 A
11 A
12 C
13 D
14 C
15 D
16 A
17 C
18 D
19 B
20 C
21 D
22 D
23 A
24 A
25 C
26 B
27 A
28 C
29 C
30 A
31 C
32 A
33 B
Loading [MathJax]/jax/output/HTML-CSS/jax.js
HADOOP MOCK TEST
http://www.tutorialspoint.com Copyright © tutorialspoint.com
This section presents you various set of Mock Tests related to Hadoop Framework. You can
download these sample mock tests at your local machine and solve offline at your convenience.
Every mock test is supplied with a mock test key to let you verify the final score and grade yourself.
A - JObtracker to Tasktracker
C - Jobtracker to namenode
D - Tasktracker to namenode
A - Namenode
B - Datanode
C - Secondary namenode
D - Secondary datanode
A - Balanced scheduler
B - Fair scheduler
C - Capacity scheduler
D - FiFO schesduler.
A - The default input format is xml. Developer can specify other input formats as appropriate if
xml is not the correct input.
B - There is no default input format. The input format always should be specified.
C - The default input format is a sequence file format. The data needs to be preprocessed before
using the default input format.
D - The default input format is TextInputFormat with byte offset as a key and entire line as a
value.
A - Velocity
B - Veracity
C - volume
D - variety
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
A - HBase
B - Avro
C - Sqoop
D - Zookeeper
Q 10 - Which of the following technologies is a document store database?
A - HBase
B - Hive
C - Cassandra
D - CouchDB
A - It is a distributed framework.
A - Name node
B - Data node
C - Master node
D - None of these
A - Name node
B - Data node
C - slave node
D - None of these
Q 14 - What is AVRO?
A - Yes, Avro was specifically designed for data processing via Map-Reduce.
D - Avro specifies metadata that allows easier data access. This data cannot be used as part of
map-reduce execution, rather input specification only.
A - The distributed cache is special component on name node that will cache frequently used
data for faster client response. It is used during reduce step.
B - The distributed cache is special component on data node that will cache frequently used data
for faster client response. It is used during map step.
D - The distributed cache is a component that allows developers to deploy jars for Map-Reduce
processing.
Q 17 - What is writable?
A - Writable is a java interface that needs to be implemented for streaming data to remote
servers.
Q 18 - What is HBASE?
B - Hbase is a part of the Apache Hadoop project that provides interface for scanning large
amount of data using Hadoop infrastructure.
D - HBase is a part of the Apache Hadoop project that provides a SQL like interface for data
processing.
B - Hadoop was specifically designed to process large amount of data by taking advantage of
MPP hardware.
C - Hadoop ships the code to the data instead of sending the data to the code.
D - Hadoop uses sophisticated caching techniques on name node to speed processing of data.
Q 20 - When using HDFS, what occurs when a file is deleted from the command line?
B - It is placed into a trash directory common to all users for that cluster.
C - It is permanently deleted and the file attributes are recorded in a log file.
D - It is moved into the trash directory of the user who deleted it if trash is enabled.
Q 21 - When archiving Hadoop files, which of the following statements are true?
Choosetwoanswers
3. MapReduce processes the original files names even after files are archived.
4. Archived files must be UN archived for HDFS and MapReduce to access the
original, small files.
5. Archive is intended for files that need to be saved but no longer accessed by
HDFS.
A-1&3
B-2&3
C-2&4
D-3&4
Q 22 - When writing data to HDFS what is true if the replication factor is three?
Choose2answers
2. The Data is stored on each DataNode with a separate file which contains a
checksum value.
4. The Client is returned with a success upon the successful writing of the first
block and checksum check.
A-1&3
B-2&3
C-3&4
D-1&4
Q 23 - Which of the following are among the duties of the Data Nodes in HDFS?
A - Maintain the file system tree and metadata for all files and directories.
Q 24 - Which of the following components retrieves the input splits directly from
HDFS to determine the number of map tasks?
A - The NameNode.
B - The TaskTrackers.
C - The JobClient.
D - The JobTracker.
A-1&4
B-2&3
C-3&4
D-2&4
Q 27 - Which one of the following statements is false regarding the Distributed Cache?
A - The Hadoop framework will ensure that any files in the Distributed Cache are distributed to all
map and reduce tasks.
B - The files in the cache can be text files, or they can be archive files like zip and JAR files.
D - The Hadoop framework will copy the files in the Distributed Cache on to the slave node
before any tasks for the job are executed on that node.
A - Region Server.
B - Nagios.
C - ZooKeeper.
D - Master Server.
A - HDFS.
B - Task Tracker.
C - Job Tracker.
D - Name Node.
E - Data Node.
Q 31 - Keys from the output of shuffle and sort implement which of the following
interface?
A - Writable.
B - WritableComparable.
C - Configurable.
D - ComparableWritable.
E - Comparable.
B - Output of the mapper and output of the combiner has to be same key value pair and they can
be heterogeneous
C - Output of the mapper and output of the combiner has to be same key value pair. Only if the
values satisfy associative and commutative property it can be done.
ANSWER SHEET
1 A
2 B
3 A
4 A
5 D
6 B
7 A
8 B
9 C
10 D
11 D
12 B
13 A
14 A
15 A
16 B
17 C
18 B
19 C
20 C
21 B
22 C
23 D
24 D
25 A
26 B
27 C
28 B
29 C
30 D
31 B
32 C
Loading [MathJax]/jax/output/HTML-CSS/jax.js
Seat No -
Total number of questions : 60
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a
MCQ No - 2
What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
Answer
D
MCQ No - 3
What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
Answer
D
MCQ No - 4
According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
Answer
A
MCQ No - 5
What are the four V’s of Big Data?
(A) Volume
(B) Velocity
OptimusPrime Page 1
(C) Variety
(D) All the above
Answer
D
Answer
B
MCQ No - 7
___________ is general-purpose computing model and runtime system for distributed data
analytics.
(A) Mapreduce
(B) Drill
(C) Oozie
(D) None of the above
Answer
A
MCQ No - 8
The examination of large amounts of data to see what patterns or other useful information
can be found is known as
(A) Data examination
(B) Information analysis
(C) Big data analytics
(D) Data analysis
Answer
C
MCQ No - 9
Big data analysis does the following except
(A) Collects data
(B) Spreads data
(C) Organizes data
(D) Analyzes data
Answer
B
OptimusPrime Page 2
MCQ No - 10
What makes Big Data analysis difficult to optimize?
(A) Big Data is not difficult to optimize
(B) Both data and cost effective ways to mine data to make business sense out of it
(C) The technology to mine data
(D) All of the above
Answer
B
The new source of big data that will trigger a Big Data revolution in the years to come is
(A) Business transactions
(B) Social media
(C) Transactional data and sensor data
(D) RDBMS
Answer
C
MCQ No - 12
The unit of data that flows through a Flume agent is
(A) Log
(B) Row
(C) Event
(D) Record
Answer
C
MCQ No - 13
Listed below are the three steps that are followed to deploy a Big Data Solution except
(A) Data Ingestion
(B) Data Processing
(C) Data dissemination
(D) Data Storage
Answer
C
MCQ No - 14
Check below the best answer to "which industries employ the use of so-called "Big Data"
in their day to day operations?
(A) Weather forecasting
(B) Marketing
(C) Healthcare
(D) All of the above
OptimusPrime Page 3
Answer
D
MCQ No - 15
There are almost as many bits of information in the digital universe as there are stars in
the actual universe?
(A) True
(B) False
Answer
A
MCQ No - 16
The word 'Big data' was coined by
(A) Roger Mougalas
(B) John Philips
(C) Simon Woods
(D) Martin Green
Answer
A
MCQ No - 17
The word 'Big Data' was coined in the year
(A) 2000
(B) 1970
(C) 1998
(D) 2005
Answer
C
MCQ No - 18
Concerning the Forms of Big Data, which one of these is odd?
(A) Structured
(B) Unstructured
(C) Processed
(D) Semi-Structured
Answer
C
MCQ No - 19
Big Data applications benefit the media and entertainment industry by
(A) Predicting what the audience wants
OptimusPrime Page 4
(B) Ad targeting
(C) Scheduling optimization
(D) All of the above
Answer
D
MCQ No - 20
The feature of big data that refers to the quality of the stored data is ______
(A) Variety
(B) Volume
(C) Variability
(D) Veracity
Answer
D
Question 1
a) The distance between categories is equal across the range of interval/ratio data.
Question 2
Question 3
Question 4
Correct answer:
b) It summarizes the frequencies of two variables so that they can be compared.
Question 5
If there were a perfect positive correlation between two interval/ratio variables, the
Pearson's r test would give a correlation coefficient of:
Correct answer:
OptimusPrime Page 5
b) +1
Question 6
What is the name of the test that is used to assess the relationship between two ordinal variables?
Correct answer:
a) Spearman's rho
Question 7
Correct answer:
d) All of the above.
Question 8
Correct answer:
c) A relationship that appears to be true because each variable is related to a third one.
Question 9
Correct answer:
d) generalising their findings from the sample to the population.
Question 10
---------------------------------------------------------------------------------------------------------------------
SET 2 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 6
2. Which of the following is not a major data analysis approaches?
A. Data Mining
B. Predictive Intelligence
C. Business Intelligence
D. Text Analytics
View Answer
Ans : B
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors
B. floating descriptors
C. numerical descriptors
D. decimal descriptors
View Answer
Ans : C
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to
identify new opportunities.
OptimusPrime Page 7
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
View Answer
Ans : A
8. The branch of statistics which deals with development of particular statistical methods is
classified as
A. industry statistics
B. economic statistics
C. applied statistics
D. applied statistics
View Answer
Ans : D
---------------------------------------------------------------------------------------------------------------------
SET 3 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 8
Two group means are equal.
What must you include when applying Wilcoxon Rank sum test?
Answer
“Critical Value”, “Rank sum”
OptimusPrime Page 9
What are the two types of variance which can occur in your data?
Answer
Between and within groups
Clustering extracts the known patterns from the existing data, True or False?
Answer
False
OptimusPrime Page 10
bottom-up
K Means is _____
Answer
Centroid based method
WSS metric is the sum of the squares of the distances between each data point and the_____.
Answer
closest centroid
Once the clusters are identified, it is often useful to label them in a descriptive way.True or
False?
Answer
True
OptimusPrime Page 11
The process of identifying the appropriate value of k is referred to as finding the_____.
Answer
elbow
A _____ is a decision support tool that uses a tree-like graph or model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility
Answer
Decision tree
OptimusPrime Page 12
What is true about Data Visualization?
Answer
All of the above
Which one of the following is most basic and commonly used techniques?
Answer
“Line charts”
When a client contacts the namenode for accessing a file, the namenode responds with____
Answer
OptimusPrime Page 13
Block Id and hostname of all the data nodes containing that block.
---------------------------------------------------------------------------------------------------------------------
SET 4 MCQs
---------------------------------------------------------------------------------------------------------------------
2)
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a Explanation: Big data management and data mining
3)
What are the main components of Big Data?
a)MapReduce
b)HDFS
c)YARN
d)All of these
Ans: d Explanation: All of these
4)
The sources of Big Data are
a)Stock Exchange
b)Transport Data
OptimusPrime Page 14
c) Banking Data
d) All of the Above
Ans: d Explanation:
5)
Big Data Characteristics are:
a) Structured data
b) Semi-structured data
c) Quasi-structured data
d) All of the above
Ans: d Explanation:
6)
Bl tends to provide reports, dashboards, and queries on business questions for the current period
or in the past.
a) True
b) False
Ans: a Explanation:
7)
Big data can come in multiple forms, including structured and non-structured data
a) True
b) False
Ans: a Explanation:
8)
BI problems tend to require highly structured data organized
a) Rows
b) Columns
c) Accurate Reporting
d) All of the Above
Ans: d Explanation:
9)
EDW achieves the objective of reporting and sometimes the creation of dashboards, perform
analysis on unstructured data
a) High-value data is hard to reach and leverage
b) Data moves in batches from EDW to local analytical tools
c) Data Science projects will remain isolated
d) All of the Above
Ans: d Explanation:
10)
Drivers of Big Data
a) Medical information
b) Photos and video footage uploaded to the World Wide Web
OptimusPrime Page 15
c) data extracts
d) Both a and b
Ans: d Explanation:
11)
According to analysts, for what can traditional IT systems provide a foundation when they’re
integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Ans: a Explanation:
12)
Select from option which is not the phase of data analytics
a) model planning
b) testing
c) discovery
d) operationalize
Ans: b Explanation:
13)
Which phase of data analytics require more time to complete
a) Data preparation
b) model building
c) communicate results
d) Discovery
Ans: a Explanation:
14)
What is analytic sandbox?
a) Tool
b) Separate repository
c) data cleaning
d) Data conditioning
Ans: b Explanation:
15)
The person which provides analytic techniques and modeling is called as.
a) Data Engineer
b) Data scientist
c) Business user
d) Project manager
Ans: b Explanation:
16)
OptimusPrime Page 16
What is task of Project manager?
a) analytic modelling
b) Provide requirement
c) ensure meeting objectives
d) creates DB environment
Ans: c
17)
Identifying Key Stakeholders this task is performed in which phase?
a) Data preparation
b) model building
c) Discovery
d) communicate results
Ans: c Explanation:
18)
ETL process is performed in which phase
a) Discovery
b) communicate results
c) model planning
d) Data preparation
Ans: d Explanation:
19)
How much data Data science teams prefer for analysis?
a) too little
b) average
c) more
d) more than average
Ans: c Explanation:
20)
select from option tool which is not used in model planning phase
a) Data wrangler
b) R
c) SQL Analysis service
d) SAS/ACESS
Ans: c Explanation:
21)
if reports and dashboards will be impacted and need to change this task is performed by.
a) Project sponsor
b) BI Analyst
c) Data Engineer
d) Project manager
Ans: b Explanation:
OptimusPrime Page 17
22)
What is need of data analytic lifecycle.
a) Data cleaning
b) To solve Big data problems
c) Data conditioning
d) Data Exploration
Ans: b Explanation:
23)
How many phases are there in data analytic lifecycle?
a) 4
b) 5
c) 6
d) 7
Ans: c
24)
The person with technical skills is called as?
a) Business user
b) Data Engineer
c) Data scientist
d) Project sponsor
Ans: b
25)
What is outcome of Model building phase?
a) Analytic results
b) Quality data
c) Data
d) Potential resources
Ans: a
2)
If the assumed hypothesis is tested for rejection considering it to be true is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: a Explanation:
OptimusPrime Page 18
3)
A statement whose validity is tested on the basis of a sample is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: b Explanation:
4)
A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
Ans: c Explanation:
5)
If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
Ans: d Explanation:
6)
The rejection probability of Null Hypothesis when it is true is called as?
a) Level of Confidence
b) Level of Significance
c) Level of Margin
d) Level of Rejection Ans: b Explanation:
7)
The point where the Null Hypothesis gets rejected is called as?
a) Significant Value
b) Rejection Value
c) Acceptance Value
d) Critical Value
Ans: d Explanation:
8)
If the Critical region is evenly distributed then the test is referred as?
a) Two tailed
b) One tailed
c) Three tailed
d) Zero tailed
Ans: a Explanation:
OptimusPrime Page 19
9)
The type of test is defined by which of the following?
a) Null Hypothesis
b) Simple Hypothesis
c) Alternative Hypothesis
d) Composite Hypothesis
Ans: c Explanation:
10)
Which of the following is defined as the rule or formula to test a Null Hypothesis?
a) Test statistic
b) Population statistic
c) Variance statistic
d) Null statistic
Ans: a Explanation:
11)
Type 1 error occurs when?
a) We reject H0 if it is True
b) We reject H0 if it is False
c) We accept H0 if it is True
d) We accept H0 if it is False Ans: a Explanation:
12) The probability of Type 1 error is referred as?
a) 1-α
b) β
c) α
d) 1-β
Ans: c Explanation:
13)
Alternative Hypothesis is also called as?
a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis
Ans: b Explanation:
14)
Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans: d Explanation:
15)
OptimusPrime Page 20
Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Ans: c Explanation:
16)
Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Ans: a Explanation:
17)
Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Ans: a Explanation:
18)
Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Ans: b Explanation:
19)
K-means is not deterministic and it also consists of number of iterations.
a) True
b) False
Ans: a
20)
Depending on acceptance and rejection of null hypothesis there are 2 types of error produced
a) Type 1
b) Type 2
c) None of these
d) All of these
Ans: d
21)
The power of a test can be defined as a possibility of …
a) Rejecting null hypothesis
OptimusPrime Page 21
b) Accepting null hypothesis
c) Increasing null hypothesis
d) Decreasing null hypothesis
Ans: a
22)
For a fixed significance level, a greater sample size is mandatory to discover a
a) Minor difference in mean
b) Major difference in mean
c) Average difference in mean
d) None of the above
Ans: a
23)
ANNOVA tests if any of the population means vary from other population means
a) True
b) False
Ans: a
24)
Clustering is defined as group of same kind of objects which are gathered by use of
a) Unsupervised method
b) Supervised method
c) Semi supervised method
d) None of these
Ans: a
25)
Following are the applications of Kmeans
a) Image Processing
b) Medical
c) Customer Segmentation
d) All of the above
Ans: d
---------------------------------------------------------------------------------------------------------------------
SET 5 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 22
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data.
OptimusPrime Page 23
View Answer
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies.
7. The overall percentage of the world’s total data has been created just within the past two years
is ?
A. 80%
B. 85%
C. 90%
D. 95%
View Answer
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the
past
two years is 90%.
8) Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Ans: Data Cleansing
10. Communicative and collaborative is one among the key skill sets and behavioral
characteristics of a
data scientist [True / False]?
a. True
b. False
Answer : a
11. ---------- are the sources of Bigdata [select all that apply]
I. Book
II. Facebook
III. Genome sequence
IV. Video Surveillance
Ans:
12. BI analyses the past data and make future predictions True/False ?
a. True
b. False
Answer : b
OptimusPrime Page 24
12. In which phase of data analytics ETLT is performed?
Ans: Phase 2 Data preparation is done in this phase. An analytical sandbox is used in this to
perform
analytics for the entire duration of the project. While you explore, preprocess and condition data,
modeling follows suit. To get the data into the sandbox, you will perform ETLT (extract,
transform, load
and transform).
A. Discovery
B. Model Planning
C. Model Building
D. Data Preparation
14. In which phase would the team expect to invest most of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
15. In which phase would the team expect to invest least time of the project time?
A. Data Preparation
B. Model Planning
C. Model Building
D. Discovery
16. from following tools which tool is used for Model building?
a. Hadoop b. Octave c. OpenRefine d. All of Above
Ans B
17. from following tools which tool is used for Data preparation
a. Alpine Miner b. Excel c. Matlab d.Weka
Ans . A
18. To determine if the project was completed on time and within budget, is the key role of
_____
OptimusPrime Page 25
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
20. In data Analytics life cycle we can move back and refine the work done. True or False
A. True
B. False
22. ________ provides subject matter expertise for analytical techniques, data modeling and
applying
valid analytical techniques to give business problems.
A. Project Sponsor
B. Project Manager
C. Data Engineer
D. Data Scientist
---------------------------------------------------------------------------------------------------------------------
SET 5 MCQs
---------------------------------------------------------------------------------------------------------------------
2. Any hypothesis which is tested for the purpose of rejection under the assumption that it is true
is
called:
(a) Null hypothesis
(b) Alternative hypothesis
(c) Statistical hypothesis
OptimusPrime Page 26
(d) Composite hypothesis
Answer : a
3. A statement that is accepted if the sample data provide sufficient evidence that the null
hypothesis is
false is called:
(a) Simple hypothesis
(b) Composite hypothesis
(c) Statistical hypothesis
(d) Alternative hypothesis
Answer : d
6. If the critical region is located equally in both sides of the sampling distribution of test-
statistic, the
test is called:
(a) One tailed
(b) Two tailed
(c) Right tailed
(d) Left tailed
Answer : b
OptimusPrime Page 27
(d) Difficult to tell
Answer : b
10. A formula that provides a basis for testing a null hypothesis is called:
(a) Test-statistic
(b) Population statistic
(c) Both of these
(d) None of the above
Answer : a
14. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated t
should be
obtained for:
(a) 10 degrees of freedom
(b) 21 degrees of freedom
(c) 22 degrees of freedom
(d) 20 degrees of freedom
Answer : d
OptimusPrime Page 28
15. The purpose of statistical inference is:
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the
conclusions (c) To
draw conclusions about populations from sample data
(d) To draw conclusions about the known value of population parameter
Answer : c
16. The histogram to the right represents the hospital length of stay (in days) for patients at a
nearby
medical facility. How many patients are included in the histogram?
a. 5
b. 21
c. 17
d. 9
Answer : b
17. Using the histogram to the right that represents the hospital lengths of stay (in days) for
patients at a
nearby medical facility, determine the relationship between the mean and the median.
a. Mean = Median
b. Mean ≈ Median
c. Mean < Median
d. Mean > Median
Answer : d
18. The statement “If there is sufficient evidence to reject a null hypothesis at the 10%
significance level, then there is sufficient evidence to reject it at the 5% significance level” :
Please select the best answer of those provided below.
a. Always True
b. Never True
c. Sometimes True; the p-value for the statistical test needs to be provided for a conclusion
d. Not Enough Information; this would depend on the type of statistical test used
Answer : c
OptimusPrime Page 29
d) all of the mentioned
Ans: defined distance metric, number of clusters, initial guess as to cluster centroids
25) Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1),
(-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
a) Yes
b) No
Ans: Yes
OptimusPrime Page 30
27) The most commonly used measure of similarity is the _____ or its square.
a)euclidean distance
b)city-block distance
c)Chebychev’s distance
d)Manhattan distance
Ans: euclidean distance
30) Clustering is a-
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. None
Ans: Unsupervised learning
31) Which of the following clustering algorithms suffers from the problem of convergence at
local
optima?
A. K- Means clustering
B. Hierarchical clustering
C. Diverse clustering
D. All of the above
Ans: K- Means clustering, Hierarchical clustering, Diverse clustering
33) Which of the following is a bad characteristic of a dataset for clustering analysis-
A. Data points with outliers
B. Data points with different densities
C. Data points with non-convex shapes
D. All of the above
Ans: Data points with outliers, Data points with different densities, Data points with non-convex
shapes
OptimusPrime Page 31
B. Unlabeled data
C. Numerical data
D. Categorical data
Ans: Labeled Data
OptimusPrime Page 32
42. Type 2 is also called as
a. False Positive
b. False negative
c. True Positive
d. True negative
Q.25 What are the two types of variance which can occur in your data?
a. Independent and Dependent
b. Between and within groups
c. Personal and interpersonal
OptimusPrime Page 33
d. Anova and Anoca
Q.26 If between group mean sum of square variability increases value of F statistics_____
a. Increases
b. Decreases
c. Neutral
d. None of these
------------------------------------------------------------------------------------------ ---------------------------
SET 6 MCQs
---------------------------------------------------------------------------------------------------------------------
3.An itemset whose support is greater than or equal to a minimum support threshold is ______
OptimusPrime Page 34
(A)Itemset
(B)Frequent Itemset
(C)Infrequent items
(D)Threshold values
Ans:B
8.Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans:A
9. A local retailer has a database that stores 10,000 transactions of lastsummer. After
analyzing the data,a data science team has identified thefollowing statistics:• {battery}
appears in 6,000 transactions.• {sunscreen}appears in 5,000 transactions.• {sandals}
appears in 4,000 transactions.•{bowls} appears in 2,000 transactions.• {battery, sunscreen}
appears in1,500 transactions.• {battery, sandals} appears in 1,000 transactions.•{battery,
bowls} appears in 250 transactions.• {battery, sunscreen, sandals}appears in 600
transactions. Q) What are the confidence values of{battery}->{ sunscreen} and {battery,
sunscreen}->{ sandals} ?
OptimusPrime Page 35
a) 0.3 and 0.4
b) 0.25 and 0.4
c) 0.25 and 0.15
d) 0.6 and 0.4
Ans: b
11. If Linear regression model perfectly first i.e., train error is zero, then
_____________________
a) Test error is also always zero
b) Test error is non zero
c) Couldn’t comment on Test error
d) Test error is equal to Train error
Ans:C
12.Which of the following metrics can be used for evaluating regression models?
i) R Squared
ii) Adjusted R Squared
iii) F Statistics
iv) RMSE / MSE / MAE
a) ii and iv
b) i and ii
c) ii, iii and iv
d) i, ii, iii and iv
Ans:d
13.How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
a) 1
b) 2
c) 3
d) 4
Ans:b
14.In a simple linear regression model (One independent variable), If we change the input
variable by 1 unit. How much output variable will change?
a) by 1
b) no change
c) by intercept
d) by its slope
OptimusPrime Page 36
Ans:d
17.In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
__________
a) (X-intercept, Slope)
b) (Slope, X-Intercept)
c) (Y-Intercept, Slope)
d) (slope, Y-Intercept)
Ans:c
19.The square of the correlation coefficient r 2 will always be positive and is called the
________
a) Regression
b) Coefficient of determination
c) KNN
d) Algorithm
Ans:b
20.Predicting y for a value of x that’s outside the range of values we actually saw for x in the
original data is called ___________
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:b
OptimusPrime Page 37
21.What is predicting y for a value of x that is within the interval of points that we saw in the
original data called?
a) Regression
b) Extrapolation
c) Intrapolation
d) Polation
Ans:c
22. ________ is a simple approach to supervised learning. It assumes that the dependence of Y
on X1, X2, . . . Xp is linear.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
23.Although it may seem overly simplistic, _______ is extremely useful both conceptually and
practically.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
24. __________ refers to a group of techniques for fitting and studying the straight- line
relationship between two variables.
a) Linear regression
b) Logistic regression
c) Gradient Descent
d) Greedy algorithms
Ans:a
---------------------------------------------------------------------------------------------------------------------
SET 7 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 38
B. cleaning data
C. transforming data
D. All of the above
View Answer
Ans : D
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors
View Answer
Ans : C
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to
identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
View Answer
Ans : A
8. The branch of statistics which deals with development of particular statistical methods is
classified as
A. industry statistics B. economic statistics C. applied statistics D. applied statistics
View Answer
Ans : D
OptimusPrime Page 39
B. estimating numerical characteristics of the data
C. modeling relationships within the data
D. describing associations within the data
View Answer
Ans : C
1. What is a hypothesis?
a. A statement that the researcher wants to test through the data
collected in a study.
b. A research question the results will answer.
c. A theory that underpins the study.
d. A statistical method for calculating the extent to which the results
could have happened by chance.
Answer: a
OptimusPrime Page 40
d. Coding
Answer: c
OptimusPrime Page 41
12. A set of data organised in a participants(rows)-by-
variables( columns) format is known as a “data set.”
a. True
b. False
Answer: a
13. A graph that uses vertical bars to represent data is called a ___
a. Line graph
b. Bar graph
c. Scatterplot
d. Vertical graph
Answer: b
14. ___________ are used when you want to visually examine the
relationship between two quantitative variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
Answer: d
OptimusPrime Page 42
b. Statistical Hypothesis
c. Simple Hypothesis
d. Composite Hypothesis
Answer: a
---------------------------------------------------------------------------------------------------------------------
SET 8 MCQs
---------------------------------------------------------------------------------------------------------------------
OptimusPrime Page 43
D - FiFO schesduler.
OptimusPrime Page 44
Q 12 - Which one of the following stores data?
A - Name node
B - Data node
C - Master node
D - None of these
Q 14 - What is AVRO?
A - Avro is a java serialization library.
B - Avro is a java compression library.
C - Avro is a java library that create split table files.
D - None of these answers are correct.
Q 17 - What is writable?
A - Writable is a java interface that needs to be implemented for streaming data to remote
servers.
B - Writable is a java interface that needs to be implemented for HDFS writes.
C - Writable is a java interface that needs to be implemented for MapReduce processing.
D - None of these answers are correct.
Q 18 - What is HBASE?
A - Hbase is separate set of the Java API for Hadoop cluster.
B - Hbase is a part of the Apache Hadoop project that provides interface for scanning large
amount of data using Hadoop infrastructure.
OptimusPrime Page 45
D - HBase is a part of the Apache Hadoop project that provides a SQL like interface for data
processing.
Q 20 - When using HDFS, what occurs when a file is deleted from the command line?
A - It is permanently deleted if trash is enabled.
B - It is placed into a trash directory common to all users for that cluster.
C - It is permanently deleted and the file attributes are recorded in a log file.
D - It is moved into the trash directory of the user who deleted it if trash is enabled.
Q 21 - When archiving Hadoop files, which of the following statements are true?
Choosetwoanswers
1. Archived files will display with the extension .arc.
2. Many small files will become fewer large files.
3. MapReduce processes the original files names even after files are archived.
4. Archived files must be UN archived for HDFS and MapReduce to access the
original, small files.
5. Archive is intended for files that need to be saved but no longer accessed by
HDFS.
A-1&3
B-2&3
C-2&4
D-3&4
Q 22 - When writing data to HDFS what is true if the replication factor is three?
Choose2answers
1. Data is written to DataNodes on three separate racks ifRackAware.
2. The Data is stored on each DataNode with a separate file which contains a
checksum value.
3. Data is written to blocks on three different DataNodes.
4. The Client is returned with a success upon the successful writing of the first
block and checksum check.
A-1&3
B-2&3
C-3&4
D-1&4
Q 23 - Which of the following are among the duties of the Data Nodes in HDFS?
A - Maintain the file system tree and metadata for all files and directories.
B - None of the options is correct.
OptimusPrime Page 46
C - Control the execution of an individual map task or a reduce task.
D - Store and retrieve blocks when told to by clients or the NameNode.
E - Manage the file system namespace.
Q 24 - Which of the following components retrieves the input splits directly from
HDFS to determine the number of map tasks?
A - The NameNode.
B - The TaskTrackers.
C - The JobClient.
D - The JobTracker.
E - None of the options is correct.
Q 27 - Which one of the following statements is false regarding the Distributed Cache?
A - The Hadoop framework will ensure that any files in the Distributed Cache are distributed to
all
map and reduce tasks.
B - The files in the cache can be text files, or they can be archive files like zip and JAR files.
C - Disk I/O is avoided because data in the cache is stored in memory.
D - The Hadoop framework will copy the files in the Distributed Cache on to the slave node
before any tasks for the job are executed on that node.
OptimusPrime Page 47
A - Compare the keys by byte.
B - Performance can be improved in sort and suffle phase by using RawComparator.
C - Intermediary keys are deserialized to perform a comparison.
Q 31 - Keys from the output of shuffle and sort implement which of the following
interface?
A - Writable.
B - WritableComparable.
C - Configurable.
D - ComparableWritable.
E - Comparable.
Answer Key :
1A
2B
3A
4A
5D
6B
7A
8B
9C
10 D
11 D
12 B
13 A
14 A
15 A
16 B
17 C
OptimusPrime Page 48
18 B
19 C
20 C
21 B
22 C
23 D
24 D
25 A
26 B
27 C
28 B
29 C
30 D
31 B
32 C
---------------------------------------------------------------------------------------------------------------------
----------------
Q 4 - Which scenario demands highest bandwidth for data transfer between nodes in
Hadoop?
A - Different nodes on the same rack
B - Nodes on different racks in the same data center.
C - Nodes in different data centers
D - Data on the same node.
Q 5 - The current block location of HDFS where data is being written to,
A - is visible to the client requesting for it.
OptimusPrime Page 49
B - Block locations are never visible to client requests.
C - May or may not be visible to the reader.
D - becomes visible only after the buffered data is committed.
Q 10 - The hdfs command to create the copy of a file from a local system is
A - CopyFromLocal
B - copyfromlocal
C - CopyLocal
D – copyFromLocal
OptimusPrime Page 50
Q 13 - When the namenode finds that some blocks are over replicated, it
A - Stops the replication job in the entire hdfs file system.
B - It slows down the replication process for those blocks
C - It deletes the extra blocks.
D - It leaves the extra blocks as it is.
Q 19 - The information mapping data blocks with their corresponding files is stored in
A - Data node
B - Job Tracker
C - Task Tracker
D – Namenode
Q 20 - The file in Namenode which stores the information mapping the data block
location with file name is −
A - dfsimage
B - nameimage
OptimusPrime Page 51
C - fsimage
D – image
Q 21 - The namenode knows that the datanode is active using a mechanism known as
A - heartbeats
B - datapulse
C - h-signal
D - Active-pulse
Q 24 - Which of the below apache system deals with ingesting streaming data to
hadoop
A - Ozie
B - Kafka
C - Flume
D – Hive
OptimusPrime Page 52
D - Report the activity of various components handled by resource manager
Q 28 - The Zookeeper
A - Detects the failure of the namenode and elects a new namenode.
B - Detects the failure of datanodes and elects a new datanode.
C - Prevents the hardware from overheating by shutting them down.
D - Maintains a list of all the components IP address of the Hadoop cluster.
Q 30 - When a client contacts the namenode for accessing a file, the namenode
responds with
A - Size of the file requested.
B - Block ID of the file requested.
C - Block ID and hostname of any one of the data nodes containing that block.
D - Block ID and hostname of all the data nodes containing that block.
Q 32 - The Hadoop tool used for uniformly spreading the data across the data nodes is
named −
A - Scheduler
B - Balancer
C - Spreader
D – Reporter
Answer Key :
1B
2A
3C
4C
5D
OptimusPrime Page 53
6A
7B
8B
9C
10 D
11 B
12 D
13 C
14 B
15 A
16 C
17 D
18 B
19 D
20 C
21 A
22 A
23 B
24 C
25 B
26 B
27 B
28 A
29 B
30 D
31 D
32 B
33 A
---------------------------------------------------------------------------------------------------------------------
----------------
OptimusPrime Page 54
B - It is aware of the mapping between the node and the rack
C - It is aware of the number of nodes in each of the rack
D - It is aware which data nodes are unavailable in the cluster.
OptimusPrime Page 55
Q 11 - Running Start-dfs.sh results in
A - Starting namenode and datanode
B - Starting namenode only
C - Starting datanode only
D - Starting namenode and resource manager
Q 15 - hadoop fs –expunge
A - Gives the list of datanodes
B - Used to delete a file
C - Used to exchange a file between two datanodes.
D - Empties the trash.
Q 18 - The comman used to copy a directory form one node to another in HDFS is
A - rcp
B - dcp
C - drcp
OptimusPrime Page 56
D – distcp
Q 23 - When you increase the number of files stored in HDFS, The memory required by
namenode
A - Increases
B - Decreases
C - Remains unchanged
D - May increase or decrease
Q 24 - If we increase the size of files stored in HDFS without increasing the number of
files, then the memory required by namenode
A - Decreases
B - Increases
C - Remains unchanged
D - May or may not increase
OptimusPrime Page 57
Q 26 - The decommission feature in hadoop is used for
A - Decommissioning the namenode
B - Decommissioning the data nodes
C - Decommissioning the secondary namenode.
D - Decommissioning the entire Hadoop cluster.
Q 27 - You can reserve the amount of disk usage in a data node by configuring the
dfs.datanode.du.reserved in which of the following file
A - Hdfs-site.xml
B - Hdfs-defaukt.xml
C - Core-site.xml
D - Mapred-site.xml
Q 28 - The namenode loses its only copy of fsimage file. We can recover this from
A - Datanodes
B - Secondary namenode
C - Checkpoint node
D – Never
Q 29 - In a HDFS system with block size 64MB we store a file which is less than 64MB.
Which of the following is true?
A - The file will consume 64MB
B - The file will consume more than 64MB
C - The file will consume less than 64MB.
D - Can not be predicted.
OptimusPrime Page 58
B - Tasktracker to Job tracker
C - Jobtracker to namenode
D - Tasktracker to namenode
Answer Key :
1C
2A
3B
4B
5B
6A
7B
8D
9B
10 A
11 A
12 C
13 D
14 C
15 D
16 A
17 C
18 D
19 B
20 C
21 D
22 D
23 A
24 A
25 C
26 B
27 A
28 C
29 C
30 A
31 C
32 A
33 B
---------------------------------------------------------------------------------------------------------------------
------------------------------------------------
OptimusPrime Page 59
large volume of data stored in a storage area network SAN. As compared to HPC,
Hadoop
A - Can process a larger volume of data.
B - Can run on a larger number of machines than HPC cluster.
C - Can process data faster under the same network bandwidth as compared to HPC.
D - Cannot run compute intensive jobs.
Q 4 - What is the main problem faced while reading and writing data in parallel from
multiple disks?
A - Processing high volume of data faster.
B - Combining data from multiple disks.
C - The software required to do this task is extremely costly.
D - The hardware required to do this task is extremely costly.
Q 5 - Which of the following is true for disk drives over a period of time?
A - Data Seek time is improving faster than data transfer rate.
B - Data Seek time is improving more slowly than data transfer rate.
C - Data Seek time and data transfer rate are both increasing proportionately.
D - Only the storage capacity is increasing without increase in data transfer rate.
OptimusPrime Page 60
B - Only append at the end of file
C - Writing into a file only once.
D - Low latency data access.
Q 10 - HDFS block size is larger as compared to the size of the disk blocks so that
A - Only HDFS files can be stored in the disk used.
B - The seek time is maximum
C - Transfer of a large files made of multiple disk blocks is not possible.
D - A single file larger than the disk size can be stored across many disks in the cluster.
Q 11 - In a Hadoop cluster, what is true for a HDFS block that is no longer available
due to disk corruption or machine failure?
A - It is lost for ever
B - It can be replicated form its alternative locations to other live machines.
C - The namenode allows new client request to keep trying to read it.
D - The Mapreduce job process runs ignoring the block and the data stored in it.
Q 12 - Which utility is used for checking the health of a HDFS file system?
A - fchk
B - fsck
C – fsch
D – fcks
Q 13 - Which command lists the blocks that make up each file in the filesystem.
A - hdfs fsck / -files -blocks
B - hdfs fsck / -blocks -files
C - hdfs fchk / -blocks -files
D - hdfs fchk / -files –blocks
Q 15 - In the local disk of the namenode the files which are stored persistently are −
A - namespace image and edit log
B - block locations and namespace image
C - edit log and block locations
D - Namespace image, edit log and block locations.
OptimusPrime Page 61
Q 16 - When a client communicates with the HDFS file system, it needs to
communicate with
A - only the namenode
B - only the data node
C - both the namenode and datanode
D - None of these
Q 19 - For the frequently accessed HDFS files the blocks are cached in
A - the memory of the datanode
B - in the memory of the namenode
C - Both A&B
D - In the memory of the client application which requested the access to these files.
OptimusPrime Page 62
A - Faster creation of the replicas of primary namenode.
B - To reduce the cycle time required to bring back a new primary namenode after existing
primary fails.
C - Prevent data loss due to failure of primary namenode.
D - Prevent the primary namenode form becoming single point of failure.
Q 28 - The property used to set the default filesystem for Hadoop in core -site.xml is-
A - filesystem.default
B - fs.default
C - fs.defaultFS
D - hdfs.default
OptimusPrime Page 63
B-1
C-0
D–3
Answer Key :
1C
2A
3D
4B
5B
6C
7C
8B
9C
10 D
11 B
12 B
13 A
14 B
15 A
16 C
17 A
18 D
19 A
20 C
21 C
22 B
23 B
24 D
25 B
26 D
27 C
28 B
OptimusPrime Page 64
29 C
30 B
31 D
32 D
OptimusPrime Page 65
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Which of the following is not an example of Social Media?
a. Twitter
1. b. Google
c. Insta
d. Youtube
By 2025, the volume of digital data will increase to
a. TB
2. b. YB
c. ZB
d. EB
For Drawing insights for Business what are need?
a. Collecting the data
3. b. Storing the data
c. Analysing the data
d. All the above
Does Facebook uses "Big Data " to perform the concept of Flashback? Is this True or
4.
False.
a. TRUE
b. FALSE
The Process of describing the data that is huge and complex to store and process is known
as
a. Analytics
5.
b. Data mining
c. Big Data
d. Data Warehouse
Data generated from online transactions is one of the example for volume of big data. Is
6.
this true or False.
a. TRUE
b. FALSE
Velocity is the speed at which the data is processed
7. a. TRUE
b. FALSE
have a structure but cannot be stored in a database.
a. Structured
8. b. Semi-Structured
c. Unstructured
d. None of these
refers to the ability to turn your data useful for business.
a. Velocity
9. b. Variety
c. Value
d. Volume
OptimusPrime Page 66
Value tells the trustworthiness of data in terms of quality and accuracy.
10. a. TRUE
b. FALSE
GFS consists of a Master and Chunk Servers
a. Single, Single
11. b. Multiple, Single
c. Single, Multiple
d. Multiple, Multiple
Files are divided into sized Chunks.
a. Static
12. b. Dynamic
c. Fixed
d. Variable
is an open source framework for storing data and running application on
clusters of commodity hardware.
a. HDFS
13.
b. Hadoop
c. MapReduce
d. Cloud
HDFS Stores how much data in each clusters that can be scaled at any time?
a. 32
14. b. 64
c. 128
d. 256
Hadoop MapReduce allows you to perform distributed parallel processing on large
volumes of data quickly and efficiently… is this MapReduce or Hadoop… i.e statement is
15. True or False
a. TRUE
b. FALSE
Hortonworks was introduced by Cloudera and owned by Yahoo.
16. a. TRUE
b. FALSE
Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem.
17. a. TRUE
b. FALSE
Google Introduced MapReduce Programming model in 2004.
18. a. TRUE
b. FALSE
phase sorts the data & creates logical clusters.
a. Reduce, YARN
b. MAP, YARN
19.
c. REDUCE, MAP
d. MAP, REDUCE
OptimusPrime Page 67
There is only one operation between Mapping and Reducing is it True or False…
a. TRUE
20.
b. FALSE
OptimusPrime Page 68
is a programming model for writing applications that can process Big
Data in parallel on multiple nodes.
a. HDFS
28. b. MAP REDUCE
c. HADOOP
d. HIVE
is a type of local Reducer that groups similar data from the map phase
into identifiable sets.
a. MAPPER
30. b. REDUCER
c. COMBINER
d. PARTITIONER
While Installing Hadoop how many xml files are edited and list them ?
i. core-site.xml
ii. hdfs-site.xml
31.
iii. mapred.xml
iv. yarn.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
32.
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
</?xml >
33. Write the code for hdfs-site.xml ?
OptimusPrime Page 69
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
Movie Recommendation systems are an example of
1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression
a. 2 Only
1.
b. 1 and 2
c. 1 and 3
d. 2 and 3
Sentiment Analysis is an example of
1. Regression 2. Classification 3. Clustering 4 Reinforcement Learning
a. 1, 2 and 4
2.
b. 1 and 3
c. 1, 2 and 3
d. 1 and 2
Can decision trees be used for performing clustering?
3. a. True
b. False
What is the minimum no. of variables/ features required to perform clustering?
1. 0
4. 2. 1
3. 2
4. 3
For two runs of K-Mean clustering is it expected to get same clustering results?
5. 1. Yes
2. No
Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations. 4.Terminate when RSS falls
6.
below a threshold.
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Which of the following algorithm is most sensitive to outliers?
1. K-means clustering algorithm
7. 2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
After performing K-Means Clustering analysis on a dataset, you observed the following
8.
dendrogram. Which of the following conclusion can be drawn from the dendrogram?
OptimusPrime Page 70
a. There were 28 data points in clustering analysis
b. The best no. of clusters for the analyzed data points is 4
c. The proximity function used is Average-link clustering
d. The above dendrogram interpretation is not possible for K-Means clustering
analysis
In the figure below, if you draw a horizontal line on y- axis for y=2. What will be the
number of clusters formed?
9.
1. 1
2. 2
3. 3
4. 4
In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
10. 4. Data points with non-convex shapes
a. 1 and 2
b. 2 and 3
c. 2 and 4
d. 1, 2 and 4
The discrete variables and continuous variables are two types of
a. Open end classification
11. b. Time series classification
c. Qualitative classification
d. Quantitative classification
OptimusPrime Page 71
Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a
12. hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
4. None of these
Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a
13.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given by a
14.
certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible
15.
solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. None of these
is good at handle missing data and support both the kind of
attributes ( i.e Categorial and Continuous attributes )
a. ID3.
16.
b. C4.5.
c. CART.
d. Naïve Bayes.
Decision trees use , in that they always choose the option
that seems the best available at that moment.
a. Greedy Algorithms.
17.
b. Divide and Conquer.
c. Backtracking.
d. Shortest Path Method.
Decision trees cannot handle categorical attributes with many distinct values, such as
country codes for telephone numbers.
18.
a. TRUE
b. FALSE
19. are easy to implement and can execute efficiently even without
OptimusPrime Page 72
prior knowledge of the data, they are among the most popular algorithms for classifying
text documents.
a. ID3
b. Naïve Bayes classifiers
c. CART
d. None of these.
High entropy means that the partitions in classification are
a. Pure
20. b. Not pure
c. Useful
d. Useless
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
21. b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
The maximum value for entropy depends on the number of classes so if we have 8 Classes
what will be the max entropy.
22.
a. Max Entropy is 1
b. Max Entropy is 2
c. Max Entropy is 3
d. Max Entropy is 4
John flies frequently and likes to upgrade his seat to first class. He has determined that if
he checks in for his flight at least two hours early, the probability that he will get an
upgrade is 0.75; otherwise, the probability that he will get an upgrade is 0.35. With his
busy schedule, he checks in at least two hours before his flight only 40% of the time.
Suppose John did not receive an upgrade on his most recent attempt. What is the
23.
probability that he did not arrive two hours early?
a. 0.892
b. 0.796
c. 0.685
d. 0.999
Point out the wrong statement.
a. k-nearest neighbor is same as k-means
24. b. k-means clustering is a method of vector quantization
c. k-means clustering aims to partition n observations into k clusters
d. none of the mentioned
Consider the following example “How we can divide set of articles such that those articles
have the same theme (we do not know the theme of the articles ahead of time) " is this:
25.
1. Clustering
2. Classification
3. Regression
4. None of These
OptimusPrime Page 73
Can we use K Mean Clustering to identify the objects in video?
26. 1. Yes
2. No
Clustering techniques are in the sense that the data scientist
does not determine, in advance, the labels to apply to the clusters.
1. Unsupervised
27.
2. Supervised
3. Reinforcement
4. Neural network
S.
Objective Questions (MCQ /True or False / Fill up with Choices )
No.
metric is examined to determine a reasonably optimal value of
k.
1. Mean Square Error
1.
2. Within Sum of Squares (WSS)
3. Speed
4. None of These
If an itemset is considered frequent, then any subset of the frequent itemset must also be
frequent.
1. Apriori Property
2.
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 & 2
if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the
confidence of rule {bread,eggs}→{milk} is
1. 0
3.
2. 1
3. 2
4. 3
Confidence is a measure of how X and Y are really related rather than coincidentally
happening together.
4.
a. True
b. False
A high-confidence rule can sometimes be misleading because confidence does not consider
support of the itemset in the rule consequent. Is This True ?
5.
a. Yes
b. No
recommend items based on similarity measures between users and/or
items.
1. Content Based Systems
6.
2. Hybrid System
3. Collaborative Filtering Systems
4. None of These
OptimusPrime Page 74
There are major Classification of Collaborative Filtering Mechanisms
1. 1
7. 2. 2
3. 3
4. None of These
Movie Recommendation to peoples is an example of
1. User Based Recommendation
8. 2. Item Based Recommendation
3. Knowledge Based Recommendation
4. Content Based Recommendation
recommenders rely on an explicitly defined set of recommendation
rules.
1. Constraint Based
9.
2. Case Based
3. Content Based
4. User Based
Parallelized hybrid recommender systems operate dependently of one another and produce
separate recommendation lists.
10.
1. True
2. False
Association rules are sometimes referred to as
a. market basket analysis
11. b. Itemset Filtering
c. Frequent Itemset Analysis
d. None of these.
if 80% of all transactions contain itemset {bread}, then the support of {bread} is 0.8.
Similarly, if 60% of all transactions contain itemset {bread,butter}, then the support of
{bread,butter} is
12. a. 0.4
b. 0.5
c. 0.6
d. 0.7
Lift is defined as the measure of certainty or trustworthiness associated with each
discovered rule.
13.
a. TRUE
b. FALSE
is able to identify trustworthy rules, but it cannot tell whether a rule is
coincidental.
a. Lift
14.
b. Confidence
c. Support
d. Leverage
OptimusPrime Page 75
recommend items based on similarity measures between users
and/or items. The items recommended to a user are those preferred by similar users.
a. Collaborative Filtering System
15.
b. Content Based Recommendation
c. Knowledge Based Recommendation
d. Hybrid Approaches
Pure collaborative approaches take a matrix of given user–item ratings as the only input
and typically produce output. Is it Pure Collaborative?
16.
a. Yes
b. No
With respect to the determination of the set of similar users, one common measure used in
17.
recommender systems is
a. Cosine Similarity Measure
b. Pearson’s correlation coefficient.
c. Mean Squared Error Method
d. None of these.
Large-scale e-commerce sites, often implement a different technique,
which is more apt for offline preprocessing and thus allows for
the computation of recommendations in real time even for a very large rating matrix.
18. a. Item-Based Recommendation
b. User-Based Recommendation
c. Content-Based Recommendation
d. None of these
Here are two very short texts to compare and find the cosine similarity measure?
I. Julie loves me more than Linda loves me
II. Jane likes me more than Julie loves me
19. a. 0.6
b. 0.7
c. 0.8
d. 0.9
is based on the availability of item descriptions and a profile that
assigns importance to these characteristics.
a. Item-Based Recommendation
20.
b. User-Based Recommendation
c. Content-Based Recommendation.
d. None of these
Consider the features of a movie which are not relevant to a recommendation system.
a. The set of actors of the movie.
21. b. The Director
c. The Year in which the movie was made
d. The Budget of the movie.
OptimusPrime Page 76
A has been implemented, for similarity based retrieval under
nearest neighbors.
a. k-nearest-neighbor method (kNN)
22.
b. Conventional Neural Network (CNN)
c. Bayes Theorem
d. Naïve Bayes Classifier
Case-based recommenders focus on the retrieval of similar items on the basis of different
types of similarity measures
23.
a. TRUE
b. FALSE
In recommendation approaches, items are retrieved using similarity
measures that describe to which extent item properties match some given user’s
24. requirements.
a. Item-Based
b. Case-Based
c. Content-Based
d. User-Based
are based on a sequenced order of techniques, in which each succeeding
recommender only refines the recommendations of its predecessor.
a. Weighted Hybrids
25.
b. Mixed Hybrids
c. Cascade Hybrids
d. Switching Hybrids
require an oracle that decides which recommender should be
used in a specific situation, depending on the user profile and/or the quality of
recommendation
26. a. Weighted Hybrids
b. Mixed Hybrids
c. Cascade Hybrids
d. Switching Hybrids
OptimusPrime Page 77
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Business intelligence (BI) is a broad
category a) Decision d) All of the
1 b) Data mining c) OLAP d
of application programs which support mentioned
includes _____________
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
2 d
in terms of _____________ services locations based locations based mentioned
that drive on profitability on probability
revenues
Which of the following areas are d) All of the
3 a) Revenue b) CRM c) Sales b
affected by BI? mentioned
________ is a performance management
tool that recapitulates an organization’s a) Balanced d) All of the
4 b) Data Cube c) Dashboard a
performance from several standpoints Scorecard mentioned
on a single page.
__________ is a system where operations
a) Data b) Data d) None of the
5 like data extraction, transformation and c) ETL a
staging integration mentioned
loading operations are executed.
_________ is a category of applications
and a) Data d) All of the
6 b) MIS c) EIS c
technologies for presenting and analyzing warehouse mentioned
corporate and external data.
Which of the following is the process of a)
basing an organization’s actions and Institutional c) Slice and d) None of the
7 b) Gap analysis a
decisions performance Dice mentioned
on actual measured results of performance? management
Which of the following does not form part
8 a) SSRS b) SSIS c) SSAS d) OBIEE d
of BI Stack in SQL Server?
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
9 d
in terms of _____________ services that locations based locations based mentioned
drive on profitability on probability
revenues
This is an approach to selling goods and
A. customer
services in which C. permission D. one-to-one
10 managed B. data mining c
a prospect explicitly agrees in advance to marketing marketing
relationship
receive marketing information.
In an Internet context, this is the practice of
tailoring Web a. Web b. customer- d. personalizati
11 c. client/server d
pages to individual users’ characteristics or services facing on
preferences.
This is the processing of data about
customers and their c. customer
a. clickstream b. database d. CRM
12 relationship with the enterprise in order to relationship d
analysis marketing analytics
improve the enterprise’s future sales and management
service and lower cost.
This is a broad category of applications and
technologies for c. business
a. best d. business
13 gathering, storing, analyzing, and providing b. data mart information d
practice intelligence
access to data to help enterprise users make warehouse
better business decisions.
OptimusPrime Page 78
This is a systematic approach to the
gathering, consolidation, d. service
a. database b. marketing c. application
14 and processing of consumer data (both for oriented a
marketing encyclopedia integration
customers and potential customers) that is integration
maintained in a company’s databases.
This is an arrangement in which a company
outsources some b. supplier d. Customer
a. spend
15 or all of its customer relationship relationship c. hosted CRM Information c
management
management functions to an application management Control System
service provider (ASP).
This is an XML-based metalanguage
developed by the Business
Process Management Initiative (BPMI) as a
16 means of modeling a. BizTalk b. BPML c. e-biz d. ebXML b
business processes, much as XML is, itself,
a metalanguage
with the ability to model enterprise data.
This is a central point in an enterprise from
a. contact c. multichannel
17 which all customer b. help system d. call center a
center marketing
contacts are managed.
This is the practice of dividing a customer
base into groups of b. customer
a. customer c. customer life d. customer
18 individuals that are similar in specific ways managed d
service chat cycle segmentation
relevant to marketing, such as age, gender, relationship
interests, spending habits, and so on.
In data mining, this is a technique used to
a. predictive b. disaster d. predictive
19 predict future behavior c. phase change d
technology recovery modeling
and anticipate the consequences of change.
1. According to analysts, for what can
Data
traditional IT systems provide a foundation Big data Collecting and
warehousing Management of
when management storing
20 and Hadoop a
they’re integrated with big data and data unstructured
business clusters
technologies mining data
intelligence
like Hadoop?
Distributed
All of the following accurately describe
21 Open source Real-time Java-based computing b
Hadoop, EXCEPT:
approach
__________ has the world’s largest Hadoop None of the
22 Apple Datamatics Facebook c
cluster. mentioned
All of the
23 What are the five V’s of Big Data? Volume velocity Variety d
above
_________ hides the limitations of Java
24 behind a powerful Scalding Cascalog Hcatalog Hcalding b
and concise Clojure API for Cascading.
What are the main components of Big
25 MapReduce HDFS YARN All of these d
Data?
What are the different features of Big Data
26 Open-Source Scalability Data Recovery All the above d
Analytics?
Define the Port Numbers for NameNode,
All of the
27 Task Tracker and NameNode Task Tracker Job Tracker d
above
Job Tracker.
Facebook Tackles Big Data With _______
28 Project Prism Prism ProjectData ProjectBid a
based on Hadoop
What is a unit of data that flows through a
29 Record Event Row Log b
Flume agent?
OptimusPrime Page 79
A feature F1 can take certain value: A, B,
Feature F1 is Feature F1 is an It doesn’t
C, D, E, & F and represents grade of
an example example belong to any
30 students from a college. Which of the Both of these b
of nominal of ordinal of the above
following statement is true in the following
variable. variable. category.
case
Which of the following is an example of a
None of the all of the
31 deterministic PCA K-Means a
above above
algorithm?
-(5/8 log(5/8)
What is the entropy of the target 5/8 log(5/8) + 5/8 log(5/8) + 5/8 log(3/8) –
32 + 3/8 a
variable? 3/8 log(3/8) 3/8 log(3/8) 3/8 log(5/8)
log(3/8))
a) OLAP is
an umbrella
term that
refers to an c) BI makes an
b) Business
assortment of organization
intelligence
software agile
equips
applications thereby giving None of the
33 Point out the correct statement. enterprises to b
for analyzing it a lower edge mentioned
gain business
an in today’s
advantage from
organization’s evolving market
data
raw data for condition
intelligent
decision
making
a) Distinguish b) Rank
c) Ranks
the products customers and
BI can catalyze a business’s success in customers and d) All of the
34 and services locations d
terms of _____________ locations based mentioned
that drive based on
on probability
revenues profitability
Which of the following areas are affected d) All of the
35 a) Revenue b) CRM c) Sales b
by BI? mentioned
Which of the following does not form part
36 a) SSRS b) SSIS c) SSAS d) OBIEE d
of BI Stack in SQL Server?
a) Distinguish
the b) Rank c) Ranks
BI can catalyze a business’s success products and customers and customers and d) All of the
37 d
in terms of _____________ services that locations based locations based mentioned
drive on profitability on probability
revenues
A set of
databases An approach to Information that
from different a problem that is hidden in a
vendors, is not database and
38 Heuristic is possibly guaranteed to that cannot be None of these b
using work but recovered by a
different performs well simple SQL
database in most cases query.
paradigms
In an Internet context, this is the practice of
tailoring Web a. Web b. customer- d. personalizati
39 c. client/server d
pages to individual users’ characteristics or services facing on
preferences.
OptimusPrime Page 80
A set of
databases An approach to Information that
from different a problem that is hidden in a
b vendors, is not database and
40 Heterogeneous databases referred to possibly guaranteed to that cannot be None of these a
using work but recovered by a
different performs well simple SQL
database in most cases. query.
paradigms
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Movie Recommendation systems are an Reinforcement b and
1 Classification Clustering Regression
example of: Learning c
Reinforcement a,b
2 Sentiment Analysis is an example of: Regression Classification Clustering
Learning and d
What is the minimum no. of variables/
3 0 1 2 3 b
features required to perform clustering?
Is it possible that Assignment of
4 observations to clusters does not change Yes No Can’t say None of these a
between successive iterations in K-Means
Assignment of
observations to
clusters does Centroids do
Terminate
For a fixed not change not change
Which of the following can act as possible when RSS falls
5 number of between between a,b,c,d
termination conditions in K-Means? below a
iterations. iterations. successive
threshold.
Except for iterations.
cases with a bad
local minimum.
Expectation-
Which of the following clustering K- Means Agglomerative Diverse
Maximization a and
6 algorithms suffers from the problem of clustering clustering clustering
clustering c
convergence at local optima? algorithm algorithm algorithm
algorithm
K-means K-medians K-modes K-medoids
Which of the following algorithm is most
7 clustering clustering clustering clustering a
sensitive to outliers?
algorithm algorithm algorithm algorithm
Creating Creating an
Creating an Creating an
How can Clustering (Unsupervised different input feature for
input feature for input feature
Learning) be used to improve the accuracy models for cluster
8 cluster ids as an for cluster size a,b,c,d
of Linear Regression model (Supervised different centroids as a
ordinal as a continuous
Learning): cluster continuous
variable. variable.
groups. variable.
What could be the possible reason(s) for
producing two different dendrograms using Proximity of data points of variables All of the
9 d
agglomerative clustering algorithm for the function used used used above
same dataset?
Data points Data points Data points
In which of the following cases will K- Data points a,b,an
10 with different with round with non-
Means clustering fail to give good results? with outliers dd
densities shapes convex shapes
mputation with
Which of the following is/are valid iterative Nearest
Imputation Expectation All of the
11 strategy for treating missing values before Neighbor c
with mean Maximization above
clustering analysis? assignment
algorithm
OptimusPrime Page 81
In distance
You always get In Manhattan
calculation it
Feature scaling is an important step before the same distance it is an
will give the
12 applying K-Mean algorithm. What is reason clusters. If you important step None of these a
same weights
behind this? use or don’t use but in Euclidian
for all
feature scaling it is not
features
Which of the following method is used for
Elbow Manhattan Ecludian All of the
13 finding optimal of cluster in K-Mean a
method method mehthod above
algorithm?
K-means is Bad Bad
extremely initialization initialization
14 What is true about K-Mean Clustering? sensitive to can lead to Poor can lead to bad None of these d
cluster center convergence overall
initializations speed clustering
Try to run
Which of the following can be applied to algorithm for Find out the
Adjust number
15 get good results for K-means algorithm different optimal number None of these a,b,c
of iterations
corresponding to global minima? centroid of clusters
initialization
If you are using Multinomial mixture All the data All the data All the data
All the data
models with the expectation-maximization points follow n points follow points follow n
points follow
16 algorithm for clustering a set of data points Gaussian two multinomial c
two Gaussian
into two clusters, which of the assumptions distribution (n multinomial distribution (n
distribution
are important: >2) distribution >2)
Both have
Which of the following is/are not true about Expectation
strong
Centroid based K-Means clustering Both starts Both are maximization
assumptions
17 algorithm and Distribution based with random iterative algorithm is a d
that the data
expectation-maximization clustering initializations algorithms special case of
points must
algorithm: K-Means
fulfill
For data
points to be in It has strong It has It does not
a cluster, they assumptions for substantially require prior
Which of the following is/are not true about b and
18 must be in a the distribution high time knowledge of
DBSCAN clustering algorithm: c
distance of data points complexity of the no. of
threshold to a in dataspace order O(n3) desired clusters
core point
Which of the following are the high and low None of the
19 [0,1] (0,1) [-1,1] a
bounds for the existence of F-Score? above
a. Increased
1. All of the following increase the width b. Increased c. Increased d. Decreased
20 confidence c
of a confidence interval except: variability sample size sample size
level
d. The
c. The probability of
a. The
probability observing
probability of b. The
that the results as
3The p-value in hypothesis testing failing to probability
observed results extreme or
represents reject the null that the null
21 are statistically more extreme d
which of the following: Please select the hypothesis, hypothesis is
significant, than currently
best answer of those provided below. given the true, given the
given that the observed,
observed observed results
null hypothesis given that the
results
is true null hypothesis
is true
OptimusPrime Page 82
4. Assume that the difference between the
observed, paired sample values is defined in
the same manner and that the specified
significance level is the same for both
hypothesis tests. Using the same data, the
a. Always c. Sometimes d. Not Enough
22 statement that “a paired/dependent two b. Never True a
True True Information
sample t-test is equivalent to a one sample t-
test on the paired differences, resulting in
the same test statistic, same p-value, and
same conclusion” is: Please select the best
answer of those provided below.
19. Green sea turtles have normally
distributed weights, measured in kilograms,
with a mean of 134.5 and a variance of
23 49.0. A particular green sea turtle’s weight a. 17 kg b. 151 kg c. 118 kg d. 252 kg c
has a z-score of -2.4. What is the weight of
this green sea turtle? Round to the nearest
whole number.
What percentage of measurements in a
d. Cannot Be
24 dataset a. 49% b. 50% c. 51% d
Determined
fall above the median?
24. The proportion of variation in 5k race
times that can be explained by the variation
in the age of competitive male runners was
25 a. 0.663 b. 0.814 c. -0.814 d. 0.440 c
approximately 0.663. What is the value of
the sample linear correlation coefficient?
Round to 3 decimal places.
a. Yes; linear c. No; linear
b. Yes; both the d. No; the age
correlation correlation
25. Using all of the results provided, is it sample linear provided
between age between age
reasonable to predict the 5k race time regression is beyond the
26 and 5k race and 5k race d
(minutes) of a competitive male runner 73 equation and an scope of our
times is times is not
years of age? age in years is available
statistically statistically
provided sample data
significant significant
It uses
machine- Science of
learning making
Computational
techniques. machines
procedure that
Here program performs tasks
takes some
can learn that would
27 Algorithm is value as input None of these b
from past require
and produces
experience intelligence
some value as
and adapt when
output
themselves to performed by
new humans
situations
OptimusPrime Page 83
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
28 Bias is classification None of these b
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory
adapting the
explanations to
fit the new
situation.
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
29 Classification is examples into classification of classification to None of these a
a number of a concept that is a set of
classes given by a examples
certain theory
This takes
only two
Systems that
values. In
The natural can be used
general, these
environment of without
30 Binary attribute are values will be None of these a
a certain knowledge of
0 and 1 and
species internal
.they can be
operations
coded as one
bit
Measure of the
A subdivision The task of
accuracy, of the
of a set of assigning a
classification of
31 Classification accuracy is examples into classification to None of these b
a concept that is
a number of a set of
given by a
classes examples
certain theory
Operations on a
Group of database to Symbolic
similar transform or representation
objects that simplify data in of facts or ideas
32 Cluster is differ order to prepare from which None of these a
significantly it for a information can
from other machine- potentially be
objects learning extracted
algorithm
A definition of a concept is-----if it
33 Complete Consistent Constant None of these a
recognizes all the instances of that concept
A definition or a concept is------------- if it
34 classifies any examples as coming within Complete Consistent Constant None of these b
the concept
OptimusPrime Page 84
A subject-
The actual oriented
discovery The stage of integrated time
phase of a selecting the variant non-
35 Data selection is None of these b
knowledge right data for a volatile
discovery KDD process collection of
process data in support
of management
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
36 Classification task referred to examples into classification of classification to None of these c
a number of a concept that is a set of
classes given by a examples
certain theory
Decision
Approach to the support systems
design of that contain an
Combining
learning information
different
algorithms that base filled with
37 Hybrid is types of None of these a
is structured the knowledge
method or
along the lines of an expert
information
of the theory of formulated in
evolution. terms of if-then
rules.
An extremely
It is hidden
The process of complex
within a
executing molecule that
database and
implicit occurs in
can only be
previously human
recovered if
38 Discovery is unknown and chromosomes None of these b
one is given
potentially and that carries
certain clues
useful genetic
(an example
information information in
IS encrypted
from data the form of
information).
genes.
What could be the possible reason(s) for
producing two different dendrograms using Proximity of data points of variables All of the
39 d
agglomerative clustering algorithm for the function used used used above
same dataset?
Is it possible that Assignment of
40 observations to clusters does not change Yes No Can’t say None of these a
between successive iterations in K-Means
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
This clustering algorithm terminates when
mean values computed for the current
K-Means conceptual expectation agglomerative
1 iteration of the algorithm are identical to the a
clustering clustering maximization clustering
computed mean values for the previous
iteration
As the value of As the value of
The attributes one attribute one attribute
The correlation coefficient for two real- The attributes
are not decreases the increases the
2 valued attributes is –0.85. What does this show a linear b
linearly value of the value of the
value tell you? relationship
related. second attribute second attribute
increases. also increases.
OptimusPrime Page 85
Y is false
Given a rule of the form IF X THEN Y, rule Y is true when X is true when X is false when
when X is
3 confidence is defined as the conditional X is known to Y is known to Y is known to b
known to be
probability that be true. be true be false.
false.
Density based Hierarchical
Partitioning Model based
4 Chameleon is clustering clustering d
based algorithm algorithm
algorithm algorithm
5 Find odd man out DBSCAN K-Mean PAM None of above a
decreases with
increases with
increases with decreases with increase in size
The number of iterations in apriori the size of the
6 the size of the the increase in of the c
___________ maximum
data size of the data maximum
frequent set
frequent set
Which of the following are interestingness
7 Recall Lift Accuracy All of Above b
measures for association rules?
2k – 1 2k – 2
2k candidate 2k -2 candidate
Given a frequent itemset L, If |L| = k, then candidate candidate
8 association association c
there are association association
rules rules
rules rules
_________ is an example for case based- Neural Genetic K-nearest
9 Decision trees d
learning networks algorithm neighbor
The average positive difference between mean positive mean squared mean absolute root mean
10 c
computed and desired outcome values. error error error squared error
Superset of
both closed
Superset of Superset of Subset of
frequent item
only closed only maximal maximal
11 Frequent item sets is sets and d
frequent item frequent item frequent item
maximal
sets sets sets
frequent item
sets
Assume that we have a dataset containing
information about 200 individuals. A
supervised data mining session has
discovered the following rule: IF age < 30
& credit card insurance = yes THEN life
12 63 38 40 89 b
insurance = yes Rule
Accuracy: 70% and Rule
Coverage: 63% How many individuals in
the class life insurance= no have credit card
insurance and are less than 30 years old?
Simple Grouping Labeled Query results
13 Which of the following is cluster analysis? b
segmentation similar objects classification grouping
Which two parameters are needed for Min points and Min sup and Number of
15 Min threshold b
DBSCAN eps min confidence centroids
Both
techniques
build models
whose Both models
The output of Both models
output is require numeric
Which statement is true about neural both models is a require input
16 determined attributes to d
network and linear regression models? categorical attributes to be
by a linear range between
attribute value. numeric.
sum of 0 and 1.
weighted
input attribute
values.
OptimusPrime Page 86
In Apriori algorithm, if 1 item-sets are 100,
17 100 200 4950 5000 c
then the number of candidate 2 item-sets are
Finding
Significant Bottleneck in the Apriori Candidate Number of
18 frequent Pruning c
algorithm is generation iterations
itemsets
typically
are better able
Machine learning techniques differ from assume an have trouble are not able to
to deal with
19 statistical techniques in that machine underlying with large-sized explain their a
missing and
learning methods distribution for datasets behavior.
noisy data
the data
The probability of a hypothesis before the
20 a priori posterior conditional subjective a
presentation of evidence.
21 KDD represents extraction of data knowledge rules model b
Outliers
Outliers should
should be part The nature of
Outliers should be part of the
of the training the problem
be identified test dataset but
22 Which statement about outliers is true? dataset but determines how c
and removed should not be
should not be outliers are
from a dataset. present in the
present in the used
training data.
test data.
23 The most general form of distance is Manhattan Eucledian Mean Minkowski d
High support High support Low support Low support
24 Which Association Rule would you prefer and medium and low and high and low c
confidence confidence confidence confidence
In a Rule based classifier, If there is a rule
Mutually
25 for each combination of attribute values, Exhaustive Inclusive Comprehensive a
exclusive
what do you called that rule set R
To decrease the To improve the
If a set cannot If a set can
efficiency, do efficiency, do
pass a test, its pass a test, its
level-wise level-wise
26 The apriori property means supersets will supersets will a
generation of generation of
also fail the fail the same
frequent item frequent item
same test test
sets sets
If an item set ‘XYZ’ is a frequent item set,
27 Undefined Not frequent Frequent Can not say c
then all subsets of that frequent item set are
The probability that a person owns a sports
car given that they subscribe to automotive
magazine is 40%. We also know that 3% of
the adult population subscribes to
automotive magazine. The probability of a
28 person owning a sports car given that they 0.0368 0.0396 0.0389 0.0398 b
don’t subscribe to automotive magazine
is 30%. Use this information to compute
the probability that a person subscribes to
automotive magazine given that they own a
sports car
Simple regression assumes a __________
29 relationship between the input attribute and quadratic inverse linear reciprocal c
output attribute.
Only Both minimum
Neither support Minimum
To determine association rules from minimum support and
30 not confidence support is c
frequent item sets confidence confidence are
needed needed
needed needed
If {A,B,C,D} is a frequent itemset,
31 C –> A D –>ABCD A –> BC B –> ADC b
candidate rules which is not possible is
High support Low support Low support High support
32 Which Association Rule would you prefer and low and high and low and medium b
confidence confidence confidence confidence
OptimusPrime Page 87
Classification rules are extracted from
33 decision tree root node branches siblings a
_____________
What does K refers in the K-Means
. number of
34 algorithm which is a non-hierarchical Complexity Fixed value No of iterations d
clusters
clustering approach?
If Linear regression model perfectly first Test error is Couldn’t Test error is
Test error is
35 i.e., train error is zero, then also always comment on equal to Train c
non zero
_____________________ zero Test error error
Which of the following metrics can be used
for evaluating regression models? i)R
ii and iv i and ii ii, iii and iv i, ii, iii and iv d
Squared ii) Adjusted R Squared iii) F
Statistics iv) RMSE/MSE/MAE
How many coefficients do you need to
37 estimate in a simple linear regression model 1 2 3 4 b
(One independent variable)?
In a simple linear regression model (One
independent variable), If we change the
38 by 1 no change by intercept by its slope d
input variable by 1 unit. How much output
variable will change?
In syntax of linear model
39 Matrix array vector list c
lm(formula,data,..), data refers to ______
In the mathematical Equation of Linear
(X-intercept, (Slope, X- (Y-Intercept, (slope, Y-
40 Regression Y = β1 + β2X + ϵ, (β1, β2) c
Slope) Intercept) Slope) Intercept)
refers to __________
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
A _________ is a decision support tool that
uses a tree-like graph or model of decisions
Neural
1 and their possible consequences, including Decision tree Graphs Trees a
Networks
chance event outcomes, resource costs, and
utility.
2 Decision Tree is a display of an algorithm. TRUE FALSE a
Flow-Chart &
Structure in
Structure in
which internal
which internal
node represents
node represents
test on an
test on an
attribute, each
attribute, each
3 What is Decision Tree? branch None of Above c
branch
represents
represents
outcome of test
outcome of test
and each leaf
and each leaf
node represents
node represents
class label
class label
OptimusPrime Page 88
Worst, best and
Use a white box
expected values
Possible model, If given
Which of the following are the advantage/s can be
9 Scenarios can result is All of Above d
of Decision Trees? determined for
be added provided by a
different
model
scenarios
Attributes are Attributes are
statistically statistically
Attributes are Attributes can
Which of the following statements about dependent of independent of
10 equally be nominal or b
Naive Bayes is incorrect? one another one another
important. numeric
given the class given the class
value. value.
Which of the following is not supervised Linear
11 Clustering Decision Tree Naive Bayesian a
learning? Regression
How many terms are required for building
12 1 2 3 4 c
a bayes model?
Answering
Solving Increasing Decreasing
13 Where does the bayes rule can be used? probabilistic d
queries complexity complexity
query
How the bayesian network can be used to Full Joint Partial
14 All of Above b
answer any query? distribution distribution distribution
Both
What is the consequence between a node
Functionally Conditionally Conditionally
15 and its predecessors while creating bayesian Dependant c
dependent independent dependant &
network?
Dependant
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
16 Bayesian classifiers is classification None of these a
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory.
adapting the
explanations to
fit the new
situation.
OptimusPrime Page 89
An approach to
the design of
learning
algorithms that
A class of
is inspired by
learning
the fact that
algorithm that
Any mechanism when people
tries to find
employed by a encounter new
an optimum
learning system situations, they
17 Bias is classification None of these b
to constrain the often explain
of a set of
search space of them by
examples
a hypothesis reference to
using the
familiar
probabilistic
experiences,
theory
adapting the
explanations to
fit the new
situation.
Additional
acquaintance
used by a A neural
It is a form of
learning network that
18 Background knowledge referred to automatic None of these a
algorithm to makes use of a
learning.
facilitate the hidden layer
learning
process
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
19 Classification accuracy is examples into classification of classification to None of these b
a number of a concept that is a set of
classes given by a examples
certain theory
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
20 Classification is examples into classification of classification to None of these a
a number of a concept that is a set of
classes given by a examples
certain theory
An extremely
It is hidden
The process of complex
within a
executing molecule that
database and
implicit occurs in
can only be
previously human
recovered if
21 Discovery is unknown and chromosomes None of these b
one is given
potentially and that carries
certain clues
useful genetic
(an example
information information in
IS encrypted
from data the form of
information).
genes.
A measure of
A subdivision the accuracy, of The task of
of a set of the assigning a
22 Classification task referred to examples into classification of classification to None of these c
a number of a concept that is a set of
classes given by a examples
certain theory
OptimusPrime Page 90
The process of
finding a
solution for a
A stage of the problem simply The distance
KDD process by enumerating between two
in which new all possible points as
23 Euclidean distance measure is None of these c
data is added solutions calculated using
to the existing according to the Pythagoras
selection. some pre- theorem
defined order
and then testing
them
The problem of finding hidden structure in Supervised Unsupervised Reinforcement
24 None of these b
unlabeled data is called learning learning learning
Assume you want to perform supervised
learning and to predict number of newborns Structural
25 according to size of storks’ population Classification Regression Clustering equation b
(http://www.brixtonhealth.com/storksBabie modeling
s.pdf), it is an example of
Discriminating between spam and ham e-
26 TRUE FALSE a
mails is a classification task, true or false?
which of the following is not involve in data Knowledge Data Data Data
27 d
mining? extraction archaeology exploration transformation
A class of
A prediction
learning A table with n
made using an
algorithms independent
extremely
that try to attributes can
28 Naive prediction is simple method, None of these c
derive a be seen as an n-
such as always
Prolog dimensional
predicting the
program from space.
same output.
examples
In the context
of KDD and
One of the
data mining,
A component defining aspects
29 Node is this refers to None of these a
of a network of a data
random errors
warehouse
in a database
table.
One of several
possible enters Discipline in
within a statistics that
The result of
database table studies ways to
the
that is chosen find the most
application of
30 Prediction is by the designer interesting None of these a
a theory or a
as the primary projections of
rule in a
means of multi-
specific case
accessing the dimensional
data in the spaces.
table.
What is the relation between the distance
inversely-
31 between clusters and the corresponding proportional no-relation None of these a
proportional
class discriminability?
the classification method in which the upper
exclusive inclusive mid point
32 limit of interval is same as of lower class None of these a
method method method
interval is called….
larger value is 60 and the smallest value is
33 40 and the number of classes is 5 then the 20 25 4 15 c
class interval is
OptimusPrime Page 91
summary and presentation of data in tabular
nominal frequency ordinal
34 form with several non overlapping classes is None of these b
distribution distribution distribution
referred as
the classification method in which the upper
exclusive inclusive mid point
35 and lower limit of interval is also in class None of these b
method method method
interval itself is called….
Suppose there are 25 base classifiers. Each
classifier has error rates of e = 0.35.
Suppose you are using averaging as
36 0.05 0.06 0.07 0.08 b
ensemble of above 25 classifiers will make
a wrong prediction? Note: all classifiers are
independent of each other
The most widely used metrics and tools to Confusion Cost-sensitive Area under the
37 All of Above d
assess a classification model are: matrix accuracy ROC curve
Normalize the
Normalize PCA →
When performing regression or data → PCA →
the data → normalize PCA
38 classification, which of the following is the normalize PCA None of these a
PCA → output →
correct way to preprocess the data? output →
training training
training
Assumes that
all the Assumes that
Which of the following is true about Naive features in a all the features
39 both a and b None of these c
Bayes ? dataset are in a dataset are
equally independent
important
In which of the following cases will K-
means clustering fail to give good results?
40 1) Data points with outliers 2) Data points 1 and 2 2 and 3 1, 2, and 3 1 and 3 c
with different densities 3) Data points with
nonconvex shapes
No
Question a b c d ANS
.
Pictorial
numerical numerical
1 Data visualtization is realted with… representaion None of these a
representation calculations
s
Which of the following are Use of data See context of Clear data finding pattern
2 all of above d
visualtization data understanding in data
Which of the following statements are true
about using visualizations to display a
dataset? I. Visualizations are visually
appealing, but don’t help the viewer
understand relationships that exist in the
data
3 I AND II II AND III I AND III ONLY III d
II. Visualizations like graphs, charts, or
visualizations with pictures are useful for
conveying information, while tables just
filled with text are not useful.
OptimusPrime Page 92
You can create a scatter plot matrix using
all of the
7 the __________ method in sca_matrix scatter_matrix DataFrame.plot b
mentioned
pandas.tools.plotting.
Plots may also be adorned with error bars or
8 True FALSE Cannot Tell All Above a
tables.
Which of the following plots are often used Autocausatio none of the
9 Autorank Autocorrelation c
for checking randomness in time series? n mentioned
__________ plots are used to visually
10 Lag RadViz Bootstrap All Above c
assess the uncertainty of a statistic
Which of the following is not a challenge in
11 Velocity Volume Version Variety c
Big Data Visualization>?
Which of the following is not a problem in Large image Information
12 Visual Noise Scaled Data b
Big Data Visualization>? perception Loss
Which of the following is a problem in Big Structured Multiple
13 Scaled Data Visual Noise c
Data Visualization>? Data valued Data
Which of the candidate is suitable for Type of
14 Cardinality Size of data all of above d
interactive visualtization? Visual
Which of the following follows interactive Overview+Deta
15 Zoom+Pan Focus+Context all of above d
visualization approach? ils
Overview+Deta
16 Visual Mapping is important for_______ Remapping Focus Context a
ils
17 Data visualtization techniques are: Scatter Plot Line Chart Pie Chart all of above d
18 Information Visualtization techniques are Flow Chart Time Line DFD All of above d
19 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these c
20 Information Visualtization techniques are Flow Chart Line Chart Pie Chart None of these a
21 Data visualtization techniques are: Scatter Plot Time Line DFD None of these a
22 Information Visualtization techniques are Scatter Plot Time Line Bubble Chart None of these b
Parallel
23 Data visualtization techniques are: Histogram Time Line None of these a
Coordinates
Semantic
24 Information Visualtization techniques are Histogram Area Chart None of these a
Network
Which of the following is realted term with
25 Exponential U-Shape Null All of above d
correlation?
26 Data visualtization techniques are: Scatter Plot Time Line DFD None of these a
27 Coulmn graph is another name for _____ Bar Chart Scatterplot Histogram Area Chart a
Which of the following follows interactive Overview+Deta
28 Zoom+Pan Focus+Context all of above d
visualization approach? ils
29 information Visualtization techniques are Pie Chart Scatterplot Histogram Area Chart a
Which of the following is category of Linear Modular Variant
30 ER Timeline a
timeline? Timeline Timeline Timeline
Which of the following specifies
31 Scatter Plot Line Chart Area Chart All of above d
relationship amongst variables?
Which of the following specifies category
32 Pie Chart Histogram Bar chart All of above d
Proportions?
Which of the following is category of Variant Comarative Modular
33 ER Timeline c
timeline? Timeline Timeline Timeline
34 Information Visualtization techniques are Flow Chart Time Line DFD All of above d
35 Data visualtization techniques are: Flow Chart Time Line Pie Chart None of these c
Pictorial
numerical numerical
36 Data visualtization is realted with… representaion None of these a
representation calculations
s
Which of the following follows interactive Overview+Deta
37 Zoom+Pan Focus+Context all of above d
visualization approach? ils
Which of the following are Use of data See context of Clear data finding pattern
38 all of above d
visualtization data understanding in data
OptimusPrime Page 93
Which of the following specifies
39 Pie Chart Histogram Area Chart None of these c
relationship amongst variables?
Which of the following specifies category
40 Pie Chart Scatter Plot Line Chart None of these a
Proportions?
No
Question a b c d ANS
.
Eg a/b/c/
Write down question Option a Option b Option c Option d
. d
Structured Un Structured semi Structured Quasi
1 Precies and steady format data is____ a
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
2 Inconsistant Data is______ b
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
3 Format that self defines itself is________ c
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
4 A little Bit inconsistant data is_______ d
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
5 XML is an example of_______
Data Data Data Structured Data
Structured Un Structured semi Structured Quasi
6 RDBMS Folllows__________ a
Data Data Data Structured Data
7 Watson is developed by____ IBM Microsoft AT&T Google a
8 Hadoop is _____ based Framework. C++ Python JAVA C# c
Which of the following are components of MAPREDUC
9 YARN HDFS All of Above d
Hadoop? E
Which of the following are components of
10 JDBC Thrift Server CLI All of Above d
HIVE?
JAVA
Mountable
11 Mahout provides__________ Executable C# Executables All of Above a
Image Format
Libraries
Which of the following are components of
12 FLATTEN Thrift Server Muster None of these b
HIVE?
Which of the following are components of
13 FLATTEN Thrift Server Muster All of above b
HIVE?
Which of the following is components of
14 Fork YARN CLI Metadata b
Hadoop?
Structured Un Structured semi Structured Quasi
15 RDBMS Folllows__________ a
Data Data Data Structured Data
Which of the following is a clustering Fuzzy K
16 Canopy K-Means All of above d
techique? means
Which of the following is HBASE Data
17 Row Table Column All of Above d
Model Terminology?
Which of the following is not a Logistic Recommender
18 Random Forest Naïve Bayes c
classification techique? Regression Algo
Which of the following is a classification Logistic
19 Random Forest Naïve Bayes All of Above d
techique? Regression
Which of the following is HBASE Data Column
20 Cell Timestamp All of Above d
Model Terminology? Family
Which of the following is a clustering Logistic
21 Random Forest K-Means Naïve Bayes c
techique? Regression
Which of the following is HBASE Data None of the
22 Identifier Variant Timestamp c
Model Terminology? above
Which of the following is not a Logistic
23 Random Forest K-Means Naïve Bayes c
classification techique? Regression
Which of the following are components of
24 FLATTEN Thrift Server Muster None of these b
HIVE?
OptimusPrime Page 94
Which of the following is HBASE Data Column None of the
25 Identifier Variant c
Model Terminology? Qualifier above
JAVA
Mountable None of the
26 Mahout provides__________ Executable C# Executables a
Image Format above
Libraries
Which of the following is not a clustering Logistic
27 Canopy K-Means Fuzzy K means a
techique? Regression
Which of the following is a clustering Fuzzy K
28 Canopy K-Means All of above d
techique? means
Hadoop do In Hadoop
Hadoop 2.0
need programming
allows live
specialized framework None of the
29 Point out the correct statement. stream b
hardware to output files are above
processing of
process the divided into
real-time data
data lines or records
30
A sound
Creator Doug
Cutting’s high The toy Cutting’s
Cutting’s
31 What was Hadoop named after? school rock elephant of laptop made c
favorite
band Cutting’s son during Hadoop
circus act
development
___________programming model used to
None of the
32 develop Hadoop-based applications that can MapReduce Mahout Oozie a
above
process massive amounts of data.
Which of the following is not a Logistic
33 Random Forest K-Means Naïve Bayes c
classification techique? Regression
Which of the following are components of
34 FLATTEN Thrift Server Muster All of above b
HIVE?
Which of the following is components of
35 Fork YARN CLI None of above b
Hadoop?
Hadoop is a framework that works with a MapReduce, MapReduce, MapReduce,
36 variety of related tools. Common cohorts Hive and MySQL and Hummer and All of above a
include ____________ HBase Google Apps Iguana
NoSQL databases is used mainly for
Structured Un Structured semi Structured Quasi
37 handling large volumes of ______________ b
Data Data Data Structured Data
data.
Which of the following is not a phase of Communicati Data Model
38 Recall b
Data Analytics Life Cycle? on Preparation Planning
Which of the following is a NoSQL Document
39 SQL JSON All of above b
Database Type? databases
Which of the following is not a NoSQL None of the
40 SQL Server MongoDB Cassandra a
database above
OptimusPrime Page 95
marks question A B C D ans
A group of 4 bits is also
0 1 Nibble Byte Kb None 4 bits make one nibble.
called?
There are how many types of
1 1 3 2 1 None Big Data is of 3 types.
Big Data:
Which of the following are the
2 1 All Volume Variety Velocity. This is an explaination.
V's of Big Data:
Which of these is not a
3 1 Storage Volume Variety Velocity. This is an explaination.
characterstic of Big data?
Which of the following is a Big Data requires high cost to
4 2 Cost Significant Process Fraud Detection
drawback of Big Data: maintain huge amount of data
GINA stands for Global
Global Innovation Network and Global Invention in Globally Investment in
5 2 Fullform of GINA is: None Innovations Networks and
Analysis. Networks and Analytics Neurons and Analytics
Analysis.
Which is the phase 3 in Data Model Planning is the 3rd phase
6 2 Model Planning Model Building Data Preparation Operationalize
Analytics Life cycle. in life cycle.
GINA team thought to GINA targeted to achieve three
7 2 3 2 1 5
accomplish mainly____ goals: goals for the project.
The Data Preparation stage
8 2 Analyzation Collection Cleansing Processing. This is an explaination.
doesn’t involve:
Unstructured Data is further Unstructured data is divided into
9 2 2 3 4 5
divided into how many types? 2 types.
The GINA team mainly used
The team used Tableau to
10 2 which software tool to analyze Tableau Hadoop HIVE SQL
visualize the Data.
the Data
Which of the follwing is the first
11 2 step of Data Analytics Life Discovery Data Preparation. Model Planning Data Aware This is an explaination.
Cycle:
There are how many phases in there are 6 stages in data
12 2 6 5 4 7
data analytics life cycle: analytics life cycle.
SEMMA Methodology has SEMMA methodology has five
13 2 5 4 6 7
how many stages: stages.
Which phase of Life Cycle
Phase 5 involves collaboration
14 2 requires collaboration with Phase 5 Phase 6 Phase 4 Phase 3
with stakeholders.
stakeholders?
In Building a Model, how many
15 2 2 3 4 5 This is an Explaination.
phases are required:
How much Data in the whole Only 20% of world's total data is
16 2 0.2 0.4 0.6 0.5
world is structured: structured.
10^7 bytes of memory is equal
17 2 1ZB 1TB 1YB 1XB 10^7 B is equal to 1 ZB.
to:
Data Scientists in the GINA
NLP technique was used on the
team used which technique on Natural Language
18 2 Hadoop HIVE SQL description of Innovation
the textual Description of the Processing(NLP)
Roadmap Idea.
Innovation Roadmap Idea.
How many types of data Two types of data anlytical
19 2 analytics methodologies are 2 4 3 6 methodologies are there. EDA
there? and CDA
Bell Curve is also known as
20 3 Other name for Bell Curve is: Normal Distribution. Poisson Distribution Bionomial Distribution Bernoulli Distribution.
normal distribution.
One of the most important tasks
One of the most important
21 3 Statical Modeling Testing of Data Visualization Operationalize in big data analytics is statistical
tasks in big data analytics is:
modeling
Some of the approaches
considered for building the data
22 3 All CRISP-DM SEMMA MAD Skills This is an explaination.
analytics lifecycle framework
best practices are:
In Phase 4, the team develops
23 3 All Testing of Data Training of Data Production purposes This is an explaination.
datasets for:
Cross International Company's Initial CRISP-DM stands for Cross
Fullform of CRISP-DM Cross Industry Standard Process Common Industry Standard
24 3 Standard Process for Standards Progress for Industry Standard Process for
Methodology is: for Data Mining Program for Data Mining
Data Modeling Data Methods Data Mining.
SEMMA Methodology
25 3 doesn’t include which of the Evaluate Sample Explore Asses This is an Explaination.
following stages:
In Which stage, the data is In last phase i.e. Opeartionalize
monitored and analyzed to see Data is monitored and analyzed
26 3 Operationalize Collection Plan Model Data Aware
if the generated model is to see if the generated model is
creating the expected results. creating the expected results.
Data is captured in how many
27 3 3 4 5 6 Data is captured in 3 main ways.
ways:
OptimusPrime Page 96
marks question A B C D ans
In phase 2 of the Data
The team performs ETL and
Anlaytics Life Cycle, the team
28 3 3 2 4 6 ELT and ETLT in 2nd phase of
performs how many analytics
the cycle.
to get the data in the sandbox.
The total area under the bell Area under the bell curve is 1
29 3 1 2 3 4
curve is____unit. unit.
Wilcoxon rank-sum test is also Wilcoxon rank-sum test is also
30 1 Mann-Whiteney U test Mean Difference Alternative Hypothesis Null Hypothesis
known as? called Mann- Whiteney U Test.
Which test is also known as T-
31 1 Hypothesis Test Mean Difference K-means test None This is an explaination.
test?
This eqn is of Mean difference
32 1 This equation is of which test? Mean Difference K-Means Null Hypothesis Alternative Hypothesis
test.
A test of a statistical A test of a statistical hypothesis,
hypothesis, where the region of where the region of rejection is
33 1 rejection is on a side of the One tailed test Two-tailed test Tailed test Null test on only one side of the sampling
sampling distribution, is distribution, is called a one-tailed
called___________. test
How many types of Statical There are two types of Statical
34 1 2 3 4 6
Hypothesis is there? Hypothesis.
Analysis of Variance is also ANOVA stands for Analysis of
35 1 ANOVA Mean Difference Alternative Hypothesis Null Hypothesis
refered as? Variance.
How many steps are involved There are 4 steps in Hypothesis
36 1 4 2 3 5
in a Hypothesis Testing? testing.
The strength of evidence in The strength of evidence in
37 2 support of a null hypothesis is P-value K-value H-value Null-value support of a null hypothesis is
measured by? measured by the P-value.
Difference in means is also Difference in means is also
38 2 Two sample t-test T- test M-test Two sample test
called? known as two sample t test.
The k-medoids is also The k-medoids is also called
Partitioning Around Medoids
39 2 called_______________ Lloyd's Algorithm Poisson's Algorithm Regression partitioning around medoids
(PAM)
algorithm. (PAM) algorithm .
Clustering is an example of Clustering is an example of
40 2 Unsupervised Learning Supervised Learning Classification Regression
____? unsupervised learning.
Which of the following is not an
41 2 advantage of K means Requires a Priori Fast Robust easy to evaluate. This is an explaination.
Clustering?
The probability of committing a The probability of committing a
42 2 Beta Alpha Delta Theta
Type 2 error is called Type II error is called Beta
The______ variation we have
The less variation we have within
within clusters, the more
clusters, the more homogeneous
43 2 homogeneous (similar) the data Less More Variable Fixed
(similar) the data points are
points are within the same
within the same cluster.
cluster.
Which hypothesis is usually the Null Hypothesis is usually the
hypothesis in which sample hypothesis that sample
44 2 Null-Hypothesis Mean Difference K-means test Alternative Hypothesis
observations result is purely observations result purely from
from chance? chance.
Classical" ANOVA for
Classical" ANOVA for balanced
45 2 balanced data does how many 3 2 1 4
data does three things at once.
things at once?
K-mean clustering is used to NP hard problems are solved
46 2 NP-hard problems NP Problems Hypothesis Problems P problems
solve which problems? using K means clustering.
The probability of committing a The probability of committing a
47 2 Alpha Beta Gama Delta
Type I error is called? Type I error is called alpha
K means Clustering is also K means clustering is also called
48 2 Lloyd's Algorithm Gaussian Algorithm Poisson's Algorithm None
known as? Lloyds algo.
Which algorithm requires the k-means clustering requires the
49 3 user to specify the number of K-means clustering Gaussian Algorithm Alternative Hypothesis Null Hypothesis user to specify the number of
clusters k to be generated. clusters k to be generated.
K means clsutering uses which expectation-maximization
50 3 approach to solve the Expectation-maximization Greedy Approach Divide and Conquer None technique is used by k means
problems? clustering.
How many factors affect the The power of a hypothesis test is
51 3 3 2 1 4
power of a hypothesis test? affected by three factors.
Law of variance is also called
52 3 Law of Variance is called? Eve's Law Laplace Law Poisson's Algorithm Regression
Eve's law.
K-Medoids use which K Medoids use greddy
53 3 Greedy Approach Divide and Conquer Recursive None
approach to solve problems? approach to solve problems
The time complexity of k Time complexity is O(n^2) of k
54 3 O(n^2) O(nlogn) O(n) O(1)
means clustering is? means clustering.
the number (k ) of clusters
The number k of clusters
55 3 assumed in k-medoids is Priori Null Hypothesis ANNOVA
OptimusPrime Effect size Page 97
assumed known as priori.
known as?
marks question A B C D ans
The effect size is the difference
What is the difference between
between the true value and the
56 3 the true value and the value Effect -size Null Hypothesis Alternative Hypothesis ANOVA
value specified in the null
specified in the null hypothesis.
hypothesis.
Time complexity of k medoids
57 3 O(n^2) O(nlogn) O(n) O(n^3) This is an explaination.
is?
Which algorithm aims at K means algorithm aims at
58 3 minimizing an objective function K-means Mean Difference Alternative Hypothesis ANOVA minimizing an objective function
know as squared error function know as squared error function
Which algorithm was the
Apriori Algorithm was earliest in
59 1 earliest of the association rule Apriori Algorithm Gaussian Algorithm K means clustering Bernoulli Distribution.
the association of algorithms.
algorithms?\n
The Apriori algorithm takes The Apriori algorithm takes a
a______ iterative approach to bottom-up iterative approach to
60 1 uncovering the frequent Bottom-Up Top-Down Recursive None uncovering the frequent itemsets
itemsets by first determining all by first determining all the
the possible items possible items
Apriori uses breadth-first search
Apriori uses which structure to
and a Hash tree structure to
61 1 count candidate item sets BFS DFS Queue Stack
count candidate item sets
efficiently?
efficiently
"y=a+b*x^2". This equation
62 1 Polynomial Regression Logistic Regreasion Linear Regression Lasso Regression This is an explaination.
shows which regression?
__________ is defined as the Confidence is defined as the
measure of certainty or measure of certainty or
63 2 Confidence Recursion Item-set None
trustworthiness associated with trustworthiness associated
each discovered rule. with\neach discovered rule.
In which Regression, we In Logistic Regression, we
64 2 Logistic Regression Linear Regression Both None
predict the value by 1 or 0? predict the value by 1 or 0.
The formula for linear The formula for linear regression
65 2 Y’ = bX+A Y’ = bX - A. Y’ = bX /A. Y’ = bX * A.
regression is: is: Y’ = bX + A.
Which regression is useful PLS regression is also useful
Partial Least Squares(PLS)
66 2 when there are a large number Cox Regression Lasso Regression Logistic Regression when there are a large number of
Regression
of independent variables. independent variables.
Which regression is an Simple linear regression is an
67 2 approach for predicting a Linear-Regression Logistic Regreasion Elasticnet Regression None approach for predicting a
response using a single feature. response using a single feature.
Association rule mining consists Association rule mining consists
68 2 2 3 4 5
of _______ steps. of 2 steps
Which type of regression is Ordinal regression is suitable
69 2 suitable when dependent Ordinal Regression Linear Regression Cox Regession Logistic Regression when dependent variable is
variable is ordinal in nature? ordinal in nature
Which regression is used for ElasticNet regression is used for
70 2 ElasticNet Regression Linear Regression Logistic Regression None
support vector machines support vector machines,
Which regression can solve Support-Vector Regession can
71 2 both linear and non-linear Support Vector Regression Linear Regression Logistic Regression ElasticNet Regression solve both linear and non linear
models? models.
Which is the most common Least Square Method is the most
72 2 method used for fitting a Least Square Method Mean Difference Null Hypothesis Classification common method used for fitting
regression line a regression line
_______problems are when A regression problem is when
73 2 the output variable is a real or Regression Classification Recursive Hypothesis the output variable is a real or
continuous value. continuous value.
Linear Regression is a machine
Linear Regression is a machine
learning algorithm based on
74 2 Supervised Learning Unsupervised Learning Recursive Learning All learning algorithm based on
______ learning regression
supervised regression algorithm.
model.
When dependent variable's
When dependent variable's
variability is not equal across
variability is not equal across
75 2 Heteroscedasticity Homooscedasticity Multicolinearity Outliers. values of an independent
values of an independent
variable, it is called
variable, it is called
heteroscedasticity
_________requires large Logistic Regression requires
sample sizes because maximum large sample sizes because
76 2 likelihood estimates are less Logistic Regression Linear Regression Lasso Regression ElasticNet Regression maximum likelihood estimates
powerful at low sample sizes are less powerful at low sample
than ordinary least square sizes than ordinary least square
PCR Regression is divided into PCR regression is divided into 2
77 2 2 3 4 5
how many steps? steps
78 3 L2 regularization is also called? Tikhonov Regularization Norm Regularization Poisson's Regularization None This is an explaination.
When the variance of count When the variance of count data
79 3 data is greater than the mean Overdispersion Underdispersion Dispersion High dispersion is greater than the mean count, it
count, it is a case of? is a case of overdispersion
OptimusPrime Page 98
marks question A B C D ans
Which regression assumes the Linear regression assumes the
80 3 normal distribution of the Linear-Regression Logistic Regreasion Elasticnet Regression None normal or gaussian distribution of
dependent variable? the dependent variable.
Nature of predicted data in Nature of predicted data in
81 3 Ordered Unordered Both None
regression is? regression is ordered.
Which regression uses a binary Logistic regression uses a binary
82 3 dependent variable but ignores Logistic Regression Linear Regression Cox Regession Lasso Regression dependent variable but ignores
the timing of events. the timing of events.
The Ridge Regression is also The ridge regression is also
83 3 Shrinkage Regression Percentile Regression Elasticnet Regression Lasso Regression
known as? known as Shrinkage Regression.
In which regression, we In Linear Regession we calculate
calculate Root Mean Square Root Mean Square
84 3 Linear-Regression ElasticNet Regression Logistic Regression All
Error(RMSE) to predict the Error(RMSE) to predict the next
next weight value. weight value.
The______ is the standard The residual standard error is the
85 3 deviation of the observed Residual standard error Mean Difference Error Data Error All standard deviation of
residuals. the\nobserved residuals.
Which Regression is used Poisson regression is used when
86 3 when dependent variable has Poisson Regression Linear Regression Cox Regession Lasso Regression dependent variable has count
count data. data.
________________regression
Quasi-Poisson regression can
can handle both over-
87 3 Quasi-Poisson regression Cox Regression Elasticnet Regression Linear Regression handle both over-dispersion and
dispersion and under-
under-dispersion.\n
dispersion.\n
___ is the regularization
λ is the regularization parameter
88 3 parameter in Lasso λ θ Ω β
in lasso regression.
Regression?
Decision Tree is a hierarchical Decision Tree is a hierarchical
model that does the separation model that recursively does the
89 1 Recursion Pointers Greedy Approach Divide and Conquer
of the\ninput space into class separation of the\ninput space
regions using: into class regions
Learning Algorithm of Decision Decision Tree uses greedy
90 1 Greedy Approach Divide and Conquer Both None
Tree is: approach for learning algorithm.
Normal Distribution is also
91 1 Gausiann Distribution Bernoulli Distribution Naïve Bias Binary Distribution This is an explaination.
called?
Classification has how many There are 2 phases of
92 1 2 3 4 5
phases: classification.
"Every pair of features being Naïve Bias uses the principle that
classified is independent of every pair of features being
93 1 Naïve Bais Classifier Decision Tree Bernoulli Distribution Normal Distribution
each other".This principle is classified is independent of each
used by: other.
This equation is of which
94 2 Gausiann Distribution Binary Distribution Naïve Bias Gross-Entrpoy This is an explaination.
theorem?
In Naïve Bias, The Datasets
data sets are divided into two
95 2 are divided into how many 2 3 4 5
types in naïve bias.
types?
Decision trees can be used to Decision trees can be used to
96 2 predict non-categorical values Regression Trees Categorial trees Normal tree None predict non-categorical values is
is called? called regression trees
An attribute with____Gini
an attribute with lower Gini index
97 2 index should be preferred in a Lower Higher Recursive Negative
should be preferred.
decision tree.
In Naïve Bias, if any two If any two events A and B are
98 2 events A and B are P(A,B)=P(A)P(B) P(A,B)=P(A)/P(B) P(A,B)=P(B) P(A,B)=P(B)P/(A) independent,
independent, then, then,P(A,B)=P(A)P(B)
What is the measure of
Entropy is the measure of
99 2 uncertainty of a random Entropy. Gain Gini Index None
uncertainty of a random variable
variable in a decision tree.
Which of the following is not
100 2 Stable Easy to understand Easy to explain Easy to evaluate. this is an explaination.
true for decision trees?
Decision tree algorithm falls Decision tree algorithm falls
101 2 under the category of which Supervised Unsupervised Regression Classification under the category of supervised
learning? learning
False Positives and False One of the use Bayes Theorem is
102 2 Negatives is an application of Bayes' Theorem Binary Distribution Bernoulli Distribution Normal Distribution false positives and false
which theorem? negatives.
Decision Tree used in mining
There are 2 types of decision
103 2 the data are of how many 2 3 4 5
trees used in data mining.
types?
In Bayes' Theorem, P(A) and
P(A) and P(B) are the
P(B) are the probabilities of
probabilities of observing A and
104 3 observing A and B Marginal Probability Normal Distribution Bernoulli Distribution Parallel Algorithm.
B respectively; they are known
respectively; they are known OptimusPrime Page 99
as the marginal probability.
as:
marks question A B C D ans
ID3 Algorithm in a decision ID3 stands for Iterative
105 3 Iterative Dichotomiser 3 (ID3) Interval Driven Interconnected Decision None
tree stands for? Dichotomiser 3 (ID3)
Probably the best way of
Probably the best way of
estimating performance for very
106 3 estimating performance for Boot Strapped Method Normal Distribution Naïve Bias Binary Distribution
small data sets is bootstrapped
very small\ndata sets is:
method
The Decision Tree works on Decision Tree works on
107 3 Disjunctive Normal Form Product of Sum Bijective Form Conjuctive Form
which form? Disjunctive normal form.
The decoupling of the class The decoupling of the class
conditional feature distributions conditional feature distributions
108 3 means that each distribution 1-D 2-D 3-D NONE means that each distribution can
can be independently estimated be independently estimated as a
as a________ distribution. one dimensional distribution.
Theoretical concept to evaluate
109 3 COLT PAC Model Naïve Bias Prediction. This is an explaination.
Classfiers is:
____________is a metric to Gini Index is a metric to measure
measure how often a randomly how often a randomly chosen
110 3 Gini Index Entropy Pointer Gross-Entrpoy
chosen element would be element would be incorrectly
incorrectly identified identified
The most notable types of The most notable types of
111 3 3 2 1 4
decision tree algorithms are: decision tree algorithms are 3
Which process is completed The recursive partition is
when the subset at a node all completed when the subset at a
112 3 Recursive Partitioning Termination Transformation Prediction.
has the same value of the target node all has the same value of
variable? the target variable
The_______ method reserves The holdout method reserves a
113 3 a certain amount for testing and Holdout Parallel Algorithm Naïve Bias Normal Distribution certain amount\nfor testing and
uses the remainder for training. uses the remainder for training
This equation is of which
114 3 Bayes' Theorem Normal Distribution Bernoulli Distribution Gross-Entrpoy This is an explaination.
theorem?
"Independence among the Independence among the
115 3 features". This is an assumption Naïve Bais Classifier Bernoulli Distribution Parallel Algorithm Binary Distribution features is an assumption in
in: Naïve bias.
Error rate obtained from error rate obtained from training
116 3 Resubstitution Error Grid Gini Index True error
training data is called: data is called resubstitution error.
In Decision Tree entropy is
117 3 proportional inverse High Less This is an explaination.
__________ to content.
In Decision Tree, No root-to-
No root-to-leaf path should
leaf path should contain the
118 3 Twice Once Thrice Four Times. contain the same discrete
same discrete attribute
attribute twice
____________.
Using_________, designers
Using data visualization methods,
can make information
119 1 Data Visualization Classification Regression Supervised Learning. designers can make information
understandable for
understandable for stakeholders.
stakeholders.
The additional visual methods
120 1 All Tree Map Parallel Coordinates Semantic Networks. This is an explaination.
include:
Data Visualization tools
121 1 Ms--Excel Tableau Power BI Jupyter This is an explaination.
Doesn’t include:
Which of the following requires
122 1 Javascript Knowledge to run All Chart.js Polymap Sigmajs This is an explaination.
the visualization tool?
Merits of Tableau doesn’t Merits of tableau doesn’t include
123 1 Cost Performance Usage Computation
include which factor: the cost factor.
Which of these is not a type of
124 1 Pictograph Bar-Graph Line-Chart Pie-Chart This is an explaination.
Big Data Visualization.
The drag-and-drop editor od
The drag-and-drop editor of
which tool makes it easy to
Infogram makes it easy to create
125 2 create professional-looking Infogram Google Chart Tableau Grafana
professional-looking designs
designs without a lot of visual
without a lot of visual design skill.
design skill.
How many V's are defined for There are 4 V's of Data
126 2 4 6 2 3
Data Visualization. visualization.
Which of the following is not a Tableau is a chargeable tool of
127 2 Tableau Google Chart Jupyter Hub-Spot CRM
free Data Visualization tool? data visualization.
Companies that work with
Companies that work with both
both traditional and big data
traditional and big data may use
128 2 use which technique to look at Pie-Chart Bar-Graph Stream graph Line-Chart
pie chart to look at customer
customer segments or market
segments or market shares
shares?
Visualization of Data includes
129 2 which of the following All Information Loss Visual Noise Large Image Perception. This is an explaination.
problems: OptimusPrime Page 100
Mainly, Data Visualization has There are 5 main challenges to
130 2 5 6 4 2
how many types of challenges? data visualization.
marks question A B C D ans
Google charts uses
Which tool uses HTML5/SVG
131 2 Google Charts Jupyter Grafana Tableau HTML5/SVG since its browser
to visualize data
compatible.
According to Colin Ware’s According to Colin Ware’s
Information Visualization: Information Visualization:
132 2 Perception for Design, he 4 2 1 3 Perception for Design, he defines
defines_____ pre-attentive four pre-attentive visual
visual properties. properties
_____ is based on space-filling Tree map method is based on
133 2 visualization of hierarchical Tree-Map Stream graph Bar-graph Line-Chart space-filling visualization of
data. hierarchical data
Which graph shows the Gantt chart show the
dependency relationships dependency relationships
134 2 Gantt-Chart Line-Chart Pie-Chart Bar-Graph
between activities and current between activities and current
schedule status. schedule status.
Another name for distribution Non parametric data is also
135 2 Non parametric data Parametric Data static data Dynamic data
free data is: called distribution free data.
Which chart is used for Bar Graph is used for
comparison of values, such as Comparison of values, such as
136 2 sales performance for several Bar-Graph Gantt-Graph Line-Chart Pie-Chart sales performance for several
persons or businesses in a persons or businesses in a single
single time. time
Graphical Techniques are
_____________are graphics
graphics in the field of statistics
137 2 in the field of statistics used to Graphical-Techniques Line-Chart Regression Classification
used to visualize quantitative
visualize quantitative data.
data.
_____ can handle several Parallel Coordinates can handle
factors for a large number of several factors for a large
138 2 objects per single screen, so it Parallel Coordinates Stream graph Google Chart Jupyter number of objects per single
satisfies the data variety screen, so it satisfies the data
criterion. variety criterion
Chart.js provides how many
139 3 8 5 3 6 This is an explaination.
types of charts?
Which visualization tool
Grafana supports mixed data
supports mixed data sources,
sources, annotations, and
annotations, and customizable
140 3 Grafana Tableau Google Chart Jupyter customizable alert functions, and
alert functions, and it can be
it can be extended via hundreds
extended via hundreds of
of available plugins.
available plugins.
Which tool was created Datawrapper was created
141 3 specifically for adding charts Data Wrapper Tableau Google Chart Jupyter specifically for adding charts and
and maps to news stories. maps to news stories.
Conventional Visualization Mekko chart is a new technique
142 3 Mekko Chart Pie-Chart Bar-graph Histogram
methods doesn’t include: to visualize data.
_____________ is a type of a Streamgraph is a type of a
stacked area graph, which is stacked area graph, which is
143 3 displaced around a central axis, Streamgraph Bar-Graph Pie-Chart Line-Chart displaced around a central axis,
resulting in flowing and organic resulting in flowing and organic
shape. shape
Which visual tool includes over
Fusion charts includes over 150
144 3 150 chart types and 1,000 Fusion charts Tableau Google Chart Jupyter
chart types and 1,000 map types
map types?
Which graph/chart is a
A semantic network is a
graphical representation of
graphical representation of
logical relationship between
logical relationship between
different concepts. It generates
145 3 Semantic Networks Bar-Graph Pie-Chart Line-Chart different concepts. It generates
directed graph, the
directed graph, the combination
combination of nodes or
of nodes or vertices, edges or
vertices, edges or arcs, and
arcs, and label over each edge
label over each edge.
According to SAS we can According to SAS we can
process only______ of process only 1 kilobit of
146 3 1 Kilobit 1 Byte 1 Bit 1 MB
information per second on a information per second on a flat
flat screen. screen
There are____ steps for
147 3 4 5 3 6 This is an explaination.
interactive data visualization:
When working with big data, When working with big data,
companies can use which companies can use the line chart
visualization technique to track visualization technique to track
148 3 total application clicks by Line-Chart Bar-Graph Pie-Chart Stream graph total application clicks by weeks,
weeks, the average number of the average number of
complaints to the call center by complaints to the call center by
months, etc.\n\n months, etc.\n\n
Which of the following
149 1 All Facebook Netflix Adobe This is an explaination.
Enterprises use HBase? OptimusPrime Page 101
marks question A B C D ans
Which NLP is used in the From 2010, Neural NLP is
150 1 Neural NLP Symbolic NLP Statical NLP None
present era? being used.
The Computer World magazine The Computer World magazine
states that unstructured states that unstructured
151 1 information might account for 70-80% 0.9 0.5 0.6 information might account for
more than______of all data in more than 70%–80% of all data
organizations. in organizations.
Almost all of the information Almost all of the information we
we use and share every day, use and share every day, such as
152 1 such as articles, documents and Unstructured Structured Semantic None articles, documents and e-mails,
e-mails, are are completely or partly
completely___________. unstructured
The Unstructured Information
Which standard provided a Management Architecture
common framework for (UIMA) standard provided a
Unstructured Information
processing information to Management common framework for
153 1 Management Architecture Data Architecure None
extract meaning and create Architecture for Data processing this information to
(UIMA)
structured data about the extract meaning and create
information? structured data about the
information.
The base Apache Hadoop The base Apache Hadoop
154 2 framework is composed of the 4 2 3 6 framework is composed of the
how many modules? four modules.
No-SQL doesn’t include
155 2 MS-SQL HBASE DyanoDB MongoDB This is an explaination.
which software?
There are _______main types There are 3 types of OLAP
156 2 3 2 5 6
of OLAP systems. systems.
SQL alternative in Apache HIVE-QL is the alternative to
157 2 HIVEQL BASEQL SPARK-QL H-QL
HIVE is called? SQL in Apche Hive family.
MapReduce program executes MapReduce program executes in
158 2 3 2 5 4
in how many stages? three stages.
How many types of NO-SQL There are 4 types of databases in
159 2 4 3 2 6
database are there? NO-SQL.
MapReduce is a processing
MapReduce is a processing
technique and a program
technique and a program model
160 2 model for distributed JAVA Python C++ R
for distributed computing based
computing based on which
on java
programming Language?
Hive supports how many Hive supports all four properties
161 2 4 3 2 1
properties of transactions? of transactions
HDFS consists of only one
HDFS consists of only one
162 2 Master Node Slave Node Both None Name Node that is called the
Name Node that is called as?
Master Node.
Which Apache Software is
needed to process massive Hbase to process massive
163 2 amounts of data for the Apache HBASE Apache Spark Apache-PIG Apache-mahout amounts of data for the purposes
purposes of natural-language of natural-language search
search?
Which database store data in a No-sql databases that store data
164 2 format other than relational NO-SQL HIVESQL SPARK-QL H-QL in a format other than relational
tables tables.
Which is a project of the Mahout is a project of the
Apache Software Foundation Apache Software Foundation to
to produce free produce free implementations of
165 2 implementations of distributed Apache Mahout Apache Spark Apache-PIG Apache HBASE distributed or otherwise scalable
or otherwise scalable machine machine learning algorithms
learning algorithms focused focused primarily on linear
primarily on linear algebra? algebra.
MapReduce model is a
Which model is a specialization
specialization of the split-apply-
166 2 of the split-apply-combine MapReduce Hadoop HBASE HIVE
combine strategy for data
strategy for data analysis?
analysis.
All Hadoop commands are
All Hadoop commands are invoked by the
167 2 $HADOOP_HOME/bin/hadoop $HADOOP/bin/hadoop $HADOOP_HOME/hadoop $HADOOP_HOME/bin
invoked by which command? $HADOOP_HOME/bin/hadoop
command
The table typically enforces the The table typically enforces the
schema when the data is schema when the data is loaded
loaded into the table. This into the table. This enables the
enables the database to make database to make sure that the
168 3 sure that the data entered Schema on Write Schema on Read Schema for Read Write None data entered follows the
follows the representation of representation of the table as
the table as specified by the specified by the table definition.
table definition. This design is This design is called schema on
called? OptimusPrime write. Page 102
marks question A B C D ans
Which command formats the Namenode -format command
169 3 Namenode -format Node -format Name -format Format
DFS filesystem? formats the DFS file system.
Which command applies the
oiv applies the offline fsimage
170 3 offline fsimage viewer to an oiv fs fc ov
viewer to an fsimage.
fsimage?
Hadoop requires which Java
Hadoop requires Java Runtime
171 3 Runtime Environment (JRE) or 1.6 1.2 1.5 1
Environment (JRE) 1.6 or higher
higher version?
Every Data node sends a
Every Data node sends a
Heartbeat message to the
Heartbeat message to the Name
172 3 Name node every____ 3 2 4 1
node every 3 seconds and
seconds and conveys that it is
conveys that it is alive
alive.
HDFS can store upto1 TB of
173 3 HDFS can store files upto: 1 TB 1 GB 1ZB 1PB
files.
Which of the following is a HBASE is a popular wide
174 3 HBase SQL DyanoDB MongoDB
wide-column store? columnn store.
Which node acts as both a A slave or worker node acts as
175 3 DataNode and TaskTracker in Slave Node Data Node Admin Node Name Node both a DataNode and
Hadooop. TaskTracker.
HDFS system uses which HDFS system uses TCP/IP
176 3 TCP/IP TCP UDP IP
protocol for communication? sockets for communication
177 3 HDFS has how many services? 5 4 2 6 HDFS has five services.
____________is a data
HIVE is a data warehouse
warehouse software project
software project built on top of
178 3 built on top of Apache Hadoop Apache HIVE Apache Spark Apache-PIG Apache HBASE
Apache Hadoop for providing
for providing data query and
data query and analysis
analysis
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 3. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 4. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : ndarray
B : spatial
C : ndimage
D : special
A : Single point
B : Line
C : 2-D Plane
A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data
A : Matlab
B : Scilab
C : Scipy
D : Numpy
Q.no 9. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. ------------- function is used to plot a histogram using matplotlib library
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability
B : Gini Index
C : Regression
D : Association
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 16. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : prod()
B : mult()
C : dot()
D:*
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 20. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A:0
B : -1
C:1
D : -2
Q.no 22. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : KNN
C : Decision trees
D : Cluster analysis
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 28. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 30. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : NoSQL data
B : YouTube data
A : EPS
B : PDF
C : PNG
D : PS
Q.no 36. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 38. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 42. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 43. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 46. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 52. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 53. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 2. ----------- data that depends on data model and resides in a fixed field within
a record.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 5. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
Q.no 9. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 10. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 11. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Un- Supervised
B : Supervised
C : Association
D : correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : KNN
C : Regression
D : Decision Tree
A : Classification
B : Regression
C : Clustering
D : Association
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
Q.no 19. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 20. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 21. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 22. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 23. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
A:1
B : -1
C:0
D:2
Q.no 25. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 26. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 27. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : KNN
C : Regression
D : Cluster analysis
Q.no 29. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
A : Java
B : Ruby
C:R
D : None of these
A:0
B : -1
C:1
D : -2
Q.no 33. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 34. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 37. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 38. ------------------- is a one dimensiional array defined in pandas that can be
used to store any data type.
A : Dict
B : series
C : ndarray
D : list
Q.no 39. To read image from a file into an array --------------- function is used.
A : matplotlib.pyplot.imshow()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 42. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 45. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 46. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 47. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 49. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 51. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Answer for Question No 1. is c
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 2. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 3. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 4. To save or write dataframe data into csv file -------- function is used
A : write_csv()
B : write_file()
C : csv_read()
D : to_csv()
A : Regression
B : Decision trees
C : KNN
D : SVM
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 9. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 11. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 14. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 15. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 19. -------------- charts represents categorical data with retangular bars
A : Bar
B : Line
C : Scatter
D : Histogram
A : Random
B : sequential
C : Same
Q.no 21. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
Q.no 22. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
Q.no 23. ------ is a classification technique relies on the naïve assumption that
input variables are independent of each other.
A : KNN
B : NAïve Bayes
C : Regression
Q.no 24. ----------- phase of the data analytics lifecycle usually takes the longest
time.
A : Data Preparation
B : Model Planning
C : Model Building
D : Communicate Results
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 28. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 31. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 32. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 33. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Website data
B : YouTube data
Q.no 36. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : EPS
B : PDF
C : PNG
D : PS
Q.no 38. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
C : Measures growth
Q.no 40. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 41. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 44. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 46. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 47. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 49. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 50. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 51. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 52. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 54. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 55. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 56. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 57. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 58. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
A : display()
B : head()
C : describe()
D : sort()
Q.no 60. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Un- Supervised
B : Supervised
C : Both of these
D : None of These
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
B : YouTube data
D : Sensor data
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 7. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : imsave()
B : imread()
C : read()
D : None of these
Q.no 10. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 12. In numpy array , array indices always starts from --------
A:1
B : -1
C:0
D:2
Q.no 13. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 14. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : Classification
B : Regression
C : Clustering
D : Association
Q.no 16. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering
Q.no 20. ---------- plot displays information as series of data points connected by
straight lines.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 21. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
A : NoSQL data
B : YouTube data
C : Text File data
Q.no 23. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 24. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 25. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 26. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
Q.no 27. Which function returns the identity array with n x n dimension with its
main diagonal set to ones and all other elements to zero.
A : numpy.ones()
B : numpy.zeros()
C : numpy.fill()
D : numpy.identity()
Q.no 28. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : Pandas
B : Numpy
C : matplotlib
D : ndarray
Q.no 30. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 31. A ---------- is an example of the most widely used machine learning
algorithms much of its popularity is because it can be adapted to almost any type
od data.
A : Clustering
B : Regression
C : Decision trees
D : Apriori
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 33. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 38. Broadcasting is a powerful technique that allows numpy to work with
arrays of ------------- .
A : Same Shapes
B : Different Shapes
C : Same values
D : Different values
Q.no 39. If scatter diagram is drawn and all scatter points lie on a straight line
then it indicates-------
A : No correlation
B : Perfect correlation
C : Regression
D : Skewness
Q.no 40. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 41. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 45. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
D : Support vector machine
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 48. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 51. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 53. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 54. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 57. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 58. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 59. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 2. The procedure to organize items of a given collection into groups based on
some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
Q.no 3. ------------- is fundamental library used for scientific computing
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 4. -------- function is used to add a title to each axis instance in a figure.
A : set_title()
B : get_title()
C : set_label()
D : title()
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 6. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
Q.no 8. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 11. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 12. If number of input features are 3 then optimal hyperplane in support
vector machine is -------------
A : Single point
B : Line
C : 2-D Plane
Q.no 13. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 14. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 15. ----------------- analysis estimates the relationship between single dependent
variable and single independent variable
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
Q.no 16. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
Q.no 17. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
Q.no 18. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
Q.no 21. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 22. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
Q.no 23. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 24. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 25. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A:3
B:5
C:1
D : 10
C : Measures growth
Q.no 29. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : class distribution
B : test on an attribute
D : class labels
Q.no 31. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 33. Pandas provide ----------- function as the entry point for all standard
database join operations while merging two DataFrame objects.
A : concat()
B : replace()
C : merge()
D : add()
Q.no 34. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
A : NoSQL data
B : YouTube data
Q.no 36. ------------the step is performed by data scientist after acquiring the data.
A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 38. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 39. ------- is basic data structure of pandas can be think of SQL table or a
spreadsheet data representation.
A : Dataframe
B : series
C : list
D : ndarray
Q.no 40. ------------- regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 41. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 42. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : display()
B : head()
C : describe()
D : sort()
Q.no 44. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 45. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
A : Train- 70%, Test - 30%
Q.no 46. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 48. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 49. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 50. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 51. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 52. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 53. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 54. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 55. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 56. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 59. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
B : Association Rule Mining
C : Clustering
Q.no 60. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Answer for Question No 1. is b
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 3. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
Q.no 4. -------------- data does not fits into a data model due to variatins in contents.
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : numpy.random.ran()
B : rank
C : random.fill()
D : numpy.fillrandom()
A : YouTube data
B : Satellite data
C : Sensor data
Q.no 9. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 10. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Pandas
B : Numpy
C : Sympy
D : Scipy
Q.no 12. The -------- function creates a 2-D array with diagonal values 1 and rest
values zeros.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 13. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 14. The ---------- attribute specifies the number of dimensions or axes of the
array.
A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 17. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 18. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
Q.no 19. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 20. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 21. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)
Q.no 23. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
Q.no 25. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 27. What is the use of following function? Plt.xlabel("Total Marks")
C : Measures growth
Q.no 29. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 30. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 31. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 32. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
A : 3, 4, 5
B : 3,4,5,6
C : 2,3,4,5
D : 1,2,3,4,5
A : Correlation coefficient
B : Regression coefficient
C : Association coefficient
D : Probability
Q.no 35. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
A:3
B:5
C:1
D : 10
A:1
B : -1
C:0
D:2
A : KNN
C : Regression
D : Cluster analysis
Q.no 39. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : XML data
B : YouTube data
Q.no 41. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 42. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 43. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 44. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 48. --------- is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array.
A : Multiplation
B : Broadcasting
C : Addition
D : Flatten
Q.no 49. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 50. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 51. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 52. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 54. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 56. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 57. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 59. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 60. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Unsupervised learning makes sense of ------------- data without having any
predefined dataset for its training.
A : unlabled
B : labeled
C : semi-labled
D : Empty dataset
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 3. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : prod()
B : mult()
C : dot()
D:*
A : Single point
B : Line
C : 2-D Plane
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 7. ------ answers the questions like " How can we make it happen?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 8. Pandas provide ----------- method in order to get label based indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 11. The leaf nodes in decision trees returns the ---------
A : decision condition
B : class lables
C : decision on variables
D : test score
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 13. The -------- function creates a 2-D array with all values 0 (zeros).
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
Q.no 14. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Pandas
B : Numpy
C : Sympy
D : Scipy
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
A : KNN
C : Regression
D : Decision Tree
Q.no 19. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : Java
B : Ruby
C:R
D : None of these
Q.no 23. ------------ is 2-D data structure defined in pandas in which data arranged in
rows and columns.
A : Series
B : Dataframe
C : ndarray
D : list
Q.no 24. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 25. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 26. The -------- of a numpy array is a tuple of integers giving the size of the
array along each dimension.
A : axes
B : rank
C : shape
D : size
Q.no 27. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 28. --------- in decision tree measures how much information a feature gives us
about the class
A : Information Gain
B : Posterior probability
C : Prior probability
D : probability
Q.no 29. The process by which we estimate value of dependent variable on the
basis of one or more independent variables is called as -----------
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 30. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 31. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
A : Classification
B : Regression
C : Clustering
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
Q.no 35. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 36. In matplotlib ------------- function groups smaller axes that can exist
togather within a single figure.
A : subplot()
B : divide_figure()
C : add_fig()
D : group_fig()
A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 39. ---------- function used to add two numppy arrays elementwise.
A : numpy.add(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.addition(x1,x2)
Q.no 40. In this type of clustring each data type either belongs to acluster
completely or not.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
Q.no 41. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 43. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 44. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 45. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
Q.no 46. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Q.no 47. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 48. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 49. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 50. --------------- is basically extracting particular set of elements from an array.
A : Slicing
B : indexing
C : sorting
D : broadcasting
Q.no 51. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 52. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 53. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
C : Clustering
Q.no 54. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 56. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 58. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Bayes Theorem
B : Pythagorous Theorom
A : hist()
B : bar()
C : pie()
D : scatter()
Q.no 3. ------------ rule mining is a technique to identify underlying relations
between different items.
A : Classification
B : Regression
C : Clustering
D : Association
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
Q.no 5. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A:1
B : -1
C:0
D:2
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 8. ---------- function used to get positive square root of an numppy array
elementwise.
A : numpy.sqrt(x1)
B : numpy.mod(x1)
C : numpy.square(x1)
D : numpy.find(x1,2)
A : Un- Supervised
B : Supervised
C : semi-supervied
D : group
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 11. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 12. To import data from csv file into a dataframe ---------- function is provided
by pandas package.
A : read_csv()
B : read_file()
C : csv_read()
D : Frrom_csv()
Q.no 13. The -------- function creates a 2-D array with all values 1.
A : numpy. Ones()
B : numpy.zeros()
C : numpy.eye()
D : numpy.empty()
A : Un- Supervised
B : Supervised
C : Association
D : correlation
Q.no 15. In support vector machines if input features are 2 then the decision
boundries or hyperplane is ---------------.
A : 2-D plane
B : 3-D plane
C : Line
D : point
A : ndarray
B : spatial
C : ndimage
D : special
Q.no 17. ------------ uses a tree structure to specify sequences ofdecisions and
consequences.
A : Regression
B : Decision trees
C : KNN
D : SVM
Q.no 18. Numpy support this function to find trigonometric sine elementwise .
A : numpy.sin()
B : numpy.cosine()
C : numpy.tangent()
D : numpy.rad2sin(x1)
Q.no 19. The procedure to organize items of a given collection into groups based
on some similar features called as -------------
A : Regression
B : Clustering
C : Ddecion Trees
D : Association
A : save image
B : read image
C : copy image
D : show image
Q.no 21. -------------- models search the data space for areas of varied density of data
points in the data space.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
Q.no 22. Pandas provide ----------- method in order to get purly integer based
indexing.
A : iloc()
B : loc()
C : ix()
D : xloc()
Q.no 23. To rotate an image -------- function is used from scipy library.
A : rotation()
B : scipy.move()
C : scipy.ndimage.rotate()
D : scipy.flip()
A : KNN
C : Decision trees
D : Cluster analysis
Q.no 25. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
A : Non-linear
B : Linear
C : Both of these
D : None of These
Q.no 28. ------------------is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.
A : Decision tree
C : Clustering
Q.no 29. Which of the following is not used for 2-D Visualisation?
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
A : class distribution
B : test on an attribute
Q.no 32. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 33. A ------------ is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
Q.no 34. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 35. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
Q.no 36. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 37. --------- function from matplotlib.pyplot library plots bar graph for given
values of x and y.
A : plot()
B : draw()
C : bar()
D : linedraw()
A : set_title()
B : set_lable()
C : set_xlabel()
D : get_xlabel()
A : ndimage
B : ndarray
C : signal
D : io
Q.no 41. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Q.no 42. Which of the following task is not performed by Data Scientist.
C : Challenge results
D : Staff Recruitement
Q.no 43. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 44. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 48. When there is no impact on one variable when increse or decrese on
other variable then it is ------------
A : Perfect correlation
B : No Correlation
C : Positive Correlation
D : Negative Correlation
Q.no 49. For testing accuracy of a machine learning algorithm whole data set
should be devided into trainin and testing datasets. Which of the following is
good preportion for train-test spliting?
Q.no 50. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 51. Plot_number parameter from subplot() function can range from 1 to ------
A : nrows*ncols
B : max
C : nrows
D : ncols
Q.no 52. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 53. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 54. In this type of clustring instead of putting each data point into a separate
cluster a probability or likelihood of that data point to be in those clusters is
assigned.
A : Hard clustering
B : Soft Clustering
C : Medium clustering
D : Simple clustring
A : Regression
B : Continuous
C : Regressand
D : Independent
Q.no 56. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 58. Catelog design is complex process where the selection of items in a
business's catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith
A : Decision tree
C : Clustering
Q.no 59. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
Q.no 60. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : -1 and +1
B : -1 and 0
C : 0 and 1
D : 0 and infinite
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : imsave()
B : imread()
C : read()
D : None of these
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 7. The ----------- algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring
B : Regression
C : Naïve Bays
D : Apriori
B : Un-Structured data
C : Semi-Structured data
D : Scattered
A : save image
B : read image
C : copy image
D : show image
Q.no 10. Choose correct option for machine generated unstructured data.
A : Website data
B : YouTube data
D : Sensor data
Q.no 11. Which function is used to give title for the axes.
A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()
Q.no 12. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Information Gain
B : Probability
C : Regression
D : Association
Q.no 13. ------------ means part of population chosen for participation in the study
A : Population
B : Sample
C : Association
D : Correlation
A : YouTube data
B : Satellite data
C : Sensor data
A : imsave()
B : imread()
C : save()
D : isave()
Q.no 16. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 17. ------- answers the question "What will happen in future?"
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 18. ---------------- method is dataframe reads first n rows from dataframe
A : head(n)
B : tail(n)
C : first(n)
D : start(n)
Q.no 19. ----------- referes to the graphical represetation of information and data.
A : Data Visualization
B : Data mining
C : Data warehousing
D : Data Structures
A : NumPy
B : SciPy
C : sklearn
D : None of these
Q.no 21. -------- is uses a tree structure to specify sequence of decisions and
consequences.
A : KNN
B : NAïve Bayes
C : Regression
D : Decision Tree
Q.no 22. Which statement will create 5 x 5 array filled with all values 1
A : x=numpy.ones((5,5))
B : x=numpy.ones(5)
C : x=numpy.zeros((5,5))
D : x=numpy.eye((5,5))
Q.no 23. In matplotlib library ------------- module supports basic image loading,
rescaling and display operations.
A : picture
B : image
C : pyplot
D : sympy
Q.no 24. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 25. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
A : Java
B : Ruby
C:R
D : None of these
Q.no 27. The ----- algorithm is the simplest machine learning algorithm, which
building the model consists only of storing the training dataset. To make a
prediction for a new data point, the algorithm finds the closest data points in the
training dataset i.e its
A : Apriori
B : K-Nearest Neighbors
C : K-Means
D : Decision Trees
Q.no 28. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 29. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A : Classification
B : Regression
C : Clustering
D : Naïve bays
Q.no 32. In this type of algorithms inputs are provided but not the desired output.
A : Cluster analysis
C : Decision trees
D : Naïve bays
A : KNN
C : Regression
D : Decision Tree
Q.no 34. Which of the following is used as attribute selection measure in decision
tree algorithms?
A : Information Gain
B : Posterior probability
C : Prior probability
D : Support
Q.no 35. ----------- analysis finds the reasons behind success or failure in past
A : Descriptive
B : Prescriptive
C : Predictive
D : Probability
Q.no 36. A -----------------graph is a circular plot, divided into slices to show numerical
proportions.
A : Bar
B : Scatter
C : pie
D : line
Q.no 37. Support(B) =
Q.no 38. -----------is not one of the key data science skill.
A : Statistics
B : Machine Learning
C : Data Visualization
D : software tester
Q.no 39. ------------ is an indication of how frequently the itemset appears in the
dataset in association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 40. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using
A : Sample
B : Parameter
C : Population
D : Probability
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 42. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 43. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 44. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 45. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
Q.no 46. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 47. -------- is an unsupervised algorithm used for frequent itemset mining.
A : Apriori
C : Decision trees
D : Cluster analysis
Q.no 48. Which function from numpy used to return the truncated value of the
input elementwise?
A : round()
B : trunc()
C : del()
D : remove_decimal()
Q.no 49. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by-------------
A : Coefficient of Correlation
B : Coefficient of Determination
D : Probability
Q.no 50. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 51. Which of the following statement will create an axes at the top right
corner of the current figure
A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 53. --------- function performs the custom operations for the entire dataframe.
A : function()
B : surutine()
C : rutine()
D : pipe()
Q.no 54. The --------- argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.
A : right
B : on
C : sort
D : how
Q.no 55. Which of the following machine learning algorithm is used for maret
basket analysis means to analyze the association of purchased items in asingle
basket or single purchase.
A : Decision tree
Q.no 56. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 57. To save a figure into a file we can use ------------ method in the figure class
of matplotlib.pyplot.
A : save()
B : save_fig()
C : Figure()
D : save_image()
Q.no 58. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)
Q.no 60. Apriori algorithm uses breadth first search and ------------structure to
count candidate item sets efficiently.
A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree
Answer for Question No 1. is a
13329_DATA ANALYTICS
Time : 1hr
Max Marks : 50
N.B
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Simple Regression
B : Multiple regression
C : Correlation
D : Probability
A : KNN
B : NAïve Bayes
C : Decision Trees
D : Cluster analysis
Q.no 3. ------------ chart is a circular plot divides into sclices to show numerical
proportion.
A : Bar
B : Line
C : Scatter
D : Pie
Q.no 4. ------------ type of plots show all individual data points without connected
with lines.
A : Bar
B : Line
C : Scatter
D : Histogram
A : PCA
B : Decision Tree
C : Linear Regression
D : Naive Bayesian
A : 0 and 1
B : -1 and +1
C : -1 and 0
D : 0 and infinite
A:1
B : -1
C:0
D:2
Q.no 8. To import data from excel file into a dataframe ---------- function is
provided by pandas package.
A : read_csv()
B : read_file()
C : read()
D : read_excel()
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 10. Which of the following is not a raster image file format?
A : PNG
B : JPG
C : BMP
D : PDF
A : Bayes Theorem
B : Pythagorous Theorom
Q.no 12. ---- is an technique to learn from examples and experience, without being
explicitly programmed.
A : Machine Learning
B : Software Testing
C : Computer Science
D : Data mining
Q.no 13. -------- library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy
B : Scikit
C : Pandas
D : Numpy
A : imsave()
B : imread()
C : save()
D : isave()
A : pia charts
B : Bar charts
C : Andrews curves
D : Scatter plots
Q.no 16. ---------------- library from python provides efficient versions of a large
number of machine learning algorithms.
A : Pandas
B : Numpy
C : Scikit-Learn
D : image
Q.no 18. Which library from python is used for implementing machine learning
algorithms?
A : Scikit-Learn
B : Pandas
C : Matplotlib
D : Numpy
A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered
Q.no 20. ---------------- is about developing code to enable the machine to learn to
perform tasks and its basic principle is the automatic modeling of underlying that
have generated the collected data.
A : Data Science
B : Data Analytics
C : Data Warehousing
D : Data mining
Q.no 21. -------- is the measure of the likeihood that an event will occure in a
random experiment
A : Probability
B : Correlation
C : Regression
D : Sample
B : Support
C : Confidence
D : lift
A:3
B:5
C:1
D : 10
A : ndimage
B : ndarray
C : signal
D : io
Q.no 25. ------ module from sklearn gathers popular unsupervised clustering
algorithms.
A : sklearn.covariance
B : sklearn.base
C : sklearn.neighbors
D : sklearn.cluster
Q.no 26. ---------- function used to get arrays elementwise remainder of division
A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)
Q.no 27. Which of the following plots is not used for multidimensional
visualization?
A : Andrrews Curves
B : Prallel Chart
C : Deviation Chart
D : Bar
Q.no 28. --------------- searches for the linear optimal separating hyperplane for
separation of the data using essential training tuples called support vectors
A : Decision tree
C : Clustering
Q.no 29. From matplotlib------------------ module is used for plotting various plots.
A : Scilearn
B : Pyplot
C : Scilab
D : Matlab
Q.no 30. In ------------ the x-axes are grouped into bins and each bin will be treated
as a category.
A : Bar
B : Line
C : Scatter
D : Histogram
Q.no 31. If X and Y are both independent of each other, then correlation
coefficient is ---------
A:1
B : -1
C:0
D:2
Q.no 32. ----------- is an indication of how often the rule has been found to be true in
association rule mining.
A : Confidence
B : Support
C : Lift
D : None of These
Q.no 33. Among the following clustering algorithm types in which of the following
type the notion of similarity is derived by the closeness of a data point to the
centroid of the clusters.
A : Connectivity models
B : Centroid models
C : Distribution models
D : Density models
A:0
B : -1
C:1
D : -2
Q.no 35. ------- changes the the arrangement of items form array so that shape of
array changes while maintaining the same number of dimensions.
A : numpy. Reshape()
B : numpy. Empty()
C : numpy. Flatten()
D : numpy.ravel()
B : YouTube data
A : KNN
C : Decision trees
D : Cluster analysis
A : XML data
B : YouTube data
A : class distribution
B : test on an attribute
D : class labels
Q.no 41. Which of the following algorithm is used in Economics, Finance, Biology
etc, to model relationships between parameters of intrests.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
A : Regression
B : Continuous
C : Regressand
D : Independent
A : Regressor
B : Continuous
C : Regressand
D : Estimated
Q.no 44. Which of the following function is not used to iterate over the rows of the
DataFrame.
A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()
Q.no 45. ------------ analysis is a set of statistical processes for estimating the
relationships among dependent and independent variables.
A : Regression
B : Decision tree
C : KNN
D : None of These
Q.no 46. In unsupervised learning, scikit learn uses ------------------- method to infer
properties of the data.
A : extract()
B : transform()
C : infer()
D : classify()
Q.no 47. To reach to the final point and to make prediction , decision trees must
be traversed from ----------
A : Top - to - bottom
B : Bottom- to - Top
C : Left- to Right
D : Right - to - Left
Q.no 48. The -- ----- is characterized by a bell shapped curve and area under curve
represents probabilities
A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability
Q.no 49. Which of the following function is used to split a figure into nrows*ncols
sub-axes.
A : plot()
B : draw()
C : bar()
D : subplot()
B : Selecting dataset
C : Data preprocessing
D : Data modeling
Q.no 51. ----------- function from scipy is used to calculate the distance between all
pairs of points in a given set.
A : scipy.spatial.distance()
B : scipy.spatial.distance.measure()
C : scipy.spatial.distance.cdist()
D : distance(x1,y1)
Q.no 52. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.
A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()
A : Axes
B : Canvas
C : Figure
D : FigureCanvas
Q.no 54. ---------- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.
A : Decision tree
C : Clustering
A : Entropy
B : Support
C : Confidence
D : lift
Q.no 57. To determine basic salary of a employee when his qualification is given is
a ----------- problem
A : Correlation
B : Regression
C : Association
D : Qualitative
Q.no 58. The statement subplot( 4,3,5) will divide figure into ------- and specify
plotting sholud be done on plot number-----------
A : 4 x 3, 5
B : 3x 4, 5
C : 3 x 5, 4
D : 5x 3, 4
Q.no 59. ------------ algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.
A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays
Q.no 60. --------- function is used to display an image through an external viewer in
scipy.
A : display()
B : imread()
C : imshow()
D : show()
Answer for Question No 1. is a