0% found this document useful (0 votes)
904 views8 pages

Bda MCQ Set

The document contains a set of 50 multiple choice questions related to big data analytics concepts and technologies such as Hadoop, MapReduce, HDFS, YARN, Apache Spark, machine learning, and NoSQL databases. The questions cover topics including the main components of Hadoop, characteristics of big data, MapReduce algorithms, HDFS architecture, Spark SQL, machine learning algorithms and techniques, and NoSQL databases.

Uploaded by

akshay Baleshgol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
904 views8 pages

Bda MCQ Set

The document contains a set of 50 multiple choice questions related to big data analytics concepts and technologies such as Hadoop, MapReduce, HDFS, YARN, Apache Spark, machine learning, and NoSQL databases. The questions cover topics including the main components of Hadoop, characteristics of big data, MapReduce algorithms, HDFS architecture, Spark SQL, machine learning algorithms and techniques, and NoSQL databases.

Uploaded by

akshay Baleshgol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

BDA MCQ SET:

1.What are the main components of Hadoop?


A. Map Reduce B. HDFS C. YARN D. All of the above?
2. The Big data analytics work on the unstructured data, where no specific pattern of the data
is defined.
A. True B. False C. Can’t Say D. None of the above
3. Identify the incorrect big data Technologies.
A. Apache Pytorch B. Apache Kafka C. Apache Hadoop D. Apache Spark
4. Identify among the options below which is general-purpose computing model and runtime
system for Distributed Data Analytics.
A.HDFS B.Map Reduce C.Oozie D.All of the above
5. Big data analysis does the following except?
A. Spreads data B. Analyse data C. Organizes data D. Collect data
6. What is NOT a characteristic of big data?
A. Volume B. Variety C. Vision D. Velocity
7. Pig is a Hadoop-based open-source platform for analyzing the large-scale datasets via its
own SQL-like language _______
A. Pig Latin B. Pig German C. Pig Roman D. Pig Italian
8 The key aspect of the Map Reduce algorithm is that if every Map and Reduce is
independent of all other ongoing Maps and Reduces in the network, the operation will run in
______ keys and lists of data.
A.series on same B.series on different C.parallel on different D.parallel on same
9 In Hadoop Map Reduce, _____ is a Java class that comes with several methods to retrieve
key and values by iterating them among the data splits.
A.Mapper B.Record Reader C.Reporter D.Record Collect
10 Which of the following scenarios failure makes HDFS unavailable?
A.Task Tracker failure B.Job Tracker failure C.Name Node failure D.Data Node
11 Hadoop Map Reduce is a popular_____ for easily written applications. It processes vast
amounts of data (multiterabyte datasets) in parallel on large clusters (thousands of nodes).
A. Spring framework B.Java frame C.work Django D.framework Web framework
12 Which is not a way to link R and Hadoop?
A.RHIPE B.RHadoop C. Hadoop streaming D. RHDFS
13 The RHIPE package uses the ________ technique to perform data analytics over Big
Data.
A.Divide and recombine B.Divide and conquer C.Integrate and recombine D.None above
14.____ phase of the data analytics lifecycle usually takes the longest time.
A.Phase 2: Data Preparation Phase B.3: Model Planning Phase
C.4: Model Building Phase D.5: Communicate Results
15 The data analytics project life cycle stages in correct sequence are __________
A. Identifying the problem>desig ning the requirements> pre-processing
data>performi ng analytics over data> visualizing data
B. Identifying the problem>desig ning the requirements> >performing analytics over
data> preprocessing data > visualizing data
C. Identifying the problem > performing analytics over data >designing the
requirements> pre-processing data> visualizing data
D. Identifying the problem> visualizing data >designing the requirements> pre-
processing data>performi ng analytics over data
16 Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?
1. Both methods can be used for classification task
2. Random Forest is used for classification whereas Gradient Boosting is used for
regression
task
3. Random Forest is used for regression whereas Gradient Boosting is used for
Classification task
4. Both methods can be used for regression task
A.1 B.2 C.2 and 3 D.1 and 4
17 In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then
aggregate the results of these tree. Which of the following is true about individual(Tk) tree in
Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observation
4. Individual tree is built on full set of observations
A.1 and 3 B.1 and 4 C. 2 and 3 D.2 and 4
18 The primary Machine Learning API for Spark is now the _____ based API
A. Dataframe B.Dataset C.RDD D. All of the above
19 Which of the following is a module for Structured data processing?
A.GraphX B.MLib C.SparkSQL D.Spark R
20 SparkSQL translates commands into codes. These codes are processed by ________
A.Driver Nodes B.Executor nodes C.Cluster Manager D.None of the above
21 SparkSQL plays the main role in the optimization of queries.
A.True B.False C.Can’t Say D.None is correct
22 Which of the following is not a SparkSQL query execution phases?
A.Analysis B.Logical Optimization C.Execution D.Physical Planning
23 What is action in Spark RDD?
A.The ways to send result from executors to the driver
B.Takes RDD as input and produces one or more RDD as output.
C.Creates one or many new RDDs
D.All of the above
24 Which of the following is true about narrow transformation?
A.The data required to compute resides on multiple partitions.
B.The data required to compute resides on the single partition.
C.Both
D.None of the above
25 __________ is a distributed machine learning framework on top of Spark.
A.MLib B.GraphX C.Spark Streaming D.RDDs
26 Which of following component of Spark runtime architecture provides resources to
execute a task?
A.Cluster manager B.Worker nodes C.Driver program D.Spark context
27 Among the following option identify the one which is not a type of learning.
A.Unsupervised learning B. Reinforcement learning
C.Supervised learning D.Semi unsupervised learning

28 Identify the type of learning in which labeled training data is used.


A.Unsupervised learning B. Reinforcement learning
C.Semi unsupervised learning D.Supervised learning
29 Machine learning is a subset of which of the following?
A.Artificial Intelligence B.Deep Learning C.Data Learning D.None of the above
30 Which of the following machine learning techniques helps in detecting the outliers in
data?
A.classification B.Clustering C.Anomaly Detection D.All of
the above
31 Which of the following are common classes of problems in machine learning?
A.Regression B.Classification C.Clustering D.All of the above
32 What is content based recommendation system?
A.Tries to recommend items based on profile built from their preferences
B.Similarity among items
C.Similarity among users buying, watching, or enjoying something
D.All of the above
33 Machine Learning is a field of AI consisting of learning algorithms that __________
A.At executing some task B.Over time with experience
C.Improve their performance D.All of the above
34 Which of the following machine learning algorithm is based upon the idea of bagging?
A.Decision tree B.Random forest C.Classification D.Regression
35 Among the following options identify the one which is false regarding regression.
A.It is used for the prediction
B..It is used for interpretation
C.It relates inputs to outputs
D.It discovers casual relationships
36.Which application of social network data analysis is used by a customer retention
Manager?
A.Business intelligence B.Marketing
C.Product design and development D.Insurance fraud
37. Identify the technologies that enable fraud identification & the predictive modeling
process:
A.Text mining B.Social media data analysis C.Regression analysis D.All of the above
38. Who among the following do you think would be able to deal with the growing number of
data sources efficiently?
A.Business developer B.Data scientist C.Sales executive D.Web designer
39. Which of the following is a disadvantage of relational databases?
A.Concurrency B.Impedance mismatch C.ACID transactions D.Normalization
40. --------- is a command-line tool that can import individual tables, specific color database
files directly in the distributed file system or data warehouse.
A.Sqoop C.Zookeeper C.Pig D.Hbase
41. Hadoop is a framework that works with a variety of related tools. Common cohorts
include ____________
A. MapReduce, Hive and HBase B.MapReduce, MySQL and Google Apps
C.MapReduce, Hummer and Iguana D.MapReduce, Heron and Trumpet
42. Which of the following is not NoSQL database?
A.Cassandra B.MongoDB C.SQL server D.None of these
43. __________ can best be described as a programming model used to develop Hadoop
based applications that can process massive amounts of data.
A. MapReduce B.Mahout C.Oozie D.All of the mentioned
44.A ________ node acts as the Slave and is responsible for executing a Task assigned to it
by the JobTracker.
A.MapReduce B.Mapper C.TaskTracker D.JobTracker
45.Which of the following are Benefits of Big Data Processing?
A.Cost Reduction B.Time Reductions C.Smarter Business Decisions D.All of above
46.MongoDB is a ____ database.
A. SQL B.DBMS C.NoSQL D.RDBMS
47.The number of maps is usually driven by the total size of ____________
A.inputs B.outputs C.tasks D.None of the mentioned
48.Which of the following is a NoSQL database type?
A.SQL B.JSON C.Document databases D.None of the Above
49._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A.Reduce B.Map C.Reducer D. All of the mentioned
50.According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?
A. Big data management and data mining B.Data warehousing and business intelligence
C.Management of Hadoop clusters D.Collecting and storing unstructured data
51.In Big Data environments, Velocity refers –
A. Data can arrive at fast speed
B. Enormous datasets can accumulate within very short periods of time
C. Velocity of data translates into the amount of time it takes for the data to be processed
D. All of the mentioned above
52.______ involves the simultaneous execution of multiple sub-tasks that collectively
comprise a larger task.
A. Parallel data processing B.Single channel processing
C.Multi data processing D.None of the mentioned above
53.What is NoSQL database?
A. NoSQL is a database is an enhanced form of RDBMS.
B. NoSQL is database that is built with enhancements to DBMS
C. NoSQL is a database that is built on ways and means other than tables and columns.
D. None of the Above
54.Structured data conforms to a data model or schema and is often stored in tabular form.
A. True B. False
55 Which of the following are the simplest NoSQL databases?
A. Key value B.Document C.Wide column D.All of the above
55.Unprocessed data or processed data are observations or measurements that can be
expressed as text, numbers, or other types of media.
A. True B. False
56.All of the following accurately describe Hadoop, EXCEPT ____________
A.Open-source B.Real-time C.Java-based D.Distributed computing approach
57._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A.Reduce B.Map C.Reducer D.All of the mentioned
58.________ is a utility which allows users to create and run jobs with any executables as the
mapper and/or the reducer.
A.Hadoop Strdata B.Hadoop Streaming C.Hadoop Stream D.None of the mentioned
59.Data that does not conform to a data model or data schema is known as ______.
A.Structured data B.Unstructured data C.Semi-structured data D.All of above
60.Map output larger than ___________ percent of the memory allocated to copying map
outputs.
A. 10 B.15 C.25 D.35
61.What is the aim of nosql?
A. NoSQL is not suitable for storing structured data.
B. NoSQL databases allow storing non-structured data
C. NoSQL is a new data format to store large datasets.
D. NoSQL provides an alternative to SQL databases to store textual data.
62.In computers, a ____ is a symbolic representation of facts or concepts from which
information may be obtained with a reasonable degree of confidence.
A. Data B. Knowledge C. Program D. Algorithm
63.As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including _______________
A. Improved data storage and information retrieval
B. Improved extract, transform and load features for data integration
C. Improved data warehousing functionality
D. Improved security, workload management, and SQL support
64._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A. Reduce B. Map C. Reducer D.All of the mentioned
65.MongoDB support cross platform and is written in _____ language.
A. Python B.C++ C.R D. Java
66.What are the types of nosql databases
A.Document databases B.Key-value stores
C.Graph & Column-oriented databases. D.All of the above
67.Amongst which of the following represents the Use of Hadoop,
A.Robust and Scalable B.Affordable and Cost Effective
C.Adaptive and Flexible D.All of the mentioned above
68.Which of the following is not a strong feature for nosql databases?
A.Scalability B.Relational data
C,Faster data access than RDBMS. D. Data easily held across multiple servers
69.What does Apriori algorithm do?
A. It mines all frequent patterns through pruning rules with lesser support
B. It mines all frequent patterns through pruning rules with higher support
C. It mines all frequent patterns by constructing a FP tree
D.All of these

You might also like