Bda MCQ Set

The document contains a set of 50 multiple choice questions related to big data analytics concepts and technologies such as Hadoop, MapReduce, HDFS, YARN, Apache Spark, machine learning, and NoSQL databases. The questions cover topics including the main components of Hadoop, characteristics of big data, MapReduce algorithms, HDFS architecture, Spark SQL, machine learning algorithms and techniques, and NoSQL databases.

Uploaded by

akshay Baleshgol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

904 views8 pages

Bda MCQ Set

Uploaded by

akshay Baleshgol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

BDA MCQ SET:

1.What are the main components of Hadoop?

A. Map Reduce B. HDFS C. YARN D. All of the above?
2. The Big data analytics work on the unstructured data, where no specific pattern of the data
is defined.
A. True B. False C. Can’t Say D. None of the above
3. Identify the incorrect big data Technologies.
A. Apache Pytorch B. Apache Kafka C. Apache Hadoop D. Apache Spark
4. Identify among the options below which is general-purpose computing model and runtime
system for Distributed Data Analytics.
A.HDFS B.Map Reduce C.Oozie D.All of the above
5. Big data analysis does the following except?
A. Spreads data B. Analyse data C. Organizes data D. Collect data
6. What is NOT a characteristic of big data?
A. Volume B. Variety C. Vision D. Velocity
7. Pig is a Hadoop-based open-source platform for analyzing the large-scale datasets via its
own SQL-like language _______
A. Pig Latin B. Pig German C. Pig Roman D. Pig Italian
8 The key aspect of the Map Reduce algorithm is that if every Map and Reduce is
independent of all other ongoing Maps and Reduces in the network, the operation will run in
______ keys and lists of data.
A.series on same B.series on different C.parallel on different D.parallel on same
9 In Hadoop Map Reduce, _____ is a Java class that comes with several methods to retrieve
key and values by iterating them among the data splits.
A.Mapper B.Record Reader C.Reporter D.Record Collect
10 Which of the following scenarios failure makes HDFS unavailable?
A.Task Tracker failure B.Job Tracker failure C.Name Node failure D.Data Node
11 Hadoop Map Reduce is a popular_____ for easily written applications. It processes vast
amounts of data (multiterabyte datasets) in parallel on large clusters (thousands of nodes).
A. Spring framework B.Java frame C.work Django D.framework Web framework
12 Which is not a way to link R and Hadoop?
A.RHIPE B.RHadoop C. Hadoop streaming D. RHDFS
13 The RHIPE package uses the ________ technique to perform data analytics over Big
Data.
A.Divide and recombine B.Divide and conquer C.Integrate and recombine D.None above
14.____ phase of the data analytics lifecycle usually takes the longest time.
A.Phase 2: Data Preparation Phase B.3: Model Planning Phase
C.4: Model Building Phase D.5: Communicate Results
15 The data analytics project life cycle stages in correct sequence are __________
A. Identifying the problem>desig ning the requirements> pre-processing
data>performi ng analytics over data> visualizing data
B. Identifying the problem>desig ning the requirements> >performing analytics over
data> preprocessing data > visualizing data
C. Identifying the problem > performing analytics over data >designing the
requirements> pre-processing data> visualizing data
D. Identifying the problem> visualizing data >designing the requirements> pre-
processing data>performi ng analytics over data
16 Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?
1. Both methods can be used for classification task
2. Random Forest is used for classification whereas Gradient Boosting is used for
regression
task
3. Random Forest is used for regression whereas Gradient Boosting is used for
Classification task
4. Both methods can be used for regression task
A.1 B.2 C.2 and 3 D.1 and 4
17 In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then
aggregate the results of these tree. Which of the following is true about individual(Tk) tree in
Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observation
4. Individual tree is built on full set of observations
A.1 and 3 B.1 and 4 C. 2 and 3 D.2 and 4
18 The primary Machine Learning API for Spark is now the _____ based API
A. Dataframe B.Dataset C.RDD D. All of the above
19 Which of the following is a module for Structured data processing?
A.GraphX B.MLib C.SparkSQL D.Spark R
20 SparkSQL translates commands into codes. These codes are processed by ________
A.Driver Nodes B.Executor nodes C.Cluster Manager D.None of the above
21 SparkSQL plays the main role in the optimization of queries.
A.True B.False C.Can’t Say D.None is correct
22 Which of the following is not a SparkSQL query execution phases?
A.Analysis B.Logical Optimization C.Execution D.Physical Planning
23 What is action in Spark RDD?
A.The ways to send result from executors to the driver
B.Takes RDD as input and produces one or more RDD as output.
C.Creates one or many new RDDs
D.All of the above
24 Which of the following is true about narrow transformation?
A.The data required to compute resides on multiple partitions.
B.The data required to compute resides on the single partition.
C.Both
D.None of the above
25 __________ is a distributed machine learning framework on top of Spark.
A.MLib B.GraphX C.Spark Streaming D.RDDs
26 Which of following component of Spark runtime architecture provides resources to
execute a task?
A.Cluster manager B.Worker nodes C.Driver program D.Spark context
27 Among the following option identify the one which is not a type of learning.
A.Unsupervised learning B. Reinforcement learning
C.Supervised learning D.Semi unsupervised learning

28 Identify the type of learning in which labeled training data is used.

A.Unsupervised learning B. Reinforcement learning
C.Semi unsupervised learning D.Supervised learning
29 Machine learning is a subset of which of the following?
A.Artificial Intelligence B.Deep Learning C.Data Learning D.None of the above
30 Which of the following machine learning techniques helps in detecting the outliers in
data?
A.classification B.Clustering C.Anomaly Detection D.All of
the above
31 Which of the following are common classes of problems in machine learning?
A.Regression B.Classification C.Clustering D.All of the above
32 What is content based recommendation system?
A.Tries to recommend items based on profile built from their preferences
B.Similarity among items
C.Similarity among users buying, watching, or enjoying something
D.All of the above
33 Machine Learning is a field of AI consisting of learning algorithms that __________
A.At executing some task B.Over time with experience
C.Improve their performance D.All of the above
34 Which of the following machine learning algorithm is based upon the idea of bagging?
A.Decision tree B.Random forest C.Classification D.Regression
35 Among the following options identify the one which is false regarding regression.
A.It is used for the prediction
B..It is used for interpretation
C.It relates inputs to outputs
D.It discovers casual relationships
36.Which application of social network data analysis is used by a customer retention
Manager?
A.Business intelligence B.Marketing
C.Product design and development D.Insurance fraud
37. Identify the technologies that enable fraud identification & the predictive modeling
process:
A.Text mining B.Social media data analysis C.Regression analysis D.All of the above
38. Who among the following do you think would be able to deal with the growing number of
data sources efficiently?
A.Business developer B.Data scientist C.Sales executive D.Web designer
39. Which of the following is a disadvantage of relational databases?
A.Concurrency B.Impedance mismatch C.ACID transactions D.Normalization
40. --------- is a command-line tool that can import individual tables, specific color database
files directly in the distributed file system or data warehouse.
A.Sqoop C.Zookeeper C.Pig D.Hbase
41. Hadoop is a framework that works with a variety of related tools. Common cohorts
include ____________
A. MapReduce, Hive and HBase B.MapReduce, MySQL and Google Apps
C.MapReduce, Hummer and Iguana D.MapReduce, Heron and Trumpet
42. Which of the following is not NoSQL database?
A.Cassandra B.MongoDB C.SQL server D.None of these
43. __________ can best be described as a programming model used to develop Hadoop
based applications that can process massive amounts of data.
A. MapReduce B.Mahout C.Oozie D.All of the mentioned
44.A ________ node acts as the Slave and is responsible for executing a Task assigned to it
by the JobTracker.
A.MapReduce B.Mapper C.TaskTracker D.JobTracker
45.Which of the following are Benefits of Big Data Processing?
A.Cost Reduction B.Time Reductions C.Smarter Business Decisions D.All of above
46.MongoDB is a ____ database.
A. SQL B.DBMS C.NoSQL D.RDBMS
47.The number of maps is usually driven by the total size of ____________
A.inputs B.outputs C.tasks D.None of the mentioned
48.Which of the following is a NoSQL database type?
A.SQL B.JSON C.Document databases D.None of the Above
49._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A.Reduce B.Map C.Reducer D. All of the mentioned
50.According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?
A. Big data management and data mining B.Data warehousing and business intelligence
C.Management of Hadoop clusters D.Collecting and storing unstructured data
51.In Big Data environments, Velocity refers –
A. Data can arrive at fast speed
B. Enormous datasets can accumulate within very short periods of time
C. Velocity of data translates into the amount of time it takes for the data to be processed
D. All of the mentioned above
52.______ involves the simultaneous execution of multiple sub-tasks that collectively
comprise a larger task.
A. Parallel data processing B.Single channel processing
C.Multi data processing D.None of the mentioned above
53.What is NoSQL database?
A. NoSQL is a database is an enhanced form of RDBMS.
B. NoSQL is database that is built with enhancements to DBMS
C. NoSQL is a database that is built on ways and means other than tables and columns.
D. None of the Above
54.Structured data conforms to a data model or schema and is often stored in tabular form.
A. True B. False
55 Which of the following are the simplest NoSQL databases?
A. Key value B.Document C.Wide column D.All of the above
55.Unprocessed data or processed data are observations or measurements that can be
expressed as text, numbers, or other types of media.
A. True B. False
56.All of the following accurately describe Hadoop, EXCEPT ____________
A.Open-source B.Real-time C.Java-based D.Distributed computing approach
57._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A.Reduce B.Map C.Reducer D.All of the mentioned
58.________ is a utility which allows users to create and run jobs with any executables as the
mapper and/or the reducer.
A.Hadoop Strdata B.Hadoop Streaming C.Hadoop Stream D.None of the mentioned
59.Data that does not conform to a data model or data schema is known as ______.
A.Structured data B.Unstructured data C.Semi-structured data D.All of above
60.Map output larger than ___________ percent of the memory allocated to copying map
outputs.
A. 10 B.15 C.25 D.35
61.What is the aim of nosql?
A. NoSQL is not suitable for storing structured data.
B. NoSQL databases allow storing non-structured data
C. NoSQL is a new data format to store large datasets.
D. NoSQL provides an alternative to SQL databases to store textual data.
62.In computers, a ____ is a symbolic representation of facts or concepts from which
information may be obtained with a reasonable degree of confidence.
A. Data B. Knowledge C. Program D. Algorithm
63.As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including _______________
A. Improved data storage and information retrieval
B. Improved extract, transform and load features for data integration
C. Improved data warehousing functionality
D. Improved security, workload management, and SQL support
64._________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
A. Reduce B. Map C. Reducer D.All of the mentioned
65.MongoDB support cross platform and is written in _____ language.
A. Python B.C++ C.R D. Java
66.What are the types of nosql databases
A.Document databases B.Key-value stores
C.Graph & Column-oriented databases. D.All of the above
67.Amongst which of the following represents the Use of Hadoop,
A.Robust and Scalable B.Affordable and Cost Effective
C.Adaptive and Flexible D.All of the mentioned above
68.Which of the following is not a strong feature for nosql databases?
A.Scalability B.Relational data
C,Faster data access than RDBMS. D. Data easily held across multiple servers
69.What does Apriori algorithm do?
A. It mines all frequent patterns through pruning rules with lesser support
B. It mines all frequent patterns through pruning rules with higher support
C. It mines all frequent patterns by constructing a FP tree
D.All of these

Bda MCQ
100% (1)
Bda MCQ
44 pages
Bda MCQ
No ratings yet
Bda MCQ
9 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
454U8-Big Data Analytics
No ratings yet
454U8-Big Data Analytics
22 pages
Questions Certif BigData
No ratings yet
Questions Certif BigData
12 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
CS8091 Big Data Analytics MCQ
No ratings yet
CS8091 Big Data Analytics MCQ
22 pages
Final Exam
17% (6)
Final Exam
6 pages
Business Intelligence MCQ Bank 1
100% (1)
Business Intelligence MCQ Bank 1
8 pages
Subject Name:: Knowledge Institute of Technology & Engineering-135
No ratings yet
Subject Name:: Knowledge Institute of Technology & Engineering-135
22 pages
Grade 12 IT Final Exam Paper
No ratings yet
Grade 12 IT Final Exam Paper
3 pages
BigData Objective
No ratings yet
BigData Objective
93 pages
QCM Bigdata 1 Exampdf
No ratings yet
QCM Bigdata 1 Exampdf
7 pages
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
No ratings yet
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
17 pages
Midterm Solution
0% (1)
Midterm Solution
7 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
26 pages
Bigdata MCQ QA Part2
No ratings yet
Bigdata MCQ QA Part2
9 pages
Sybca Bigdata MCQ
No ratings yet
Sybca Bigdata MCQ
7 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
CS8091 Big Data Analytics MCQ
100% (2)
CS8091 Big Data Analytics MCQ
22 pages
Data Mining MCQ's Viva Questions
No ratings yet
Data Mining MCQ's Viva Questions
7 pages
Big Data Analytics Unit 1 MCQ
90% (10)
Big Data Analytics Unit 1 MCQ
10 pages
Hadoop Quiz and Exam Answers
No ratings yet
Hadoop Quiz and Exam Answers
10 pages
Big Data Mock Exam: Right or Wrong
No ratings yet
Big Data Mock Exam: Right or Wrong
11 pages
Hadoop MCQs
75% (8)
Hadoop MCQs
21 pages
MCQ On Data Mining With Answers Set-1
No ratings yet
MCQ On Data Mining With Answers Set-1
11 pages
Fdocuments - in - Data Mining MCQ
50% (2)
Fdocuments - in - Data Mining MCQ
34 pages
Computer Science - Data Warehouse MCQS With Answer
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
35 pages
IBM Big Data Engineer Quiz Prep
No ratings yet
IBM Big Data Engineer Quiz Prep
30 pages
Data Mining MCQ
50% (2)
Data Mining MCQ
6 pages
MCQ
100% (7)
MCQ
37 pages
Bigdataaaaa
No ratings yet
Bigdataaaaa
180 pages
Machine Learning MCQ S
No ratings yet
Machine Learning MCQ S
318 pages
Data Mining
50% (2)
Data Mining
34 pages
U.G. Department of Computer Applications N.G.M College 16 UBC 626 - Data Mining and Warehousing Multiple Choice Questions. (K1 Questions) Unit - I
No ratings yet
U.G. Department of Computer Applications N.G.M College 16 UBC 626 - Data Mining and Warehousing Multiple Choice Questions. (K1 Questions) Unit - I
11 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Big Data Computing - Assignment 3
No ratings yet
Big Data Computing - Assignment 3
3 pages
Top 20 MCQ Questions On Data Warehouse Architecture - InfoTechSite
No ratings yet
Top 20 MCQ Questions On Data Warehouse Architecture - InfoTechSite
15 pages
Data Mining MCQ
75% (4)
Data Mining MCQ
24 pages
Big Data MCQ
100% (1)
Big Data MCQ
4 pages
MCQs - Big Data Analytics - Fundamentals
No ratings yet
MCQs - Big Data Analytics - Fundamentals
14 pages
Combined Quizes
No ratings yet
Combined Quizes
8 pages
NoSQL MCQ PDF
0% (2)
NoSQL MCQ PDF
5 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
46 pages
2022 Assignment Answers
No ratings yet
2022 Assignment Answers
37 pages
TYCS - Data Science MCQ
No ratings yet
TYCS - Data Science MCQ
6 pages
Data Warehouse & Mining MCQs
No ratings yet
Data Warehouse & Mining MCQs
4 pages
Hadoop 1000 MCQ Question
No ratings yet
Hadoop 1000 MCQ Question
96 pages
Data Mining Quiz for AIML Students
No ratings yet
Data Mining Quiz for AIML Students
3 pages
Question - Bank (MCQ) - Advance Analytics - Question Bank eDBDA Sept 21
No ratings yet
Question - Bank (MCQ) - Advance Analytics - Question Bank eDBDA Sept 21
14 pages
Question Paper Code:: (10×2 20 Marks)
No ratings yet
Question Paper Code:: (10×2 20 Marks)
2 pages
Data Mining MCQs Unit1&2
No ratings yet
Data Mining MCQs Unit1&2
11 pages
BigData Exam C2122 PDF
100% (1)
BigData Exam C2122 PDF
6 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
Big Data Analytics Exam 2020
100% (1)
Big Data Analytics Exam 2020
10 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
BD Question Bank MCQ Answered
No ratings yet
BD Question Bank MCQ Answered
8 pages
JNTUK 3-2 1st Mid Big Data Analytics - (R2032121) Online Bits
No ratings yet
JNTUK 3-2 1st Mid Big Data Analytics - (R2032121) Online Bits
10 pages
Secure File-1
No ratings yet
Secure File-1
12 pages
Roll No 107 WT Assighn
No ratings yet
Roll No 107 WT Assighn
48 pages
MySQL Setup & Java Integration Guide
No ratings yet
MySQL Setup & Java Integration Guide
2 pages
Cbersecurity Used
No ratings yet
Cbersecurity Used
1 page
Block Chain Set
No ratings yet
Block Chain Set
9 pages
Data Scientist Career Profile
No ratings yet
Data Scientist Career Profile
2 pages
Iso 5127 2017 en PDF
No ratings yet
Iso 5127 2017 en PDF
19 pages
Chapter 2
No ratings yet
Chapter 2
10 pages
Answers To Exercises: Step One
No ratings yet
Answers To Exercises: Step One
6 pages
Youtube Channel For Comps IT Engineering
No ratings yet
Youtube Channel For Comps IT Engineering
11 pages
Unit 01 C With Ds BSC
No ratings yet
Unit 01 C With Ds BSC
12 pages
AI Week1
No ratings yet
AI Week1
12 pages
Land Admin Trends: Poland vs. Germany
No ratings yet
Land Admin Trends: Poland vs. Germany
10 pages
Ankush J CV
No ratings yet
Ankush J CV
2 pages
Endorsement Letter
No ratings yet
Endorsement Letter
3 pages
Faculty List - AIML
No ratings yet
Faculty List - AIML
2 pages
BDACh05L08Applications and Big Data Analytics Using Spark
No ratings yet
BDACh05L08Applications and Big Data Analytics Using Spark
11 pages
Data Mining UNIT - 1 (Important)
No ratings yet
Data Mining UNIT - 1 (Important)
7 pages
Management Information Systems Assignment 2
No ratings yet
Management Information Systems Assignment 2
8 pages
Fair Principles
No ratings yet
Fair Principles
9 pages
CV For Craig Trim
No ratings yet
CV For Craig Trim
4 pages
RPSC PTI Librarian Syllabus
No ratings yet
RPSC PTI Librarian Syllabus
16 pages
GIS & Risk Assessment Specialist CV
No ratings yet
GIS & Risk Assessment Specialist CV
18 pages
Angel Emmanuel Flores Munoz - Es.en
No ratings yet
Angel Emmanuel Flores Munoz - Es.en
6 pages
Secure Decentralized File Sharing
No ratings yet
Secure Decentralized File Sharing
8 pages
Re Engineering
No ratings yet
Re Engineering
14 pages
SEO for Dictionaries & Retrieval
No ratings yet
SEO for Dictionaries & Retrieval
8 pages
Cof-C02 2
No ratings yet
Cof-C02 2
39 pages
Document Similarity Analysis Report
No ratings yet
Document Similarity Analysis Report
6 pages
AI Question Bank
No ratings yet
AI Question Bank
13 pages
AI & Innovation: A Systematic Review
No ratings yet
AI & Innovation: A Systematic Review
25 pages
Khushi Kumari's Tech Profile
No ratings yet
Khushi Kumari's Tech Profile
2 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
5 pages
IP Location Finder - IP Lookup With Detailed Geolocation Data - KeyCDN Tools
No ratings yet
IP Location Finder - IP Lookup With Detailed Geolocation Data - KeyCDN Tools
2 pages
What Is NLP
No ratings yet
What Is NLP
3 pages

Bda MCQ Set

Uploaded by

Bda MCQ Set

Uploaded by

BDA MCQ SET:

1.What are the main components of Hadoop?

28 Identify the type of learning in which labeled training data is used.

You might also like