0% found this document useful (0 votes)
12 views7 pages

MCQ Big

Uploaded by

malakshaaban613
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

MCQ Big

Uploaded by

malakshaaban613
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MCQ:

1. ………………… data files are a compact, efficient binary format that provides interoperability with
applications written in other programming languages
a. Avro b. sequence files c.dictionaries d.JSON
2. …………………are a binary format that store individual records in custom record-specific data types
a. Avro b. sequence files c.dictionaries d.JSON
3. ………………… is an incredibly rich and flexible data representation format
a. XML b. sequence files c.dictionaries d.JSON
4. ……………….. is a columnar storage format available to any project in the Hadoop ecosystem, built from
the ground up with complex nested data structures in min and support compression and encoding schemes.
a. Praquest b. sequence files c.dictionaries d.Dictionary
5. ………………….. is a plain-text object serialization format that can represent quite complex data in a way
that can be transferred between a user and a program or one program to another program.
a. Praquest b. JSON c.directiories d.Dictionary
6. …………………. was specifically introduced to handle the rise in data types, data access, and data
availability needs brought on the dot.com boom.
a. NoSql b. JSON c.dictionaries d.Dictionary
7. ………………. All reading and writing of data in one region is done by the assigned Region Server
a. Atomicity b.Durability 3.Scalabaility 4.Consistency
8. ................. mode on a single machine without requirement for HDFS [Local ]
9. …………………….mode: execution on an HDFS cluster, with the Pig scrip converted to a MapReduce job
[Map Reduce].
10. …………………. A system for managing and querying structured data built on top of Hadoop [Hive].
11. ………….. is a component of Hive. It is a table and storage management layer for Hadoop that enables
users with different data processing tools – including Pig and MapReduc.[ HCatalog]
12. ……………..provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or
perform Hive metadata operations using an HTTP [WebHCat]
13. …………………database is a columnar storage database [HBase].
14. …………………….. is designed to work with the Spark via SQL and HiveQL (a Hive variant of SQL).
15. …………… provides processing of live streams of data.[Spark Streaming]
16. …………………. is the machine learning library that provides multiple types of machine learning
algorithms.[MLib]
17. …………………… is a graph processing library with APIs to manipulate graphs and performing graph-
parallel computations.[GraphX].
18. ………………… Fault-tolerant collection of elements that can be operated on in parallel[RDD]
19. map(func) Return a new dataset formed by passing each element of the source through a function func.
20. filter(func) Return a new dataset formed by selecting those elements of the source on which func returns
true.
21. flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items. So func should
return a Seq rather than a single item.
22. The …………….function combines two sets of key value pairs and return a set of keys to a pair of values
from the two initial set.[ join]
23. The ……………….. function aggregates on each key by using the given reduce function. This is something
you would use in a WordCount to sum up the values for each word to count its occurrences.[ reduceByKey]
24. The …………….is the main entry point for Spark functionality; it represents the connection to a Spark
cluster [SparkContext]
1. The MapReduce algorithm contains two important tasks, namely __________.
A. mapped, reduce
B. mapping, Reduction
C. Map, Reduction
D. Map, Reduce
2. _____ takes a set of data and converts it into another set of data, where individual elements are broken down
into tuples (key/value pairs).
A. Map
B. Reduce
C. Both A and B
D. Node
Explanation: Map takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs).
3. ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set
of tuples.
A. Map
B. Reduce
C. Node
D. Both A and B
Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.
4. In how many stages the MapReduce program executes?
A. 2
B. 3
C. 4
D. 5
Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
5. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode
B. MasterNode
C. JobTracker
D. Task Tracker
Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker.
6. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task
B. Job
C. Mapper
D. PayLoad
Explanation: Task : An execution of a Mapper or a Reducer on a slice of data.
8. Point out the correct statement.

A. MapReduce tries to place the data and the compute as close as possible
B. Map Task in MapReduce is performed using the Mapper() function
C. Reduce Task in MapReduce is performed using the Map() function
D. None of the above
9. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in
____________

A. C
B. C#
C. Java
D. None of the above
10. The number of maps is usually driven by the total size of ____________
A. Inputs
B. Output
C. Task
D. None of the above
1. Data in ___________ bytes size is called Big Data.

A. Tera
B. Giga
C. Peta
D. Meta
2. How many V's of Big Data
A. 2
B. 3
C. 4
D. 5
3. Transaction data of the bank is?
A. structured data
B. unstructured datat
C. Both A and B
D. None of the above

4. In how many forms BigData could be found?

A. 2
B. 3
C. 4
D. 5
Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured.

10. What are the main components of Big Data?


A. MapReduce
B. HDFS
C. YARN
D. All of the above
Explanation: All of the above are the main components of Big Data.

1. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication

HDFS, and Replication, etc. are stored and maintained on the NameNode.
2. Point out the correct statement.
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned

3. HDFS works in a __________ fashion.


a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned

4. ________ NameNode is used when the Primary NameNode goes down.


a) Rack
b) Data
c) Secondary
d) None of the mentioned

6. Which of the following scenario may not be a good fit for HDFS?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned

8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication

10. HDFS is implemented in _____________ programming language.


a) C++
b) Java
c) Scala
d) None of the mentioned

11. For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication

13. For ________ the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned

14. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned

1. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker

3. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing
the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned

4. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned

7. ________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the
reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer: b

8. __________ maps input key/value pairs to a set of intermediate key/value pairs.


a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned

11. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned

1. ________ is the architectural center of Hadoop that allows multiple data processing engines.
a) YARN
b) Hive
c) Incubator
d) Chuckwa

3. YARN’s dynamic allocation of cluster resources improves utilization over more static _______ rules used in
early versions of Hadoop.
a) Hive
b) MapReduce
c) Imphala
d) All of the mentioned

…………………has the responsibility of negotiating appropriate resource containers from the Scheduler,
tracking their status and monitoring for progress
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned

4. The __________ is a framework-specific entity that negotiates resources from the ResourceManager.
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned

6. Apache Hadoop YARN stands for _________


a) Yet Another Reserve Negotiator
b) Yet Another Resource Network
c) Yet Another Resource Negotiator
d) All of the mentioned

8. The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned

9. The __________ is responsible for allocating resources to the various running applications subject to familiar
constraints of capacities, queues etc.
a) Manager
b) Master
c) Scheduler
d) None of the mentioned

2. YARN is the one who helps to manage the resources across the ________.
A. clusters
B. Negotiator
C. Jobs
D. Hadoop System
3. How many major component Yarn has?
A. 2
B. 3
C. 4
D. 5
4. Which of the following is the component of YARN?
A. Resource Manager
B. Nodes Manager
C. Application Manager
D. All of the above
Explanation: Yarn consists of three major components i.e. Resource Manager, Nodes Manager, Application
Manager.
5. Which managers work on the allocation of resources?
A. Nodes Manager
B. Resource Manager
C. Application Manager
D. All of the above
Explanation: Node managers work on the allocation of resources such as CPU, memory, bandwidth per
machine and later on acknowledges the resource manager.
7.…………………. is responsible for accepting job-submissions, negotiating the first container for executing
the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster
container on failure.
A. NodeManager
B. ApplicationManager
C. ApplicationMaster
D. All of the above.

…………………………

You might also like