0% found this document useful (0 votes)
14 views22 pages

Big Data Analytics Question Bank Nw

This document outlines a lesson plan for the Big Data Analytics course at SRI Indu College of Engineering and Technology for the academic year 2024-25. It includes a question bank categorized by Bloom's Taxonomy levels, covering various topics related to Big Data, such as its definition, characteristics, technologies, and applications. The document also features multiple-choice questions, fill-in-the-blanks, and matching exercises to assess students' understanding of the subject matter.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

Big Data Analytics Question Bank Nw

This document outlines a lesson plan for the Big Data Analytics course at SRI Indu College of Engineering and Technology for the academic year 2024-25. It includes a question bank categorized by Bloom's Taxonomy levels, covering various topics related to Big Data, such as its definition, characteristics, technologies, and applications. The document also features multiple-choice questions, fill-in-the-blanks, and matching exercises to assess students' understanding of the subject matter.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

SRI INDU COLLEGE OF ENGG & TECH

Prepared on Reg:
LESSON PLAN
27/02/2025
(Regulation: R22)
Page: 1 of 20
DEPARTMENT OF AI&DS and DATA SCIENCE
Sub.Code & Title (R22CSD3213) BIG DATA ANALYTICS
Academic Year: 2024-25 Year/Sem./Section III/II AI&DS,DS (A&B)
D.SUMA,ASSISTANT PROFESSOR
Faculty Name & Designation
M.JYOTHI REDDY, ASSISTANT
PROFESSOR

QUESTION BANK WITH BLOOMS TAXONOMY LEVEL (BTL)


(1.Remembering 2.Understanding 3. Applying 4.Analyzing 5.Evaluating 6.Creating)

UNIT-I:INTRODUCTION TO BIG DATA ANALYTICS

Multiple choice Questions BT Level Course


Outcome
1C-1 What is Big Data? 1 CO1
a) A large database
b) A collection of unstructured data
c) A set of data that is too large to be processed by traditional methods
d) A type of data analytics software
1C-2 Which of the following is a characteristic of Big Data? 1 CO1
a) Low volume
b) Low velocity
c) High variety
d) Low veracity
1C-3 What is the primary goal of Big Data Analytics? 1 CO1
a) To collect and store large amounts of data
b) To extract insights and patterns from large datasets
c) To develop new data storage technologies
d) To create new data analytics software
1C-4 The word 'Big data' was coined by 1 CO1
(a) Roger Mougalas
(b) John Philips
(c) Simon Woods
(d) Martin Green
1C-5 The examination of large amounts of data to see what patterns or other useful information 1 CO1
can be found is known as
(a) Data examination
(b) Information analysis
(c) Big data analytics
(d) Data analysis
1C-6 What makes Big Data analysis difficult to optimize? 1 CO1
(a) Big Data is not difficult to optimize
(b) Both data and cost effective ways to mine data to make business sense out of it
(c) The technology to mine data
(d) All of the above
1C-7 What are the main components of Big Data? 1 CO1
(a) MapReduce
(b) HDFS
(c) YARN
(d) All of these
1C-8 What are the different features of Big Data Analytics? 4 CO1
(a) Open-Source
(b) Scalability
(c) Data Recovery
(d) All the above
1C-9 Big data analysis does the following except 4 CO1
(a) Collects data
(b) Spreads data
(c) Organizes data
(d) Analyzes data
1C-10 Check below the best answer to "which industries employ the use of so-called "Big Data" 1 CO1
in their day to day operations?
(a) Weather forecasting
(b) Marketing
(c) Healthcare
(d) All of the above
1C-11 Big Data applications benefit the media and entertainment industry by 1 CO1
(a) Predicting what the audience wants
(b) Ad targeting
(c) Scheduling optimization
(d) All of the above
1C-12 Identify the slave node among the following. 2 CO1
a) Job node
b) Data node
c) Task node
d) Name node
1C-13 Which of the following is a driver for Big Data? 2 CO1
a) Decreasing use of social media
b) Increasing adoption of cloud computing
c) Decreasing use of mobile devices
d) Decreasing use of IoT devices
1C-14 What is Data velocity? 1 CO1
a) The speed at which data is generated and processed
b) The volume of the data
c) The variety of the data
d) The accuracy and quality of the data
1C-15 What is Data variety? 1 CO1
a) The different types of data being generated
b) The volume of the data
c) The velocity of the data
d) The accuracy and quality of the data
Fill in the blanks
1F-1 Big Data is characterized by its . 1 CO1
1F-2 The primary goal of Big Data Analytics is to 1 CO1
1F-3 Predictive maintenance uses data analytics to . 2 CO1
1F-4 Customer segmentation involves dividing customers into groups based on their 2 CO1
.
1F-5 Big Data Analytics helps organizations to make better 3 CO1
1F-6 The Four V's of Big Data are 1 CO1
1F-7 The increasing use of is a driver for Big Data. 2 CO1
1F-8 Big Data Analytics involves the use of 3 CO1
1F-9 The accuracy and quality of the data is referred to as . 4 CO1
1F-10 The speed at which data is generated and processed is referred to as 6 CO1
.
1F-11 The different types of data being generated is referred to as 1 CO1
1F-12 Big Data Analytics helps organizations to gain a competitive . 2 CO1
1F-13 The use of Big Data Analytics can lead to improved . 1 CO1
1F-14 Big Data Analytics can help organizations to identify new . 1 CO1
1F-15 The increasing use of IoT devices is a driver for . 2 CO1

Match the following


1M-1 Match the following. 1 CO1
a) NLP 1) Content analytics
b) Text analytics 2) Text messages
c) UIMA 3) Comprehend human or Natural Language Input
d) Noisy Unstructured data 4) Text Mining
e) IBM 5) UIMA
1M-2 Match the following. 2 CO1
a) JSON 1) SOAP
b) MongoDB 2) REST
c) XML 3) JSON
d) Flexible structure 4) Use methods at the intersection of Statistics,AI,ML,DB’S
e) Data Mining 5) XML
1M-3 Match the following. 2 CO1
a) Scientific data 1) Machine generated structured data
b) Point-of-scale 2) Machine generated unstructured data
c) Social-media data 3) Human generated structured data
d) Gaming-related data 4) Human generated unstructured data
e) Mobile data 5) Human generated unstructured data
1M-4 Match the following 1 CO1
a) Big Data 1) Process of analyzing big data to extract useful insights.
b) Four V’s of Big data 2) Technological advancements & increasing data generation.
c) Drivers for Big data 3) Large volume of structured and unstructured data.
d) Big data analytics 4) Volume, Velocity, Variety, and Veracity
e) Data 5) Collection of Raw Facts and Figures.
1M-5 Match the following 3 CO1
a) Volume 1) Social media data (Facebook, Twitter).
b) Velocity 2) Real-time stock market analysis.
c) Variety 3) Semi-structured, Structured and Unstructured data.
d) Veracity 4) Data accuracy and trustworthiness
e) Variability 5) Data flows can be highly Inconsistent.

5-MARKS QUESTIONS
1 Explain the concept of Big Data and discuss its significance in today's business and technological 2 CO1
landscape.?

2 Define the Six V’s of Big Data and explain how each factor (Volume, Velocity, Variety, 2 CO1
Veracity,Value,Variability) impacts the way data is processed and analysed?

3 What are the major Key components of Big data Analytics? 1 CO1
4 Explain the term "Velocity" in the context of Big Data and discuss the challenges that organizations 4 CO1
face when trying to process and analyze data in real time.

5 What is meant by "Variety" in Big Data? How does the diverse nature of data types (e.g., structured, 2 CO1
unstructured, and semi-structured) affect the way businesses manage data?

6 Define the concept of "Veracity" in Big Data. What are some common issues related to data 1 CO1
uncertainty, and how can organizations address them?

7 What is a Big Data Analytics how it is helpful in the real time applications? 4 CO1
8 What are the major Key components of Big Data Analytics? 4 CO1
9 What are the different types of Big Data Analytics? 2 CO1
10 What is the Importance of Big Data for Business? 1 CO1
11 Discuss the main challenges faced by organizations when performing Big Data Analytics. How can 4 CO1
these challenges be overcome?

12 What are the Applications of Big data Analytics? 4 CO1


13 What are the different types of Drivers in Big Data in the Business Sector? 4 CO1
14 How does Big Data Analytics benefit the retail industry? Discuss its role in personalizing customer 2 CO1
experiences, optimizing inventory, and enhancing marketing strategies.

15 Discuss the applications of Big Data Analytics in the development of Smart Cities. How can 2 CO1
analyzing data from urban infrastructure help improve traffic management, energy consumption, and
public services?
UNIT-II:BIG DATA TECHNOLOGIES
Multiple choice Questions
2C-1 What is the primary purpose of Hadoop's MapReduce? 1 CO2
A) Data storage
B) Data processing
C) Data visualization
D) Data mining
2C-2 Which of the following is an open-source technology for Big Data analytics? 1 CO2
A) Apache Hadoop
B) Microsoft SQL Server
C) Oracle Database
D) IBM DB2
2C-3 What is the term for the process of identifying, categorizing, and understanding the data available 1 CO2
within an organization?
A) Data discovery
B) Data profiling
C) Data cataloging
D) Data mining
2C-4 Which of the following cloud computing benefits is particularly relevant to Big Data analytics? 1 CO2
A) Scalability
B) Security
C) Compliance
D) Cost-effectiveness
2C-5 What is the term for the type of advanced analytics that uses statistical models and machine learning 1 CO2
algorithms to predict future events or outcomes?
A) Descriptive analytics
B) Predictive analytics
C) Prescriptive analytics
D) Diagnostic analytics
2C-6 Which of the following mobile business intelligence benefits is particularly relevant to Big Data 2 CO2
analytics?
A) Improved decision-making
B) Increased productivity
C) Enhanced customer experience
D) All of the above
2C-7 What is the term for the distributed computing framework that enables the processing of large 1 CO2
datasets in parallel?
A) Apache Hadoop
B) Apache Spark
C) Apache Cassandra
D) MapReduce
2C-8 Which of the following data discovery techniques is used to analyze data to understand its 5 CO2
distribution, patterns, and relationships?
A) Data profiling
B) Data cataloguing
C) Data mining
D) Data visualization
2C-9 What was Hadoop written in? 5 CO2
a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)
2C-10 Which of the following predictive analytics techniques is used to forecast outcomes based on 4 CO2
historical data?
A) Statistical modelling
B) Machine learning
C) Data mining
D) Text analytics
2C-11 What is the term for the ability to access and analyze data on mobile devices? 2 CO2
A) Mobile business intelligence
B) Mobile data analytics
C) Mobile data science
D) Mobile data mining
2C-12 Which of the following Big Data analytics benefits is particularly relevant to organizations? 5 CO2
A) Improved decision-making
B) Increased productivity
C) Enhanced customer experience
D) All of the above
2C-13 What license is Hadoop distributed under? 2 CO2
a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial
2C-14 What was Hadoop named after? 4 CO2
a) Creator Doug Cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop development
2C-15 Which of the following platforms does Hadoop run on? 5 CO2
a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like
Fill in the blanks
2F-1 The primary purpose of is to enable the processing of large datasets in 2 CO2
parallel.
2F-2 is an open-source technology for Big Data analytics. 4 CO2
2F-3 The process of identifying, categorizing, and understanding the data available within an organization 6 CO2
is called .
2F-4 is a cloud computing benefit that is particularly relevant to Big Data 6 CO2
analytics.
2F-5 is the type of advanced analytics that uses statistical models and 5 CO2
machine learning algorithms to predict future events or outcomes.
2F-6 is a mobile business intelligence benefit that is particularly relevant to 4 CO2
Big Data analytics.
2F-7 is the distributed computing framework that enables the processing of 5 CO2
large datasets in parallel.
2F-8 is the distributed computing framework that enables the processing 6 CO2
of large datasets in parallel.
2F-9 is the NoSQL database designed for handling large amounts of 3 CO2
distributed data.
2F-10 is a predictive analytics technique used to forecast outcomes based on 4 CO2
historical data.
2F-11 is the ability to access and analyse data on mobile devices. 5 CO2
2F-12 The process of creating a centralized repository of metadata about the data available within 6 CO2
an organization is called .
2F-13 is an open-source technology used for in-memory computing. 4 CO2
2F-14 The ability to quickly scale up or down to meet changing business needs is called 4 CO2
.
2F-15 Big Data analytics can help organizations to by analyzing data on 3 CO2
operations.
Match the following
2M-1 Match the following 5 CO2
a) Hadoop 1) Open-source framework for storing and processing Big Data
b) Predictive analytics 2) Uses historical data to make future predictions.
c) Mobile Business Intelligence 3) Real-time business analytics on mobile devices
d) Open-source Technology 4) Software available for free modification and distribution
2M-2 Match the following 6 CO2
a) Hadoop’s Parallel World 1) Distributed processing across multiple nodes
b) Cloud Big Data 2) Google Big Query, AWS Redshift.
c) Mobile Analytics 3) Identifying patterns and trends in datasets.
d) Data Discovery 4) Tracking user behavior on mobile apps
2M-3 Match the following 4 CO2
a) HDFS 1) SQL-like query language for Hadoop.
b) Map Reduce 2) Parallel processing framework in Hadoop
c) Pig 3) Hadoop Distributed File System.
d) Hive 4) High-level scripting language for data transformation.
2M-4 Match the following 2 CO2
a) Data Processing Framework 1) Scalable databases like MongoDB and Cassandra
b) Machine Learning in Big Data 2) Parallel processing framework in Hadoop
c) NoSQL Databases 3) Apache Spark, Hadoop MapReduce
d) Distributed Computing 4) Processing data across multiple machines.
2M-5 Match the following 3 CO2
a) Business Intelligence 1) Data-driven decision-making in organizations.
b) Data Ingestion 2) Processing data closer to the source
c) Edge Computing 3) Collecting and importing raw data into systems
d) Big Data Security 4) Protecting data from unauthorized access.
5-MARKS QUESTIONS
1 Describe the architecture of Hadoop's MapReduce and explain its role in processing large 6 CO2
datasets.
2 Explain the concept of data discovery and its importance in Big Data analytics. Describe the 2 CO2
various techniques used in data discovery.
3 Describe the features and benefits of using Apache Hadoop in Big Data analytics. Explain its role 2 CO2
in processing large datasets.
4 Explain the relationship between cloud computing and Big Data. Describe the benefits of using 5 CO2
cloud computing in Big Data analytics.
5 Describe the concept of predictive analytics and its importance in Big Data analytics. Explain the 6 CO2
various techniques used in predictive analytics.
6 Explain the role of mobile business intelligence in Big Data analytics. Describe the benefits of 2 CO2
using mobile devices to access and analyze data.
7 Describe the features and benefits of using Apache Spark in Big Data analytics. Explain its role 2 CO2
in processing large datasets.
8 Explain the importance of data cataloguing in Big Data analytics. Describe the various techniques 2 CO2
used in data cataloguing?
9 Describe the concept of scalability and its importance in Big Data analytics. Explain the various 1 CO2
techniques used to achieve scalability.
10 Explain the role of Apache Cassandra in Big Data analytics. Describe its features and benefits. 3 CO2
11 Describe the concept of data profiling and its importance in Big Data analytics. Explain the 4 CO2
various techniques used in data profiling.
12 Explain the relationship between statistical modelling and predictive analytics. Describe the 4 CO2
various techniques used in statistical modelling.
13 Describe the benefits of using mobile data analytics in Big Data analytics. Explain the various 2 CO2
techniques used in mobile data analytics.
14 Explain the importance of open-source technology in Big Data analytics. Describe the 1 CO2
various open-source technologies used in Big Data analytics.
15 Describe the importance of Big Data analytics in today's business environment. Explain the 3 CO2
various benefits of using Big Data analytics.
UNIT–III: INTRODUCTION TO HADOOP
Multiple choice Questions
3C-1 What is the primary purpose of Hadoop's MapReduce? 1 CO3
A) Data storage
B) Data processing
C) Data visualization
D) Data mining
3C-2 Which of the following is a component of the Hadoop ecosystem? 1 CO3
A) Apache Spark
B) Apache HBase
C) Apache Hive
D) All of the above
3C-3 What is the process of moving data into Hadoop called? 1 CO3
A) Data ingestion
B) Data export
C) Data processing
D) Data storage
3C-4 Hadoop is a framework that works with a variety of related tools. Common cohorts include 1 CO3
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
3C-5 What is the output of a MapReduce job? 1 CO3
a) A processed dataset
b) A raw dataset
c) A data model
d) A data visualization
3C-6 Which of the following data serialization formats is used in Hadoop? 6 CO3
A) Apache Avro
B) Apache Protocol Buffers
C) JSON
D) All of the above
3C-7 What is the purpose of Data serialization in Hadoop? 2 CO3
A) To compress data
B) To transmit data over a network
C) To convert data into a readable format
D) All of the above
3C-8 Which of the following is a benefit of using Hadoop? 1 CO4
A) Scalability
B) Flexibility
C) Cost-effectiveness
D) All of the above
3C-9 What is the name of the programming model used for processing large datasets in parallel across 2 CO4
a cluster of computers?
A) MapReduce
B) Apache Spark
C) Apache Hadoop
D) HDFS
3C-10 Which of the following is a component of the MapReduce programming model? 1 CO4
A) Mapper
B) Reducer
C) Combiner
D) All of the above
3C-11 is a platform for constructing data flows for extract, transform, and load 6 CO4
(ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive
3C-12 is general-purpose computing model and runtime system for distributed 4 CO4
data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned
3C-13 is the most popular high-level Java API in Hadoop Ecosystem 6 CO4
a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
3C-14 has the world’s largest Hadoop cluster. 5 CO4
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned
3C-15 Facebook Tackles Big Data With based on Hadoop. 3 CO4
a) ‘Project Prism’
b) ‘Prism’
c) ‘Project Big’
d) ‘Project Data’
Fill in the blanks
3F-1 The primary purpose of Hadoop's is to enable the processing of 3 CO3
large datasets in parallel.
3F-2 The Hadoop ecosystem consists of various tools and technologies, including 3 CO3
.
3F-3 The process of moving data into Hadoop is called . 4 CO3

3F-4 Apache is a tool used for data ingestion in Hadoop 6 CO3

3F-5 The output of a MapReduce job is a . 5 CO3

3F-6 Data serialization is the process of converting data into a format that can be 2 CO3
.
3F-7 Apache is a data serialization format used in Hadoop. 6 CO3

3F-8 The purpose of data serialization in Hadoop is to . 4 CO4


3F-9 Hadoop provides a platform for processing large datasets. 3 CO4

3F-10 Apache Hive is a data warehousing and SQL-like query language for 5 CO4
.
3F-11 Apache Pig is a high-level data processing language and framework for 5 CO4
.
3F-12 Data serialization is important in Hadoop because it enables data to be stored and transmitted 6 CO4
.
3F-13 Hadoop is an open-source, distributed computing framework that enables the processing of large 4 CO4
datasets across a cluster of .
3F-14 Data can be moved into Hadoop using tools such as . 4 CO4

3F-15 The inputs to MapReduce are typically large datasets stored in Hadoop's 5 CO4

Match the following

3M-1 Match the following 4 CO3


a) Apache Hadoop 1) Programming model for parallel data processing
b) Hadoop Ecosystem 2) Includes HDFS, MapReduce, YARN, Hive, Pig, etc.
c) Data Serialization 3) Process of converting data into a format for storage or transmission
d) Map Reduce 4) Framework for distributed storage and processing of Big Data

3M-2 Match the following 5 CO3


a) Data Serialization 1) Text files, Sequence files, Parquet files
b) Moving Data In & Out of Hadoop 2) HDFS storing large files across multiple machines
c) Hadoop Input & Output 3) Sqoop for databases, Flume for logs
d) Distributed Storage 4) JSON, Avro, Protocol Buffers
3M-3 Match the following 3 CO3
a) HDFS Replication 1) Map, Shuffle, Reduce
b) MapReduce Phases 2) Splitting data into blocks and storing in HDFS
c) Hadoop Configuration 3) Copies of data blocks across multiple nodes
d) Hadoop File Write Process 4) Setting up cluster nodes, memory allocation.
3M-4 Match the following 6 CO4
a) YARN 1) NoSQL database for Hadoop
b) HDFS Daemons 2) Cloudera, Hortonworks, Map R
c) Hadoop Distributors 3) Name node, Data node, Secondary Name node
d) Hbase 4) Resource manager in Hadoop

3M-5 Match the following 3 CO4


a) Hadoop Streaming 1) Manages task execution in Hadoop
b) Data Locality 2) Recovering from hardware failures in Hadoop
c) Fault Tolerance 3) Running non-Java code in MapReduce
d) Job Tracker 4) Processing data where it is stored
5-Mark Questions
1 What is Big Data? Discuss the three main characteristics of Big Data (Volume, Velocity, and 2 CO3
Variety).

2 Explain in detail about Hadoop and its core components: HDFS, MapReduce, and YARN. 3 CO3

3 Describe the role and functionality of HDFS (Hadoop Distributed File System) in the Hadoop 1 CO3
ecosystem. How does it handle large datasets?

4 What is MapReduce in Hadoop? Explain the Map and Reduce phases of the MapReduce process 4 CO3
with an example

5 Discuss the role of YARN (Yet Another Resource Negotiator) in managing resources in a 5 CO3
Hadoop cluster. How does it improve resource allocation?

6 What is the Hadoop ecosystem? List and briefly describe five key components or tools in the 6 CO3
Hadoop ecosystem.

7 Explain the process of moving data into Hadoop using Apache Flume. How does it help in 1 CO3
managing log data?

8 Describe how Apache Sqoop is used to transfer data between relational databases and Hadoop. 5 CO4
Provide an example scenario where Sqoop is beneficial.

9 Discuss the importance of data serialization in Hadoop. What are the benefits of using formats 4 CO4
like Avro, Parquet, and Sequence File for data serialization?

10 How does MapReduce handle the input data? Explain how the input data is split and processed 3 CO4
across different nodes in a Hadoop cluster.

11 What is the significance of Hadoop's Writable interface? How does it help in the serialization and 5 CO4
deserialization of data in MapReduce jobs?

12 Explain the role of Apache Kafka in the Hadoop ecosystem. How does it support real-time data 1 CO4
streaming into Hadoop?

13 Discuss the features and advantages of using Apache Hive for querying and managing large 3 CO4
datasets in Hadoop.

14 What is the difference between HDFS and traditional file systems? How does HDFS ensure fault 1 CO4
tolerance and high availability?

15 Describe the concept of "data locality" in Hadoop. How does it affect the performance of 1 CO4
MapReduce jobs in a Hadoop cluster?
UNIT IV: HADOOP ARCHITECTURE
Multiple choice Questions
4C-1 Which of the following is a component of Hadoop? 1 CO3
a) MySQL
b) PostgreSQL
c) HDFS
d) Oracle

4C-2 What does HDFS stand for? 1 CO3


a) Hadoop File Distribution System
b) Hadoop Distributed File Storage
c) Hadoop Distributed File System
d) High Definition File System

4C-3 Which node stores metadata in HDFS? 1 CO3


a) DataNode
b) TaskTracker
c) NameNode
d) ResourceManager
4C-4 1 CO3

The default block size in HDFS is:


a) 16MB
b) 64MB
c) 128MB
d) 1GB
4C-5 Which node handles actual data storage in HDFS? 1 CO3
a) NameNode
b) ResourceManager
c) JobTracker
d) DataNode

4C-6 Which of the following is NOT a Hadoop distributor? 6 CO3


a) Cloudera
b) Hortonworks
c) Oracle
d) MapR

4C-7 Which component of Hadoop processes data? 2 CO3


a) HDFS
b) MapReduce
c) NameNode
d) HBase
4C-8 HBase is built on top of which storage system? 1 CO4
a) MySQL
b) Hive
c) HDFS
d) Oracle

4C-9 What is the role of Secondary NameNode? 2 CO4


a) Backup for NameNode
b) Processes client requests
c) Stores data blocks
d) Executes Map tasks
4C-10 Hive converts queries into: 1 CO4
a) SQL queries
b) Spark jobs
c) Java code
d) MapReduce jobs

4C-11 Which of the following is true about Pig? 6 CO4


a) It is a relational database
b) Uses Pig Latin
c) Replaces MapReduce
d) Only used for streaming data

4C-12 Which architecture does Hadoop follow? 4 CO4


a) Peer-to-peer
b) Client-server
c) Master-slave
d) Mesh

4C-13 6 CO4
Which file contains core configuration settings in Hadoop?
a) mapred-site.xml
b) yarn-site.xml
c) core-site.xml
d) hdfs-site.xml

4C-14 5 CO4
What does MapReduce consist of?
a) Read and Write
b) Map and Reduce functions
c) Input and Output
d) Block and File

4C-15 Which is used for real-time read/write operations in Hadoop? 3 CO4


a) Hive
b) Pig
c) HBase
d) MapReduce

FILL IN THE BLANKS


4F-1 HDFS stands for _________. 3 CO3

4F-2 Hadoop is designed for _________ processing of large data sets. 3 CO3

4F-3 Hadoop is an open-source framework used for storing and processing __________ data sets. 4 CO3

4F-4 In Hadoop, data is stored in a distributed file system known as __________. 6 CO3

4F-5 The primary node in HDFS responsible for managing file system metadata is called the 5 CO3
__________.
4F-6 The __________ Node periodically pulls a copy of the file system metadata from the Name 2 CO3
Node.
4F-7 The actual data in HDFS is stored in blocks on __________ Nodes. 6 CO3

4F-8 HDFS follows a __________ architecture for scalability and fault tolerance. 4 CO4

4F-9 The default block size in HDFS is __________ MB (configurable). 3 CO4

4F-10 The process of writing a file in HDFS begins by contacting the __________. 5 CO4

4F-11 In a MapReduce program, the __________ function processes input data and produces key-value 5 CO4
pairs.
4F-12 is a distributed, column-oriented NoSQL database built on top of HDFS. 6 CO4

4F-13 Is a data warehouse infrastructure built on top of Hadoop for querying and 4 CO4
managing large datasets using SQL-like syntax.
4F-14 is a high-level platform for creating MapReduce programs using a scripting 4 CO4
language.

4F-15 Hadoop can be distributed by vendors such as Cloudera, Hortonworks, and __________. 5 CO4

Match the following

4M-1 Match the following 4 CO3


a. NameNode 1. Stores actual data in HDFS
b. DataNode 2. Coordinates data processing jobs
c. Secondary NameNode 3. Stores metadata about HDFS files
d. ResourceManager 4. Performs periodic checkpoints
e. NodeManager 5. Manages execution on a single node

4M-2 Match the following 5 CO3


a. RDBMS 1. Schema-less or flexible schema
b. Hadoop 2. Schema-on-write
c. Structured Data 3. Stored in tables with predefined schema
d. Unstructured Data 4. Can include images, logs, or text
e. Hadoop Storage 5. Distributed across multiple nodes

4M-3 Match the following 3 CO3


a. NameNode 1. Controls individual DataNode services
b. DataNode 2. Stores blocks of data
c. Secondary NameNode 3. Helps in checkpointing
d. ResourceManager 4. Handles job scheduling
e. NodeManager 5. Works under ResourceManager
4M-4 Match the following 6 CO4
a. HBase 1. NoSQL database on Hadoop
b. Hive 2. SQL-like querying tool
c. Pig 3. Dataflow scripting language
d. MapReduce 4. Programming model for processing
e. HDFS 5. Storage system for Hadoop

4M-5 Match the following 3 CO4


a. Cloudera 1. Provides CDP (Cloudera Data Platform)
b. Hortonworks 2. Merged with Cloudera in 2019
c. MapR 3. Known for its high-performance FS
d. Apache Foundation 4. Maintains Hadoop open source
e. Amazon EMR 5. Cloud-based Hadoop solution
5-Mark Questions

1 Compare and contrast RDBMS and Hadoop with respect to data handling, scalability, and 2 CO3
structure.

2 Describe the core components of the Hadoop framework and explain how they work together. 3 CO3

3 List any three major Hadoop distributors and describe the key features of any one of them. 1 CO3

4 Explain the role and structure of HDFS in the Hadoop ecosystem. What makes it suitable for big 4 CO3
data storage?

5 What are the major daemons in HDFS? Describe the role of each daemon. 5 CO3

6 Describe the complete process of writing and reading a file in HDFS. Use a diagram if necessary. 6 CO3

7 What is the NameNode in Hadoop? Explain its responsibilities and why it is considered the 1 CO3
centerpiece of HDFS.

8 Discuss the purpose of the Secondary NameNode. How is it different from a backup NameNode? 5 CO4

9 Describe the role of a DataNode in Hadoop. How does it interact with the NameNode and 4 CO4
client?

10 Explain the architecture of HDFS using a neat labeled diagram. Highlight the data flow in 3 CO4
read/write operations.

11 Discuss the importance of Hadoop configuration files. Name any two and explain their use. 5 CO4

12 Explain the working of the MapReduce framework with an example. How does it achieve 1 CO4
parallelism?

13 What is HBase and how does it complement Hadoop in real-time big data applications? 3 CO4

14 What is Hive? Explain its architecture and how it simplifies querying in Hadoop. 1 CO4

15 Describe Apache Pig and its data flow language Pig Latin. How does it benefit data analysts 1 CO4
working with Hadoop?
UNIT–V:DATA ANALYTICS WITH R, MACHINE LEARNING
Multiple choice Questions
5C-1 Which of the following is a supervised learning algorithm? 1 CO3
a) K-means
b) KNN
c) PCA
d) Apriori

5C-2 In R, which package is commonly used for data visualization? 1 CO3


a) dplyr
b) ggplot2
c) tidyr
d) lubridate

5C-3 Which machine learning type does not use labeled data? 1 CO3
a) Supervised learning
b) Unsupervised learning
c) Reinforcement learning
d) Deep learning

5C-4 Collaborative filtering is mainly used in: 1 CO3


a) Fraud detection
b) Recommendation systems
c) Image classification
d) Spam filtering

5C-5 Which of the following is an example of a classification problem? 1 CO3


a) Predicting stock prices
b) Grouping customers based on behavior
c) Predicting if an email is spam or not
d) Finding frequent itemsets

5C-6 Which function in R is used to create a decision tree? 6 CO3


a) lm()
b) rpart()
c) kmeans()
d) svm()

5C-7 The term “Big R” refers to: 2 CO3


a) R with big screens
b) Use of R in embedded systems
c) Scalable R tools for big data analytics
d) R for graphics

5C-8 Which R package is commonly used for machine learning? 1 CO4


a) lattice
b) e1071
c) MASS
d) forecast
5C-9 Which algorithm is used for clustering in unsupervised learning? 2 CO4
a) Linear regression
b) Logistic regression
c) K-means
d) Decision tree

5C-10 Which of the following best describes collaborative filtering? 1 CO4


a) It filters unwanted messages
b) It groups similar data points
c) It makes recommendations based on user-item interactions
d) It filters data from social media

5C-11 Which package in R can be used for sentiment analysis in social media analytics? 6 CO4
a) sentimentr
b) dplyr
c) stats
d) lubridate

5C-12 Which mobile analytics metric indicates how many users uninstall an app? 4 CO4
a) Engagement rate
b) Churn rate
c) Retention rate
d) Conversion rate

5C-13 6 CO4
Which method is used in Big R to connect to Hadoop?
a) HadoopConnect
b) rhdfs
c) hbaseR
d) sparklyr
5C-14 5 CO4
Which of the following is an unsupervised learning technique?
a) Naive Bayes
b) Decision Trees
c) K-means Clustering
d) Random Forest
5C-15 Which function in R is used for linear regression? 3 CO4
a) linreg()
b) lm()
c) reg()
d) lr()
Fill in the blanks
5F-1 The __________ function in R is used to build linear models. 3 CO3

5F-2 In supervised learning, the model is trained on a dataset that includes __________ labels. 3 CO3

5F-3 The R package __________ is used for data manipulation and transformation. 4 CO3

5F-4 In collaborative filtering, recommendations are made based on __________ behavior. 6 CO3
5F-5 K-means is a type of __________ learning algorithm. 5 CO3

5F-6 In social media analytics, __________ analysis is used to determine users’ emotions or opinions. 2 CO3

5F-7 The __________ package in R is widely used for creating plots and graphs 6 CO3

5F-8 In machine learning, the process of improving model accuracy using test data is called 4 CO4
__________.

5F-9 Mobile analytics helps in tracking user behavior through __________ devices. 3 CO4

5F-10 The R package __________ is used to connect R with Hadoop’s file system. 5 CO4

5F-11 In classification, the output variable is typically __________. 5 CO4

5F-12 The term “Big R” refers to using R for __________ data analytics. 6 CO4

5F-13 Social media data is often __________, meaning it comes in many different formats. 4 CO4

5F-14 The __________ method in R is used to train support vector machines. 4 CO4

5F-15 _________ learning uses feedback to learn actions (e.g., gaming bots). 5 CO4

Match the following

5M-1 a. ggplot2 1. Data manipulation 4 CO4


b. dplyr 2. Machine learning algorithms
c. caret 3. Text mining
d. tm 4. Data visualization
e. shiny 5. Web application framework in R
5M-2 a. Linear Regression 1. Unsupervised Learning 5 CO4
b. K-means Clustering 2. Supervised Learning
c. Decision Trees 3. Based on labeled data
d. PCA 4. Dimensionality reduction
e. Supervised Learning 5. Clustering algorithm
5M-3 a. Social Media Analytics 1. Customer churn tracking 3 CO4
b. Mobile Analytics 2. Analyzing tweets and sentiments
c. Collaborative Filtering 3. Predicting user ratings
d. Supervised Learning 4. Email spam detection
e. Unsupervised Learning 5. Customer segmentation
5M-4 a. lm() 1. Clustering (K-means) 6 CO4
b. kmeans() 2. Linear regression model
c. predict() 3. Model prediction
d. rpart() 4. Decision tree
e. table() 5. Frequency distribution
5-Mark Questions
5M-5 a. rhdfs 1. Connect 3 CO4
1 Define Data Analytics. How isRR with
used Spark
as a tool for data analysis and visualization? 2 CO5
b. sparklyr 2. Use of Hadoop File System in R
c. Big R 3. R for Big Data processing
2 Differentiate
d. HDFS between Supervised and Unsupervised Learning. Provide suitable examples.
4. Storage layer in Hadoop 3 CO5
e. MapReduce
3 Explain 5.implement
the steps to Programming model for
a supervised big data
learning algorithm in R. Use any example like linear 1 CO5
regression.

4 Describe the K-means clustering algorithm and how it is applied in R. 4 CO5

5 What is the difference between classification and regression? Explain with reference to R 5 CO5
functions.

6 Explain collaborative filtering. How does it work in recommendation systems? 6 CO5

7 What is the role of R in building recommendation systems using user-item interaction data? 1 CO5

8 Discuss the application of R in Social Media Analytics. Mention any relevant libraries. 5 CO5

9 What is sentiment analysis in social media? How can it be implemented in R using real-time 4 CO5
Twitter data?

10 Explain the importance of Mobile Analytics. What kind of insights can it offer to businesses? 3 CO5

11 What are the challenges of using R for Big Data Analytics and how can they be addressed? 5 CO5

12 What is Big R? Describe how R can be used in a big data environment (e.g., Hadoop, Spark). 1 CO5

13 List and explain any five important R packages used in machine learning and data analytics. 3 CO5

14 How does data preprocessing influence machine learning performance in R? Explain with an 1 CO5
example.

15 Describe a complete machine learning workflow in R — from importing data to evaluating the 1 CO5
model.

FACULTY HOD

You might also like