Big Data Analytics Question Bank Nw
Big Data Analytics Question Bank Nw
Prepared on Reg:
LESSON PLAN
27/02/2025
(Regulation: R22)
Page: 1 of 20
DEPARTMENT OF AI&DS and DATA SCIENCE
Sub.Code & Title (R22CSD3213) BIG DATA ANALYTICS
Academic Year: 2024-25 Year/Sem./Section III/II AI&DS,DS (A&B)
D.SUMA,ASSISTANT PROFESSOR
Faculty Name & Designation
M.JYOTHI REDDY, ASSISTANT
PROFESSOR
5-MARKS QUESTIONS
1 Explain the concept of Big Data and discuss its significance in today's business and technological 2 CO1
landscape.?
2 Define the Six V’s of Big Data and explain how each factor (Volume, Velocity, Variety, 2 CO1
Veracity,Value,Variability) impacts the way data is processed and analysed?
3 What are the major Key components of Big data Analytics? 1 CO1
4 Explain the term "Velocity" in the context of Big Data and discuss the challenges that organizations 4 CO1
face when trying to process and analyze data in real time.
5 What is meant by "Variety" in Big Data? How does the diverse nature of data types (e.g., structured, 2 CO1
unstructured, and semi-structured) affect the way businesses manage data?
6 Define the concept of "Veracity" in Big Data. What are some common issues related to data 1 CO1
uncertainty, and how can organizations address them?
7 What is a Big Data Analytics how it is helpful in the real time applications? 4 CO1
8 What are the major Key components of Big Data Analytics? 4 CO1
9 What are the different types of Big Data Analytics? 2 CO1
10 What is the Importance of Big Data for Business? 1 CO1
11 Discuss the main challenges faced by organizations when performing Big Data Analytics. How can 4 CO1
these challenges be overcome?
15 Discuss the applications of Big Data Analytics in the development of Smart Cities. How can 2 CO1
analyzing data from urban infrastructure help improve traffic management, energy consumption, and
public services?
UNIT-II:BIG DATA TECHNOLOGIES
Multiple choice Questions
2C-1 What is the primary purpose of Hadoop's MapReduce? 1 CO2
A) Data storage
B) Data processing
C) Data visualization
D) Data mining
2C-2 Which of the following is an open-source technology for Big Data analytics? 1 CO2
A) Apache Hadoop
B) Microsoft SQL Server
C) Oracle Database
D) IBM DB2
2C-3 What is the term for the process of identifying, categorizing, and understanding the data available 1 CO2
within an organization?
A) Data discovery
B) Data profiling
C) Data cataloging
D) Data mining
2C-4 Which of the following cloud computing benefits is particularly relevant to Big Data analytics? 1 CO2
A) Scalability
B) Security
C) Compliance
D) Cost-effectiveness
2C-5 What is the term for the type of advanced analytics that uses statistical models and machine learning 1 CO2
algorithms to predict future events or outcomes?
A) Descriptive analytics
B) Predictive analytics
C) Prescriptive analytics
D) Diagnostic analytics
2C-6 Which of the following mobile business intelligence benefits is particularly relevant to Big Data 2 CO2
analytics?
A) Improved decision-making
B) Increased productivity
C) Enhanced customer experience
D) All of the above
2C-7 What is the term for the distributed computing framework that enables the processing of large 1 CO2
datasets in parallel?
A) Apache Hadoop
B) Apache Spark
C) Apache Cassandra
D) MapReduce
2C-8 Which of the following data discovery techniques is used to analyze data to understand its 5 CO2
distribution, patterns, and relationships?
A) Data profiling
B) Data cataloguing
C) Data mining
D) Data visualization
2C-9 What was Hadoop written in? 5 CO2
a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)
2C-10 Which of the following predictive analytics techniques is used to forecast outcomes based on 4 CO2
historical data?
A) Statistical modelling
B) Machine learning
C) Data mining
D) Text analytics
2C-11 What is the term for the ability to access and analyze data on mobile devices? 2 CO2
A) Mobile business intelligence
B) Mobile data analytics
C) Mobile data science
D) Mobile data mining
2C-12 Which of the following Big Data analytics benefits is particularly relevant to organizations? 5 CO2
A) Improved decision-making
B) Increased productivity
C) Enhanced customer experience
D) All of the above
2C-13 What license is Hadoop distributed under? 2 CO2
a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial
2C-14 What was Hadoop named after? 4 CO2
a) Creator Doug Cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop development
2C-15 Which of the following platforms does Hadoop run on? 5 CO2
a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like
Fill in the blanks
2F-1 The primary purpose of is to enable the processing of large datasets in 2 CO2
parallel.
2F-2 is an open-source technology for Big Data analytics. 4 CO2
2F-3 The process of identifying, categorizing, and understanding the data available within an organization 6 CO2
is called .
2F-4 is a cloud computing benefit that is particularly relevant to Big Data 6 CO2
analytics.
2F-5 is the type of advanced analytics that uses statistical models and 5 CO2
machine learning algorithms to predict future events or outcomes.
2F-6 is a mobile business intelligence benefit that is particularly relevant to 4 CO2
Big Data analytics.
2F-7 is the distributed computing framework that enables the processing of 5 CO2
large datasets in parallel.
2F-8 is the distributed computing framework that enables the processing 6 CO2
of large datasets in parallel.
2F-9 is the NoSQL database designed for handling large amounts of 3 CO2
distributed data.
2F-10 is a predictive analytics technique used to forecast outcomes based on 4 CO2
historical data.
2F-11 is the ability to access and analyse data on mobile devices. 5 CO2
2F-12 The process of creating a centralized repository of metadata about the data available within 6 CO2
an organization is called .
2F-13 is an open-source technology used for in-memory computing. 4 CO2
2F-14 The ability to quickly scale up or down to meet changing business needs is called 4 CO2
.
2F-15 Big Data analytics can help organizations to by analyzing data on 3 CO2
operations.
Match the following
2M-1 Match the following 5 CO2
a) Hadoop 1) Open-source framework for storing and processing Big Data
b) Predictive analytics 2) Uses historical data to make future predictions.
c) Mobile Business Intelligence 3) Real-time business analytics on mobile devices
d) Open-source Technology 4) Software available for free modification and distribution
2M-2 Match the following 6 CO2
a) Hadoop’s Parallel World 1) Distributed processing across multiple nodes
b) Cloud Big Data 2) Google Big Query, AWS Redshift.
c) Mobile Analytics 3) Identifying patterns and trends in datasets.
d) Data Discovery 4) Tracking user behavior on mobile apps
2M-3 Match the following 4 CO2
a) HDFS 1) SQL-like query language for Hadoop.
b) Map Reduce 2) Parallel processing framework in Hadoop
c) Pig 3) Hadoop Distributed File System.
d) Hive 4) High-level scripting language for data transformation.
2M-4 Match the following 2 CO2
a) Data Processing Framework 1) Scalable databases like MongoDB and Cassandra
b) Machine Learning in Big Data 2) Parallel processing framework in Hadoop
c) NoSQL Databases 3) Apache Spark, Hadoop MapReduce
d) Distributed Computing 4) Processing data across multiple machines.
2M-5 Match the following 3 CO2
a) Business Intelligence 1) Data-driven decision-making in organizations.
b) Data Ingestion 2) Processing data closer to the source
c) Edge Computing 3) Collecting and importing raw data into systems
d) Big Data Security 4) Protecting data from unauthorized access.
5-MARKS QUESTIONS
1 Describe the architecture of Hadoop's MapReduce and explain its role in processing large 6 CO2
datasets.
2 Explain the concept of data discovery and its importance in Big Data analytics. Describe the 2 CO2
various techniques used in data discovery.
3 Describe the features and benefits of using Apache Hadoop in Big Data analytics. Explain its role 2 CO2
in processing large datasets.
4 Explain the relationship between cloud computing and Big Data. Describe the benefits of using 5 CO2
cloud computing in Big Data analytics.
5 Describe the concept of predictive analytics and its importance in Big Data analytics. Explain the 6 CO2
various techniques used in predictive analytics.
6 Explain the role of mobile business intelligence in Big Data analytics. Describe the benefits of 2 CO2
using mobile devices to access and analyze data.
7 Describe the features and benefits of using Apache Spark in Big Data analytics. Explain its role 2 CO2
in processing large datasets.
8 Explain the importance of data cataloguing in Big Data analytics. Describe the various techniques 2 CO2
used in data cataloguing?
9 Describe the concept of scalability and its importance in Big Data analytics. Explain the various 1 CO2
techniques used to achieve scalability.
10 Explain the role of Apache Cassandra in Big Data analytics. Describe its features and benefits. 3 CO2
11 Describe the concept of data profiling and its importance in Big Data analytics. Explain the 4 CO2
various techniques used in data profiling.
12 Explain the relationship between statistical modelling and predictive analytics. Describe the 4 CO2
various techniques used in statistical modelling.
13 Describe the benefits of using mobile data analytics in Big Data analytics. Explain the various 2 CO2
techniques used in mobile data analytics.
14 Explain the importance of open-source technology in Big Data analytics. Describe the 1 CO2
various open-source technologies used in Big Data analytics.
15 Describe the importance of Big Data analytics in today's business environment. Explain the 3 CO2
various benefits of using Big Data analytics.
UNIT–III: INTRODUCTION TO HADOOP
Multiple choice Questions
3C-1 What is the primary purpose of Hadoop's MapReduce? 1 CO3
A) Data storage
B) Data processing
C) Data visualization
D) Data mining
3C-2 Which of the following is a component of the Hadoop ecosystem? 1 CO3
A) Apache Spark
B) Apache HBase
C) Apache Hive
D) All of the above
3C-3 What is the process of moving data into Hadoop called? 1 CO3
A) Data ingestion
B) Data export
C) Data processing
D) Data storage
3C-4 Hadoop is a framework that works with a variety of related tools. Common cohorts include 1 CO3
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
3C-5 What is the output of a MapReduce job? 1 CO3
a) A processed dataset
b) A raw dataset
c) A data model
d) A data visualization
3C-6 Which of the following data serialization formats is used in Hadoop? 6 CO3
A) Apache Avro
B) Apache Protocol Buffers
C) JSON
D) All of the above
3C-7 What is the purpose of Data serialization in Hadoop? 2 CO3
A) To compress data
B) To transmit data over a network
C) To convert data into a readable format
D) All of the above
3C-8 Which of the following is a benefit of using Hadoop? 1 CO4
A) Scalability
B) Flexibility
C) Cost-effectiveness
D) All of the above
3C-9 What is the name of the programming model used for processing large datasets in parallel across 2 CO4
a cluster of computers?
A) MapReduce
B) Apache Spark
C) Apache Hadoop
D) HDFS
3C-10 Which of the following is a component of the MapReduce programming model? 1 CO4
A) Mapper
B) Reducer
C) Combiner
D) All of the above
3C-11 is a platform for constructing data flows for extract, transform, and load 6 CO4
(ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive
3C-12 is general-purpose computing model and runtime system for distributed 4 CO4
data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned
3C-13 is the most popular high-level Java API in Hadoop Ecosystem 6 CO4
a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
3C-14 has the world’s largest Hadoop cluster. 5 CO4
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned
3C-15 Facebook Tackles Big Data With based on Hadoop. 3 CO4
a) ‘Project Prism’
b) ‘Prism’
c) ‘Project Big’
d) ‘Project Data’
Fill in the blanks
3F-1 The primary purpose of Hadoop's is to enable the processing of 3 CO3
large datasets in parallel.
3F-2 The Hadoop ecosystem consists of various tools and technologies, including 3 CO3
.
3F-3 The process of moving data into Hadoop is called . 4 CO3
3F-6 Data serialization is the process of converting data into a format that can be 2 CO3
.
3F-7 Apache is a data serialization format used in Hadoop. 6 CO3
3F-10 Apache Hive is a data warehousing and SQL-like query language for 5 CO4
.
3F-11 Apache Pig is a high-level data processing language and framework for 5 CO4
.
3F-12 Data serialization is important in Hadoop because it enables data to be stored and transmitted 6 CO4
.
3F-13 Hadoop is an open-source, distributed computing framework that enables the processing of large 4 CO4
datasets across a cluster of .
3F-14 Data can be moved into Hadoop using tools such as . 4 CO4
3F-15 The inputs to MapReduce are typically large datasets stored in Hadoop's 5 CO4
2 Explain in detail about Hadoop and its core components: HDFS, MapReduce, and YARN. 3 CO3
3 Describe the role and functionality of HDFS (Hadoop Distributed File System) in the Hadoop 1 CO3
ecosystem. How does it handle large datasets?
4 What is MapReduce in Hadoop? Explain the Map and Reduce phases of the MapReduce process 4 CO3
with an example
5 Discuss the role of YARN (Yet Another Resource Negotiator) in managing resources in a 5 CO3
Hadoop cluster. How does it improve resource allocation?
6 What is the Hadoop ecosystem? List and briefly describe five key components or tools in the 6 CO3
Hadoop ecosystem.
7 Explain the process of moving data into Hadoop using Apache Flume. How does it help in 1 CO3
managing log data?
8 Describe how Apache Sqoop is used to transfer data between relational databases and Hadoop. 5 CO4
Provide an example scenario where Sqoop is beneficial.
9 Discuss the importance of data serialization in Hadoop. What are the benefits of using formats 4 CO4
like Avro, Parquet, and Sequence File for data serialization?
10 How does MapReduce handle the input data? Explain how the input data is split and processed 3 CO4
across different nodes in a Hadoop cluster.
11 What is the significance of Hadoop's Writable interface? How does it help in the serialization and 5 CO4
deserialization of data in MapReduce jobs?
12 Explain the role of Apache Kafka in the Hadoop ecosystem. How does it support real-time data 1 CO4
streaming into Hadoop?
13 Discuss the features and advantages of using Apache Hive for querying and managing large 3 CO4
datasets in Hadoop.
14 What is the difference between HDFS and traditional file systems? How does HDFS ensure fault 1 CO4
tolerance and high availability?
15 Describe the concept of "data locality" in Hadoop. How does it affect the performance of 1 CO4
MapReduce jobs in a Hadoop cluster?
UNIT IV: HADOOP ARCHITECTURE
Multiple choice Questions
4C-1 Which of the following is a component of Hadoop? 1 CO3
a) MySQL
b) PostgreSQL
c) HDFS
d) Oracle
4C-13 6 CO4
Which file contains core configuration settings in Hadoop?
a) mapred-site.xml
b) yarn-site.xml
c) core-site.xml
d) hdfs-site.xml
4C-14 5 CO4
What does MapReduce consist of?
a) Read and Write
b) Map and Reduce functions
c) Input and Output
d) Block and File
4F-2 Hadoop is designed for _________ processing of large data sets. 3 CO3
4F-3 Hadoop is an open-source framework used for storing and processing __________ data sets. 4 CO3
4F-4 In Hadoop, data is stored in a distributed file system known as __________. 6 CO3
4F-5 The primary node in HDFS responsible for managing file system metadata is called the 5 CO3
__________.
4F-6 The __________ Node periodically pulls a copy of the file system metadata from the Name 2 CO3
Node.
4F-7 The actual data in HDFS is stored in blocks on __________ Nodes. 6 CO3
4F-8 HDFS follows a __________ architecture for scalability and fault tolerance. 4 CO4
4F-10 The process of writing a file in HDFS begins by contacting the __________. 5 CO4
4F-11 In a MapReduce program, the __________ function processes input data and produces key-value 5 CO4
pairs.
4F-12 is a distributed, column-oriented NoSQL database built on top of HDFS. 6 CO4
4F-13 Is a data warehouse infrastructure built on top of Hadoop for querying and 4 CO4
managing large datasets using SQL-like syntax.
4F-14 is a high-level platform for creating MapReduce programs using a scripting 4 CO4
language.
4F-15 Hadoop can be distributed by vendors such as Cloudera, Hortonworks, and __________. 5 CO4
1 Compare and contrast RDBMS and Hadoop with respect to data handling, scalability, and 2 CO3
structure.
2 Describe the core components of the Hadoop framework and explain how they work together. 3 CO3
3 List any three major Hadoop distributors and describe the key features of any one of them. 1 CO3
4 Explain the role and structure of HDFS in the Hadoop ecosystem. What makes it suitable for big 4 CO3
data storage?
5 What are the major daemons in HDFS? Describe the role of each daemon. 5 CO3
6 Describe the complete process of writing and reading a file in HDFS. Use a diagram if necessary. 6 CO3
7 What is the NameNode in Hadoop? Explain its responsibilities and why it is considered the 1 CO3
centerpiece of HDFS.
8 Discuss the purpose of the Secondary NameNode. How is it different from a backup NameNode? 5 CO4
9 Describe the role of a DataNode in Hadoop. How does it interact with the NameNode and 4 CO4
client?
10 Explain the architecture of HDFS using a neat labeled diagram. Highlight the data flow in 3 CO4
read/write operations.
11 Discuss the importance of Hadoop configuration files. Name any two and explain their use. 5 CO4
12 Explain the working of the MapReduce framework with an example. How does it achieve 1 CO4
parallelism?
13 What is HBase and how does it complement Hadoop in real-time big data applications? 3 CO4
14 What is Hive? Explain its architecture and how it simplifies querying in Hadoop. 1 CO4
15 Describe Apache Pig and its data flow language Pig Latin. How does it benefit data analysts 1 CO4
working with Hadoop?
UNIT–V:DATA ANALYTICS WITH R, MACHINE LEARNING
Multiple choice Questions
5C-1 Which of the following is a supervised learning algorithm? 1 CO3
a) K-means
b) KNN
c) PCA
d) Apriori
5C-3 Which machine learning type does not use labeled data? 1 CO3
a) Supervised learning
b) Unsupervised learning
c) Reinforcement learning
d) Deep learning
5C-11 Which package in R can be used for sentiment analysis in social media analytics? 6 CO4
a) sentimentr
b) dplyr
c) stats
d) lubridate
5C-12 Which mobile analytics metric indicates how many users uninstall an app? 4 CO4
a) Engagement rate
b) Churn rate
c) Retention rate
d) Conversion rate
5C-13 6 CO4
Which method is used in Big R to connect to Hadoop?
a) HadoopConnect
b) rhdfs
c) hbaseR
d) sparklyr
5C-14 5 CO4
Which of the following is an unsupervised learning technique?
a) Naive Bayes
b) Decision Trees
c) K-means Clustering
d) Random Forest
5C-15 Which function in R is used for linear regression? 3 CO4
a) linreg()
b) lm()
c) reg()
d) lr()
Fill in the blanks
5F-1 The __________ function in R is used to build linear models. 3 CO3
5F-2 In supervised learning, the model is trained on a dataset that includes __________ labels. 3 CO3
5F-3 The R package __________ is used for data manipulation and transformation. 4 CO3
5F-4 In collaborative filtering, recommendations are made based on __________ behavior. 6 CO3
5F-5 K-means is a type of __________ learning algorithm. 5 CO3
5F-6 In social media analytics, __________ analysis is used to determine users’ emotions or opinions. 2 CO3
5F-7 The __________ package in R is widely used for creating plots and graphs 6 CO3
5F-8 In machine learning, the process of improving model accuracy using test data is called 4 CO4
__________.
5F-9 Mobile analytics helps in tracking user behavior through __________ devices. 3 CO4
5F-10 The R package __________ is used to connect R with Hadoop’s file system. 5 CO4
5F-12 The term “Big R” refers to using R for __________ data analytics. 6 CO4
5F-13 Social media data is often __________, meaning it comes in many different formats. 4 CO4
5F-14 The __________ method in R is used to train support vector machines. 4 CO4
5F-15 _________ learning uses feedback to learn actions (e.g., gaming bots). 5 CO4
5 What is the difference between classification and regression? Explain with reference to R 5 CO5
functions.
7 What is the role of R in building recommendation systems using user-item interaction data? 1 CO5
8 Discuss the application of R in Social Media Analytics. Mention any relevant libraries. 5 CO5
9 What is sentiment analysis in social media? How can it be implemented in R using real-time 4 CO5
Twitter data?
10 Explain the importance of Mobile Analytics. What kind of insights can it offer to businesses? 3 CO5
11 What are the challenges of using R for Big Data Analytics and how can they be addressed? 5 CO5
12 What is Big R? Describe how R can be used in a big data environment (e.g., Hadoop, Spark). 1 CO5
13 List and explain any five important R packages used in machine learning and data analytics. 3 CO5
14 How does data preprocessing influence machine learning performance in R? Explain with an 1 CO5
example.
15 Describe a complete machine learning workflow in R — from importing data to evaluating the 1 CO5
model.
FACULTY HOD