0% found this document useful (0 votes)
30 views

Data Analytics

Uploaded by

Keerti Sh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Data Analytics

Uploaded by

Keerti Sh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Module 1 and module 2 practice questions Data Analytics

1. What is Data Science?

a) The study of ancient civilizations through data


b) A field that uses scientific methods, processes, algorithms, and systems to extract
knowledge from data
c) A branch of engineering focused on building data centers
d) The study of celestial objects through data

2. Which of the following is NOT a step in the data science process?

a) Data Collection
b) Data Cleaning
c) Data Dancing
d) Data Modeling

3. What does 'EDA' stand for in data analytics?

a) Enhanced Data Analytics


b) Exploratory Data Analysis
c) Extensive Data Application
d) Effective Data Aggregation

4. Which programming language is most commonly used for data science?

a) Java
b) Python
c) HTML
d) PHP

5. What is a 'dataset'?

a) A collection of algorithms used in machine learning


b) A collection of data points organized in a structured manner
c) A software tool for data visualization
d) A process for cleaning data

6. What is the primary purpose of data visualization?

a) To create complex data structures


b) To visually represent data to find patterns and insights
c) To replace data analysis
d) To store data in the cloud

7. Which of the following is a common data visualization tool?


a) Tableau
b) Eclipse
c) Visual Studio
d) MySQL

8. What does the term 'Big Data' refer to?

a) Data that is larger in size than usual


b) A specific database software
c) The practice of collecting small data sets
d) Data that cannot be stored on a single computer

9. Which of these is a popular library for data manipulation in Python?

a) Pandas
b) Jupyter
c) NumPy
d) TensorFlow

10. What is 'machine learning'

a) A subset of artificial intelligence that focuses on building systems that learn from data
b) A technique for cleaning data
c) A method for storing large data sets
d) A programming language used for data science

Answers:

1. b) A field that uses scientific methods, processes, algorithms, and systems to extract
knowledge from data
2. c) Data Dancing
3. b) Exploratory Data Analysis
4. b) Python
5. b) A collection of data points organized in a structured manner
6. b) To visually represent data to find patterns and insights
7. a) Tableau
8. d) Data that cannot be stored on a single computer
9. a) Pandas
10. a) A subset of artificial intelligence that focuses on building systems that learn from data

1. What is Big Data characterized by?

a) Volume, Velocity, Variety, Veracity, and Value


b) Veracity, Volume, Viscosity, Value, and Versatility
c) Volume, Velocity, Variety, Viscosity, and Volatility
d) Value, Variety, Veracity, Versatility, and Velocity
2. Which of the following is a common technology used to handle Big Data

a) SQL
b) Hadoop
c) HTML
d) CSS

3. What does the Internet of Things (IoT) refer to?

a) A network of physical objects embedded with sensors, software, and other


technologies
b) A new programming language
c) A type of social media platform
d) An advanced form of cloud storage

4. Which of the following devices can be part of IoT?

a) Smart Thermostats
b) Smartphones
c) Wearable Fitness Trackers
d) All of the above

5. How does IoT typically generate data?

a) Through manual data entry


b) Through automated data collection from sensors and devices
c) By generating random numbers
d) By capturing data from social media

6. What is the primary benefit of IoT in smart homes?

a) Increasing social media followers


b) Automating and improving the efficiency of home functions
c) Developing new programming languages
d) Enhancing gaming experiences

7. In the context of data science, what is the role of statistics?

a) To replace data science


b) To provide tools and techniques for analyzing data and drawing conclusions
c) To manage databases
d) To design hardware for data storage

8. Which statistical concept is used to summarize the central tendency of data


a) Mean
b) Variance
c) Standard Deviation
d) Correlation

9. Which statistical method can be used to identify the relationship between two
variables?

a) Regression Analysis
b) Principal Component Analysis
c) k-Means Clustering
d) Neural Networks

10. How do data science and statistics complement each other?

a) Statistics provides the theoretical foundation for data analysis, while data science
applies these techniques to real-world data
b) Data science replaces the need for statistics
c) Statistics only deals with historical data, while data science only deals with future
predictions
d) They do not complement each other

11. Which of the following is a common use case of Big Data analytics?

a) Analyzing social media trends


b) Developing simple websites
c) Creating mobile apps
d) Building desktop applications

12. What is the purpose of predictive analytics in data science?

a) To predict future trends based on historical data


b) To clean and preprocess data
c) To visualize data in real-time
d) To store data securely in the cloud

Answers:

1. a) Volume, Velocity, Variety, Veracity, and Value


2. b) Hadoop
3. a) A network of physical objects embedded with sensors, software, and other
technologies
4. d) All of the above
5. b) Through automated data collection from sensors and devices
6. b) Automating and improving the efficiency of home functions
7. b) To provide tools and techniques for analyzing data and drawing conclusions
8. a) Mean
9. a) Regression Analysis
10. a) Statistics provides the theoretical foundation for data analysis, while data science
applies these techniques to real-world data
11. a) Analyzing social media trends
12. a) To predict future trends based on historical data

1. Which of the following is a common limitation of data science?

a) Lack of data availability


b) Excessive manual labor
c) Too much storage space
d) High cost of software

2. What can cause a data science project to fail?

a) Clear and well-defined objectives


b) Poor data quality and incomplete data
c) Adequate computational resources
d) Strong collaboration among team members

3. Why is data privacy a significant concern in data science?

a) Data privacy is not a concern in data science


b) Because data is often collected without user consent
c) Because data is always encrypted
d) Because data is rarely used for decision-making

4. Which of the following is a potential ethical issue in data science?

a) Data cleaning
b) Algorithm bias
c) Data visualization
d) Data storage

5. What is one major challenge in integrating data from multiple sources?

a) Redundant data collection


b) Data consistency and compatibility issues
c) Increased computational power
d) Simplicity of integration

6. Which methodology is commonly used in chemical engineering to model chemical


processes?
a) Regression Analysis
b) Monte Carlo Simulation
c) k-Means Clustering
d) Principal Component Analysis

7. What is the primary use of machine learning in chemical engineering?

a) Predicting material properties and optimizing processes


b) Creating marketing strategies
c) Designing user interfaces
d) Developing new programming languages

8. How can data science help in the optimization of chemical reactions?

a) By visualizing reaction mechanisms


b) By analyzing and predicting the outcomes of various reaction conditions
c) By manually testing all possible reactions
d) By ignoring experimental data

9. Which of the following is an application of data science in chemical engineering?

a) Predictive maintenance of chemical plants


b) Social media analysis
c) Web development
d) Game design

10. What role does big data play in chemical engineering?

a) It allows for real-time monitoring and control of processes


b) It reduces the need for experimental data
c) It is primarily used for entertainment purposes
d) It is not applicable in chemical engineering

Answers:

1. a) Lack of data availability


2. b) Poor data quality and incomplete data
3. b) Because data is often collected without user consent
4. b) Algorithm bias
5. b) Data consistency and compatibility issues
6. b) Monte Carlo Simulation
7. a) Predicting material properties and optimizing processes
8. b) By analyzing and predicting the outcomes of various reaction conditions
9. a) Predictive maintenance of chemical plants
10. a) It allows for real-time monitoring and control of processes
1. Which of the following is a current trend in data science?

a) Decrease in the use of artificial intelligence


b) Increased emphasis on explainable AI (XAI)
c) Decline in cloud-based solutions
d) Reduced focus on data privacy

2. What is 'explainable AI'?

a) AI that can solve any problem without human intervention


b) AI systems designed to make their decision-making processes transparent and
understandable to humans
c) AI that does not require data to function
d) AI that can only be used by data scientists

3. Which trend involves using machine learning models that can continually learn
from new data without being explicitly programmed?

a) Static machine learning


b) Transfer learning
c) Continual learning
d) Batch processing

4. What is 'AutoML'?

a) A machine learning model that requires no data


b) A set of tools and processes that automate the process of applying machine learning to
real-world problems
c) A manual method of tuning machine learning models
d) An outdated approach to data science

5. Which of the following is an emerging trend in data visualization?

a) Decreasing the use of visual tools


b) Using more text-based summaries
c) Interactive and real-time data visualization
d) Static and unchangeable graphs

6. In data science, what is 'data democratization'?

a) Limiting access to data to only top-level management


b) Making data accessible to a broader audience within an organization
c) Selling data to third parties
d) Removing all restrictions on data usage

7. What is the purpose of experimentation in data science?


a) To randomly change data sets
b) To test hypotheses and validate models
c) To replace data analysis
d) To create more complex algorithms without testing

8. Which method is commonly used in data science experiments to validate the


performance of models?

a) Cross-validation
b) Manual inspection
c) Random guessing
d) Visualization

9. What is A/B testing?

a) A method to train neural networks


b) An experimental approach to compare two versions of a variable to determine which
performs better
c) A way to store data in databases
d) A technique for cleaning data

10. Which of the following is a key aspect of running successful data science
experiments?

a) Ignoring data integrity


b) Having a well-defined hypothesis and control group
c) Avoiding documentation
d) Making changes based on gut feelings

11. What does 'causality' refer to in the context of data science experiments?

a) The correlation between two variables


b) The ability to determine that one variable directly affects another
c) The random association of events
d) The collection of large datasets

12. Why is reproducibility important in data science experimentation?

a) It helps in reducing the amount of data needed


b) It ensures that results can be consistently duplicated and verified by others
c) It decreases the time required for experiments
d) It eliminates the need for further testing

Answers:

1. b) Increased emphasis on explainable AI (XAI)


2. b) AI systems designed to make their decision-making processes transparent and
understandable to humans
3. c) Continual learning
4. b) A set of tools and processes that automate the process of applying machine learning to
real-world problems
5. c) Interactive and real-time data visualization
6. b) Making data accessible to a broader audience within an organization
7. b) To test hypotheses and validate models
8. a) Cross-validation
9. b) An experimental approach to compare two versions of a variable to determine which
performs better
10. b) Having a well-defined hypothesis and control group
11. b) The ability to determine that one variable directly affects another
12. b) It ensures that results can be consistently duplicated and verified by others

1. What is the primary goal of computational modeling in data science?

a) To create physical models


b) To simulate real-world processes and predict outcomes
c) To replace data storage systems
d) To manage hardware resources

2. Which of the following is a common type of computational model?

a) Geometric models
b) Analytical models
c) Graphical models
d) Numerical models

3. What does 'overfitting' mean in the context of model training?

a) The model is too simple and underperforms


b) The model performs well on training data but poorly on unseen data
c) The model has insufficient data to learn from
d) The model is optimized for speed rather than accuracy

4. Which method is used to evaluate the performance of a computational model?

a) Data cleaning
b) Cross-validation
c) Data augmentation
d) Feature selection

5. What is supervised learning?


a) A type of machine learning where the algorithm learns from labeled training data
b) A machine learning method that requires no data
c) Learning that occurs without any human intervention
d) A type of learning that relies on reinforcement from the environment

6. Which of the following is a common algorithm used in supervised learning?

a) K-means clustering
b) Decision trees
c) Principal component analysis
d) Independent component analysis

7. What is unsupervised learning?

a) Learning from data that has labels


b) Learning from data that does not have labels
c) Learning that occurs through reward and punishment
d) Learning that requires constant human supervision

8. Which algorithm is typically used for clustering in unsupervised learning? a) Linear


regression
b) K-means
c) Logistic regression
d) Support vector machines

9. What is big data analytics?

a) The process of analyzing large and complex data sets to uncover hidden patterns and
insights
b) A method to minimize the amount of data collected
c) A simple way to store large amounts of data
d) The process of visualizing small data sets

10. Which of the following technologies is often associated with big data analytics?

a) HTML
b) Hadoop
c) CSS
d) XML

11. What is the main advantage of using big data analytics in business?

a) It reduces the need for data storage


b) It allows for better decision-making by uncovering trends and insights
c) It simplifies the data collection process
d) It ensures data is never lost
12. Which component of the Hadoop ecosystem is responsible for distributed storage?
a) MapReduce
b) HDFS (Hadoop Distributed File System)
c) YARN
d) Hive

Answers:

1. b) To simulate real-world processes and predict outcomes


2. d) Numerical models
3. b) The model performs well on training data but poorly on unseen data
4. b) Cross-validation
5. a) A type of machine learning where the algorithm learns from labeled training data
6. b) Decision trees
7. b) Learning from data that does not have labels
8. b) K-means
9. a) The process of analyzing large and complex data sets to uncover hidden patterns and
insights
10. b) Hadoop
11. b) It allows for better decision-making by uncovering trends and insights
12. b) HDFS (Hadoop Distributed File System)

You might also like