Python ML Guide for Beginners
Python ML Guide for Beginners
@seismicisolation
@seismicisolation
@seismicisolation
Tarkeshwar Barua, Kamal Kant Hiran, Ritesh Kumar Jain, Ruchi Doshi
Machine Learning with Python
@seismicisolation
@seismicisolation
De Gruyter STEM
@seismicisolation
@seismicisolation
Tarkeshwar Barua, Kamal Kant Hiran, Ritesh Kumar Jain, Ruchi Doshi
@seismicisolation
@seismicisolation
ISBN 9783110697162
e-ISBN (PDF) 9783110697186
e-ISBN (EPUB) 9783110697254
@seismicisolation
@seismicisolation
Contents
Chapter 1 Introduction to Machine Learning
Summary
Exercise (MCQs)
Answers
Answers
Descriptive Questions
@seismicisolation
@seismicisolation
2.1 Why Python?
2.1.1 Drawbacks of Python
2.1.2 History of Python
2.1.3 Major Features of Python
2.1.4 Market Demand
2.1.5 Why Python in Mobile App Development?
2.1.6 Python Versions
Summary
@seismicisolation
@seismicisolation
Exercise (MCQs)
Answers
Answers
Descriptive Questions
@seismicisolation
@seismicisolation
3.3.1 Handling Missing Data
3.3.2 Data Type Conversions
Summary
Exercise (MCQs)
Answers
Answers
Descriptive Questions
@seismicisolation
@seismicisolation
4.3.2 Metrics for Regression
4.4 Cross-Validation
4.4.1 k-Fold Cross-Validation
4.4.2 Leave-One-Out and Stratified K-Fold
Summary
Exercise (MCQs)
Answers
Answers
Descriptive Questions
@seismicisolation
@seismicisolation
5.3.1 Building Decision Trees
5.3.2 Entropy and Information Gain
5.3.3 Random Forests and Bagging
@seismicisolation
@seismicisolation
Summary
Exercise (MCQs)
Answers
Answers
Descriptive Questions
@seismicisolation
@seismicisolation
How Do SOMs Work?
Where Should SOMs Not Be Used?
Choosing the Right Technique
6.4 Clustering
6.4.1 Isolation Forest
6.4.2 Density-Based Methods
6.4.3 Other Techniques
Summary
Exercise (MCQs)
Answers
@seismicisolation
@seismicisolation
7.2 Perceptron
7.2.1 Structure of Perceptron
7.2.2 Function of Perceptron
7.2.3 Where to Use Perceptron
7.2.4 Where to Use Activation Function
7.3 TensorFlow
7.3.1 Computational Graph
7.3.2 Eager Execution
7.3.3 Keras
7.3.4 Sessions
7.3.5 Common Operations
@seismicisolation
@seismicisolation
7.14 Regularization and Optimization in Deep Learning
Summary
Exercise (MCQs)
Answer Key
Answers
@seismicisolation
@seismicisolation
8.7 Challenges of Reinforcement Learning
8.8 Q-learning
Summary
Exercise (MCQs)
Answers Key
Answers
References
Index
@seismicisolation
@seismicisolation
Chapter 1 Introduction to Machine Learning
@seismicisolation
@seismicisolation
Fig. 1.1: AI, ML, and deep learning.
Different experts and sources may provide slightly varied definitions of ML,
reflecting different perspectives on the field. Here are a few diverse definitions
given by various authorities:
“Machine learning is a field of study that gives computers the ability to learn
without being explicitly programmed.”
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”
“Machine learning is the field of study that gives computers the ability to learn
without being explicitly programmed. It is a type of artificial intelligence that
provides systems with the ability to automatically learn and improve from
experience.”
@seismicisolation
@seismicisolation
models to enable a computer to carry out tasks without specific programming.”
The establishment of ML can be traced back to the 1950s when renowned figures
like Alan Turing made significant contributions. Turing introduced the concept of
machines capable of acquiring knowledge through experience, which
subsequently paved the way for future advancements in this field.
The phrase “artificial intelligence” (AI) was introduced during the Dartmouth
Conference in 1956. This conference established the foundation for the
development of AI and ML as multidisciplinary areas of study.
@seismicisolation
@seismicisolation
1990s – Support Vector Machines and Boosting
The 1990s witnessed the emergence of support vector machines (SVMs) to classify
tasks and boost algorithms for enhancing the efficacy of feeble learners.
The late 1990s witnessed an upsurge in fascination with data mining, a field closely
associated with ML, which concentrated on extracting knowledge from data, owing
to the increased availability of large datasets.
The decade of the 2010s observed the prevalence of deep learning in diverse
domains, attaining significant advancements in the realms of image and speech
identification, natural language comprehension, and beyond. The progress made
in technological infrastructure, particularly the utilization of graphics processing
units (GPUs), proved to be pivotal in the triumph of deep learning.
@seismicisolation
@seismicisolation
encompasses different methodologies, each designed to address specific learning
situations. The main forms of ML comprise supervised learning, unsupervised
learning, and reinforcement learning, each providing distinct approaches and
applications for solving various problems. Now, let us delve into an investigation of
these foundational classifications.
@seismicisolation
@seismicisolation
The objective is for the algorithm to acquire knowledge of a mapping from the
input to the output, such that when faced with new, unseen input data, it can make
accurate predictions about the corresponding output.
Key Characteristics
Training data: The dataset used for training purposes comprises pairs of
inputs and corresponding outputs, with each input having its correct output
provided.
Learning objective: The algorithm aims to learn the relationship or
mapping between inputs and outputs.
Examples: Common applications include image classification, spam filtering,
and regression problems.
Example Scenario
Key Characteristics
Example Scenario
@seismicisolation
@seismicisolation
Task: Grouping similar customer purchase behaviors.
Training data: Purchase data without specific labels.
Learning objective: The algorithm identifies natural groupings or clusters of
similar purchase patterns.
Key Characteristics
Example Scenario
@seismicisolation
@seismicisolation
ensuing paragraphs offer comprehensive analysis of a few principal applications.
Healthcare
Finance
Autonomous Vehicles
@seismicisolation
@seismicisolation
Object detection and recognition: ML algorithms are utilized to analyze
sensor data to identify and categorize various entities such as objects,
pedestrians, and obstacles. These algorithms play a crucial role in the
decision-making process within autonomous vehicles.
Path planning: Reinforcement learning is applied for path planning,
enabling vehicles to navigate complex environments and make dynamic
decisions.
Education
Although ML has made notable progress, it also faces various obstacles that affect
@seismicisolation
@seismicisolation
its progress, implementation, and efficacy. Resolving these obstacles is imperative
for furthering the field and guaranteeing morally sound, resilient, and
comprehensible ML systems. Presented below are the principal challenges
encountered in ML.
Lack of Standardization
@seismicisolation
@seismicisolation
Challenge: The absence of standardized evaluation metrics, datasets, and
model architectures can impede reproducibility and hinder fair comparisons
between different models.
Mitigation: Encouraging open science practices, sharing benchmarks, and
adopting standardized evaluation protocols contribute to increased
transparency and collaboration.
Ethical Considerations
Adversarial Attacks
Continuous Learning
Challenge: Many ML models are designed for static datasets, and adapting
to evolving data over time (concept drift) is a challenge.
Mitigation: Implementing online learning approaches, retraining models
periodically, and staying vigilant to changes in data distributions help
address continuous learning challenges.
Privacy Concerns
@seismicisolation
@seismicisolation
Challenge: Handling sensitive information in training data raises privacy
concerns, especially in healthcare and finance applications.
Mitigation: Adopting privacy-preserving techniques, such as federated
learning and differential privacy, helps protect individual privacy while still
allowing model training.
Python has become the preferred programming language for ML due to various
compelling factors. The efficient development is facilitated by its simple and easy-
to-understand syntax, which makes it accessible for both novices and experienced
developers. Python’s extensive ecosystem incorporates powerful libraries like
Scikit-learn, TensorFlow, and PyTorch, providing robust tools for a wide range of
tasks, from data preprocessing to complex deep learning models.
The language’s versatility is demonstrated by its ability to seamlessly integrate
with other technologies, enabling smooth incorporation into different data science
workflows and frameworks. Python’s strong community support plays a crucial
@seismicisolation
@seismicisolation
role, ensuring a vast array of resources, tutorials, and forums for problem-solving.
Its popularity extends beyond the realm of ML, fostering collaborations across
different disciplines. The open-source nature of Python and its compatibility with
various platforms contribute to its widespread adoption in research, industry, and
academia. Consequently, Python’s user-friendly nature, extensive libraries, and
community support collectively establish it as the preferred language for
practitioners and researchers navigating the diverse field of ML.
In the realm of ML, Python is synonymous with a rich ecosystem of libraries that
empower developers and researchers to build and deploy sophisticated models.
Here are three standout libraries that have played pivotal roles in shaping the
landscape of ML:
1.2.2.1 Scikit-learn
@seismicisolation
@seismicisolation
reduction tasks.
1.2.2.2 TensorFlow
1.2.2.3 PyTorch
PyTorch is a dynamic and popular deep learning library known for its imperative
programming style. Favored by researchers and developers alike, PyTorch
facilitates building dynamic computational graphs, offering flexibility and ease in
experimenting with various neural network architectures.
These libraries serve as pillars in the Python ML ecosystem, each contributing
unique strengths and functionalities. Their widespread adoption underscores their
significance in the development and advancement of ML applications.
Python has become the prevailing programming language in the realm of ML;
however, it is crucial to evaluate it alongside other languages that are frequently
employed in this particular domain.
@seismicisolation
@seismicisolation
lower performance performance in high performance
performance certain cases
Learning ✔ Beginner- ❌ Steeper learning ❌ Moderate learning ❌ Steeper
Curve friendly curve curve learning curve
Enterprise ❌ ❌ Generalpurpose ✔ Robust, suitable for ❌
Integration Generalpurpose limitations integration Generalpurpose
limitations limitations
Low-Level ❌ Limited low- ❌ Limited low-level ❌ Limited low-level ✔ Provides low-
Control level control control control level control
The Python community for ML is a lively and cooperative ecosystem that has a
crucial function in the progress, dissemination, and enhancement of ML
endeavors. Presented here is a summary of the community and the ample
resources that are accessible for Python in the field of ML.
Open-Source Collaboration
@seismicisolation
@seismicisolation
and sessions for ML, providing a platform for networking, knowledge
sharing, and collaboration.
Local meetups: Python and ML enthusiasts frequently organize local
meetups, fostering community building and face-to-face interactions.
Educational Platforms
GitHub: The Python ML community heavily utilizes GitHub for version control,
collaborative development, and sharing of code repositories. This platform
facilitates collaboration on open-source projects and promotes code transparency.
@seismicisolation
@seismicisolation
for discussions, announcements, and community updates.
Weekly newsletters: Newsletters like “Pycoders Weekly” curate the latest
Python and ML news, articles, and resources, keeping the community
informed.
The collaborative and inclusive nature of the Python ML community, coupled with
the abundance of educational resources and platforms, makes it an ideal
environment for developers, researchers, and learners to thrive and contribute to
the evolving landscape of ML in Python.
Installing Python is the foundational step for any ML endeavor. Follow these steps
for a seamless installation:
Download Python
Perform the execution of the installer that has been downloaded. While the
installation process is taking place, it is essential to verify and select the option
labeled “Add Python to PATH.” By doing so, Python will be conveniently accessible
through the command line interface.
Verify Installation
@seismicisolation
@seismicisolation
Fig 1.4: Python version.
This example demonstrates checking the Python version, with the result indicating
that Python 3.8.12 is installed. Now, with Python installed, you’re ready to proceed
to the next steps of setting up your ML environment.
Dependency Isolation
Virtual environments create isolated spaces for Python projects. Each environment
has its own set of installed packages, preventing conflicts between project
dependencies.
Version Control
Virtual environments keep your system’s global Python environment clean. You
can experiment with different versions of libraries and frameworks without
affecting other projects or the system-wide Python installation.
@seismicisolation
@seismicisolation
Easy Replication
Once activated, the terminal prompt changes to indicate the active virtual
environment, ensuring that any installed packages are specific to the project tied
to that environment.
Understanding and leveraging virtual environments is a best practice in Python
development, especially in the context of ML projects, where dependencies can be
project-specific and evolve over time.
@seismicisolation
@seismicisolation
NumPy
Pandas
Scikit-learn
@seismicisolation
@seismicisolation
TensorFlow
PyTorch
Keras
Package managers are essential tools for managing Python libraries and
dependencies. Two widely used package managers in the Python ecosystem are
@seismicisolation
@seismicisolation
pip and conda. Here’s an overview of how to use them.
pip
Description: pip serves as the primary package manager for the Python
programming language. It streamlines the procedure of installation,
enhancement, and administration of Python packages.
Installation: If one is utilizing Python 3.4 or a more recent version, it is
probable that the pip package manager is already present. To enhance the
pip package manager to the most recent iteration, execute the following
command:
$ python -m pip install –upgrade pip
Installing a package:
$ pip install package_name
Example (installing NumPy):
$ pip install numpy
conda
@seismicisolation
@seismicisolation
1.3.5 Setting Up Jupyter Notebook
The Jupyter Notebook, which is extensively utilized for interactive data analysis,
exploration, and ML development, possesses significant capabilities. A
comprehensive tutorial outlining the process of setting up Jupyter Notebook in
your Python environment is presented here.
Installation
This action will initiate the launch of a fresh tab in the web browser, thereby
revealing the Jupyter Notebook dashboard.
Jupyter Notebooks are composed of cells that can contain either code or
markdown (text). The execution of a cell can be initiated by selecting it and
clicking the “Run” button or by utilizing the appropriate keyboard shortcut,
typically Shift + Enter.
To incorporate an additional cell, one may employ the “+” icon located in the
toolbar or alternatively press the B key on the keyboard to introduce a cell
beneath the presently chosen cell.
@seismicisolation
@seismicisolation
means of distinct kernels. If one desires to employ a virtual environment
other than the standard Python environment, it is possible to procure a
kernel for said environment. By installing the ipykernel package, it becomes
feasible to generate a fresh kernel.
$ pip install ipykernel
$ python -m ipykernel install –user –name=myenv –display-name=“My
Environment”
Now, in your Jupyter Notebook, you can choose “My Environment” as a kernel.
Exporting Notebooks
You can export your Jupyter Notebooks to various formats, including HTML,
PDF, and Markdown. Use the “File” menu to access the “Download as”
option and choose your preferred format.
To stop the Jupyter Notebook server, go back to the terminal where it’s
running and press Ctrl + C. Confirm with Y and press Enter.
Managing ML projects in Python involves more than just writing code. Adopting
best practices ensures project organization, reproducibility, and collaboration.
Here’s a detailed guide.
Project Structure
@seismicisolation
@seismicisolation
├── README.md
└── requirements.txt
Use Git for version control. Initialize a Git repository at the project’s root and
commit regularly. Host your code on platforms like GitHub or GitLab for
collaboration and backup.
$ git init
$ git add .
$ git commit -m “Initial commit”
Virtual Environments
Documentation
Automated Testing
Implement unit tests to ensure code correctness. Tools like pytest can be
used for automated testing. Run tests regularly to catch potential issues
early.
(venv) $ pip install pytest
(venv) $ pytest tests/
@seismicisolation
@seismicisolation
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- name: Run tests
run: |
pytest tests/
Reproducibility
Code Reviews
Incorporate code review practices. Peer reviews help catch bugs, improve
code quality, and ensure that the project adheres to coding standards.
Environment Variables
@seismicisolation
@seismicisolation
If applicable, plan for scaling your ML models. Consider containerization
(e.g., Docker) for deployment, and design your models to handle production-
level loads.
Collaboration Platforms
Summary
ML is a constituent part of AI, which enables computers to acquire
knowledge and render judgments through experiential learning without the
need for explicit programming.
Developed from foundational concepts in the 1950s, ML has progressed
through key milestones, including expert systems, SVMs, and the dominance
of deep learning in the 2010s.
Supervised learning entails instructing an algorithm using annotated data,
wherein the system is directed to comprehend the correlation between
inputs and outputs through input-output pairs. Image classification and
spam filtering are among the widely recognized applications of this
approach.
Unsupervised learning pertains to the handling of data that lacks labels,
thereby compelling the algorithm to unveil patterns or correlations present
within the data in the absence of pre-established output labels. The
encompassed tasks encompass clustering and dimensionality reduction,
which encompass the grouping of analogous customer purchase behaviors.
Reinforcement learning is centered around the acquisition of decision-
making skills by an agent through its active engagement with an
environment, wherein it acquires feedback in the form of rewards or
penalties. This approach finds application in various domains such as game
playing (for instance, AlphaGo) and robotic control, with the ultimate aim of
learning a strategy to optimize cumulative rewards within a given
environment.
ML is widely applied across industries, transforming healthcare with disease
diagnosis and predictive analytics, enhancing finance through fraud
detection and algorithmic trading, improving marketing with
@seismicisolation
@seismicisolation
recommendation systems, and contributing to autonomous vehicles, image
recognition, education, and more.
Key challenges include ensuring data quality and mitigating biases,
addressing interpretability issues in complex models, balancing overfitting
and underfitting, promoting standardization for fair comparisons, managing
computational complexity, navigating ethical considerations, countering
adversarial attacks, adapting to continuous learning, and addressing privacy
concerns in handling sensitive information.
Python’s versatility, readability, and extensive library support, including
Scikit-learn, TensorFlow, and PyTorch, have made it the dominant
programming language in the ML landscape. Its active community and
integration capabilities contribute to its widespread adoption across
industries.
The foundational Python libraries for ML development encompass NumPy
and Pandas, which are employed for data manipulation, Matplotlib, which is
utilized for visualization, Scikit-learn, which is employed for ML algorithms,
TensorFlow and PyTorch, which are utilized for deep learning, and Keras,
which is employed for high-level neural network APIs.
Utilize pip and conda for managing Python libraries and dependencies based
on project requirements.
Set up Jupyter Notebook for interactive data analysis and ML development,
allowing seamless integration with Python libraries.
Exercise (MCQs)
1.
c) Data manipulation
d) Virtual environments
2.
@seismicisolation
@seismicisolation
b) Reinforcement
c) Labeled dataset
d) Clustering
3.
d) Keras
4.
Which Python library is essential for numerical computing and supports large
arrays?
a) Matplotlib
b) TensorFlow
c) Pandas
d) NumPy
6.
@seismicisolation
@seismicisolation
a) Version control
b) Package management and environment management
c) Data visualization
d) Deep learning
7.
c) Git
d) Conda
8.
d) Version control
9.
c) Travis CI
d) Scikit-learn
10.
@seismicisolation
@seismicisolation
a) To write documentation
b) To capture errors and important information
c) To create visualizations
Answers
1. b) AI subset
2. c) Labeled dataset
3. c) PyTorch
4. c) Isolating project dependencies
5. d) NumPy
6. b) Package management and environment management
7. c) Git
8. c) Interactive data analysis and development
9. c) Travis CI
10. b) To capture errors and important information
@seismicisolation
@seismicisolation
7. Scikit-learn, TensorFlow, and PyTorch are examples of __________ libraries in
the Python machine learning ecosystem.
8. Virtual environments in Python are crucial for managing project
dependencies and isolating different projects from each other. They create
isolated spaces for Python projects, preventing conflicts between __________.
9. __________ is the default package manager for Python, simplifying the process
of installing, upgrading, and managing Python packages.
10. Jupyter Notebook is a powerful tool widely used for interactive data analysis,
exploration, and machine learning development. It provides an interactive
and visual environment for developing machine learning models, visualizing
data, and documenting your work. It seamlessly integrates with the Python
__________, making it a versatile tool for data scientists and developers alike.
Answers
1. explicitly
2. labeled
3. unlabeled
4. environment
5. complex
6. versatility
7. machine learning
8. project dependencies
9. pip
10. ecosystem
Descriptive Questions
1. Explain the fundamental types of machine learning discussed in the
overview, highlighting their key characteristics and providing examples for
each.
2. Discuss the transformative applications of machine learning across various
industries, providing detailed insights into specific use cases in healthcare,
finance, marketing, and autonomous systems.
3. Why has Python become the dominant programming language in the
machine learning landscape? Provide a comprehensive explanation,
touching on its features, ecosystem, and community support.
4. Explore the significance of virtual environments in Python for machine
learning projects. Explain how they contribute to dependency isolation,
version control, and maintaining a clean development environment.
5. Elaborate on the essential Python libraries for machine learning, such as
@seismicisolation
@seismicisolation
NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, PyTorch, and Keras.
Describe the role and importance of each library in the machine learning
ecosystem.
6. Compare and contrast the use of pip and conda as package managers in
Python, emphasizing their features and best use cases.
7. Provide a step-by-step guide on setting up Jupyter Notebook in a Python
environment, including installation, starting the notebook, creating a new
notebook, and installing additional kernels.
8. Discuss the best practices for managing machine learning projects in Python,
covering aspects like project structure, version control with Git,
documentation, automated testing, and continuous integration.
9. Explain the importance of proper logging and monitoring in machine
learning projects, emphasizing their role in capturing information and errors,
especially for ML models’ performance and drift over time.
10. How can reproducibility be ensured in machine learning experiments?
Discuss the steps to document and reproduce experiments, including details
on data sources, preprocessing, and model hyperparameters.
@seismicisolation
@seismicisolation
Chapter 2 Basics of Python Programming
@seismicisolation
@seismicisolation
In the absence of Cython, code execution suffers from sluggishness. Cython
allows for code compilation at the C level, leveraging C compiler
optimizations.
Improved performance necessitates the utilization of GPU or Cython.
To avoid resource wastage, the application’s speed can be constrained.
Python language, which was created by Guido van Rossum in 1991, is not named
after the type of snake, but rather after a British Comedy group called “Monty
Python’s Flying Circus.” Guido van Rossum, being a big fan of the group and their
quirky humor, named the language in their honor. In the year 2000, Python 2.0 was
released with new features such as comprehensions, cycle detecting garbage
collection, and Unicode support, 10 years after the initial release of Python 1.0.
Python programs often pay tribute to the group by incorporating their jokes and
famous quotes into the code.
Python is available in two major versions: Python 3.x and Python 2.x. Python
2.x, which is considered a legacy version, will be supported until 2020, while Python
3.x is the more frequently updated and popular version. Some features from
Python 3 can be imported into Python 2.x using the “__future__” module. Python
3.0, released in 2008, was a major release that lacked backward compatibility,
meaning that code written in Python 2.x could not run on the Python 3.x series.
However, this issue was addressed in Python 2.7, as a large amount of code
had already been written in the Python 2.x series. Initially, support for Python 2.7
was set to end in 2015, but it has since been extended to 2020. Guido van Rossum
took on the responsibility of leading the Python project until July 2018, when he
passed on the role to a five-person steering council. This council will now be
responsible for releasing future versions of Python.
@seismicisolation
@seismicisolation
dynamic pages for clients.
Interactive language, python code does not perform any conversion of
human-readable code into executable code. This characteristic renders
python highly interactive as it allows for real-time modifications.
The KIVY framework provides support for Python, Android, and iOS.
Same code can be executed on all available platforms due to platform
independence.
@seismicisolation
@seismicisolation
Fig. 2.1: Market demand of programming language.Source:
→https://doi.org/10.1515/9783110689488 (Page-8)
@seismicisolation
@seismicisolation
@seismicisolation
@seismicisolation
Fig. 2.2: Language-wise average salary.
@seismicisolation
@seismicisolation
In today’s world, mobile devices have become an integral part of our daily lives,
making it nearly impossible for individuals to function without their smartphones.
These devices have greatly simplified our lives. Recognizing this, many software
companies have shifted their focus to mobile app development. However,
developing mobile apps presents a number of challenges due to the existence of
various mobile phone platforms such as Android, iOS, Windows, etc., each with its
own unique software requirements. Consequently, programmers are required to
write code natively for each platform, a time-consuming task. To mitigate this
issue, a cross-platform approach is recommended. Python, with its ease of use and
extensive library support, simplifies the process of app development.
The importance of Python can be seen in the figure below, which highlights the
market demand in various sectors. Python is renowned for its simplicity and
versatility, as it can be learned and utilized on multiple platforms. It offers robust
integration capabilities with various technologies, leading to increased
programming productivity throughout the development life cycle. Python is
particularly well-suited for large and complex projects with evolving requirements.
Furthermore, it is currently the fastest growing programming language and is
compatible with millions of phones across diverse industries. Additionally, Python
code is highly readable. Table 2.1 shows the various python versions released from
Python 1.0 to Python 3.12.
@seismicisolation
@seismicisolation
Fig. 2.4: Industry sector-wise market demand.
Indentation
@seismicisolation
@seismicisolation
Python employs indentation as a means to establish code blocks. In contrast to
numerous other programming languages that employ braces {} or keywords such
as begin and end to demarcate code blocks, Python relies on uniform indentation.
This particular characteristic distinguishes Python and is indispensable for the
legibility and organization of the code.
def welcome(name):
if name == "Ritesh":
print("Hello, Ritesh!")
else:
print("Hello, stranger!")
The example provided above illustrates how the scope of the if-else block is
determined by the indentation, specifically the whitespace that precedes the print
statements and else statement. Typically, this indentation consists of four spaces,
although it is also possible to use a tab. It is crucial to maintain consistency in the
chosen indentation style across the entire codebase.
Whitespace
Whitespace, including spaces and tabs, is used for separation and clarity in Python
code. However, excessive or inconsistent use of whitespace can lead to syntax
errors.
@seismicisolation
@seismicisolation
Comments
Comments in Python are used to explain code and make it more understandable.
They are not executed and are preceded by the # symbol.
Documentation
Parameters:
- a (int): The first number.
- b (int): The second number.
Returns:
int: The result of multiplying a and b.
"""
return a ✶ b
The docstring in this given example offers comprehensive details regarding the
function, which encompasses its objective, parameters, and the value it returns.
The significance of having appropriate documentation cannot be overstated, as it
plays a vital role in aiding fellow developers in comprehending the utilization of
your code, while also facilitating the seamless operation of tools such as automatic
documentation generators.
@seismicisolation
@seismicisolation
2.3 Data Types and Variables
Python is a language that is dynamically typed, which implies that there is no
requirement to explicitly declare the data type of a variable. Nevertheless,
comprehending data types is of utmost importance for proficient programming.
# Integer
age = 25
# Float
height = 5.8
# String
name = "John"
# Boolean
is_student = True
# NoneType
no_value = None
Lists
@seismicisolation
@seismicisolation
their formation, manipulation, and assorted operations through extensive
illustrations.
Creating Lists
– Empty list:
An empty list is created without any elements, useful for dynamic population.
empty_list = []
The list can be created from an iterable, such as a string, tuple, or another list, by
utilizing the constructor list().
word_list = list("Python")
# Output: ['P', 'y', 't', 'h', 'o', 'n']
Nested Lists
@seismicisolation
@seismicisolation
Lists can be nested, allowing the creation of multidimensional structures.
Accessing Elements
Retrieving particular values from Python lists involves the act of accessing
elements based on their respective index. The indexing of lists in Python follows a
zero-based approach, whereby the initial element possesses an index of 0, the
subsequent element possesses an index of 1, and so forth.
– Basic indexing:
Index notation is used to access elements within a list, wherein the indexing
begins from 0.
– Negative indexing:
List Slicing
List slicing enables the creation of a novel list through the extraction of a subset of
@seismicisolation
@seismicisolation
elements from a preexisting list. The slicing syntax, list[start:stop:step], entails
the utilization of the start index as the point of origin, the stop index as the point of
termination, and the step size as the interval between elements.
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Basic slicing
subset1 = numbers[2:6] # Elements from index 2 to 5
print(subset1) # Output: [3, 4, 5, 6]
# Slicing with step
subset2 = numbers[1:8:2] # Elements from index 1 to 7, every 2nd element
print(subset2) # Output: [2, 4, 6, 8]
# Slicing with negative indices
subset3 = numbers[-5:-2] # Elements from index -5 to -3
print(subset3) # Output: [5, 6, 7]
Modifying Elements
Modifying elements in Python lists is a crucial aspect of working with mutable data
structures. Lists allow us to change the values of existing elements at specific
indices.
– Basic modification:
To alter an element within a list, one employs its index and designates a fresh
value to said index.
The slicing used to modify multiple elements, assigning a new list to the specified
range of indices.
@seismicisolation
@seismicisolation
numbers = [1, 2, 3, 4, 5]
numbers[1:4] = [10, 20, 30]
# Output: [1, 10, 20, 30, 5]
Adding Elements
The extend() method appends elements from an iterable, such as another list, to
the list’s end. The + = operator accomplishes the identical outcome.
@seismicisolation
@seismicisolation
fruits = ["apple", "banana", "orange"]
new_fruits = ["grape", "kiwi"]
fruits.extend(new_fruits)
# Output: ["apple", "banana", "orange", "grape", "kiwi"]
In this instance, the elements derived from the new_fruits list are appended to the
concluding segment of the fruits list.
The + = operator has the capability to add a sole element to the list’s termination
point.
In this particular context, the element denoted as “grape” is appended to the fruits
list through the utilization of the + = operator.
Removing Elements
Removing elements from Python lists is a fundamental operation for the purpose
of preserving and adjusting lists. Diverse methods can be employed based on the
particular need.
In this particular instance, the initial instance of the term “banana” is eliminated
from the enumeration of fruits.
@seismicisolation
@seismicisolation
pop() method. In the event that no index is specified, the method proceeds to
remove and return the last element.
Here, the element at index 1 (“banana”) is removed and assigned to the variable
removed_fruit.
The del statement can be used to remove elements by index or delete the entire
list.
In this example, the element at index 1 (“banana”) is removed using the del
statement.
The clear() method eliminates all elements from the list, resulting in an empty
state.
Length of a List
The len() function enables the calculation of the quantity of elements present in a
given list.
num_elements = len(numbers)
# Output: 5
@seismicisolation
@seismicisolation
Nesting Lists
Creating lists within lists in the Python programming language entails the creation
of a list where the constituent elements are also lists. This particular technique
facilitates the generation of intricate data structures, including but not limited to
matrices or lists that contain other lists. It is important to note that each individual
element within the outer list has the potential to be a list in its own right.
Tuples
Creating Tuples
– Using parentheses:
– Without parentheses:
@seismicisolation
@seismicisolation
Tuples can also be created without explicit parentheses. The commas alone are
sufficient to define a tuple.
numbers_tuple = 1, 2, 3
Tuples have the capability to encompass elements originating from diverse data
types.
empty_tuple = ()
– Single-element tuple:
A tuple containing only one element necessitates the inclusion of a trailing comma
to differentiate it from a typical value enclosed in parentheses.
single_element_tuple = ("apple",)
Accessing Elements
– Basic indexing:
@seismicisolation
@seismicisolation
To access an element in a tuple, specify its index within square brackets.
– Negative indexing:
Negative indexing enables the retrieval of elements from the termination of the
tuple.
last_fruit = fruits_tuple[-1]
# Output: "orange"
Here, last_fruit is assigned the value of the last element using negative indexing.
– Slicing:
Tuple slicing allows you to create a new tuple by extracting a subset of elements.
numbers_tuple = (1, 2, 3, 4, 5)
subset = numbers_tuple[1:4]
# Output: (2, 3, 4)
– Omitting indices:
If the start or stop index is not provided when slicing a tuple, the default behavior
is to use the beginning or end of the tuple, respectively.
Immutable Nature
The characteristic that sets tuples in Python apart from other data structures, like
lists, is their immutable nature. Immutability implies that, once a tuple is created,
@seismicisolation
@seismicisolation
its elements cannot be altered, adjusted, appended, or erased.
Tuple Unpacking
# Creating a tuple
@seismicisolation
@seismicisolation
coordinates = (3, 7)
# Tuple unpacking
x, y = coordinates
# Variables x and y now hold the values 3 and 7, respectively
In this illustration, the values of the coordinates tuple are decomposed into the
variables x and y. It is essential that the count of variables on the left side of the
assignment corresponds to the count of elements in the tuple.
– Unpacking in functions:
def get_coordinates():
return 5, 10
# Calling the function and unpacking the result
x, y = get_coordinates()
# x is now 5, y is now 10
Here, the function get_coordinates returns a tuple, and the values are unpacked
into variables x and y when calling the function.
– Extended unpacking:
# Creating a tuple
numbers = (1, 2, 3, 4, 5)
# Unpacking with extended unpacking
first, ✶rest, last = numbers
# first is 1, rest is [2, 3, 4], last is 5
In this example, the ✶rest syntax captures all elements between the first and last
elements in the tuple.
– Ignoring elements:
@seismicisolation
@seismicisolation
# Creating a tuple
point = (8, 3, 5)
# Unpacking and ignoring the middle value
x, _, z = point
# x is 8, z is 5
Here, the underscore _ is used to ignore the second element in the tuple.
Dictionaries
Creating a Dictionary
@seismicisolation
@seismicisolation
– Creating an empty dictionary:
Dictionaries have the capability to store values that belong to various data types.
Here, the “grades” key is associated with another dictionary containing subject
grades.
We may also construct a dictionary by utilizing the dict() constructor and providing
key-value pairs as arguments.
– Basic accessing:
@seismicisolation
@seismicisolation
# Creating a dictionary of student information
student = {
"name": "John Doe",
"age": 20,
"grade": "A",
"courses": ["Math", "Physics", "English"]
}
# Accessing elements using keys
student_name = student["name"]
student_age = student["age"]
In this example, the values associated with the keys “name” and “age” are
accessed and assigned to variables.
The get() method enables the specification of a default value that will be returned
in the event that the key is not discovered.
If one attempts to retrieve a key that is not present in the dictionary, it will lead to
the occurrence of a KeyError. To prevent this from happening, one can utilize the
get() method, which will yield a value of None if the specified key is not found.
We are able to utilize the “in” keyword to verify the existence of a specific key
within the dictionary prior to accessing it.
@seismicisolation
@seismicisolation
student_courses = student["courses"]
else:
student_courses = []
The methods keys(), values(), and items() provide the means to retrieve keys,
values, and key-value pairs, correspondingly.
In this instance, the numerical value linked to the designated identifier “age” has
been altered from 20 to 21.
@seismicisolation
@seismicisolation
"age": 20,
"grade": "A",
"courses": ["Math", "Physics", "English"]
}
# Adding a new key-value pair for the "gender" information
student["gender"] = "Male"
Here, a novel key-value pair consisting of the key “gender” and the value “Male” is
appended to the dictionary.
The update() method is employed in this particular instance to alter the value
linked to the “age” key, introduce a fresh key-value pair for the attribute of
“gender,” and append yet another new key-value pair for the attribute of “city.”
Deleting specific key-value pairs is a frequent task in Python when working with
dictionaries. This can be accomplished by employing various approaches.
The del statement provides the functionality to eliminate a particular key and its
corresponding value from a dictionary.
@seismicisolation
@seismicisolation
student = {
"name": "John Doe",
"age": 20,
"grade": "A",
"courses": ["Math", "Physics", "English"]
}
# Removing the "grade" key and its value
del student["grade"]
In this particular instance, the dictionary’s key “grade” and its corresponding value
are extracted from the dictionary by employing the del statement.
The “courses” key and its corresponding value are extracted from the dictionary,
and the value is subsequently assigned to the variable courses.
The popitem() function is utilized to eliminate and retrieve the final key-value pair
found within the dictionary. It is crucial to acknowledge that the arrangement in
which the data was introduced remains unaltered beginning from Python 3.7.
@seismicisolation
@seismicisolation
"age": 20,
"grade": "A",
"courses": ["Math", "Physics", "English"]
}
# Removing and retrieving the last key-value pair
last_item = student.popitem()
In this instance, the final key-value pair within the dictionary is eliminated and
designated to the variable last_item.
The clear() function eliminates all pairs of keys and values from the dictionary,
resulting in an empty state.
Sets
Creating a Set
@seismicisolation
@seismicisolation
# Creating a set with explicit elements
fruits_set = {"apple", "banana", "orange"}
# Creating an empty set
empty_set = set()
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Adding a single element
fruits_set.add("kiwi")
In this particular instance, the inclusion of the element “kiwi” within the fruits_set
is achieved through the utilization of the add() method.
@seismicisolation
@seismicisolation
To incorporate multiple elements into a set, one can employ the update() function
by supplying an iterable object, such as a list or another set.
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Adding multiple elements
fruits_set.update(["grape", "pineapple"])
The fruits_set incorporates the elements “grape” and “pineapple” through the
utilization of the update() method.
The remove() function is utilized to eliminate a particular element from the set. In
the event that the element does not exist, it will raise a KeyError.
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Removing a specific element
fruits_set.remove("banana")
The discard() method eliminates a particular element from the set. In case the
element is absent, no action is taken and no error is raised.
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Discarding an element
fruits_set.discard("kiwi")
Here, the element “kiwi” is discarded from the fruits_set, but if “kiwi” were not
@seismicisolation
@seismicisolation
present, it would not raise an error.
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Pop an arbitrary element
popped_element = fruits_set.pop()
In this instance, a random element is removed from the fruits_set, and its value is
subsequently assigned to the variable popped_element.
The clear() method eradicates all elements from the set, resulting in an empty set.
# Creating a set
fruits_set = {"apple", "banana", "orange"}
# Clearing all elements from the set
fruits_set.clear()
Set Operations
Sets in Python support various operations that enable us to perform common set-
related tasks. Below are the fundamental set operations:
The amalgamation of two sets encompasses all distinct elements from both sets.
set1 = {1, 2, 3}
set2 = {3, 4, 5}
union_set = set1 | set2
# or using method: union_set = set1.union(set2)
# Output: {1, 2, 3, 4, 5}
@seismicisolation
@seismicisolation
– Intersection (&) – common elements:
The intersection of two sets comprises elements that are shared by both sets.
set1 = {1, 2, 3}
set2 = {3, 4, 5}
intersection_set = set1 & set2
# or using method: intersection_set = set1.intersection(set2)
# Output: {3}
– Difference (-) – elements in the first set but not in the second:
The dissimilarity between two sets comprises of elements that exist in the initial
set but do not belong to the subsequent set.
set1 = {1, 2, 3}
set2 = {3, 4, 5}
difference_set = set1 - set2
# or using method: difference_set = set1.difference(set2)
# Output: {1, 2}
The set obtained by taking the symmetric difference of two sets is composed of
elements that belong to either of the sets, but not to both sets simultaneously.
set1 = {1, 2, 3}
set2 = {3, 4, 5}
symmetric_difference_set = set1 ^ set2
# or using method: symmetric_difference_set = set1.symmetric_difference(set2)
# Output: {1, 2, 4, 5}
Conditional statements
@seismicisolation
@seismicisolation
Loops
– if statement:
Syntax
if condition:
# code to be executed if the condition is true
x = 10
if x > 5:
print("x is greater than 5")
The print statement will be executed solely if the condition x > 5 is verified, as
illustrated in this example.
– if-else statement:
The if-else statement grants the capability to execute a specific block of code when
a given condition is found to be true, and alternatively, to execute a distinct block
of code when the condition is found to be false.
@seismicisolation
@seismicisolation
if condition:
# code to be executed if the condition is true
else:
# code to be executed if the condition is false
x = 3
if x % 2 == 0:
print("x is even")
else:
print("x is odd")
In this instance, the program shall output whether x is classified as even or odd
contingent upon the condition.
– if-elif-else statement:
if condition1:
# code to be executed if condition1 is true
elif condition2:
# code to be executed if condition2 is true
else:
# code to be executed if none of the conditions are true
x = 0
if x > 0:
print("x is positive")
elif x < 0:
print("x is negative")
else:
print("x is zero")
2.4.2 Loops
@seismicisolation
@seismicisolation
automating redundant tasks, iterating through data collections, and performing
operations until a specific condition is satisfied. Python offers two main types of
looping statements: for loops and while loops.
– for loop:
The for loop is employed to iterate through a sequence, be it a list, tuple, string, or
range, or any other iterable objects. This loop carries out a set of instructions for
every element present in the sequence.
Syntax
In this example, the for loop iterates over the list of fruits, and the code inside the
loop prints each fruit.
– while loop:
The while loop persists in executing a block of code for as long as a specified
condition remains true. It iterates the execution until the condition becomes false.
Syntax
while condition:
# code to be executed as long as the condition is true
count = 0
while count < 5:
print(count)
count += 1
The while loop, in this particular case, outputs the count value and increases it by 1
during each iteration, provided that the count is below 5.
@seismicisolation
@seismicisolation
2.5 Functions and Modules
Functions and modules are essential principles in Python that endorse the
organization, legibility, and reusability of code. Functions encapsulate the logic of
code, while modules and packages furnish a structured approach to arranging and
aggregating correlated functionality.
Functions in Python enable us to encapsulate and reuse code, enhancing the
legibility and maintainability of our programs.
Modules and packages aid in the organization of code into manageable units,
thereby facilitating code reuse and maintenance.
Syntax
def function_name(parameters):
# code to be executed
return result # optional
The add_numbers function in this illustrative case accepts two parameters, namely
a and b, performs the summation operation on them, and subsequently provides
@seismicisolation
@seismicisolation
the outcome as the returned value.
def greet(name):
greeting_message = f"Hello, {name}!"
return greeting_message
# Calling the function
message = greet("Alice")
print(message) # Output: Hello, Alice!
Here, the greet function takes a name parameter and returns a personalized
greeting message.
@seismicisolation
@seismicisolation
The implementation of lambda functions proves to be especially advantageous
when dealing with temporary operations that can be conveyed as arguments to
higher-order functions.
Lambda functions are especially useful when a small, one-off function is
needed and defining a full function using def seems too verbose. They are
commonly used in situations where functions are treated as first-class citizens,
such as when passing functions as arguments to higher-order functions.
Syntax
add_numbers = lambda x, y: x + y
result = add_numbers(5, 3)
print(result) # Output: 8
square = lambda x: x ✶✶ 2
result = square(4)
print(result) # Output: 16
@seismicisolation
@seismicisolation
Example 3: Checking if a number is even
is_even = lambda x: x % 2 == 0
result = is_even(7)
print(result) # Output: False
Modules
# my_module.py
def add(a, b):
return a + b
def subtract(a, b):
return a - b
# main_program.py
import my_module
@seismicisolation
@seismicisolation
result = my_module.add(5, 3)
print(result) # Output: 8
Packages
Package Structure
my_package/
|-- __init__.py
|-- module1.py
|-- module2.py
# main_program.py
from my_package import module1
result = module1.square(4)
print(result) # Output: 16
@seismicisolation
@seismicisolation
external data sources. It offers a means of interacting with data stored on disk,
enabling the integration of Python programs with external data and facilitating
efficient data management.
File I/O in Python pertains to the procedures encompassing the act of both
retrieving data from and storing data into files. Python furnishes an assortment of
pre-existing functions and methods for file I/O, affording you the capability to
engage with files on your operating system. The fundamental constituents of file
I/O encompass the act of initiating file access, acquiring data from files, inscribing
data into files, and ultimately terminating file access.
Opening a File
The Python open() function is an inherent function used to initiate the process of
opening files. This function is an essential component of file management within
the Python programming language, facilitating a multitude of file-related tasks
such as reading, writing, and appending. As a result of executing the open()
function, a file object is returned, which in turn grants access to a range of
methods for the manipulation of files.
Syntax
Modes
@seismicisolation
@seismicisolation
file = open("example.txt", "r")
The with statement is employed in conjunction with file operations for the purpose
of guaranteeing proper closure of the file following the completion of said
operations.
Reading from a file in Python involves opening the file in a specific mode, reading
its content, and then closing the file. Python provides various methods for reading
different amounts of data from a file.
Use the open() function to initiate the process of accessing a file in read mode (‘r’).
Alternatively, it is possible to explicitly specify ‘rt’ in order to denote text mode.
The read() function retrieves the complete content of the file and presents it as a
unified string.
The readline() function is responsible for reading a solitary line from the document
whenever it is invoked. This feature proves to be advantageous while handling
@seismicisolation
@seismicisolation
voluminous files.
The method called readlines() is responsible for reading and retrieving all lines
from the file, after which it will present them in the form of a list.
We can additionally iterate directly over the file object, whereby lines are
automatically read.
Writing to a File
To perform file writing operations in Python, it is necessary to first open the file
using a designated mode, subsequently write the desired data into it, and finally
close the file. In the Python programming language, there exist multiple methods
that facilitate the process of writing data to a file.
The open() function is utilized to initiate the opening of a file in the write mode
(‘w’). In the event that the file has already been created, opening it in write mode
will result in the removal of its current contents. Conversely, if the file does not
exist, a new file will be generated.
@seismicisolation
@seismicisolation
with open("output.txt", "w") as file:
# File operations go here
The write() function is employed for the purpose of writing information to the file.
It has the capability to write various types of data, such as strings, numbers, or any
other data format that can be transformed into a string.
– Appending to a file:
To append new content to an existing file without replacing its current content, it is
possible to open the file in append mode (‘a’). Subsequently, the write() method
will append the new content to the conclusion of the file.
Multiple lines can be written to a file either through the repeated utilization of the
write() method or by supplying a sequence of strings to the writelines() method.
@seismicisolation
@seismicisolation
Importing libraries in Python is an indispensable component of programming, as it
enables us to exploit pre-existing code and functionalities created by other
individuals. The process of importing can be accomplished through various
methods, and the following instances provide illustrations of distinct approaches.
This allows us to avail ourselves of an extensive range of functionalities, thereby
enhancing the efficiency and manageability of our code.
The import keyword is used in this instance to import the entire math library.
Subsequently, the square root of 25 is calculated using the math.sqrt() function.
# Example: Importing only the sqrt function from the math library
from math import sqrt
result = sqrt(25)
print(result) # Output: 5.0
In this case, only the sqrt function from the math library is imported. This
approach eliminates the need to prefix the function with the library name when
used.
Here, the pandas library is imported with the alias pd. This is a common
convention to simplify code and make it more readable. The pd alias is then used
to call functions from the pandas library.
@seismicisolation
@seismicisolation
# Example: Importing all modules from the datetime library
from datetime import ✶
current_date = date.today()
print(current_date) # Output: 2023-12-12
Importing all modules using the ✶ wildcard allows us to use all functions and
classes without explicitly mentioning them. However, this approach is generally
discouraged to avoid name clashes.
The Python libraries form the backbone of various Python applications, providing
solutions across different domains, including data science, machine learning, web
development, and more. Depending on our project requirements, we can choose
the libraries that best suit our needs.
NumPy
Main Features
Use Cases
Pandas
Main Features
@seismicisolation
@seismicisolation
DataFrame: A two-dimensional table for data manipulation.
Series: A one-dimensional labeled array.
Data cleaning, merging, and reshaping tools.
Use Cases
Matplotlib
Main Features
Use Cases
Requests
Main Features
Use Cases
@seismicisolation
@seismicisolation
Scikit-learn
Main Features
Use Cases
Main Features
Use Cases
Description: Django and Flask are web frameworks for building web
applications.
Main Features
@seismicisolation
@seismicisolation
Flask: Micro-framework for lightweight and flexible applications.
Use Cases
Beautiful Soup
Description: Beautiful Soup is a web scraping library for pulling data out of
HTML and XML files.
Main Features
Use Cases
SQLAlchemy
Main Features
Use Cases
OpenCV
@seismicisolation
@seismicisolation
Main Features
Use Cases
Classes
class ClassName:
def __init__(self, parameter1, parameter2, ...):
# Constructor or initializer method
# Set up instance attributes
@seismicisolation
@seismicisolation
The class keyword is employed for the purpose of declaring a class. In the
provided instance, the name of the class is denoted by ClassName.
Constructor (__init__) method:
The __init__ method, which is referred to as the “constructor,” is a
distinctive method that is automatically executed upon the creation of
an object from the class.
The initialization of the object’s attributes is performed. The self
parameter is employed to refer to the instance of the class and to make
references to instance attributes.
Other methods:
Additional methods within the class are defined like regular functions,
taking self as the first parameter.
These methods can perform various actions and access instance
attributes.
Example
class Car:
def __init__(self, make, model, year):
self.make = make
self.model = model
self.year = year
def display_info(self):
return f"{self.year} {self.make} {self.model}"
The given example demonstrates the implementation of a class called Car. This
class possesses the attributes of make, model, and year, as well as a method
named display_info. The initialization method, __init__, is responsible for assigning
values to the attributes of the Car object.
Objects
@seismicisolation
@seismicisolation
– Creating an object:
To instantiate an instance of the class, one invokes the class as though it were a
function. By doing so, the constructor (__init__) method is invoked, passing the
specified initial values.
Here, car1 and car2 are instances of the Car class. The __init__ method is
automatically called when creating these objects, initializing their attributes.
We can access attributes using the dot (.) notation. In this case, car1.make returns
the value of the make attribute. Similarly, car2.display_info() calls the display_info
method.
Example 1
class Car:
def __init__(self, make, model, year):
self.make = make
self.model = model
self.year = year
def display_info(self):
return f"{self.year} {self.make} {self.model}"
# Creating objects of the Car class
car1 = Car("Toyota", "Camry", 2022)
@seismicisolation
@seismicisolation
car2 = Car("Honda", "Civic", 2021)
# Accessing attributes and calling methods
print(car1.make) # Output: Toyota
print(car2.display_info()) # Output: 2021 Honda Civic
Example 2
class Dog:
def __init__(self, name, age):
self.name = name
self.age = age
def bark(self):
return "Woof!"
# Creating an object of the Dog class
my_dog = Dog("Buddy", 3)
# Accessing attributes and calling methods
print(my_dog.name) # Output: Buddy
print(my_dog.bark()) # Output: Woof!
In this particular instance, the class is denoted as “Dog,” while an object named
“my_dog” is created from the aforementioned class. The initialization method,
“__init__,” serves to establish the attributes (name and age), while the bark method
acts as a function connected to the Dog class, capable of being invoked on
instances of the class. Subsequently, the object my_dog possesses the ability to
access its attributes and invoke methods that have been defined within the class.
Inheritance in Python
@seismicisolation
@seismicisolation
reuse of code and the establishment of a hierarchical structure encompassing
various classes.
# Base class
class Animal:
def __init__(self, name):
self.name = name
def speak(self):
return "Some generic sound"
# Derived class (inherits from Animal)
class Dog(Animal):
def speak(self):
return "Woof!"
# Creating objects
animal = Animal("Generic Animal")
dog = Dog("Buddy")
# Calling the speak method
print(animal.speak()) # Output: Some generic sound
print(dog.speak()) # Output: Woof!
In this particular instance, the base class Animal possesses a method known as
speak. The Dog class, derived from Animal, supersedes the speak method. Both
classes, namely Animal and Dog, can be utilized, and the speak method exhibits
distinct behavior contingent upon the class.
Types of Inheritance
@seismicisolation
@seismicisolation
employed. For example, a class may inherit from one class using multiple
inheritance and from another class using single inheritance.
# Base class
class Animal:
def __init__(self, name):
self.name = name
def speak(self):
return "Some generic sound"
# Derived class (inherits from Animal)
class Dog(Animal):
def bark(self):
return "Woof!"
# Creating objects
animal = Animal("Generic Animal")
dog = Dog("Buddy")
# Accessing methods from the base and derived classes
print(animal.speak()) # Output: Some generic sound
print(dog.speak()) # Output: Some generic sound
print(dog.bark()) # Output: Woof!
In this particular instance, the Animal class serves as the fundamental class, while
the Dog class functions as the subclass. The Dog class acquires the speak method
from its parent class Animal.
# Base classes
class Engine:
def start(self):
return "Engine started"
class Electric:
def charge(self):
return "Charging electric power"
# Derived class (inherits from both Engine and Electric)
@seismicisolation
@seismicisolation
class HybridCar(Engine, Electric):
def drive(self):
return "Driving in hybrid mode"
# Creating an object
hybrid_car = HybridCar()
# Accessing methods from the base classes
print(hybrid_car.start()) # Output: Engine started
print(hybrid_car.charge()) # Output: Charging electric power
print(hybrid_car.drive()) # Output: Driving in hybrid mode
# Base class
class Animal:
def speak(self):
return "Some generic sound"
# Intermediate class (inherits from Animal)
class Dog(Animal):
def bark(self):
return "Woof!"
# Derived class (inherits from Dog)
class Poodle(Dog):
def dance(self):
return "Poodle is dancing"
# Creating an object
poodle = Poodle()
# Accessing methods from all levels of inheritance
print(poodle.speak()) # Output: Some generic sound
print(poodle.bark()) # Output: Woof!
print(poodle.dance()) # Output: Poodle is dancing
@seismicisolation
@seismicisolation
Polymorphism in Python
class Shape:
def area(self):
return "Some generic area calculation"
class Square(Shape):
def __init__(self, side):
self.side = side
def area(self):
return self.side ✶✶ 2
class Circle(Shape):
def __init__(self, radius):
self.radius = radius
def area(self):
return 3.14 ✶ self.radius ✶✶ 2
# Using polymorphism
shapes = [Square(5), Circle(3)]
for shape in shapes:
print(f"Area: {shape.area()}")
# Output
# Area: 25
# Area: 28.26
In this particular instance, the class Shape serves as the fundamental class housing
a method for computing area. The classes Square and Circle are subclasses of
Shape and possess an overridden version of the area method. The list shapes
encompasses objects from both classes, employing the identical method (area) to
determine the area, thereby exemplifying the concept of polymorphism.
Summary
Python syntax places a strong emphasis on readability, employing
@seismicisolation
@seismicisolation
indentation to denote block structure.
The correct use of indentation is of utmost importance in Python, as it serves
as a crucial indicator of the underlying block structure.
Comments, which serve as a form of documentation, are denoted by a #
symbol. Python offers support for a variety of data types, encompassing
integers, floats, strings, and booleans.
Variables are utilized to store data and manipulate it as needed. Lists are
mutable, meaning they can be modified, while tuples are immutable,
meaning they cannot be changed.
Dictionaries employ key-value pairs, while sets consist solely of unique
elements.
Python is an object-oriented language, with classes and objects serving as its
core concepts.
A class functions as a blueprint for creating objects, while objects are
instances of classes.
Inheritance allows a class to inherit attributes and methods from another
class.
Polymorphism enables objects of different types to be treated as if they were
objects of a common type.
Exercise (MCQs)
1.
C. –
@seismicisolation
@seismicisolation
D.
/✶ ✶/
3.
C. To create loops
D. To define functions
4.
C. String
D. Dictionary
5.
C. String
D. Dictionary
6.
C.
Unique elements
@seismicisolation
@seismicisolation
D. Key-value pairs
7.
D. A loop structure
9.
C. Variables
D. Methods
10.
C. Method overloading
@seismicisolation
@seismicisolation
D. Variable overloading
11.
Which type of inheritance involves a class inheriting from more than one base
class?
A. Single inheritance
B. Multiple inheritance
C. Multilevel inheritance
D.
Hierarchical inheritance
Answers
1. C
2. B
3. A
4. C
5. A
6. C
7. B
8. C
9. A
10. B
11. B
@seismicisolation
@seismicisolation
10. Inheritance in Python allows a class to inherit attributes and methods from
__________ class(es).
11. __________ inheritance involves a class inheriting from more than one base
class.
Answers
1. indentation
2. #
3. booleans
4. primitive
5. mutable, immutable
6. key-value, unique
7. index
8. blueprint
9. objects
10. another
11. multiple
Descriptive Questions
1. Explain the significance of proper indentation in Python syntax and how it
influences the structure of the code.
2. Provide an overview of primitive data types in Python and explain the
purpose of variables in the context of Python programming.
3. Discuss the differences between lists and tuples in Python, emphasizing their
mutability or immutability.
4. Explain the key characteristics of dictionaries and sets in Python, highlighting
their use cases and unique properties.
5. Define the concepts of classes and objects in Python’s object-oriented
programming paradigm, providing examples for better understanding.
6. Elaborate on the concept of inheritance in Python, discussing how it enables
code reuse. Provide an example to illustrate polymorphism.
7. Describe the various types of inheritance in Python, including single,
multiple, multilevel, hierarchical, and hybrid inheritance.
8. Walk through the polymorphism example involving Shape, Square, and Circle
classes. Explain how the common interface (area method) is utilized.
9. Explore the control statements in Python, including if, for, and while
statements. Provide examples to demonstrate their usage.
10. Discuss the role of functions in Python and explain the process of defining
functions. Additionally, explain the concept of modules and how they
@seismicisolation
@seismicisolation
enhance code organization.
11. Provide an overview of working with files in Python, explaining the open()
function and file input/output operations.
12. Explain the importance of libraries in Python programming and discuss the
process of importing libraries. Provide examples of popular Python libraries.
13. Delve deeper into object-oriented programming concepts, specifically
focusing on defining classes, creating objects, and encapsulating attributes
and methods.
14. Elaborate on file input/output operations in Python, including reading from
and writing to files. Explain the with statement for handling files.
15. Provide a detailed explanation of conditional statements in Python, including
the syntax and usage of if, elif, and else statements.
16. Write a Python program that takes a list of numbers, squares each element,
and prints the resulting list.
17. Create a tuple containing the names of your favorite fruits. Write a program
that asks the user for input and checks if the fruit is in the tuple.
18. Design a program that prompts the user to enter information about a book
(title, author, and publication year) and stores it in a dictionary. Print the
dictionary at the end.
19. Implement a Python program with classes representing geometric shapes
(e.g., Circle, Square) inheriting from a common base class. Use
polymorphism to calculate and print the area of each shape.
20. Create a text file with some content. Write a Python program that reads the
content of the file, counts the number of words, and prints the result.
21. Write a function that takes a list of numbers as input and returns the sum of
all the even numbers in the list.
22. Create a Python module that defines a function to calculate the area of a
rectangle. In another program, import the module and use the function to
calculate the area of a rectangle.
23. Write a program that takes two sets as input from the user and prints the
union, intersection, and difference of the sets.
24. Design a simple movie database using classes. Create classes for movies,
directors, and actors, allowing users to add and retrieve information about
movies, directors, and actors.
25. Write a program that asks the user for their age. Based on the age, print
different messages such as “You’re a child,” “You’re a teenager,” or “You’re
an adult.”
@seismicisolation
@seismicisolation
Chapter 3 Data Preprocessing in Python
– Introduction to NumPy
1. Arrays:
1. Mathematical operations:
@seismicisolation
@seismicisolation
arrays. These ufuncs encompass a broad spectrum of mathematical
operations, encompassing fundamental arithmetic, trigonometry,
logarithms, and various others.
Linear algebra: NumPy incorporates an extensive collection of functions
dedicated to operations in the field of linear algebra, encompassing tasks
such as the multiplication of matrices, decomposition of eigenvalues, as well
as the resolution of linear systems of equations.
1. Performance:
1. Applications:
@seismicisolation
@seismicisolation
illustrative instances, and optimal methodologies.
Array Creation
Using np.array()
@seismicisolation
@seismicisolation
import numpy as np
# Creating a 1D array from a Python list
arr1d = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(arr1d)
A 2D array (arr2d) is generated from a nested Python list in this instance. The outer
list symbolizes rows, while the inner lists symbolize the elements of each row.
Consequently, the resulting structure is a 2D array.
@seismicisolation
@seismicisolation
# Specifying the data type of the array
arr_float = np.array([1, 2, 3], dtype=float)
print("\nArray with Specified Data Type (float):")
print(arr_float)
We have the option to explicitly indicate the data type of the array by utilizing the
dtype parameter. In the given instance, we generate a one-dimensional array
consisting of integers and specifically designate it as a float.
Using np.zeros()
import numpy as np
# Creating a 1D array filled with zeros
zeros_1d = np.zeros(5)
print("1D Array of Zeros:")
print(zeros_1d)
Here, a 2D array (zeros_2d) with three rows and four columns, all initialized to zero,
is created using np.zeros().
@seismicisolation
@seismicisolation
Using np.ones()
The np.ones() function is similar to np.zeros(), but it creates an array filled with
ones.
This example demonstrates the creation of a 1D array (ones_1d) with six elements,
all initialized to one, using np.ones().
# Creating a 1D array filled with zeros with a specified data type (float)
zeros_float = np.zeros(4, dtype=float)
print(“\n1D Array of Zeros with Specified Data Type (float):”)
print(zeros_float)
@seismicisolation
@seismicisolation
In this example, a 1D array (zeros_float) with a specified data type (float) is created
using np.zeros().
# Creating a 2D array filled with ones with a specified data type (int)
ones_int = np.ones((3, 2), dtype=int)
print(“\n2D Array of Ones with Specified Data Type (int):”)
print(ones_int)
Here, a 2D array (ones_int) with a specified data type (int) is created using
np.ones().
Using np.arange()
The np.arange() function is utilized to generate an array that contains values that
are evenly spaced within a specified range. It bears resemblance to the pre-
existing range() function in Python; however, it yields a NumPy array.
import numpy as np
# Creating a 1D array with values from 0 to 9
arr1d = np.arange(10)
print("1D Array:")
print(arr1d)
@seismicisolation
@seismicisolation
print(arr_custom)
The dtype parameter enables us to specify the data type of the resulting array. In
the present case, an array of dimension 1 (arr_float_dtype) is generated with values
ranging from 0 to 4, while a particular data type (float) is assigned to it.
import numpy as np
@seismicisolation
@seismicisolation
# Creating a 1D array
arr1d = np.arange(6)
print("Original 1D Array:")
print(arr1d)
# Reshaping to a 2D array with 2 rows and 3 columns
arr2d = arr1d.reshape(2, 3)
print("\nReshaped 2D Array:")
print(arr2d)
# Creating a 2D array
arr2d_original = np.array([[1, 2, 3], [4, 5, 6]])
print("\nOriginal 2D Array:")
print(arr2d_original)
# Reshaping to a flattened 1D array
arr_flattened = arr2d_original.reshape(-1)
print("\nFlattened 1D Array:")
print(arr_flattened)
@seismicisolation
@seismicisolation
Here, we commence by utilizing a 2D array, denoted as arr2d_original, and proceed
to employ the reshape() method with the objective of generating a 1D array that
has been flattened, referred to as arr_flattened. The incorporation of the − 1
argument within the reshape() method automatically undertakes the task of
ascertaining the dimensions of the remaining dimension.
Array Manipulation
NumPy arrays facilitate the execution of robust indexing and slicing operations.
import numpy as np
# Creating a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Original 2D Array:")
print(arr2d)
# Accessing elements using indexing
print("\nElement at row 1, column 2:", arr2d[1, 2])
# Slicing 1D array
print("Sliced row 2:", arr2d[1, :])
# Slicing 2D array
print("Sliced subarray:")
print(arr2d[:2, 1:])
Array Concatenation
@seismicisolation
@seismicisolation
arr2d_1 = np.array([[1, 2], [3, 4]])
arr2d_2 = np.array([[5, 6]])
# Concatenating 2D arrays along rows (axis=0)
concatenated_2d_arr = np.concatenate([arr2d_1, arr2d_2], axis=0)
print("\nConcatenated 2D Array along Rows:")
print(concatenated_2d_arr)
Transposition
# Creating a 2D array
arr2d_original = np.array([[1, 2, 3], [4, 5, 6]])
print("Original 2D Array:")
print(arr2d_original)
# Transposing the 2D array
arr2d_transposed = arr2d_original.T
print("\nTransposed 2D Array:")
print(arr2d_transposed)
Reshaping
# Creating a 1D array
arr1d = np.arange(6)
print("Original 1D Array:")
print(arr1d)
# Reshaping to a 2D array with 2 rows and 3 columns
arr2d_reshaped = arr1d.reshape(2, 3)
print("\nReshaped 2D Array:")
print(arr2d_reshaped)
Splitting Arrays
Dividing an array into multiple subarrays along a specified axis can be achieved
through the process of array partitioning.
# Creating a 1D array
@seismicisolation
@seismicisolation
arr1d_to_split = np.array([1, 2, 3, 4, 5, 6])
# Splitting the 1D array into three parts
split_arr = np.split(arr1d_to_split, [2, 4])
print("Split Arrays:")
print(split_arr)
Adding/Removing Elements
# Creating a 1D array
arr1d_original = np.array([1, 2, 3, 4, 5])
# Appending an element
arr1d_appended = np.append(arr1d_original, 6)
print("Appended 1D Array:")
print(arr1d_appended)
# Removing an element by index
arr1d_removed = np.delete(arr1d_original, 2)
print("\nArray after removing element at index 2:")
print(arr1d_removed)
Element-wise operations
import numpy as np
# Creating two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Addition
result_addition = arr1 + arr2
@seismicisolation
@seismicisolation
print("Addition:", result_addition)
# Subtraction
result_subtraction = arr1 - arr2
print("Subtraction:", result_subtraction)
# Multiplication
result_multiplication = arr1 ✶ arr2
print("Multiplication:", result_multiplication)
# Division
result_division = arr1 / arr2
print("Division:", result_division)
Mathematical Functions
# Creating an array
arr = np.array([1, 2, 3, 4, 5])
# Square root
result_sqrt = np.sqrt(arr)
print("Square Root:", result_sqrt)
# Exponential
result_exp = np.exp(arr)
print("Exponential:", result_exp)
# Logarithm (natural logarithm)
result_log = np.log(arr)
print("Natural Logarithm:", result_log)
# Trigonometric functions
result_sin = np.sin(arr)
print("Sine:", result_sin)
result_cos = np.cos(arr)
print("Cosine:", result_cos)
# Summation
sum_result = np.sum(arr)
@seismicisolation
@seismicisolation
print("Sum:", sum_result)
# Mean
mean_result = np.mean(arr)
print("Mean:", mean_result)
# Minimum and Maximum
min_value = np.min(arr)
max_value = np.max(arr)
print("Minimum:", min_value)
print("Maximum:", max_value)
Broadcasting
import numpy as np
# Creating a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
# Broadcasting a scalar to the entire array
result_broadcasting = arr2d + 10
print("Broadcasting Result:")
print(result_broadcasting)
The scalar 10 is uniformly distributed to all elements of the 2D array arr2d in this
particular instance.
Vectorization
@seismicisolation
@seismicisolation
result_squared = arr✶✶2
print(“Vectorized Square Operation:”)
print(result_squared)
Here, the np.sqrt() function is a universal function that applies the square root
operation element-wise to the array.
Array Broadcasting
@seismicisolation
@seismicisolation
3.1.2 Scientific Computations with SciPy
Key Features
This command will install the latest version of SciPy and its dependencies.
@seismicisolation
@seismicisolation
After installation, you can verify it by importing SciPy in a Python script or the
Python interactive environment:
import scipy
# Check the version
print("SciPy Version:", scipy.__version__)
If there are no errors, and the version is printed, SciPy is successfully installed.
Dependencies
@seismicisolation
@seismicisolation
import numpy as np
from scipy import linalg # Importing SciPy's linear algebra module
# Creating a 1D array
arr_1d = np.array([1, 2, 3])
# Creating a 2D matrix
matrix_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("1D Array:")
print(arr_1d)
print("\n2D Matrix:")
print(matrix_2d)
# Coefficient matrix
coeff_matrix = np.array([[2, 1], [3, -1]])
# Right-hand side
rhs_vector = np.array([8, 1])
# Solving the linear system
solution = linalg.solve(coeff_matrix, rhs_vector)
print("Solution to the Linear System:")
print(solution)
4 −2
A =[ ]
1 1
# Matrix A
matrix_A = np.array([[4, -2], [1, 1]])
# Computing eigenvalues and eigenvectors
eigenvalues, eigenvectors = linalg.eig(matrix_A)
@seismicisolation
@seismicisolation
print("Eigenvalues:")
print(eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)
1 2
B =[ ]
2 3
# Matrix B
matrix_B = np.array([[1, 2], [2, 3]])
# Computing Singular Value Decomposition
U, S, Vt = linalg.svd(matrix_B)
print("U matrix:")
print(U)
print("\nS matrix (singular values):")
print(S)
print("\nVt matrix (transpose of V):")
print(Vt)
Numerical Integration
SciPy provides powerful numerical integration methods through
scipy.integrate. The quad() function is commonly used.
Consider integrating f(x) = x2 over the interval [0, 1]
# Numerical integration
result, error = integrate.quad(func, 0, 1)
print("Result of Numerical Integration:")
print(result)
Numerical Differentiation
@seismicisolation
@seismicisolation
scipy.misc.derivative() computes the derivative of a function at a given point
using numerical methods.
Consider differentiating g(x) = ex at x = 2
# Numerical differentiation
derivative_at_2 = derivative(func_g, 2.0, dx=1e-6)
print("Result of Numerical Differentiation at x=2:")
print(derivative_at_2)
DataFrame
Features
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
@seismicisolation
@seismicisolation
df = pd.DataFrame(data)
print(df)
Output
→Fig. 3.1 shows the results generated by a code snippet designed to create a
pandas DataFrame implementation. The visual diagram shows the structure and
contents of the DataFrame, including its rows, columns, and corresponding data
objects stored in each cell.
Series
A Series refers to a singularly structured and labeled array in the pandas library. It
is essentially a solitary column extracted from a DataFrame, but it is also capable of
existing independently. The Series data structure has the capability to store
various data types, encompassing integers, floating-point numbers, strings, and
other forms of data.
Features
@seismicisolation
@seismicisolation
# Creating a Series from a list
ages = pd.Series([25, 30, 35], name='Age')
print(ages)
Output
→Fig. 3.2 shows the results from the implementation rules responsible for
creating the pandas Series object. The visual diagram shows the layout and
elements of the sequence, which is a one-dimensional labeled layout in the Panda
library. The output shows the index labels associated with each value in the series,
and provides a clear understanding of how data is organized and stored in this
data structure.
@seismicisolation
@seismicisolation
From a Dictionary
The column names in a dictionary are derived from its keys, and the data is
represented by the corresponding values.
import pandas as pd
# Creating a DataFrame from a dictionary
data_dict = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df_from_dict = pd.DataFrame(data_dict)
print("DataFrame from Dictionary:")
print(df_from_dict)
Output:
Each individual list that is nested within the main list represents a single row, while
the main list itself encompasses all of the rows.
@seismicisolation
@seismicisolation
['Charlie', 35, 'Los Angeles']]
df_from_list = pd.DataFrame(data_list, columns=['Name', 'Age', 'City'])
print("\nDataFrame from List of Lists:")
print(df_from_list)
Output:
import numpy as np
# Creating a DataFrame from a NumPy array
data_array = np.array([['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']])
df_from_array = pd.DataFrame(data_array, columns=['Name', 'Age', 'City'])
print("\nDataFrame from NumPy Array:")
print(df_from_array)
Output:
@seismicisolation
@seismicisolation
Fig. 3.5: Create a DataFrame from NumPy array.
Output:
@seismicisolation
@seismicisolation
Fig. 3.6: Create a Series from list.
Output:
@seismicisolation
@seismicisolation
Fig. 3.7: Create a Series from NumPy array.
Data cleansing with pandas involves the preparation and refinement of datasets to
@seismicisolation
@seismicisolation
ensure their suitability for analysis. Essential elements of data cleansing
encompass the management of missing data as well as the identification and
elimination of duplicate records. Pandas offers methods such as dropna() and
fillna() to address missing values, while the drop_duplicates() function is employed
to handle duplicated rows. These operations play a pivotal role in preserving data
integrity and guaranteeing accurate analyses.
Furthermore, pandas grants the ability to explore and comprehend data
through functions such as info() and describe(), which assist in the identification of
outliers or anomalies. The versatility of pandas facilitates the manipulation of data,
including the alteration of data types or the conversion of categorical variables. By
utilizing these tools, data cleansing with pandas ensures that datasets remain
consistent, comprehensive, and primed for meaningful analysis and insights.
Dealing with missing data is an essential stage in the process of analyzing data,
and pandas offers powerful tools to accomplish this objective. Two popular
approaches entail the identification and management of missing values:
Identifying missing values:
Pandas facilitates the identification of missing values through the utilization of
functions such as isnull() or info(). A notable illustration of this is the function
df.isnull().sum(), which provides the tally of missing values in every column.
import pandas as pd
import numpy as np
# Creating a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', np.nan, 'Charlie'],
'Age': [25, np.nan, 35, 40],
'City': ['New York', 'San Francisco', 'Los Angeles', np.nan]}
df = pd.DataFrame(data)
# Identifying missing values using isnull() and sum()
missing_values_count = df.isnull().sum()
print("DataFrame with Missing Values:")
print(df)
print("\nMissing Values Count:")
print(missing_values_count)
Output:
@seismicisolation
@seismicisolation
Fig. 3.8: Missing values count in DataFrame.
@seismicisolation
@seismicisolation
print("\nDataFrame after Fillna:")
print(df_cleaned_fillna)
Output:
@seismicisolation
@seismicisolation
skewed analyses. The identification of duplicate rows can be accomplished by
utilizing the duplicated() method, allowing for the exploration of redundant
records based on either column values or the entire row. The drop_duplicates()
function facilitates the elimination of duplicate rows, resulting in a DataFrame that
exclusively contains unique records. The strategic management of duplicates is
imperative in order to preserve the accuracy of analyses and to prevent biases that
may arise from repeated data points. These operations possess particular value
when working with extensive datasets or when integrating data from multiple
sources, as they ensure that the resulting DataFrame accurately represents the
underlying information.
Identifying uplicates:
Identifying and handling duplicate rows is crucial to maintaining data integrity.
Pandas provides methods to identify duplicate rows based on column values or the
entire row.
import pandas as pd
# Creating a DataFrame with duplicate rows
data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
'Age': [25, 30, 25, 35, 30],
'City': ['New York', 'San Francisco', 'New York', 'Los Angeles', 'San Francisco']
df = pd.DataFrame(data)
# Identifying duplicates using duplicated()
duplicates = df[df.duplicated()]
print("DataFrame with Duplicates:")
print(df)
print("\nIdentified Duplicates:")
print(duplicates)
Output:
@seismicisolation
@seismicisolation
Fig. 3.10: Identifying duplicate values in DataFrame.
Removing duplicates
Output:
@seismicisolation
@seismicisolation
Fig. 3.11: Removing duplicate values in DataFrame.
The identification of duplicate rows in the DataFrame was conducted through the
utilization of the duplicated() method in this particular instance. The resulting
DataFrame, denoted as duplicates, showcases the duplicate rows that were
identified. Subsequently, the removal of duplicate rows was carried out by means
of the drop_duplicates() function, thereby generating a DataFrame devoid of any
duplicates. These operations are of utmost importance in guaranteeing the
accuracy of the data and ensuring that the analyses conducted are not distorted by
the presence of redundant information.
The employment of the pivot function in the pandas library acts as a powerful tool
to enable the conversion of data from a lengthy configuration to a broad
arrangement. This specific feature offers significant benefits when dealing with
datasets that are structured in a stacked or vertical manner, thus requiring a
transformation into a horizontal or tabular format. Essential elements in this
procedure include the index, columns, and values, which collectively determine the
framework of the resulting DataFrame.
import pandas as pd
# Creating a DataFrame
@seismicisolation
@seismicisolation
data = {'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
# Reshaping with pivot
df_pivot = df.pivot(index='Date', columns='Category', values='Value')
print("Original DataFrame:")
print(df)
print("\nDataFrame after Pivot:")
print(df_pivot)
Output:
@seismicisolation
@seismicisolation
Fig. 3.12: Reshaping with pivot.
In this example, the original DataFrame has a long format, with each date having
separate rows for categories A and B. After applying pivot, the data is reshaped
into a wide format, with dates as the index and categories as columns, providing a
clearer tabular structure.
Melting data:
The utilization of the melt function in the pandas library is paramount in the
conversion of wide-format data to long-format. This becomes particularly valuable
in situations where the data is arranged in a tabular or horizontal structure and
necessitates transformation into a stacked or vertical format.
The inclusion of the id_vars, var_name, and value_name parameters in the
method affords the opportunity for customization of the melted DataFrame.
import pandas as pd
# Creating a DataFrame
data = {'Date': ['2022-01-01', '2022-01-02'],
'Category_A': [10, 15],
'Category_B': [20, 25]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Melting data with the melt function
df_melted = pd.melt(df, id_vars=['Date'], var_name='Category', value_name='Value')
print("\nDataFrame after Melting:")
print(df_melted)
Output:
@seismicisolation
@seismicisolation
Fig. 3.13: Melting the data.
@seismicisolation
@seismicisolation
cleansing and transformation facilitate more precise and dependable analyses,
thereby leading to enhanced insights and decision-making capabilities.
import pandas as pd
from sklearn.impute import SimpleImputer
# Sample dataset with missing values
data = {'Age': [30, None, 50, 60],
'Income': [50000, 60000, None, 80000],
'Education_Level': [12, 16, 18, None]}
df = pd.DataFrame(data)
# Imputation using mean
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
print("Original DataFrame:")
print(df)
print("\nDataFrame after Imputation:")
print(df_imputed)
Output:
@seismicisolation
@seismicisolation
Fig. 3.14: Imputing missing values in DataFrame.
In this particular instance, there are absent values in the columns denoted as
‘Age’, ‘Income,’ and ‘Education_Level’. To address this issue, we employ the
utilization of the SimpleImputer module from the renowned scikit-learn library to
impute the missing values with the mean value of each corresponding column. The
resultant DataFrame encompasses the imputed values, thereby guaranteeing the
wholeness of the dataset and facilitating further analysis and modeling endeavors.
Data type conversions play a vital role in the preprocessing of data, as they allow
for the representation of data in a manner that is suitable for analysis, modeling,
or storage. This process entails the alteration of data from one type to another, for
instance, the conversion of strings into numerical values or the transformation of
categorical variables into numerical representations.
The conversions of data types serve to guarantee the consistency of data and
its compatibility with various algorithms and tools. To illustrate, the conversion of
categorical variables into numerical format, accomplished through the utilization
of encoding techniques like one-hot encoding or label encoding, facilitates the
@seismicisolation
@seismicisolation
effective processing of such variables by machine learning algorithms. Likewise,
the conversion of numerical values from one data type to another, such as the
transition from integers to floats, may become necessary in order to address
precision or scaling requirements.
import pandas as pd
# Sample DataFrame with mixed data types
data = {'ID': ['001', '002', '003', '004'],
'Age': [30, 40, 50, 60],
'Income': ['50000', '60000', '70000', '80000'],
'Education_Level': [12.5, 16.2, 18.9, 20.7]}
df = pd.DataFrame(data)
# Convert 'ID' column from string to integer
df['ID'] = df['ID'].astype(int)
# Convert 'Income' column from string to integer
df['Income'] = df['Income'].astype(int)
print("DataFrame after Data Type Conversions:")
print(df)
Output:
In this example, we have a DataFrame with mixed data types. We convert the ‘ID’
and ‘Income’ columns from strings to integers using the astype method in pandas,
ensuring consistency and enabling numerical operations or analysis of these
columns. Similarly, other data type conversions can be performed as needed to
prepare the data for further processing or modeling.
@seismicisolation
@seismicisolation
3.4 Feature Engineering
Feature engineering involves the process of transforming raw data into a format
that is more suitable for machine learning algorithms, with the goal of improving
the performance of the model. It encompasses the selection, creation, and
modification of features to extract meaningful patterns and insights from the data.
This may include converting categorical variables into numerical representations
using techniques like one-hot encoding or label encoding. Additionally, feature
engineering includes feature scaling, which standardizes or normalizes numerical
features to ensure consistency and comparability across different scales.
Moreover, it involves techniques for reducing data dimensionality, such as
principal component analysis (PCA) and feature selection methods that aim to
decrease the number of features and eliminate irrelevant or redundant ones. The
effective implementation of feature engineering can significantly enhance the
accuracy, interpretability, and generalization of the model to new data, making it a
critical step in the machine learning pipeline.
One-Hot Encoding
@seismicisolation
@seismicisolation
import pandas as pd
# Sample categorical data
data = {'Color': ['Red', 'Green', 'Blue', 'Red', 'Blue']}
df = pd.DataFrame(data)
# One-hot encoding
one_hot_encoded = pd.get_dummies(df['Color'])
print("Original DataFrame:")
print(df)
print("\nOne-hot Encoded DataFrame:")
print(one_hot_encoded)
Output:
@seismicisolation
@seismicisolation
Fig. 3.16: One-hot encoding.
@seismicisolation
@seismicisolation
In this specific instance, the categorical variable ‘Color’ is transformed into binary
vectors through the process of one-hot encoding, which conveys the existence or
nonexistence of each category. This particular conversion allows machine learning
algorithms to effectively understand and make use of categorical data.
Label Encoding
Output:
@seismicisolation
@seismicisolation
@seismicisolation
@seismicisolation
Fig. 3.17: Label encoding.
In this example, the categorical variable ‘Color’ has been encoded with labels that
represent numerical values. Every distinct category is given a numerical label
according to its sequential appearance in the data. Nevertheless, prudence is
advised when utilizing label encoding, particularly in scenarios where the
categorical variable lacks inherent ordinality. This is due to the potential for
machine learning algorithms to misinterpret the encoded labels, resulting in
misinterpretation of the data.
Standardization
@seismicisolation
@seismicisolation
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Sample numerical data
data = {'Age': [30, 40, 50, 60],
'Income': [50000, 60000, 70000, 80000],
'Education_Level': [12, 16, 18, 20]}
df = pd.DataFrame(data)
# Standardization
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print("Original DataFrame:")
print(df)
print("\nStandardized DataFrame:")
print(df_scaled)
Output:
@seismicisolation
@seismicisolation
Fig. 3.18: Standardization of DataFrame.
In this instance, the initial numerical data depicting age, income, and education
level is standardized through the utilization of the StandardScaler implemented in
scikit-learn. Each individual characteristic is adjusted in such a way that it
possesses an average value of 0 and a standard deviation of 1, thereby rendering
them amenable to comparison across varying scales. This safeguard guarantees
that no individual characteristic holds undue influence over the process of learning
in machine learning algorithms that heavily rely on numerical data.
Normalization
@seismicisolation
@seismicisolation
Output:
In this instance, the initial numeric information signifying age, income, and level of
education undergoes normalization by utilizing the MinMaxScaler from scikit-
learn. The values of each characteristic are adjusted to a span ranging from 0 to 1,
thereby maintaining their relative associations while guaranteeing consistency
across various scales. This facilitates equitable comparisons between
characteristics and averts the prevalence of any individual attribute in machine
learning algorithms, thereby upholding the integrity of the learning process.
@seismicisolation
@seismicisolation
library in Python, provides a significant level of customization in generating a wide
range of static plots, including line plots, scatter plots, histograms, bar plots, and
more. It empowers users with precise control over various plot elements, such as
colors, markers, labels, and annotations.
Seaborn, constructed atop Matplotlib, furnishes a more advanced interface for
crafting informative and visually appealing statistical graphics. By providing
convenient functions for plotting data with minimal code, it simplifies the
otherwise intricate process of generating complex visualizations. Seaborn excels in
the production of visually captivating plots for statistical analysis, including
specialized ones like violin plots, box plots, pair plots, and heatmaps.
Collectively, Matplotlib and Seaborn constitute a potent toolkit for data
visualization, enabling analysts and data scientists to swiftly and effectively explore
patterns, trends, and relationships within datasets. These libraries facilitate the
creation of plots of publication-quality, thereby enhancing data storytelling and
presentation, thereby rendering them indispensable tools in the workflow of data
analysis.
Line Plot
@seismicisolation
@seismicisolation
Line plots are employed to represent the trajectory of data points across an
unbroken duration. They are constructed by joining the data points with linear
segments. For instance, one could construct a line plot to depict the variation in
stock prices over a given period of time.
Output:
@seismicisolation
@seismicisolation
Fig. 3.20: Line plot.
Scatter Plot
Output:
@seismicisolation
@seismicisolation
Fig. 3.21: Scatter plot.
Bar Plot
Bar plots depict categorical data using rectangular bars, where the length of each
bar signifies the value of the corresponding category. Such visualizations prove
valuable in the task of comparing the quantities associated with various
categories. For instance, they are employed to assess the sales performance of
different products.
@seismicisolation
@seismicisolation
plt.show()
Output:
Histogram
@seismicisolation
@seismicisolation
plt.hist(data, bins=5)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram Example')
plt.show()
Output:
Pie Chart
Pie charts depict categorical information by dividing a circle into slices, with each
slice denoting a specific category and its magnitude indicating the proportion of
that category in the entirety. These charts serve a practical purpose in
demonstrating the makeup of an entire entity, such as the allocation of expenses
@seismicisolation
@seismicisolation
in a budget.
Output:
@seismicisolation
@seismicisolation
Fig. 3.24: Pie chart.
Box Plot
@seismicisolation
@seismicisolation
median, quartiles, and outliers. Their value lies in the identification of outliers and
the comprehension of the data’s dispersion and central tendency. For instance,
they can be employed to contrast the distribution of test scores among various
student groups.
Output:
@seismicisolation
@seismicisolation
Violin Plot
Violin plots amalgamate a box plot and a kernel density plot to exhibit the
distribution of numerical data. They offer a more intricate perspective of the
distribution of data in comparison to box plots. For instance, they can be utilized to
visualize the distribution of heights among diverse age groups.
Output:
@seismicisolation
@seismicisolation
Fig. 3.26: Violin plot.
Heatmap
@seismicisolation
@seismicisolation
plt.show()
Output:
Area Plot
Area plots are similar to line plots but fill the area below the line, making them
useful for visualizing cumulative data or stacked data. For example, visualizing the
cumulative sales over time for different product categories.
import matplotlib.pyplot as plt
@seismicisolation
@seismicisolation
x = [1, 2, 3, 4, 5]
y1 = [1, 2, 3, 4, 5]
y2 = [1, 4, 9, 16, 25]
plt.fill_between(x, y1, color='skyblue', alpha=0.4)
plt.fill_between(x, y2, color='orange', alpha=0.4)
plt.title('Area Plot Example')
plt.show()
Output:
Contour Plot
@seismicisolation
@seismicisolation
geographical or scientific data. For example, visualizing elevation data on a map.
Output:
@seismicisolation
@seismicisolation
3.5.2 Advanced Visualizations with Seaborn
x_variable,y_variable,category
1,2,A
2,3,B
3,4,A
4,5,B
5,6,A
6,7,B
Pair Plot
Output:
@seismicisolation
@seismicisolation
Fig. 3.30: Pair plot.
Joint Plot
Joint plots integrate scatter plots and histograms to visually represent the
correlation between two quantitative variables in addition to their respective
distributions.
@seismicisolation
@seismicisolation
import seaborn as sns
import pandas as pd
data = pd.read_csv('dataset.csv')
sns.jointplot(x='x_variable', y='y_variable', data=data, kind='scatter')
Output:
@seismicisolation
@seismicisolation
PairGrid
Output:
@seismicisolation
@seismicisolation
Fig. 3.32: Pair grid.
@seismicisolation
@seismicisolation
surpasses the quantity of samples in another category.
Imbalanced datasets pose challenges in the training of models because
algorithms tend to favor the category with a larger number of samples, leading to
biased predictions and inferior performance for categories with fewer samples. To
address this issue, various methodologies are employed, including resampling
techniques such as oversampling (increasing the number of samples in the
minority category) and undersampling (reducing the number of samples in the
majority category).
Furthermore, synthetic data generation techniques such as SMOTE (Synthetic
Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling) are
utilized to generate artificial data points for the minority category, thus achieving a
balanced dataset. Handling imbalanced data ensures that machine learning
models are trained on datasets that are more representative, resulting in
improved performance and generalization across all categories.
@seismicisolation
@seismicisolation
RandomOverSampler function, sourced from the imbalanced-learn library, in order
to oversample the minority class. Lastly, we display the class distributions both
prior to and subsequent to the resampling process, enabling us to observe the
resultant balancing effect.
Here’s a programming example of undersampling using the imbalanced-learn
library:
@seismicisolation
@seismicisolation
from imblearn.over_sampling import SMOTE
import numpy as np
# Sample imbalanced dataset
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 0, 1, 1])
# Instantiate SMOTE
smote = SMOTE()
# Generate synthetic samples
X_synthetic, y_synthetic = smote.fit_resample(X, y)
print("Original class distribution:", np.bincount(y))
print("Synthetic class distribution:", np.bincount(y_synthetic))
In this instance, SMOTE is utilized to produce artificial data points for the
underrepresented class within the dataset that exhibits imbalance. The
fit_resample technique is utilized to execute the generation of synthetic data, and
the resultant distributions of classes both before and after the application of
SMOTE are compared to observe the effect of achieving balance.
Here is an illustrative programming example showcasing the utilization of
ADASYN (Adaptive Synthetic Sampling) for the purpose of creating synthetic data
points for the underrepresented class within an imbalanced dataset:
@seismicisolation
@seismicisolation
Summary
Numerical and scientific computing using NumPy and SciPy: Covered array
creation, manipulation, and various numerical operations using NumPy and
SciPy libraries.
Loading data with pandas: Introduction to pandas library for data
manipulation and analysis, including DataFrame and Series, along with data
manipulation techniques.
Data cleaning and transformation: Discussed strategies for handling missing
data and data type conversions in datasets.
Feature engineering: Covered techniques such as encoding categorical
variables and feature scaling for preparing data for machine learning.
Data visualization with Matplotlib and Seaborn: Introduced Matplotlib and
Seaborn libraries for data visualization, including basic and advanced
plotting techniques.
Handling imbalanced data: Discussed challenges posed by imbalanced
datasets in machine learning and techniques like resampling and synthetic
data generation to address class imbalance.
Exercise (MCQs)
1.
c) NumPy
@seismicisolation
@seismicisolation
d) Seaborn
3.
What is the primary data structure in pandas for storing and manipulating
data?
a) Arrays
b) Lists
c) DataFrame
d)
Tuples
4.
d) Imputation
5.
c) Pandas
@seismicisolation
@seismicisolation
d) Matplotlib
7.
c) Histogram
d)
Heatmap
8.
c) Data cleaning
d) Dimensionality reduction
9.
What is the purpose of synthetic data generation techniques like SMOTE and
ADASYN?
a) To create artificial data points for majority classes
b) To remove outliers from datasets
@seismicisolation
@seismicisolation
c) Data imputation
d) Dimensionality reduction
Answers
1. b
2. c
3. c
4. d
5. c
6. d
7. b
8. a
9. a
10. a
Answers
@seismicisolation
@seismicisolation
1. NumPy, SciPy
2. DataFrame
3. oversampling, undersampling
4. scale
5. data visualization
6. pairwise
7. synthetic
8. Label encoding, one-hot encoding
9. missing
10. objects
Descriptive Questions
1. Explain the importance of proper indentation in Python syntax and how it
impacts the readability of code.
2. Describe the role of pandas in data manipulation and analysis, and provide
examples of DataFrame operations.
3. Discuss the significance of feature engineering in machine learning and
explain the difference between encoding categorical variables and feature
scaling.
4. Explain the process of data visualization using Matplotlib and Seaborn, and
provide examples of basic and advanced plotting techniques.
5. Describe the challenges posed by imbalanced datasets in machine learning
and discuss techniques such as resampling and synthetic data generation to
address class imbalance.
6. Explain the concept of feature scaling and discuss its importance in
preparing data for machine learning models.
7. Discuss the role of comments in Python code and how they contribute to
code documentation and readability.
8. Explain the concept of object-oriented programming in Python, including
classes, objects, inheritance, and polymorphism.
9. Discuss the strategies for handling missing data in datasets and the
importance of data imputation in data preprocessing.
10. Describe the process of encoding categorical variables in feature
engineering and discuss the differences between label encoding and one-hot
encoding.
11. Write a Python program that creates a 2D NumPy array and performs the
following operations:
a. Compute the mean, median, and standard deviation of the array.
b. Reshape the array into a different shape.
c. Perform element-wise addition and multiplication with another array.
@seismicisolation
@seismicisolation
12. Write a Python program that loads a CSV file using pandas and performs the
following operations:
a. Display the first few rows of the DataFrame.
b. Calculate summary statistics for numerical columns.
c. Convert a categorical column to numerical one using label encoding.
13. Write a Python program that generates a line plot using Matplotlib to
visualize a time series dataset.
a. Include labels for the x and y axes.
b. Add a title to the plot.
c. Customize the line style and color.
14. Write a Python program that loads an imbalanced dataset and implements
oversampling using the SMOTE technique from the imbalanced-learn library.
a. Display the class distribution before and after oversampling.
b. Train a simple machine learning model (e.g., logistic regression) on the
balanced dataset and evaluate its performance.
15. Write a Python program that preprocesses a dataset for machine learning
using feature engineering techniques.
a. Encode categorical variables using one-hot encoding.
b. Scale numerical features using Min-Max scaling or standardization.
c. Split the dataset into training and testing sets for model evaluation.
@seismicisolation
@seismicisolation
Chapter 4 Foundations of Machine Learning
@seismicisolation
@seismicisolation
concepts and techniques in later chapters.
@seismicisolation
@seismicisolation
house prices, stock prices, and temperature forecasting.
@seismicisolation
@seismicisolation
employed to predict future stock prices by analyzing historical data and market
indicators. Common regression techniques include linear regression, polynomial
regression, and decision tree regression.
To exemplify, let us examine a straightforward classification problem:
forecasting if a loan applicant of a bank will fail to pay their loan or not. The input
characteristics in this situation could consist of the applicant’s credit score,
earnings, and debt-to-income ratio, while the output would be a binary tag
indicating either “default” or “no default.” A logistic regression model can be
educated on past data to categorize potential loan applicants based on these
characteristics.
In contrast, suppose we have an interest in predicting the price of a dwelling
based on its dimensions, number of bedrooms, and geographical location. This
scenario introduces a regression issue as the output variable, specifically the
dwelling price, exhibits a continuous characteristic. In this instance, a linear
regression model can be utilized to establish the connection between the input
attributes and the dwelling prices by employing a dataset consisting of past real
estate transactions. Consequently, this grants us the capability to generate
predictions regarding the price of newly constructed dwellings.
Classification involves the classification of data into classes or labels, whereas
regression concentrates on the prediction of numerical values. A comprehensive
understanding of the differences between these two types of supervised learning
tasks is crucial when choosing appropriate algorithms and methodologies to
effectively tackle various real-world problems.
Types of Classification
Types of Regression
@seismicisolation
@seismicisolation
correlation between a reliant variable and one or multiple autonomous
variables using a linear equation. Its application lies in the forecast of
continuous numerical values. Instances include the anticipation of housing
costs based on the area and quantity of bedrooms, the prediction of stock
prices utilizing historical data, and the estimation of sales revenue by taking
marketing expenses into account.
Polynomial regression: Polynomial regression extends the scope of linear
regression by employing a polynomial function to fit the data, rather than a
mere straight line. This methodology effectively captures intricate nonlinear
associations between the variables. Illustrative instances encompass the
prediction of a projectile’s trajectory, the modeling of temporal temperature
fluctuations, and the fitting of growth curves in the realm of biology.
Logistic regression: Despite being named as such, logistic regression is
utilized as a classification algorithm for tasks involving binary classification. It
constructs a model to determine the probability that a given instance is part
of a particular class by employing a logistic function. Instances of its
application include predicting the probability of a customer making a
purchase, estimating the likelihood of a patient having a specific disease
based on medical tests, and forecasting the probability of defaulting on a
loan.
Ridge and Lasso regression: Ridge and Lasso regression are methods
utilized to regularize linear regression models by integrating a penalty term
into the cost function. These methods assist in mitigating overfitting and
improving the generalization capability of the model. Ridge regression
incorporates a penalty term that is directly proportional to the square of the
coefficients’ magnitude, whereas Lasso regression incorporates a penalty
term that is directly proportional to the absolute value of the coefficients.
These methods are particularly advantageous when dealing with
multicollinearity and datasets with high dimensionality.
Classification and regression techniques, which are essential tools in the field of
machine learning, are widely used in diverse domains including finance,
healthcare, marketing, and engineering.
@seismicisolation
@seismicisolation
to discover inherent groupings or clusters within the data. For example, in the
context of customer segmentation, clustering algorithms can effectively divide
customers into distinct groups based on their purchasing behavior, demographics,
or other relevant features. A well-known algorithm for clustering is the K-means
algorithm, which assigns data points to clusters by minimizing the distance
between each point and the centroid of its assigned cluster. Another technique,
called “hierarchical clustering,” constructs a hierarchical structure of clusters by
recursively merging or splitting clusters based on their similarity.
On the contrary, association rule learning seeks to uncover intriguing
connections or associations amidst various variables within extensive datasets. The
primary emphasis resides in the identification of patterns of co-occurrence or
correlation among items. An exemplary illustration of this phenomenon is market
basket analysis, where association rules are employed to unveil the relationships
between products that are frequently purchased together during transactions. The
Apriori algorithm is commonly employed for this endeavor, as it systematically
generates potential itemsets and eliminates those that fail to satisfy the minimum
support criteria, thus effectively identifying frequent itemsets and association
rules.
To exemplify, let us contemplate a retail establishment that desires to
scrutinize customer acquisition data. By employing clustering, the establishment
can effectively classify customers into discrete categories based on their
preferences when it comes to procuring merchandise. This invaluable data can
subsequently be employed to fabricate targeted marketing strategies or
personalized recommendations. Meanwhile, association analysis can reveal
patterns such as “customers who acquire diapers are inclined to also obtain baby
formula,” thus enabling the establishment to optimize the positioning of products
or promotional undertakings.
In summary, the focus of clustering is on the identification of natural
groupings or clusters within data. On the other hand, association rule learning is
primarily concerned with the discovery of relationships or patterns of co-
occurrence among variables. The utilization of both approaches is extremely
valuable in the process of uncovering insights and patterns within unlabeled data.
Ultimately, this enables businesses to make well-informed decisions and enhance
their operational efficiency.
Types of Clustering
@seismicisolation
@seismicisolation
means clustering is extensively utilized owing to its straightforwardness and
effectiveness, although it necessitates the prior specification of the cluster
count.
Hierarchical clustering: Hierarchical clustering, conversely, builds a
dendrogram, a tree-like arrangement, through iterative merging or splitting
of clusters based on the similarity of data points. It does not require the prior
specification of cluster numbers and can yield a valuable understanding of
the hierarchical composition of the data. Two frequently employed
techniques in hierarchical clustering are agglomerative (bottom-up) and
divisive (top-down) clustering.
DBSCAN: Density-based clustering (DBSCAN) is a clustering algorithm that
detects clusters by taking into account areas of high density that are
separated by areas of low density. This method can identify clusters of
various shapes and is robust against noise and outliers. DBSCAN,
abbreviated from density-based spatial clustering of applications with noise,
is a widely used density-based clustering algorithm that necessitates the
specification of two parameters: epsilon, which denotes the maximum
distance between points for them to be considered part of the same cluster,
and minPts, which represents the minimum number of points required to
form a dense region.
Mean shift clustering: Mean shift clustering, on the other hand, is a
clustering technique devoid of parameters that detects clusters through the
displacement of centroids towards areas of heightened data density.
Comparable to hierarchical clustering, it does not necessitate the a priori
specification of cluster quantity and can autonomously ascertain the optimal
number of clusters based on the distribution of data.
Types of Association
@seismicisolation
@seismicisolation
for explicit generation of candidate itemsets. This property renders it
especially advantageous for processing large datasets containing a
substantial number of transactions.
Eclat algorithm: The Eclat algorithm is an additional and widely recognized
algorithm for association rule learning. It explores frequent itemsets by
intersecting transaction tidsets. Through the utilization of the downward
closure property of support, it effectively detects frequent itemsets.
PrefixSpan algorithm: The PrefixSpan algorithm, conversely, is utilized
specifically to extract sequential patterns in sequence databases. It employs
a recursive approach to generate frequent sequences through the extension
of prefix patterns, subsequently optimizing the search space to effectively
discern frequent sequential patterns.
Various types of clustering and association techniques play a crucial role in the
domain of data mining and pattern recognition. They facilitate the identification of
significant patterns and valuable insights from extensive datasets across a wide
range of fields, including market basket analysis, customer segmentation, and
recommendation systems.
@seismicisolation
@seismicisolation
as the regularization factor, which controls the balance between accurately fitting
the training data and reducing model complexity. The selection of appropriate
hyperparameters is often accomplished through the utilization of cross-validation,
which assesses the model’s effectiveness on a distinct validation dataset.
Overfitting occurs when a model acquires noise or irrelevant patterns from the
training data, leading to insufficient generalization performance. The use of
regularization techniques, such as L2 and L1 regularization, can mitigate
overfitting by penalizing complex models and advocating for simplicity. To
construct models that effectively generalize to unseen data and provide accurate
predictions in real-world scenarios, it is imperative to employ appropriate
regularization and conduct hyperparameter tuning.
The concept of balancing bias and variance is a fundamental aspect of the domain
of machine learning. This balance is essential to attain an equilibrium between the
bias and variance of a specific model. Bias represents the difference between the
model’s expected prediction and the true value, while variance measures the
variability in the model’s predictions across various training datasets.
Let us consider a simple example that involves the application of a polynomial
regression model to a collection of data points. Suppose we have a dataset that
consists of only one characteristic, labeled as “x,” and its associated target
variable, labeled as “y.” Our aim is to construct a polynomial regression model that
can effectively forecast the value of “y” based on “x”. To accomplish this task, we
can express our model in the following manner:
y = β0 + β1 x + β2x 2 + … + βnxn + ϵ
where ϵ represents the error term, and β0, β1, …, βn are the coefficients of the
polynomial terms.
The model’s bias can be measured by determining the disparity between the
anticipated forecast of the model and the actual value, which can be computed as:
where ˆ
f (x) represents the predicted value by the model, f(x) is the true value, and
E[ˆ
f (x)] is the expected value of the predictions over different training datasets.
On the contrary, the model’s variance quantifies the amount of variation in
predictions at a specific point when considering different instances of the model. It
can be determined by performing calculations:
@seismicisolation
@seismicisolation
Var (ˆf(x)]) = E[ˆf(x)−E[ˆf(x)])2]
The aim is to identify a model that attains an optimal balance between bias and
variance. A model that displays a high level of bias but a low level of variance, like a
linear regression model, may oversimplify the underlying relationship within the
data, thus resulting in systematic errors (underfitting). On the other hand, a model
that exhibits low bias but high variance, such as a high-degree polynomial
regression model, may capture the noise present in the training data, leading to an
increased susceptibility to fluctuations in the training set (overfitting).
To exemplify the trade-off between bias and variance, we shall examine the
process of fitting polynomial regression models to a specific dataset, where the
degrees of the polynomials differ. In this scenario, as the degree of the polynomial
rises, the bias decreases (resulting in a more flexible model capable of capturing
more complex relationships within the data), while the variance increases (causing
the model to be more sensitive to fluctuations in the training data).
By choosing the suitable degree of a polynomial, our aim is to find a
harmonious equilibrium between bias and variance that reduces the total error
(the sum of the squared bias and variance). This objective is frequently achieved
through methods like cross-validation or regularization, which work to address
overfitting by penalizing overly complex models.
In summary, the bias-variance trade-off highlights the fundamental trade-off
that exists between the bias and variance of machine learning models.
Understanding this trade-off is essential when choosing the appropriate
complexity of a model and preventing cases of underfitting or overfitting in real-
world applications.
The concept of balancing bias and variance is a pivotal principle in the domain
of machine learning. It entails achieving an equilibrium between a model’s
capacity to precisely grasp the intrinsic patterns within a dataset (reduced bias)
and its adaptability to various datasets (reduced variance). Numerous Python
libraries are accessible, providing resources and methodologies for
comprehending and handling this trade-off.
scikit-learn, a Python library, is extensively employed for a range of machine
learning undertakings, encompassing both supervised and unsupervised learning.
Although scikit-learn does not furnish distinct functions for measuring bias and
variance, it does present an array of instruments for assessing models. These
instruments consist of cross-validation, learning curves, and validation curves,
which serve the purpose of evaluating the bias and variance of a model.
An example of utilizing learning curves in scikit-learn to visually represent the
bias-variance trade-off is as follows:
@seismicisolation
@seismicisolation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import learning_curve
from sklearn.linear_model import LogisticRegression
# Define model
model = LogisticRegression()
@seismicisolation
@seismicisolation
Fig. 4.1: Number of samples vs accuracy.
TensorFlow and Keras: TensorFlow and its associated high-level API, Keras,
provide a diverse range of tools and methodologies for the construction and
training of deep learning models. These libraries furnish functionalities that enable
the implementation of regularization techniques, dropout, and early stopping, all
of which serve to effectively address the bias-variance trade-off.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
@seismicisolation
@seismicisolation
X, y = iris.data, iris.target
whereJ(θ) is the cost function, hθ (x ( i ) ) is the predicted value for the ith
example, y ( i ) is the true value, and m is the number of training examples.
In the context of regularization in the L2 norm, which is alternatively referred
to as “Ridge regularization,” an additional term is incorporated into the cost
function that is directly proportional to the square of the magnitude of the
coefficients of the model:
@seismicisolation
@seismicisolation
n
JL2(θ)= J(θ)+λ ∑ θ 2j
j=1
We fit a linear regression model to this dataset using both L1 and L2 regularization
with λ = 0.1. After training the models, we examine the values of the coefficients θ0
and θ1 .
With L2 regularization, the resulting coefficients might be:
θ0 = 0.5
θ1 = 0.9
With L1 regularization, the resulting coefficients might be:
θ0 = 0.7
θ1 = 0.8
@seismicisolation
@seismicisolation
In this specific case, it is evident that L2 regularization demonstrates a tendency to
gradually decrease the coefficients toward zero to a greater extent when
compared to L1 regularization. Conversely, L1 regularization illustrates a
propensity to reduce the coefficients completely to zero, thereby effectively
performing feature selection by eliminating irrelevant features. This occurrence
can be attributed to the fact that the L1 penalty possesses the property of
generating sparse solutions, which can be beneficial in situations where feature
sparsity is desirable.
In summary, L1 and L2 regularization are methods used to prevent overfitting
in machine learning models by penalizing the coefficients with high values. L2
regularization penalizes the squared magnitude of the coefficients, while L1
regularization penalizes the absolute magnitude. It is crucial to understand the
differences between these regularization methods to choose the most suitable
technique and effectively control model complexity.
Several libraries in Python offer implementations of L1 and L2 regularization
methods, which are frequently employed in machine learning to mitigate
overfitting and enhance the generalization capability of models. scikit-learn and
TensorFlow/Keras are two commonly utilized libraries for this purpose. In the
subsequent section, I will elucidate the usage of these libraries in performing L1
and L2 regularization, illustrated through an example employing the Iris dataset.
scikit-learn: Scikit-learn provides linear models that support L1 and L2
regularization. These models include LogisticRegression, LinearRegression, and
Ridge, among others.
@seismicisolation
@seismicisolation
# Evaluate model
accuracy = model.score(X_test_scaled, y_test)
print("Accuracy:", accuracy)
Accuracy: 0.9666666666666667
TensorFlow/Keras: TensorFlow and its high-level API Keras provide an
assortment of regularizers that can be implemented in the layers of a neural
network. These regularizers encompass l1, l2, and l1_l2.
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_st
# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Define a simple neural network model with L2 regularization
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=tf.keras.regul
tf.keras.layers.Dense(3, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['
# Train the model
model.fit(X_train_scaled, y_train, epochs=50, batch_size=32, validation_data=(X_te
# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test)
print("Test Accuracy:", test_accuracy)
@seismicisolation
@seismicisolation
TensorFlow/Keras layers, one can effectively regulate the degree of regularization
that is implemented.
@seismicisolation
@seismicisolation
repeated multiple times, and the average performance across the subsets is
calculated to obtain a reliable estimate of the model’s performance.
In summary, the assessment criteria play a pivotal role in the assessment of
the effectiveness of machine learning models in various tasks and datasets.
Through careful selection of suitable assessment criteria and utilization of
methods like cross-validation, experts can gain a valuable understanding of model
performance and make informed choices to improve model accuracy and
generalizability.
Recall, also referred to as sensitivity or true positive rate, quantifies the ratio of
accurate positive forecasts in relation to the entirety of genuine positive
occurrences, and its computation can be accomplished by:
TP
Recall = TP+FN
The F1 score, known as the harmonic mean of precision and recall, offers an
@seismicisolation
@seismicisolation
equilibrium between these two metrics and proves particularly advantageous
when dealing with imbalanced datasets. One can compute it using the subsequent
formula:
Precision∗ Recall
F1 = 2∗ Precision+Recall
Using this matrix of confusion, we can derive the accuracy, precision, recall, and F1
score of the model.
@seismicisolation
@seismicisolation
scikit-learn: Scikit-learn, a machine learning library extensively employed in
Python, furnishes a broad range of tools for the purpose of classification analysis.
Within this library, one can find functions that enable the computation of diverse
classification metrics, encompassing accuracy, precision, recall, F1 score, and area
under the ROC curve (ROC AUC).
# Compute accuracy
accuracy = accuracy_score(y_true, y_pred)
# Compute precision
precision = precision_score(y_true, y_pred)
# Compute recall
recall = recall_score(y_true, y_pred)
# Compute F1 score
f1 = f1_score(y_true, y_pred)
TensorFlow and PyTorch: While TensorFlow and PyTorch are primarily deep
learning libraries, they also offer functionality for computing classification metrics.
These libraries are particularly useful when working with neural network models.
import tensorflow as tf
# Compute accuracy
accuracy = tf.keras.metrics.Accuracy()
accuracy.update_state(y_true, y_pred)
accuracy_result = accuracy.result().numpy()
# Compute precision, recall, and F1 score
precision, recall, f1 = tf.keras.metrics.Precision(), tf.keras.metrics.Recall(), t
precision.update_state(y_true, y_pred)
recall.update_state(y_true, y_pred)
f1.update_state(y_true, y_pred)
precision_result = precision.result().numpy()
recall_result = recall.result().numpy()
f1_result = f1.result().numpy()
@seismicisolation
@seismicisolation
Pandas and NumPy: Pandas and NumPy constitute essential libraries utilized for
the manipulation of data and numerical computations within the Python
programming language. Despite their lack of dedicated functions for the
computation of classification metrics, they are frequently employed in tandem with
other libraries to preprocess data and manually derive metrics.
import numpy as np
import pandas as pd
# Compute accuracy
accuracy = np.mean(y_true == y_pred)
# Compute precision
true_positives = np.sum((y_true == 1) & (y_pred == 1))
false_positives = np.sum((y_true == 0) & (y_pred == 1))
precision = true_positives / (true_positives + false_positives)
# Compute recall
false_negatives = np.sum((y_true == 1) & (y_pred == 0))
recall = true_positives / (true_positives + false_negatives)
# Compute F1 score
f1 = 2 ✶ (precision ✶ recall) / (precision + recall)
where n is the number of instances, yi is the true value, and ŷ i is the predicted
value for the ith instance.
MAE is another metric that is frequently employed. It quantifies the average
@seismicisolation
@seismicisolation
absolute disparity between the predicted and actual values. The calculation
involves determining the absolute difference:
n
1
MAE = n ∑|yi − ŷ i |
i=1
RMSE = √MSE
These metrics provide different perspectives on model performance. MSE and
RMSE penalize large errors more heavily, making them sensitive to outliers. MAE,
on the other hand, treats all errors equally and is more robust to outliers.
Let’s consider a numerical example to illustrate these metrics. Suppose we
have a dataset with five instances:
(x1 , y1 )=(1, 3)
(x2 , y2 )=(2, 5)
(x3 , y3 )=(3, 7)
(x4 , y4 )=(4, 9)
(x5 , y5 )=(5, 11)
A regression model is utilized to make predictions on y by taking x as input. The
following predictions are made by the model:
ŷ 1 = 2
ŷ 2 = 4
ŷ 3 = 6
ŷ 4 = 8
ŷ 5 = 10
Using these estimations, it is feasible to compute the MSE, MAE, and RMSE of the
model:
5
MSE = 1
5
∑ (yi − ŷ i )2 = 1
5
×(02 + 12 + 12 + 12 + 12 )= 0.8
i=1
@seismicisolation
@seismicisolation
5
1 1
MAE = 5
∑[yi − ŷ i ]= 5
×(0 + 1 + 1 + 1 + 1)= 0.8
i=1
TensorFlow and PyTorch: While TensorFlow and PyTorch are primarily deep
learning libraries, they also offer functionality for computing regression metrics.
These libraries are particularly useful when working with neural network models.
@seismicisolation
@seismicisolation
import torch
import torch.nn as nn
Pandas and NumPy: Pandas and NumPy are essential libraries utilized to
manipulate data and execute numerical computations within the Python
programming language. Although these libraries do not offer dedicated
functionalities for the calculation of regression metrics, they are frequently
employed alongside other libraries to preprocess data and manually compute
metrics.
import numpy as np
import pandas as pd
4.4 Cross-Validation
Cross-validation is an essential procedure utilized in the domain of machine
learning to evaluate the effectiveness of a model and deduce its capacity to
generalize to unseen data. This technique involves dividing the dataset into several
subsets, known as folds, where the model is trained on a portion of the data and
assessed on the remaining portion. This iterative process is repeated multiple
times, with each fold serving as the validation set precisely once. The application of
@seismicisolation
@seismicisolation
cross-validation effectively tackles the issue of an excessively optimistic or
pessimistic model performance resulting from stochastic fluctuations in the
training and testing data partitions.
Among the various methodologies for cross-validation, one of the most
prevalent approaches is commonly referred to as “k-fold cross-validation.” This
particular method entails dividing the dataset into k folds, with each fold being of
equal size. The model is then trained on k-1 folds and evaluated on the remaining
fold. This process is repeated k times, with each fold being used as the validation
set once. Typically, the final performance metric is obtained by calculating the
average of the performance metrics across all the folds. This k-fold cross-validation
methodology is highly reliable in estimating the performance of the model and
effectively addresses the variability introduced by random data partitions.
Another form of cross-validation is known as “leave-one-out cross-validation”
(LOOCV), in which each instance in the dataset is used as a separate validation set,
and the model is trained on the remaining instances. LOOCV offers significant
benefits for datasets with limited size, as it provides a more accurate assessment
of model performance. Nonetheless, this approach can be computationally
intensive for datasets with a large number of instances.
Stratified k-fold cross-validation is an adaptation of the k-fold cross-validation
approach that ensures that the class distribution of the original dataset is
preserved in each fold. This is especially important when working with imbalanced
datasets, where one class may have a lower representation. The utilization of
stratified k-fold cross-validation guarantees that each class is proportionally
represented in both the training and validation sets, leading to more reliable
performance estimations.
Cross-validation is of utmost importance in the process of selecting models
and tuning hyperparameters, as it enables the discovery of the most effective
model structure and optimal values for hyperparameters. Through the comparison
of model performance across various folds, practitioners are able to make well-
informed choices and prevent overfitting to the training data.
Cross-validation may present a potential limitation in terms of computational
cost, especially when confronted with vast datasets or intricate models. However,
the advantages gained from acquiring a dependable evaluation of model
performance frequently surpass the computational burden. Furthermore, the
utilization of methods like parallelization and optimization can aid in mitigating the
computational overhead linked with cross-validation.
Cross-validation is a crucial technique in the field of machine learning, used to
estimate the performance of models and choose the most efficient one. By
reducing the impact of random data partitions and offering reliable estimates of
model generalization, cross-validation empowers practitioners to make informed
decisions and build models that effectively generalize to new data.
@seismicisolation
@seismicisolation
4.4.1 k-Fold Cross-Validation
@seismicisolation
@seismicisolation
In the Python programming language, numerous libraries are available that
facilitate the implementation of k-fold cross-validation. These libraries are widely
used for evaluating the performance of machine learning models. Here are some
popular Python libraries typically employed for k-fold cross-validation:
scikit-learn: Scikit-learn is an influential Python library for machine learning
that offers extensive resources for data analysis and modeling. Within the sklearn
model_selection module, it incorporates a flexible KFold class that enables the
partitioning of the dataset into k folds, thereby facilitating cross-validation.
import numpy as np
from sklearn.model_selection import KFold
import torch
@seismicisolation
@seismicisolation
y_train, y_test = y[train_index], y[test_index]
# Train and evaluate model
Pandas and NumPy: Pandas and NumPy are essential libraries in Python for the
manipulation of data and the computation of numerical values. Despite their lack
of inherent cross-validation capabilities, these libraries are frequently employed in
tandem with other libraries to preprocess data and facilitate cross-validation.
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold
LOOCV and stratified k-fold cross-validation are two extensively utilized techniques
for assessing the effectiveness of machine learning models.
LOOCV entails dividing the dataset into n folds, with n denoting the number of
instances in the dataset. In each iteration, one instance is excluded and used as
the validation set, while the model is trained on the remaining n-1 instances. This
process is repeated n times, with each instance serving as the validation set once.
LOOCV provides a precise evaluation of model performance, although it can be
computationally demanding, especially for large datasets.
Let us illustrate the LOOCV methodology using a numerical example. We can
assume that we have a dataset comprising 100 instances. In the first iteration, the
model is trained on instances 2 to 100, and its performance is evaluated on
instance 1. Moving on to the second iteration, the model is trained on instances 1
and 3 to 100, with the performance assessed on instance 2. This process is
repeated for all instances, leading to the computation of a performance metric
(such as accuracy) for each iteration. The final estimation of the model’s
performance is obtained by averaging the performance metrics across all
instances.
Stratified k-fold cross-validation, a variant of K-fold cross-validation, ensures
@seismicisolation
@seismicisolation
that each fold in the cross-validation process maintains the same class distribution
as the original dataset. This is of particular importance when dealing with
imbalanced datasets, where one class may be disproportionately represented. The
stratified approach guarantees that each class is proportionately represented in
both the training and validation sets, thus yielding more reliable performance
estimates.
Let us expound upon the concept of stratified k-fold cross-validation through
the use of a numerical illustration. Assume we are faced with a binary classification
problem, which consists of a total of 100 instances. Out of these instances, 80
belong to class 0 while the remaining 20 belong to class 1. Our objective is to apply
stratified k-fold cross-validation with a value of k equal to 5. By dividing the dataset
into 5 folds, we ensure that each fold maintains the original dataset’s class
distribution. This guarantees that each fold contains a proportionate
representation of both classes, resulting in more dependable performance
evaluations.
LOOCV and stratified k-fold cross-validation are two powerful methodologies
utilized for assessing the effectiveness of machine learning models. LOOCV
provides an accurate evaluation of model performance, although it comes at the
expense of computational overhead. On the other hand, stratified K-fold cross-
validation ensures that each fold preserves the class distribution of the original
dataset, leading to more reliable performance estimates, particularly when dealing
with imbalanced datasets.
LOOCV and stratified k-fold cross-validation are commonly used
methodologies for evaluating the effectiveness of machine learning models.
Several Python libraries are available to facilitate the implementation of these
methodologies.
scikit-learn: Scikit-learn is a widely recognized Python machine learning
library renowned for providing effective tools for the analysis and modeling of
data. It encompasses a wide range of functions and classes that facilitate the
execution of various cross-validation techniques, including but not limited to
LOOCV and stratified k-fold cross-validation.
@seismicisolation
@seismicisolation
distribution of classes within the dataset.
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
@seismicisolation
@seismicisolation
may be required.
import numpy as np
from sklearn.model_selection import StratifiedKFold
import torch
import torch.nn as nn
import torch.optim as optim
@seismicisolation
@seismicisolation
# Train the model
for epoch in range(100):
optimizer.zero_grad()
outputs = model(X_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
mean_accuracy_skf = np.mean(accuracies_skf)
print("Mean accuracy with Stratified K-Fold (PyTorch):", mean_accuracy_skf)
These libraries provide effective and reliable resources for implementing cross-
validation methods, enabling professionals to accurately assess the performance
of models and make well-informed choices regarding model selection and fine-
tuning of hyperparameters.
Summary
The introduction pertains to fundamental concepts such as the dichotomy
between supervised and unsupervised learning, the issue of overfitting, the
concept of regularization, the assessment of evaluation metrics, and the
utilization of cross-validation.
Supervised learning entails the utilization of labeled data to train, whereas
unsupervised learning revolves around the handling of unlabeled data.
Supervised learning is concerned with making predictions based on input–
output pairs, whereas unsupervised learning is primarily focused on the
identification of patterns or structures within the data.
Classification is a form of learning that involves the prediction of discrete
class labels, while regression entails the prediction of continuous numeric
values.
For instance, classification can be applied to the prediction of species of iris
flowers, while regression can be utilized to forecast house prices.
Clustering is a procedure that involves the categorization of similar data
points based on their characteristics, while association is concerned with the
@seismicisolation
@seismicisolation
identification of patterns or relationships among variables. For example,
clustering can be utilized to group customers based on their purchasing
behavior, while association can be employed to identify frequent itemsets in
market basket analysis.
Overfitting occurs when a model becomes excessively fine-tuned to the
training data and as a result, performs inadequately when presented with
unseen data. To prevent overfitting, regularization techniques such as L1 and
L2 are implemented to penalize large parameter values.
Diverse evaluation metrics, including accuracy, precision, recall, F1 score,
MSE, MAE, and R^2 are used to assess the performance of models.
Cross-validation techniques such as k-Fold, Leave-One-Out, and stratified k-
fold are employed to estimate the performance of models on unseen data.
Python libraries such as scikit-learn, TensorFlow, PyTorch, and Yellowbrick
provide a range of tools for implementing machine learning algorithms and
techniques.
Code examples serve to demonstrate the loading of datasets, the
preprocessing of data, the training of models, and the evaluation of
performance using appropriate metrics and cross-validation techniques.
Exercise (MCQs)
1.
C) Clustering
D) Dimensionality reduction
2.
@seismicisolation
@seismicisolation
D) Supervised learning is used for classification, while unsupervised learning is
used for regression.
3.
C) Clustering
D)
Association
4.
Which technique involves grouping similar data points together based on their
features?
A) Clustering
B) Association
C) Classification
D) Regression
5.
@seismicisolation
@seismicisolation
C) Recall
D) F1 score
7.
Which cross-validation technique involves dividing the dataset into k folds and
using each fold as a test set exactly once?
A) k-Fold cross-validation
B) Leave-one-out cross-validation
C)
Stratified k-fold cross-validation
D) Random split cross-validation
8.
C) Matplotlib
D) Pandas
10.
@seismicisolation
@seismicisolation
B) L2 regularization
C) Recall
D) F1 score
12.
C) R-squared (R^2)
Which visualization technique helps analyze the relationship between training size
and model performance?
@seismicisolation
@seismicisolation
A) Learning curves
B) Validation curves
C) Residual plots
D) Confusion matrices
15.
Which library offers high-level APIs for building and training deep learning
models?
A)
Matplotlib
B) TensorFlow
C) Scikit-learn
D) PyTorch
16.
Answers
1. A) Regularization
2. A) Supervised learning involves labeled data, while unsupervised learning
involves unlabeled data.
3. B) Regression
4. A) Clustering
5. A) Large parameter values
6. D) F1 score
7. A) k-Fold cross-validation
8. C) Underfitting and overfitting
@seismicisolation
@seismicisolation
9. B) TensorFlow
10. A) L1 regularization
11. A) Accuracy
12. A) Mean squared error (MSE)
13. B) Leave-one-out cross-validation
14. A) Learning curves
15. D) PyTorch
16. C) Principal component analysis (PCA)
Answers
1. memorize
2. labeled, unlabeled
@seismicisolation
@seismicisolation
3. discrete, continuous
4. similar, patterns or relationships
5. overfitting
6. precision, recall
7. k
8. complexity, variance
9. neural network
10. absolute value, square
11. true positive, all positive
12. Mean Squared Error, errors
13. TensorFlow
14. scikit-learn
15. dimensionality
Descriptive Questions
1. Explain the concept of overfitting in machine learning and provide strategies
to mitigate it.
2. Compare and contrast supervised and unsupervised learning techniques,
providing examples of each.
3. Describe the difference between classification and regression tasks in
machine learning, providing real-world examples.
4. Discuss the role of regularization techniques such as L1 and L2 in preventing
overfitting, and explain how they work.
5. What are evaluation metrics in machine learning, and why are they
important? Provide examples of commonly used metrics.
6. Explain the concept of cross-validation and its importance in evaluating
machine learning models. Provide examples of cross-validation techniques.
7. Discuss the bias-variance trade-off in machine learning and how it impacts
model performance. Provide examples to illustrate.
8. Describe the role of Python libraries such as scikit-learn, TensorFlow, and
PyTorch in implementing machine learning algorithms.
9. What are some common techniques used for feature selection and
dimensionality reduction in machine learning? Explain each briefly.
10. Discuss the advantages and disadvantages of different types of clustering
algorithms in unsupervised learning.
11. Given a dataset with 100 samples and 10 features, how many data points
would be in each fold in 5-fold cross-validation?
12. Calculate the mean squared error (MSE) for a regression model with actual
values [5, 10, 15] and predicted values [4, 11, 16].
13. Suppose a classification model correctly predicts 80 out of 100 positive cases
@seismicisolation
@seismicisolation
and 90 out of 100 negative cases. Calculate the accuracy, precision, recall,
and F1 score.
14. Implement k-fold cross-validation with k = 3 on a dataset using Python and
scikit-learn, and evaluate a logistic regression model.
15. Use L1 regularization with a logistic regression model to classify samples in
the Iris dataset, and tune the regularization strength parameter to optimize
model performance.
16. Compute the mean and standard deviation of a feature in the Iris dataset
using NumPy.
@seismicisolation
@seismicisolation
Chapter 5 Classic Machine Learning
Algorithms
@seismicisolation
@seismicisolation
Naive Bayes is a classifier that uses Bayes’ theorem and assumes strong
independence between features. Despite its simplicity, it performs well in practice
and is widely used for tasks like text classification and spam filtering.
Ensemble methods, such as Boosting and Bagging, combine multiple base
learners to improve predictive performance. AdaBoost trains weak learners
sequentially, with a focus on difficult instances to classify. Gradient Boosting
constructs models step by step, optimizing for errors made by previous models.
@seismicisolation
@seismicisolation
the realms of economics, finance, social sciences, engineering, and natural
sciences, to undertake a multitude of tasks, including prognosticating forthcoming
results, comprehending associations amid variables, and assessing the impacts of
interventions or treatments. Notwithstanding its uncomplicated nature, linear
regression persists as one of the most potent and comprehensible instruments
within the statistical arsenal.
@seismicisolation
@seismicisolation
Step 1: Data Collection
To see how study hours and exam scores relate to one another, first plot the data
points on a scatter plot.
hours_studied = [2, 3, 4, 5, 6]
exam_scores = [65, 70, 75, 80, 85]
plt.scatter(hours_studied, exam_scores)
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.title('Relationship Between Hours Studied and Exam Score')
plt.show()
@seismicisolation
@seismicisolation
Fig. 5.1: Relationship between Hours Studies and Exam Score.
Fig. 5.1 presents a scatter plot of the relationship between the number of hours of
study and the corresponding test scores before visually applying simple linear
regression. The data points in the graph show the distribution of these two
variables, allowing a preliminary examination of the potential relationship between
study time and academic achievement.
Now, use the simple linear regression model to fit a line to the data.
The formula we’ll use is y = mx + b.b
where:
@seismicisolation
@seismicisolation
y is the expected exam score.
n ( ∑ xy ) − ( ∑ x ) ( ∑ y )
m=
n(∑ x2 )− ( ∑ x ) 2
∑ y−m ( ∑ x )
b= n
where:
n=5
∑ xy = 1550
∑ x = 20
∑ y = 375
∑ x2 = 90
5 ∗ 1550−20 ∗ 375
m=
5 ∗ 90−202
7750−7500
m= 450−400
250
m= 50
m=5
375−5 ∗ 20
b= 5
375−100
b= 5
275
b= 5
b = 55
@seismicisolation
@seismicisolation
After calculating, the slope m = 5 and y-intercept b = 55.
The best fit line fitted on the data is:
Fig. 5.2 shows a linear regression best fit line superimposed on a scatter plot of
test scores versus hours of practice. A line of best fit is a linear model calculated to
best represent the relationship between two variables, minimizing the deviation
between the actual data points and the predicted values on the line.
Make predictions for new data points using the equation of line. For instance, we
@seismicisolation
@seismicisolation
can forecast a student’s exam score as follows if they study for seven hours:
y =(5 × 7)+55 = 90
So, the predicted exam score for a student who studies 7 h is 85.
In order to determine how well our model matches the data, we can finally analyze
its performance using measures like Mean Squared Error (MSE) or R2.
Python Code to evaluate the model:
import numpy as np
# Given data
hours_studied = np.array([2, 3, 4, 5, 6])
exam_scores = np.array([65, 70, 75, 80, 85])
@seismicisolation
@seismicisolation
R^2: 1.0
Thus, the R^2 value is 1.00 and the MSE is 0.0. These measurements show how well
the model matches the data, with the number of hours studied accounting for
almost 100% of the variation in exam scores.
Let’s look at another scenario in which we wish to estimate a house’s cost
depending on its square footage. We will use the scikit-learn toolkit and Python to
do simple linear regression.
sklearn.linear_model
This module in scikit-learn contains various classes for linear models, including
regression models.
LinearRegression class
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Example data: House size (in square feet) and corresponding prices
house_sizes = np.array([800, 1000, 1200, 1500, 1800]).reshape(-1, 1) # Reshape to
house_prices = np.array([100000, 150000, 180000, 210000, 250000])
@seismicisolation
@seismicisolation
model = LinearRegression()
model.fit(house_sizes, house_prices)
→Fig. 5.3 shows an example of using linear regression to predict house prices
based on property size. The Scatter plot shows the relationship between the
square footage or total living area of the homes and their corresponding sale
prices. The line of best fit, estimated using linear regression techniques, is
superimposed on the data points, representing a linear model that aims to capture
the relationship between household size and the objective variable of house price.
@seismicisolation
@seismicisolation
Fig. 5.3: Sample linear regression: house price prediction.
@seismicisolation
@seismicisolation
Methods like ordinary least squares (OLS), which minimize the sum of squared
discrepancies between actual and predicted values, are used to estimate the
coefficients. Metrics like MSE and R2 are used to assess the model’s performance
and determine its goodness of fit.
Multiple linear regression enables the modeling of more intricate relationships
and interactions among multiple predictors, making it a versatile tool in diverse
fields such as finance, economics, engineering, and social sciences. However, it
assumes linearity, independence of predictors, constant variance of errors, and
normally distributed residuals, all of which should be examined before
interpretation.
A multiple linear regression model’s coefficients are interpreted by evaluating
each independent variable’s effect on the dependent variable while holding the
other variables constant. Assuming that all other variables stay constant, the
coefficients show how the dependent variable changes for every unit change in the
corresponding independent variable.
Multiple linear regression can be performed using various methods, each with
its advantages and disadvantages.
A popular strategy for fitting a multiple linear regression model to a given dataset
is ordinary least squares (OLS). Finding the regression model’s coefficients, or
weights, is its main goal in order to reduce the sum of squared differences
between the dependent variable’s observed and predicted values. The OLS
method is computationally efficient and produces objective estimations of the
coefficients.
The most common method used to fit a multiple linear regression model is
ordinary least squares.
Its goal is to reduce the total sum of squared differences between the
dependent variable’s expected and observed values.
OLS estimates the coefficients (weights) for the independent variables by
determining the values that minimize the residual sum of squares (RSS).
This approach yields unbiased coefficient estimates and is computationally
efficient and simple to apply.
To execute OLS in Python, the statsmodels library can be employed, which offers a
convenient interface for fitting statistical models, including linear regression.
Below is an outline of the steps involved in performing OLS using statsmodels:
import numpy as np
@seismicisolation
@seismicisolation
import pandas as pd
import statsmodels.api as sm
# Example dataset
data = {
'X1': [1, 2, 3, 4, 5],
'X2': [2, 3, 4, 5, 6],
'Y': [2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)
→Fig. 5.4 provides a visual representation of the ordinary least squares (OLS)
method, which is a fundamental technique employed in linear regression analysis.
@seismicisolation
@seismicisolation
Fig. 5.4: The ordinary least squares (OLS) method.
One popular optimization approach that seeks to reduce the cost function –
such as the Mean Squared Error – is gradient descent. The coefficients are
adjusted iteratively to accomplish this.
@seismicisolation
@seismicisolation
Gradient fall, when used in multiple linear regression, updates the
coefficients by advancing in the direction of the cost function’s steepest fall.
Gradient Descent can be used by processing the data in batches or mini-
batches when faced with massive datasets that are too big to store into
memory.
It is noteworthy, therefore, that Gradient Descent might not converge to the
intended global minimum, but rather to a local minimum. Thus, it becomes
vital to fine-tune hyperparameters like learning rate.
import numpy as np
import matplotlib.pyplot as plt
# Gradient Descent
for iteration in range(n_iterations):
gradients = 2/m ✶ X_b.T.dot(X_b.dot(theta) - y)
theta -= eta ✶ gradients
@seismicisolation
@seismicisolation
y_predict = X_new_b.dot(theta)
plt.plot(X_new, y_predict, color='red', label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Gradient Descent Linear Regression')
plt.legend()
plt.show()
Create a sample set of data called X and Y, where X represents a feature and
Y the desired variable.
np.c_[np.ones((100, 1)), X] to add an intercept term to the independent
variables.
Establish the settings for Gradient Descent, including the number of data
points (m), the number of iterations (n_iterations), and the learning rate
(eta).
Randomly initialize the coefficients theta.
Update the coefficients theta using the gradient of the cost function with
respect to the coefficients and perform Gradient Descent for a
predetermined number of iterations.
Use Matplotlib to plot the regression line and the data points.
Print the regression line’s final coefficients at the end.
Program output is a plot with data points and the regression line fitted with
gradient descent. Furthermore, the final regression line coefficients, which reflect
the slope and intercept, are printed.
@seismicisolation
@seismicisolation
Fig. 5.5: Gradient descent linear regression.
This method involves solving the normal equation (XTX)−1XTy, where X is the
matrix of independent variables, y is the vector of the dependent variable,
and (XTX)−1 is the inverse of the matrix XTX.
The multiple linear regression model’s coefficients can be solved using
matrix inversion in the form of an algebraic statement.
This operation can be resource-intensive when applied to extensive datasets,
owing to the requirement of calculating the inverse of a matrix. This is
particularly true when the matrix is not well conditioned.
Nevertheless, matrix inversion ensures precise solutions without the
necessity for iterative optimization, rendering it advantageous for datasets of
smaller proportions.
@seismicisolation
@seismicisolation
import numpy as np
import matplotlib.pyplot as plt
plt.xlabel('X')
plt.ylabel('y')
plt.title('Multiple Linear Regression with Matrix Inversion Method')
plt.legend()
plt.show()
Selection of Method
@seismicisolation
@seismicisolation
Matrix Inversion is a suitable technique for datasets ranging from small to
moderately sized, provided that computational resources are sufficient.
@seismicisolation
@seismicisolation
Fig. 5.6 presents a visual representation of a multi-linear regression model, which
involves more than one independent variable, fitted using the Matrix Inversion
method.
Working Principle
For instance, consider a scenario where we have data on the relationship between
the temperature (x) and the rate of ice cream sales (y). Simple linear regression
may not capture the non-linear relationship adequately. In such cases, polynomial
regression, such as quadratic or cubic regression, can be used to better model the
curvature in the relationship, potentially improving predictive accuracy.
Applications
With Python, we can utilize libraries like NumPy and scikit-learn to conduct
polynomial regression. NumPy will be utilized for numerical operations, and scikit-
learn offers useful methods for regression modeling and polynomial features.
We’ll visualize the data using Matplotlib.
@seismicisolation
@seismicisolation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Polynomial features
poly_features = PolynomialFeatures(degree=2) # Quadratic regression
X_poly = poly_features.fit_transform(X)
# Predictions
y_pred = poly_reg.predict(X_poly)
# Print coefficients
print("Intercept:", poly_reg.intercept_)
print("Coefficients:", poly_reg.coef_)
@seismicisolation
@seismicisolation
Use PolynomialFeatures from scikit-learn to generate polynomial features up
to degree 2 (quadratic regression).
Utilizing the polynomial characteristics, fit a linear regression model.
Based on the fitted model, make predictions.
Matplotlib can be used to plot the regression line and the data points.
The polynomial regression model’s coefficients should then be printed.
@seismicisolation
@seismicisolation
coefficients associated with the polynomial regression model, encompassing both
the intercept and the coefficients pertaining to the polynomial features.
Working Principle
@seismicisolation
@seismicisolation
Advantages
Limitations
Applications
Example
@seismicisolation
@seismicisolation
and y, alongside a binary target variable y which signifies whether a student
triumphs (1) or falters (0) an examination. Here is a condensed rendition of
our dataset:
x y Pass/Fail
2.5 3.5 1
3.0 4.0 1
2.0 3.0 0
2.5 3.0 1
3.5 4.0 0
Example
@seismicisolation
@seismicisolation
regression model. The model is characterized by the coefficients θ0 = − 1, θ1 = 2,
and θ2 = 3.
For a new student with x = 2.8 and y = 3.7, we compute:
P (y = 1 ∣∣x) = 1
1+e−8.4
≈ 0.9997
So, the logistic regression model predicts with high probability that the new
student will pass the exam.
Working Principle
@seismicisolation
@seismicisolation
to measure the model’s capacity to make accurate predictions of the true
class labels on data that has not been previously observed.
Prediction: After the completion of the training and evaluation process, the
model can be employed for the purpose of making predictions on fresh data
instances. This is achieved by classifying each instance into one of the two
classes, which is based on the feature values associated with that particular
instance.
Evaluation Metrics
Applications
Example
@seismicisolation
@seismicisolation
including transaction amount, time of transaction, and whether the transaction is
fraudulent (1) or legitimate (0). By training a binary classification model on this
dataset, we can predict whether future transactions are likely to be fraudulent,
enabling proactive measures to prevent fraudulent activities.
Let’s consider a Python code to perform binary classification using logistic
regression, along with plots and evaluation metrics. We’ll use the famous Iris
dataset available in scikit-learn, where we’ll classify whether a given iris flower is of
the “setosa” species or not.
Load the Iris dataset and retrieve solely the characteristics and target labels
linked to the “setosa” species.
Divide the dataset into two sets for the purpose of training and testing.
Employ a logistic regression model to train the data for classification.
Generate predictions using the test data and assess the model’s
performance by considering accuracy, precision, recall, F1-score, and the
confusion matrix.
Represent the data points together with the decision boundary in a visual
manner to facilitate classification visualization.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_scor
@seismicisolation
@seismicisolation
y_pred = model.predict(X_test)
@seismicisolation
@seismicisolation
Fig. 5.8: Binary classification (Setosa vs. Not Setosa).
@seismicisolation
@seismicisolation
The F1-score is a measure of balance between precision and recall,
calculated as the harmonic mean of the two metrics. The confusion matrix
displays the frequencies of true positive, true negative, false positive, and
false negative predictions.
Moreover, the code generates a visual representation of the decision
boundary in conjunction with the test data points, which aids in
understanding the classification. Data points classified as “setosa” are
depicted in one color, while those classified as “not setosa” are represented
in a different color.
Example
Working Principle
@seismicisolation
@seismicisolation
dataset, where it learns the underlying patterns and relationships between
the features and class labels. Throughout the training process, the algorithm
adjusts its internal parameters to minimize a predefined loss or error
function.
Model evaluation: The performance of the trained model is assessed using
various metrics, including accuracy, precision, recall, F1-score, and confusion
matrix. These metrics provide an evaluation of how effectively the model
predicts the true class labels for unseen data.
Prediction: Once the model has been trained and evaluated, it can be
utilized to make predictions on new data instances. This involves assigning
each instance to one of the multiple classes based on its feature values.
Evaluation Metrics
Applications
@seismicisolation
@seismicisolation
scikit-learn. Our objective is to categorize iris flowers into one of three species,
namely Setosa, Versicolor, or Virginica. To accomplish this objective, we will employ
a basic logistic regression model.
Load the dataset of Iris and extract the features and target labels.
Divide the dataset into sets for training and testing.
Educate a model of logistic regression on the data meant for training.
Formulate predictions on the data meant for testing.
Assess the model by means of accuracy, precision, recall, F1-score, and the
matrix of confusion.
Create a visualization of the performance of the model by means of a plot of
the matrix of confusion.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_scor
@seismicisolation
@seismicisolation
conf_matrix = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
print("Confusion Matrix:\n", conf_matrix)
@seismicisolation
@seismicisolation
Fig. 5.9: Multiclass classification.
Working Principle
@seismicisolation
@seismicisolation
instances into their corresponding classes.
Regularization term: Regularization incorporates a penalty component into
the loss function, which in turn discourages the existence of coefficients with
large values. When it comes to logistic regression, the two prevalent
methods of regularization are L1 regularization (also known as Lasso) and L2
regularization (frequently referred to as Ridge).
L1 regularization (Lasso): L1 regularization incorporates the absolute
values of the coefficients into the loss function. This mechanism promotes
sparsity in the model by driving certain coefficients to zero, effectively
performing feature selection.
L2 regularization (Ridge): L2 regularization includes the squared values of
the coefficients in the loss function. It penalizes the presence of large
coefficient values, leading to their reduction and preventing overfitting.
Regularization parameter (λ): The strength of regularization is determined
by a hyperparameter denoted as λ (lambda). Higher values of λ correspond
to more intense regularization, resulting in smaller coefficient values and
simpler models.
Example
N p
Loss = − N1 ∑[yi log(yˆi )+(1 − yi )log(1 − yˆi )] + λ ∑ θ2j
i=1 j=1
@seismicisolation
@seismicisolation
Below, an example of Python code is presented, which demonstrates the
implementation of L2 regularization (also known as Ridge) in logistic regression.
This example includes the usage of plots and evaluation metrics. In order to apply
regularization and prevent overfitting, the Iris dataset from scikit-learn is utilized.
Load the Iris dataset and obtain the features as well as target labels.
Partition the dataset into separate training and testing sets.
Employ a logistic regression model with L2 regularization (Ridge) to train the
data.
Generate predictions on the test data.
Assess the model’s performance using accuracy, precision, recall, F1-score,
and the confusion matrix.
Illustrate the impact of regularization on the model’s decision boundary by
plotting it.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_scor
@seismicisolation
@seismicisolation
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
print("Confusion Matrix:\n", conf_matrix)
@seismicisolation
@seismicisolation
Fig. 5.10: Regularization in Logistic Regression (L2).
@seismicisolation
@seismicisolation
negative, false positive, and false negative predictions.
Moreover, the code generates a visual representation that illustrates the
decision boundary alongside the test data points.
The utilization of regularization techniques aids in smoothing the decision
boundary, thereby mitigating the risk of overfitting to the training data and
enhancing the model’s ability to generalize to new data.
Decision Trees
Definition: Decision Trees are hierarchical structures that bear semblance to trees.
Each internal node within these structures represents a decision predicated on a
specific characteristic. Meanwhile, each branch signifies the resulting outcome of
said decision. Finally, each leaf node serves as a representation of the ultimate
decision or prediction.
Working principle: Decision Trees repeatedly divide the feature space by
considering the values of input features. At each node, the algorithm selects the
feature that most effectively separates the data into similar subsets. This process
persists until either all data points are assigned to the same category or a
predetermined stopping criterion is met.
Advantages:
Interpretability: Decision Trees possess a straightforward and comprehensible
nature, rendering them appropriate for elucidating the rationale behind
predictions.
Non-parametric: They refrain from making presumptions about the underlying
distribution of data. Accommodate both numerical and categorical data.
Disadvantages:
Prone to overfitting: Decision Trees possess the inclination to produce
excessively complex models that excessively adhere to the training data, hence
leading to insufficient generalization on unseen data.
Instability: Minor fluctuations in the data can give rise to dissimilar tree
structures, thus rendering them susceptible to noise.
Random Forests
@seismicisolation
@seismicisolation
consists of a group of Decision Trees. Every individual tree in the forest is trained
on a randomly chosen subset of the training data along with a randomized subset
of features. The predictions made by each tree are then combined to formulate the
ultimate prediction.
Working principle: Random Forests combine the predictive power of multiple
Decision Trees to improve generalization performance and mitigate overfitting. In
the training phase, each tree is grown independently using a random subset of the
training data and features. The final prediction is obtained by averaging the
predictions made by all the trees (for regression) or by majority voting (for
classification).
Advantages:
Enhanced generalization: Random Forests alleviate the problem of overfitting
by aggregating the predictions of numerous individual trees.
Robustness: They adeptly handle noisy data and outliers due to the
amalgamation effect. Feature importance: They furnish a metric of feature
importance, affording users the ability to discern the most pertinent features for
prediction.
Disadvantages:
Complexity: Random Forests exhibit greater intricacy compared to individual
Decision Trees, rendering them more arduous to interpret.
Computational cost: Training and predicting with Random Forests can incur
considerable computational expenses, particularly when confronted with
voluminous datasets.
Applications
Decision Trees and Random Forests are extensively utilized in diverse domains,
encompassing but not limited to:
Decision Trees and Random Forests are formidable machine learning algorithms
with their individual merits and demerits. While Decision Trees proffer
transparency and simplicity, Random Forests furnish enhanced generalization
aptitude and resilience via ensemble learning.
@seismicisolation
@seismicisolation
The process of constructing Decision Trees entails iteratively dividing the dataset
according to the input feature values, resulting in the formation of a hierarchical
structure resembling a tree. In this structure, internal nodes signify decisions,
while leaf nodes indicate the ultimate prediction.
Example: Predicting Loan Approval
Suppose we have a dataset of loan applicants containing features such as
income, credit score, and employment status, and the target variable indicates
whether the loan was approved (Yes or No).
The process of constructing decision trees begins with the selection of the
optimal split, which involves identifying the feature that can most effectively
divide the dataset into separate, homogeneous subsets. An example of this
could be the credit score feature, which has the ability to create groups with
the highest level of purity. In other words, each group primarily consists of
instances belonging to a single class label, such as “Yes” or “No” for loan
approval.
Once the most advantageous division has been ascertained, a decision node
is established at the apex of the tree to symbolize the selected attribute and
its associated partition. This acts as a crucial juncture in the process of
making decisions.
After the decision node is established, the dataset is partitioned into
subgroups according to the values of the chosen characteristic. Each
subgroup corresponds to a branch emerging from the decision node,
facilitating additional examination and investigation.
The procedure of recursive division subsequently occurs, in which the
previously mentioned measures are reiterated for each subset. This iterative
strategy persists until one of the termination conditions is satisfied. These
criteria include the following:
All data points within a subset belong to the same class, indicating a
high level of homogeneity.
The maximum depth of the tree has been reached, indicating that
further splits would not yield significant improvements.
The minimum number of data points within a node has been reached,
suggesting that further division would not provide meaningful insights.
Upon the satisfaction of the stopping criteria, the initiation of the
construction of leaf nodes takes place. These leaf nodes encapsulate the
label of the majority class that is found within the subset. As a result, the tree
is empowered to generate precise predictions by leveraging the provided
data.
@seismicisolation
@seismicisolation
executed steps, including the selection of an optimal split, the creation of decision
nodes and leaf nodes, and the iterative process of recursive splitting. These steps
ultimately result in the creation of a powerful and interpretable model for making
data-driven decisions.
→Fig. 5.11 shows the decision tree for credit card approval.
In this example, the decision tree splits the dataset based on the credit score
feature. If an applicant’s credit score is 700 or higher, they are approved for the
loan; otherwise, they are denied.
Let us examine a more intricate numerical illustration of constructing a
decision tree to anticipate whether consumers will procure a product, relying on
demographic and behavioral attributes.
Example: Predicting Purchase Decision
Suppose we have a dataset of customers containing the following features:
Age (numeric)
Gender (categorical: Male, Female)
Income (numeric)
Website Visit Duration (numeric)
Product Reviews (numeric)
@seismicisolation
@seismicisolation
And the target variable indicates whether the customer made a purchase (Yes or
No).
Selecting the Optimal Division: Our initial step involves the selection of the
characteristic that yields the most homogeneous subsets within the dataset.
As an illustration, we may discover that segmenting the data based on age
leads to subsets exhibiting the highest degree of purity.
Generation of Decision Nodes: At the apex of the tree, we establish a
decision node that represents the chosen characteristic and its division point.
For example, if the optimal division is age < 30, we generate a decision node
labeled “Age < 30?”
Subdivision of the Data: The dataset is partitioned into subgroups based on
the values of the chosen attribute. Each subgroup represents a branch
originating from the decision node.
Recursive Division: We perform the aforementioned process recursively for
each subset until one of the specified stopping conditions is fulfilled:
All data points within a subset pertain to the same class (homogeneous).
The maximum depth of the tree has been reached.
The minimum number of data points within a node has been reached.
No significant enhancement in the reduction of impurity is observed.
Generation of Terminal Nodes: Upon fulfillment of the stopping conditions,
we generate terminal nodes that contain the majority class label found
within the respective subset.
→Fig. 5.12 depicts a decision tree model that predicts whether a consumer will
procure a specific product or not.
@seismicisolation
@seismicisolation
Fig. 5.12: Decision tree: consumers will procure a product.
In this particular illustration, the decision tree forecasts a transaction in the event
that the patron is below 30 years of age and possesses an income that falls below
50 K, or if they are of the male gender. In any other case, the decision tree
proceeds to divide based on the duration of the visit to the website, projecting a
transaction if the duration of the visit falls below 10 min.
The scikit-learn library, often employed for Decision Trees, is widely utilized in
the field. This library is a robust tool for machine learning, offering a range of
algorithms, such as Decision Trees, to construct models that can make accurate
@seismicisolation
@seismicisolation
predictions. A comprehensive elucidation of scikit-learn’s DecisionTreeClassifier
can be found below:
@seismicisolation
@seismicisolation
through the metrics module in the sklearn library.
Example:
Output:
Accuracy: 1.0
Below is an illustrative Python code snippet showcasing the implementation of
Decision Trees for regression on the Diabetes dataset from the scikit-learn library.
The code encompasses various steps such as dataset loading, model training using
DecisionTreeRegressor, prediction generation, model evaluation, and visualization
of the decision tree.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
@seismicisolation
@seismicisolation
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import mean_squared_error
Output:
Mean Squared Error: 4976.797752808989
The MSE computation is executed by the code in order to determine the
disparity between the target values (y_test) and the predicted values (y_pred) on
the test set. MSE represents the average squared deviation between the actual and
predicted values, thus providing an indication of the model’s precision.
→Fig. 5.13 presents a decision tree model trained on a diabetes dataset. The
tree structure consists of internal nodes representing tests or decisions based on
various features or attributes.
@seismicisolation
@seismicisolation
Fig. 5.13: Decision tree of diabetes dataset.
The act of visualizing the decision tree is facilitated by the plot_tree function from
scikit-learn. It portrays the structure of the decision tree through the utilization of
nodes and branches. Each node corresponds to a decision based on a specific
feature, while each leaf node corresponds to the predicted target value.
The data’s output and visualization provide valuable insights into the
performance and decision-making process of the Decision Tree Regression model,
which has been trained on the Diabetes dataset. Understanding the organization
of the decision tree and interpreting its nodes and branches is essential for gaining
insights into the relationships between characteristics and the target variable.
Furthermore, evaluating the model’s performance by utilizing metrics such as MSE
assists in assessing its accuracy and effectiveness in generating predictions.
Entropy and Information Gain are principles utilized in Decision Trees for the
purpose of ascertaining the most optimal attribute to divide the data at each node.
Entropy
@seismicisolation
@seismicisolation
Entropy quantifies the degree of impurity or uncertainty within a given dataset,
and it is determined through the analysis of the dataset’s distribution in terms of
various classes or categories.
Formula: For a dataset with K classes and proportion pi of class i:
K
Entropy(S)= − ∑ pi log2 (pi )
i=1
Proportion of class B:
4
pB = 10
= 0.4
Information Gain
@seismicisolation
@seismicisolation
Entropy of subset S1 : Entropy(S1 )= −(0.7 ∗ log2 (0.7)+0.3 ∗ log2 (0.3)
Entropy of subset S2 : Entropy(S2 )= −(0.3 ∗ log2 (0.3)+0.7 ∗ log2 (0.7)
Total instances N = 14
Number of instances with Play = Yes PYes = 9
Number of instances with Play = No PNo = 5
9 9 5 5
Entropy(S)= −( 14 log2( 14 )+ 14 log2 ( 14 ))
9 5
Entropy(S)= −( 14 ∗ −0.764 + 14
∗ −1)
@seismicisolation
@seismicisolation
Entropy(S)= −(−0.439 + −0.357)
Entropy(S)= 0.796
Step 2: Calculate Information Gain for each Feature
For Outlook:
For Temperature:
@seismicisolation
@seismicisolation
5.3.3 Random Forests and Bagging
Example
1. When training each decision tree, instead of using all features, we randomly
select a subset of features.
2. The subset of characteristics is employed to ascertain the optimal division at
every node within the tree.
3. The variability in the process of selecting features guarantees that every
@seismicisolation
@seismicisolation
decision tree within the collection acquires distinct characteristics of the
dataset, thereby resulting in a more heterogeneous assortment of models.
4. Predictions are aggregated as in bagging.
Advantages
The Python libraries commonly employed for Random Forests and Bagging are
primarily implemented by scikit-learn, a well-known Python library for machine
learning. Within this context, we find the main libraries utilized to carry out
Random Forests and Bagging:
scikit-learn (sklearn)
@seismicisolation
@seismicisolation
numpy (np)
matplotlib.pyplot (plt)
These libraries present a comprehensive array of tools for the implementation and
assessment of Random Forests and Bagging algorithms in the Python
programming language. They offer proficient implementations of these ensemble
techniques and provide supplementary functionalities for data preprocessing,
evaluation, and visualization, rendering them indispensable for the construction of
resilient machine learning models.
Below is a Python code example demonstrating the use of Random Forests and
Bagging with the Iris dataset from scikit-learn. It includes loading the dataset,
training Random Forest and Bagging classifiers, making predictions, evaluating the
models, and visualizing the decision boundaries.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from itertools import product
@seismicisolation
@seismicisolation
X = iris.data[:, :2] # Using only the first two features for visualization purpose
y = iris.target
# Train classifiers
dt.fit(X_train, y_train)
rf.fit(X_train, y_train)
bagging.fit(X_train, y_train)
# Make predictions
y_pred_dt = dt.predict(X_test)
y_pred_rf = rf.predict(X_test)
y_pred_bagging = bagging.predict(X_test)
# Evaluate classifiers
accuracy_dt = accuracy_score(y_test, y_pred_dt)
accuracy_rf = accuracy_score(y_test, y_pred_rf)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)
@seismicisolation
@seismicisolation
plt.contourf(xx, yy, Z, alpha=0.4)
→Fig. 5.14 shows the plots for comparisons of accuracy in Decision Tree, Random
Forest, and Bagging.
Fig. 5.14: Comparisons of accuracy in Decision Tree, Random Forest, and Bagging.
The code uses the Iris dataset, which contains 150 samples with 4 features each
(sepal length, sepal width, petal length, and petal width). For visualization
purposes, only the first two features are used.
Output
The precision of the Decision Tree, Random Forest, and Bagging classifiers
when applied to the test data is displayed.
Decision boundaries of each classifier are plotted to visualize their
performance in separating different classes.
Explanation
@seismicisolation
@seismicisolation
instantiated. By employing the fit method, each classifier is trained on the
training data.
Predictions are generated on the test data through the utilization of the
predict method.
Using the accuracy_score function from scikit-learn, accuracy scores are
computed for each classifier.
Eventually, decision boundaries are plotted to visually represent how each
classifier separates the classes within the feature space.
Visualization
Decision boundaries are plotted for each classifier, showing regions where each
class is predicted. Different colors represent different classes, and data points are
plotted as markers. Decision boundaries help visualize the classification
performance of each classifier in the feature space.
@seismicisolation
@seismicisolation
Types of Support Vector Machines
Working Principle
Data Preparation: We start by preparing our dataset, consisting of features (x1 and
x2) and corresponding class labels.
Model Training:
Model Evaluation:
@seismicisolation
@seismicisolation
Once the training process of the model is completed, we proceed to assess
its performance by utilizing a range of metrics including accuracy, precision,
recall, and F1-score.
Additionally, we employ the technique of visualizing the decision boundary to
gain insights into the effectiveness of the Support Vector Machine (SVM) in
segregating the classes within the feature space.
The execution of Support Vector Machine (SVM) commonly involves the utilization
of Python libraries, which are predominantly provided by scikit-learn (sklearn), a
popular machine learning library in the Python programming language. In the
subsequent discussion, we will introduce the main libraries employed in the
implementation of SVM.
scikit-learn (sklearn)
numpy (np)
matplotlib.pyplot (plt)
@seismicisolation
@seismicisolation
In SVM, the utilization of matplotlib.pyplot is frequently employed for the
purpose of illustrating decision boundaries, support vectors, and other
pertinent graphical representations to assess the model.
These libraries offer a comprehensive set of tools for implementing and evaluating
SVM algorithms in Python. They provide efficient implementations of SVM models,
support for data manipulation and numerical computations, and functionalities for
visualization and model evaluation. By leveraging these libraries, users can easily
build, train, and evaluate SVM models for various classification and regression
tasks.
Objective: The aim of the Linear Support Vector Machine (SVM) is to identify
the hyperplane possessing the utmost margin, thereby distinguishing the
classes within the feature space.
Hyperplane: The equation w⋅x + b = 0 serves as a mathematical
representation of the hyperplane, in which the weight vector w is orthogonal
to the hyperplane, the feature vector x is representative of the features, and
the bias term b has a significant role.
Margin: The margin is defined as the separation between the hyperplane
and the closest support vectors of each class, in terms of distance. The
objective of the linear SVM is to optimize this margin, aiming to maximize it.
Optimization: The formulation of Linear SVM involves an optimization
problem that aims to minimize the norm of the weight vector w, while
ensuring that the constraint yi(w⋅xi + b) ≥ 1 holds for all training instances
(xi,yi).
Classification: Upon completion of training, Linear SVM assigns new data
points to classes based on their position in relation to the hyperplane. Data
points falling on one side are assigned to one class, while those on the other
side are assigned to the other class.
Kernel trick: While the performance of Linear SVM is commendable in the
case of linearly separable data, the utilization of kernel functions allows for
the transformation of data into a higher-dimensional space. This
transformation effectively renders the data linearly separable, thereby
facilitating the classification of data that is not linearly separable.
@seismicisolation
@seismicisolation
Regularization: Linear SVM encompasses regularization parameters to
handle outliers and achieve a trade-off between maximizing the margin and
minimizing classification errors.
Scalability: Linear SVM is characterized by its efficiency and scalability,
making it a suitable choice for handling large datasets that have high-
dimensional feature spaces.
The Linear Support Vector Machine (SVM) algorithm exhibits remarkable efficacy in
executing binary classification tasks, demonstrating robust resilience to errors,
substantial computational efficiency, and the capability to manage voluminous
datasets. It discerns the most suitable hyperplane that effectively discriminates
between distinct classes within the feature space, thereby contributing to its
extensive utilization in various machine learning tasks.
Let us contemplate a binary classification problem which entails two distinct
features, denoted as x1 and x2. In the scenario at hand, we are presented with a
dataset that can be described as follows:
x1 x2 Class
1 2 0
2 3 0
3 4 1
4 5 1
Data Preparation:
The dataset is prepared by incorporating features (x1 and x2) alongside their
corresponding class labels (0 or 1).
Model Training:
@seismicisolation
@seismicisolation
Model Evaluation:
The evaluation of the Linear Support Vector Machine (SVM) model is conducted
by making predictions on the test data and comparing them with the actual labels.
To measure the performance of the model, different metrics such as accuracy,
precision, recall, and F1-score are computed. Additionally, the decision boundary,
which represents the hyperplane that effectively separates the two classes, is
presented visually.
Let’s consider a Python code to perform Linear SVM:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
@seismicisolation
@seismicisolation
plt.contourf(xx0, xx1, Z, alpha=0.3, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.title('Data After SVM with Hyperplane')
plt.xlabel('X1')
plt.ylabel('X2')
# Plot hyperplane
plt.plot([x0_min, x0_max], [-(w[0]✶x0_min + b)/w[1], -(w[0]✶x0_max + b)/w[1]], 'k-
plt.xlim(x0_min, x0_max)
plt.ylim(x1_min, x1_max)
plt.show()
→Fig. 5.15 presents a scatter plot visualization of the dataset prior to applying the
Support Vector Machine (SVM) algorithm. The plot displays the data points, each
representing an individual observation or instance, distributed across two
dimensions or features. These features are represented by the x-axis and y-axis,
respectively.
@seismicisolation
@seismicisolation
→Fig. 5.16 presents a scatter plot visualization of the dataset after applying the
Support Vector Machine (SVM) algorithm. Similar to the previous scatter plot (Fig.
5.15), the data points are plotted in the feature space, with the x-axis and y-axis
representing two chosen features or dimensions.
@seismicisolation
@seismicisolation
sets of randomly generated 2D points.
The initial collection of data points is generated from a standard
distribution with a center at (−2, −2), denoted as class −1.
A subsequent collection of data points is generated from a standard
distribution with a center at (2, 2), denoted as class 1.
The acquired data is stored in the variable X, with the corresponding
labels being stored in the variable y.
Train Linear SVM:
We create an SVM classifier object using svm.SVC with kernel = ‘linear’,
indicating a linear kernel.
The classifier undergoes training on the synthetic data with the
utilization of the fit method.
Plot Data After SVM:
We create a contour plot to visualize the decision boundary (hyperplane)
obtained after applying SVM.
The decision boundary is plotted along with the support vectors and
data points.
Support vectors are marked with circles, and the hyperplane is plotted as
a dashed line.
The title and axis labels are added to the plot for clarity.
This code exemplifies the utilization of scikit-learn in order to execute Linear SVM
classification on fabricated data. Initially, it generates fabricated data and graphs it
prior to the application of SVM. Subsequently, it proceeds to train a Linear SVM
model and graphically illustrates the data post SVM application, showcasing the
decision boundary and support vectors.
Let us examine an alternative Python code that executes Linear SVM on a
dataset obtained from scikit-learn, specifically the load_iris dataset.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
@seismicisolation
@seismicisolation
# Create a pipeline to scale the data and apply SVM
clf = Pipeline([
('scaler', StandardScaler()),
('linear_svc', LinearSVC(C=1, loss='hinge', random_state=42))
])
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
plt.title('Data before SVM')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, edgecolor='k')
plt.show()
Loads the iris dataset from the scikit-learn library and proceeds to exclusively
choose the initial two characteristics.
Creates a pipeline that first scales the data using StandardScaler and then
applies a linear SVM using LinearSVC.
Fits the pipeline to the data.
Creates a mesh grid to plot the data before and after applying SVM.
@seismicisolation
@seismicisolation
Plots the original data (before SVM) in the first subplot.
Plots the data after applying SVM in the second subplot, including the
decision boundary (hyperplane) using plt.contourf.
Displays the plots.
The above code displays two subplots. The first subplot shows the original data
points before applying SVM, colored according to their class labels. The second
subplot shows the data points after applying linear SVM, along with the decision
boundary (hyperplane) separating the classes as shown below.
→Fig. 5.17 presents a comparative visualization of the Iris dataset before and
after applying the Support Vector Machine (SVM) algorithm.
The Kernel Support Vector Machine (SVM) is known for its significant advancement
over the conventional SVM algorithm. This allows the SVM to classify data that
cannot be separated linearly. This is accomplished by implicitly mapping the data
into a feature space with a higher dimensionality.
The primary objective of Kernel SVM is to identify the most optimal hyperplane
@seismicisolation
@seismicisolation
in the feature space, capable of separating the classes. This is achieved by non-
linearly transforming the data using kernel functions.
The technique known as the kernel trick plays a crucial role in allowing the
computation of dot products in the feature space of higher dimensionality, all the
while avoiding the explicit transformation of the data. Various kernel functions,
such as the linear, polynomial, radial basis function (RBF), and sigmoid functions,
are employed to gauge the similarity between different data points.
By transforming the data into a space of higher dimensionality, kernel SVM
enables the establishment of decision boundaries that are non-linear in nature
within the original feature space. This allows for the linear separation of classes.
Various types of kernels are available for Kernel SVM:
The optimization problem for Kernel SVM is tackled through techniques such as
the Sequential Minimal Optimization (SMO) algorithm or quadratic programming.
These methods ascertain the optimal hyperplane in the higher-dimensional space,
either by maximizing the margin between classes or by minimizing classification
errors.
In terms of scalability, Kernel SVM can prove computationally demanding for
large datasets, particularly when non-linear kernels and high-dimensional feature
spaces are involved.
Kernel SVM finds widespread application in diverse machine learning tasks,
encompassing classification, regression, and anomaly detection, where non-linear
relationships are apparent in the data.
To conclude, Kernel SVM stands as a versatile and effective algorithm for
handling non-linear relationships in data, enabling the construction of intricate
decision boundaries in the feature space. Its capacity to implicitly transform data
using kernel functions renders it suitable for a vast array of machine learning
applications.
Below is a Python code that demonstrates the implementation of Kernel SVM.
Steps:
First generate synthetic data using the make_circles function from scikit-
learn, creating circular clusters with some noise.
Plot the generated data before applying SVM using plt.scatter.
@seismicisolation
@seismicisolation
Train a model with Radial Basis Function (RBF) kernel using Support Vector
Classifier (SVC) from the scikit-learn library in order to conduct Kernel
Support Vector Machine (SVM) training.
Following the completion of SVM model training, it is necessary to visually
represent the data through a plot and superimpose the decision boundary
derived from the SVM model using the contourf method.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.svm import SVC
# Generate synthetic data
X, y = make_circles(n_samples=100, noise=0.1, factor=0.4, random_state=42)
# Plot data before SVM
plt.figure(figsize=(10, 5))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.title('Data Before SVM')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# Train Kernel SVM
svm_model = SVC(kernel='rbf', gamma='auto')
svm_model.fit(X, y)
# Plot data after SVM
plt.figure(figsize=(10, 5))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
# Plot decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),
np.linspace(ylim[0], ylim[1], 50))
Z = svm_model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)
plt.title('Data After SVM with Hyperplane')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
→Fig. 5.18 presents a scatter plot visualization of the dataset prior to applying the
@seismicisolation
@seismicisolation
kernel Support Vector Machine (SVM) algorithm. The plot displays the data points,
each representing an individual observation or instance, distributed across two
dimensions or features. These features are represented by the x-axis and y-axis,
respectively.
The first plot displays the synthetic data generated using the make_circles
function.
Data points belonging to different classes are represented by different
colors.
The original feature space does not allow for linear separation due to its
circular distribution.
→Fig. 5.19 presents a scatter plot visualization of the dataset after applying the
kernel Support Vector Machine (SVM) algorithm. Similar to the previous scatter
plot (Fig. 5.18), the data points are plotted in the feature space, with the x-axis and
y-axis representing two chosen features or dimensions.
@seismicisolation
@seismicisolation
Fig. 5.19: Data after Kernel SVM with hyperplane.
The second plot shows the same synthetic data after applying Kernel SVM.
The data points are once again depicted using distinct colors to indicate
varying classes.
Furthermore, the SVM model’s learned decision boundary is illustrated as a
contour plot.
The decision boundary effectively separates the circular clusters into
different regions, demonstrating the non-linear separation capability of
Kernel SVM.
Let us now examine an alternative Python code that executes the kernel Support
Vector Machine (SVM) algorithm, utilizing the Radial Basis Function (RBF) kernel,
on a dataset known as Iris, sourced from the scikit-learn library.
Steps:
Loads the iris dataset from the scikit-learn library and specifically chooses
solely the initial two characteristics.
Creates a pipeline that first scales the data using StandardScaler and then
applies a kernel SVM with an RBF kernel using SVC(kernel = ‘rbf’).
Fits the pipeline to the data.
Creates a mesh grid to plot the data before and after applying SVM.
Plots the original data (before SVM) in the first subplot.
Plots the data after applying SVM in the second subplot, including the
@seismicisolation
@seismicisolation
decision boundary using plt.contourf.
Displays the plots.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # We only take the first two features
y = iris.target
# Create a pipeline to scale the data and apply SVM
clf = Pipeline([
('scaler', StandardScaler()),
('svc', SVC(kernel='rbf', C=10, gamma=0.1, random_state=42))
])
# Fit the pipeline
clf.fit(X, y)
# Plot the data before applying SVM
x0_min, x0_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x1_min, x1_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x0_min, x0_max, 0.1),
np.arange(x1_min, x1_max, 0.1))
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
plt.title('Data before SVM')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, edgecolor='k')
# Plot the data after applying SVM
plt.subplot(1, 2, 2)
plt.title('Data after SVM')
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Set1, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, edgecolor='k')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
@seismicisolation
@seismicisolation
plt.show()
→Fig. 5.20 presents a comparative visualization of the Iris dataset before and after
applying the Kernel Support Vector Machine (SVM) algorithm. The code displays
two subplots. The first subplot shows the original data points before applying SVM,
colored according to their class labels. The second subplot shows the data points
after applying kernel SVM with an RBF kernel, along with the decision boundary
separating the classes.
The kernel type refers to the specific type of kernel function utilized, such as
linear, polynomial, or radial basis function (RBF).
@seismicisolation
@seismicisolation
The regularization parameter (C) plays a crucial role in balancing the
optimization of margin maximization and classification error minimization.
Additionally, kernel-specific parameters, such as the degree of the
polynomial kernel or the gamma parameter for the RBF kernel, further
contribute to the customization and fine-tuning of the kernel function.
@seismicisolation
@seismicisolation
step is crucial as it contributes to improving the generalization capability of the
SVM models and achieving higher predictive accuracy. To efficiently explore the
hyperparameter space and identify the optimal configuration, various techniques
such as grid search, random search, and Bayesian optimization can be employed.
The following Python code exemplifies the process of hyperparameter tuning
for SVM using grid search. Additionally, it includes plots that depict the
performance of the SVM model before and after the hyperparameter tuning stage.
For this demonstration, synthetic data will be used, which is generated through the
implementation of the make_classification function from the scikit-learn library.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
# Step 1: Generate Synthetic Data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2,
n_clusters_per_class=1, n_redundant=0, random_state=42)
@seismicisolation
@seismicisolation
np.linspace(ylim[0], ylim[1], 50))
Z = svm_model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)
plt.title('Data After SVM with Hyperplane')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Step 7: Display Best Hyperparameters
print("Best Hyperparameters:", svm_model.best_params_)
# Step 8: Evaluate Model Performance
accuracy = svm_model.score(X_test, y_test)
print("Accuracy:", accuracy)
Steps:
→Fig. 5.21 presents a scatter plot visualization of the dataset prior to applying the
Support Vector Machine (SVM) algorithm and performing hyperparameter tuning.
The plot displays the data points, each representing an individual observation or
instance, distributed across two dimensions or features. These features are
represented by the x-axis and y-axis, respectively.
@seismicisolation
@seismicisolation
Fig. 5.21: Data before SVM and Hypertuning.
→Fig. 5.22 presents a scatter plot visualization of the dataset after applying the
Support Vector Machine (SVM) algorithm and performing hyperparameter tuning.
Similar to the previous scatter plot (Fig. 5.21), the data points are plotted in the
feature space, with the x-axis and y-axis representing two chosen features or
dimensions.
@seismicisolation
@seismicisolation
Fig. 5.22: Data after SVM and Hypertuning.
Initialization: The first step is to determine the number of clusters (k) and
randomly initialize the centroids of these clusters.
Assignment Step: Next, we allocate each data point to the nearest centroid
by utilizing a distance metric, commonly the Euclidean distance.
Update Step: We then proceed to recalculate the centroids of the clusters by
computing the mean of all data points assigned to each cluster.
Convergence: Finally, we repeat the assignment and update steps until
convergence criteria are satisfied, such as reaching a maximum number of
iterations or observing minimal change in centroids.
@seismicisolation
@seismicisolation
Initialization Methods:
@seismicisolation
@seismicisolation
eliminating the requirement of labeled instances.
In clustering, data points are partitioned into clusters, with the aim of
maximizing intra-cluster similarity and minimizing inter-cluster similarity. Common
clustering algorithms include K-Means, Hierarchical Clustering, and DBSCAN, each
with its approach to defining clusters.
K-Means iteratively assigns data points to the nearest cluster centroid and
updates centroids until convergence.
Hierarchical Clustering constructs a hierarchical structure of clusters through
the repeated process of merging or dividing according to a linkage criterion.
DBSCAN identifies clusters based on density, grouping points in high-density
regions while considering low-density regions as noise.
@seismicisolation
@seismicisolation
between a data point and its own cluster relative to other clusters. A higher
silhouette score is indicative of superior clustering performance. Silhouette scores
can be calculated for various values of k, and the optimal number of clusters is
determined by selecting the value of k that maximizes the average silhouette score
across all data points.
Gap Statistics: Gap statistics are utilized to compare the dispersion present
within clusters with that of a reference null distribution in order to ascertain the
optimal quantity of clusters. This process entails generating reference datasets
that possess random uniform distributions and subsequently calculating the gap
statistic for every value of k. The value of k that maximizes the gap statistic
signifies the optimal quantity of clusters.
Expert Review and Iteration: It is crucial to review the results obtained from
various methods and consider additional factors such as interpretability,
practicality, and the specific objectives of the analysis. The selection process may
involve iterating through different values of k and evaluating the clustering results
until a satisfactory solution is achieved.
Visual Inspection: Visualizing the data and clustering results can provide
insights into the underlying structure and assist in selecting the optimal quantity
of clusters. Techniques such as scatter plots, heatmaps, or dendrograms can be
employed to visualize clusters and assess their quality.
Cross-Validation: Cross-validation methodologies, including k-fold cross-
validation, can be employed to assess the robustness and efficacy of clustering
algorithms across various k values in terms of stability and generalizability.
In conclusion, the process of selecting the quantity of clusters involves a
combination of statistical methods, domain knowledge, and expert judgment. It is
an iterative process that requires careful consideration of various factors to
determine the optimal quantity of clusters for meaningful and interpretable
results.
Consider a numerical illustration in which a dataset of two-dimensional points
is available, and the objective is to ascertain the optimal number of clusters
through the utilization of the elbow method.
Suppose we have the following dataset:
X = [(2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (20, 22), (21, 23), (22
import numpy as np
import matplotlib.pyplot as plt
X = np.array([(2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (20, 22), (21,
@seismicisolation
@seismicisolation
plt.scatter(X[:, 0], X[:, 1])
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Dataset Visualization')
plt.show()
The scatter plot illustrates that the dataset encompasses points that have the
potential to develop clusters. Subsequently, we shall employ the K-Means
clustering algorithm for various values of k and employ the elbow method to
ascertain the most optimal number of clusters.
→Fig. 5.23 presents scatter plot of data set used for selecting number of
clusters.
@seismicisolation
@seismicisolation
from sklearn.cluster import KMeans
# Define a range of k values
k_values = range(1, 6)
inertia = []
# Apply K-Means for each k value and compute inertia
for k in k_values:
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X)
inertia.append(kmeans.inertia_)
→Fig. 5.24 illustrates the application of the Elbow method, a widely used technique
for determining the optimal number of clusters (k) in k-means clustering. The
figure likely consists of two subplots or panels. The elbow method aids us in
discerning the juncture at which the pace of decline in within-cluster sum of
squares (WCSS) decelerates. In the present scenario, we discern an elbow juncture
at k = 3. Consequently, we can deduce that the most suitable number of clusters for
this dataset is 3.
@seismicisolation
@seismicisolation
Fig. 5.24: Elbow method for optimal k.
@seismicisolation
@seismicisolation
as well as the labels assigned to each data point.
5. Proceed to plot the data following K-Means clustering, where each data point
is color-coded based on its assigned cluster, and the cluster centers are
denoted by red crosses.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Step 1: Generate Synthetic Data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Step 2: Plot Data Before K-Means
plt.figure(figsize=(10, 5))
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.title('Data Before K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# Step 3: Apply K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=0)
kmeans.fit(X)
# Step 4: Get Cluster Centers and Labels
cluster_centers = kmeans.cluster_centers_
cluster_labels = kmeans.labels_
# Step 5: Plot Data After K-Means
plt.figure(figsize=(10, 5))
plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, s=50, cmap='viridis')
plt.scatter(cluster_centers[:, 0], cluster_centers[:, 1], c='red', s=200, alpha=0.
plt.title('Data After K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
→Fig. 5.25 presents a scatter plot visualization of the dataset prior to applying the
k-means clustering algorithm. The plot displays the data points, each representing
an individual observation or instance, distributed across two dimensions or
features. These features are represented by the x-axis and y-axis, respectively.
@seismicisolation
@seismicisolation
Fig. 5.25: Data before K-Means clustering.
→Fig. 5.26 algorithm. Similar to the previous scatter plot (Fig. 5.25), the data points
are plotted in the feature space, with the x-axis and y-axis representing two chosen
features or dimensions.
However, in this figure, an additional component is superimposed onto the
scatter plot: the cluster assignments resulting from the k-means algorithm.
@seismicisolation
@seismicisolation
Fig. 5.26: Data after K-Means clustering.
The initial plot illustrates the synthetic data prior to the implementation of K-
Means clustering.
The second plot displays the data after clustering with K-Means. Each point is
colored according to its assigned cluster, and the centroids of the clusters
are marked in red. This visualization helps us understand how K-Means has
grouped the data into clusters based on similarity.
Let us examine an additional Python script that performs the k-means clustering
algorithm on a dataset obtained from scikit-learn, while simultaneously producing
graphical depictions of the data before and after the utilization of the k-means
clustering algorithm.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # We only take the first two features
# Plot the data before applying k-means
plt.figure(figsize=(10, 6))
@seismicisolation
@seismicisolation
plt.subplot(1, 2, 1)
plt.title('Data before K-Means')
plt.scatter(X[:, 0], X[:, 1], cmap='viridis')
# Apply k-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
# Plot the data after applying k-means
plt.subplot(1, 2, 2)
plt.title('Data after K-Means')
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Plot the cluster centers
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=100, alpha=0.5)
plt.show()
Steps:
@seismicisolation
@seismicisolation
The first subplot shows the original data points before applying k-means
clustering, colored with a continuous colormap (viridis in this case).
The second subplot shows the data points after applying k-means clustering,
where each point is colored according to its assigned cluster. Additionally,
the cluster centers are plotted as larger red dots.
→Fig. 5.27 presents a visual comparison of the IRIS dataset before and after
applying the k-means clustering algorithm.
@seismicisolation
@seismicisolation
computational complexity, eliminating noise, and facilitating data visualization
within a lower-dimensional space.
where n represents the total number of samples, xki and xkj denote the values of
the ith and jth features of the kth sample, and μi and μj correspond to the means
of features xi and xj, respectively.
Eigendecomposition
Projection
Finally, the data is projected onto the designated principal components in order to
@seismicisolation
@seismicisolation
acquire the representation with reduced dimensions. This projection is
accomplished by performing the multiplication of the original data matrix with the
matrix consisting of the chosen eigenvectors (principal components).
Mathematical Representation
Y = X ⋅ Vk
where Vk is the matrix of the first k eigenvectors.
PCA is an influential instrument utilized for the pre-processing of data,
visualization, and the extraction of features. Its extensive employment can be
witnessed across diverse domains, such as machine learning, signal processing,
and image analysis. Through its mathematical underpinnings, PCA affords valuable
perspectives into the inherent structure of data with high dimensions,
consequently facilitating the computation of reduced-dimensional representations
in an efficient manner.
PCA finds multiple applications, such as the visual representation of data,
reduction of noise, extraction of features, and compression of data. It is
extensively utilized as a preliminary step in machine learning pipelines, particularly
when handling high-dimensional data, for example, images, signals, or textual
data.
Below are the steps to perform PCA:
1. Data preprocessing:
Standardization of the data is achieved by subtracting the mean value
from each feature and subsequently dividing by the standard deviation.
This crucial step guarantees that all features are uniformly scaled and
possess equal significance in the subsequent analysis.
2. Calculate the covariance matrix:
Compute the covariance matrix of the standardized data, whereby the
covariance matrix denotes the measure of the variance and covariance
among the features.
@seismicisolation
@seismicisolation
3. Calculate eigenvectors and eigenvalues:
Compute the eigenvectors and associated eigenvalues of the covariance
matrix.
The eigenvectors depict the principal components, while the eigenvalues
indicate the extent to which each principal component captures the
variance.
4. Sort eigenvectors by eigenvalues:
Sort the eigenvectors in descending order based on their corresponding
eigenvalues.
The principal component that captures the maximum variance in the
data is the eigenvector associated with the highest eigenvalue.
5. Select principal components:
Decide the number of principal components to retain by considering the
extent of variance you wish to preserve or the intended reduction in
dimensionality.
One common approach is to choose the top k principal components that
capture a certain percentage (e.g., 95%) of the total variance in the data.
6. Project data onto principal components:
Project the original data onto the selected principal components by
multiplying the original data with the chosen eigenvectors.
This procedure converts the data from the initial feature space to the
subsequent subspace specified by the principal components.
7. Dimensionality reduction:
The transformed data now exists in the lower-dimensional subspace
defined by the selected principal components.
If one were to select k principal components, the dimensionality of the
data would be diminished from the initial number of features to k
dimensions.
8. Optional: reconstruction or visualization:
Optionally, it is possible to restore the initial data from the diminished
representation by performing the multiplication of the transformed data
with the transposed chosen eigenvectors.
Visualize the transformed data in the lower-dimensional subspace for
exploration or interpretation purposes.
@seismicisolation
@seismicisolation
Dimensionality reduction is a crucial technique in the realm of machine learning
and data analysis, especially when dealing with data that contains a large number
of dimensions. The main objective of dimensionality reduction is to reduce the
number of features or dimensions in a dataset while retaining as much relevant
information as possible.
The ultimate goal of dimensionality reduction is to transform data with a high
number of dimensions into a space with fewer dimensions, making it more
manageable and computationally efficient. By doing so, it helps overcome the
challenges posed by the “curse of dimensionality,” which occurs when the
complexity of the data increases exponentially with the number of dimensions.
This, in turn, leads to issues like overfitting, increased computational costs, and
data sparsity.
There are two primary approaches to dimensionality reduction: feature
selection and feature extraction. Feature selection involves choosing a subset of
the original features, while feature extraction involves creating new features by
combining or transforming the original ones.
PCA is a commonly used technique for feature extraction. It identifies the
directions of maximum variance in the data and projects the data onto a lower-
dimensional subspace defined by these directions, known as principal
components.
Other notable techniques for dimensionality reduction include Linear
Discriminant Analysis (LDA), which aims to maximize the separability between
classes, and t-Distributed Stochastic Neighbor Embedding (t-SNE), a non-linear
technique suitable for visualizing high-dimensional data in a lower-dimensional
space.
Dimensionality reduction has the potential to improve the performance of
machine learning models by eliminating irrelevant or redundant features, reducing
noise, and enhancing the interpretability of the data. However, it is important to
strike a balance between reducing dimensions and preserving essential
information for the specific task at hand. Dimensionality reduction techniques find
extensive use in diverse domains such as image and signal processing, text
mining, bioinformatics, and recommendation systems, among others. They play a
fundamental role in data preprocessing, visualization, and feature engineering
pipelines within the context of machine learning workflows.
Feature selection and feature extraction are two frequently utilized
methodologies within the domains of machine learning and data analysis, which
aim to reduce the dimensionality of datasets and amplify the efficacy of models.
Herein lies an extensive elucidation of each approach:
Feature Selection
@seismicisolation
@seismicisolation
Feature selection involves the careful selection of a subset of the original features
from the dataset, with the exclusion of any features that are deemed irrelevant or
redundant. The primary objective is to ameliorate model performance by curbing
overfitting, diminishing computational complexity, and augmenting
interpretability. The techniques employed for feature selection can be classified
into three distinct types:
Feature Extraction
Feature extraction is the process of converting the initial features into a fresh
collection of features through the act of combining or altering them, all the while
preserving the crucial information. The primary objective is to decrease the
dimensionality of the dataset while maintaining the utmost amount of pertinent
information possible. There are two main classifications for feature extraction
methods: linear and non-linear techniques:
@seismicisolation
@seismicisolation
Neighbor Embedding (t-SNE) and Isomap, produce novel characteristics by
capturing non-linear associations within the data. These approaches prove to
be especially advantageous when it comes to representing high-dimensional
data in lower-dimensional spaces while simultaneously conserving local
structures.
Eigenvectors
A⋅v =λ⋅v
where λ denotes a scalar referred to as the eigenvalue corresponding to v.
Eigenvalues
Eigenvalues are the scalars that represent the factor by which the
corresponding eigenvector is stretched or compressed during a linear
transformation.
Each eigenvector of a matrix A corresponds to a unique eigenvalue.
Eigenvalues have a significant impact on the determination of the
characteristics of linear transformations, such as the process of diagonalizing
@seismicisolation
@seismicisolation
matrices, analyzing stability, and finding solutions to differential equations.
Applications
Image and signal processing: Eigenvectors are used for image compression
and noise reduction.
Structural analysis: Eigenvalues determine the stability and natural
frequencies of structures.
Machine learning: Eigenvectors and eigenvalues are used in dimensionality
reduction, feature extraction, and clustering algorithms.
import numpy as np
# Original data
X = np.array([[2.5, 2.4, 0.5, 0.7],
[2.1, 1.9, 1.8, 1.3],
[1.6, 1.6, 1.5, 1.1],
[1.0, 0.9, 1.0, 0.7],
[0.5, 0.6, 0.7, 0.5]])
# Step 1: Calculate the mean of each feature (dimension)
mean = np.mean(X, axis=0)
print("Mean:", mean)
# Step 2: Subtract the mean from each observation to center the data
X_centered = X - mean
# Step 3: Calculate the covariance matrix
covariance_matrix = np.cov(X_centered.T)
print("\nCovariance Matrix:")
print(covariance_matrix)
# Step 4: Calculate the eigenvalues and eigenvectors of the covariance matrix
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)
print("\nEigenvalues:", eigenvalues)
# Step 5: Sort the eigenvectors in descending order of their corresponding eigenva
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_indices]
sorted_eigenvalues = eigenvalues[sorted_indices]
@seismicisolation
@seismicisolation
print("\nSorted Eigenvectors:")
for i, eigenvector in enumerate(sorted_eigenvectors.T):
print(f"PC{i+1}: {eigenvector}")
# Step 6: Project the centered data onto the new subspace defined by the principal
X_projected = X_centered @ sorted_eigenvectors
print("\nProjected Data:")
print(X_projected)
This code follows the steps you outlined for dimensionality reduction using PCA.
Here’s a breakdown of what the code does:
→Fig. 5.28 displays the results of Principal Component Analysis (PCA), a technique
used for dimensionality reduction, showcasing the transformed dataset where
data points are represented in a lower-dimensional space while preserving the
most significant variance across the original features.
@seismicisolation
@seismicisolation
Fig. 5.28: Principal component analysis (PCA) for dimensionality reduction.
The code prints out the mean, covariance matrix, eigenvalues, sorted eigenvectors
(labeled as PC1, PC2, etc.), and the projected data X_projected.
Below is an exemplification of a Python code that showcases the
implementation of PCA on an IRIS dataset. Additionally, it demonstrates the
visualization of the outcomes both pre- and post-PCA.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
@seismicisolation
@seismicisolation
# Instantiate PCA and fit the data
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Plot original data
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('Original Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar()
# Plot data after PCA
plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('Data After PCA')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar()
plt.tight_layout()
plt.show()
Steps:
This code generates two subplots: one showing the original data and another
showing the data after PCA. The data points colored based on the target labels
(species) to visualize any patterns or clusters before and after PCA.
→Fig. 5.29 illustrates the IRIS dataset both before and after Principal
Component Analysis (PCA) transformation. The plot likely demonstrates how PCA
@seismicisolation
@seismicisolation
reduces the dimensionality of the data while retaining the most important
information, aiding in visualizing the dataset’s structure and potential clustering
patterns.
Bayes’ Theorem
@seismicisolation
@seismicisolation
Where:
The “naive” assumption is derived from the notion that Naive Bayes assumes
conditional independence among all features, given the class label. This
simplification enables the calculation of P(data|class) to be performed in the
subsequent manner:
Mathematically, this is expressed as:
@seismicisolation
@seismicisolation
Bernoulli Naive Bayes: Assumes binary features, often used for document
classification tasks.
Let’s consider a simple example of using Naive Bayes for email spam classification.
We consider a dataset consisting of emails, each classified as either “spam” or
“not spam” (ham). Our objective is to construct a Naive Bayes classifier that can
forecast whether a new email is spam or not by analyzing the presence or absence
of specific words within the email’s body.
Let’s say we have the following training data:
['you', 'have', 'won', 'free', 'vacation', 'make', 'money', 'fast', 'secret', 'sys
In order to educate the Naive Bayes classifier, it is necessary to compute the prior
probabilities P(spam) and P(ham), as well as the likelihood probabilities
P(word|spam) and P(word|ham) for each individual word.
Given that an equivalent amount of spam and ham emails are present in the
training data, the prior probabilities would be as follows:
For the likelihood probabilities, we can calculate the frequency of each word in the
spam and ham emails. For example:
Once all the requisite probabilities have been estimated from the training data, a
new email can be classified by computing the posterior probability P(spam|email)
and P(ham|email) utilizing Bayes’ theorem and the assumption of naive
independence. The prediction is made by selecting the class with the highest
posterior probability.
@seismicisolation
@seismicisolation
For example, let’s say we have a new email with the text: “Get rich quickly with
our system!” To classify this email, we would calculate P(spam|email) and
P(ham|email) using the estimated probabilities from the training phase, and
choose the class with the higher probability.
This particular example demonstrates the fundamental operations of Naive
Bayes in the context of text classification. In practical applications, more
sophisticated techniques such as feature selection, smoothing, and handling of
non-occurring events are frequently employed to enhance the performance of
Naive Bayes classifiers.
Let’s consider a numerical example of using Naive Bayes for classification.
Suppose we possess a dataset comprising weather observations, wherein each
instance is categorized as either “Play” or “Don’t Play” contingent upon four
characteristics: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool),
Humidity (High, Normal), and Wind (Strong, Weak).
Here’s the training dataset:
Next, we calculate the likelihood probabilities for each feature value given the
class:
For example, let’s calculate P(Outlook = Sunny|Play) and P(Outlook =
@seismicisolation
@seismicisolation
Sunny|Don’t Play):
We can similarly calculate the likelihood probabilities for all feature values and
classes.
Now, let’s classify a new instance with the feature values: Outlook = Overcast,
Temperature = Cool, Humidity = High, Wind = Strong.
In order to determine the posterior probability of each class, the utilization of
Bayes’ theorem and the assumption of feature independence, commonly referred
to as the “naive” assumption, is employed.
We don’t need to calculate P(features) since it’s a scaling factor, and we’re only
interested in the relative probabilities.
Since P(Play|features) > P(Don’t Play|features), we would classify this new
instance as “Play.”
This example illustrates the calculations involved in training and using a Naive
Bayes classifier. In practice, techniques like Laplace smoothing are often employed
to handle zero probabilities and prevent overfitting.
Applications
@seismicisolation
@seismicisolation
5.7.1 Gaussian Naive Bayes
The Gaussian Naive Bayes technique is a variation of the Naive Bayes algorithm,
which proves to be highly advantageous in scenarios involving continuous or
numerical characteristics. It operates under the assumption that the continuous
attributes conform to a Gaussian (normal) distribution for every class.
The key steps in Gaussian Naive Bayes are:
1. Calculate the prior probabilities of each class, P(class), from the training data.
2. For every continuous feature, the mean and standard deviation should be
computed for that particular feature in each class.
3. Assuming a Gaussian distribution, the likelihood of a feature value x given a
class is calculated using the probability density function:
∣
)
( x−μ ) 2
P (x∣class) =( 1 2 )∗exp(− 2σ 2
∣ √2π∗σ
the mean and standard deviation of the feature in that class are represented
by μ and σ, respectively.
The Gaussian assumption makes Gaussian Naive Bayes particularly effective for
continuous data, as it captures the distribution of feature values within each class.
However, it may not perform well if the feature distributions are significantly non-
Gaussian or if there are strong dependencies between features.
Like regular Naive Bayes, Gaussian Naive Bayes is computationally efficient
and can be a good baseline classifier, especially when dealing with high-
dimensional continuous data. However, more advanced techniques like kernel
density estimation or semi-supervised learning may be required for complex data
distributions.
Let’s consider a numerical example of using Gaussian Naive Bayes for
classification.
Suppose we possess a collection of measurements for iris flowers, in which
@seismicisolation
@seismicisolation
each individual is categorized as one of three distinct species: Setosa, Versicolor, or
Virginica. The attributes included in this dataset encompass sepal length, sepal
width, petal length, and petal width, all of which are expressed in centimeters.
Here’s a small subset of the training dataset:
First, we calculate the prior probabilities of each class from the training data:
For Versicolor:
For Virginica:
@seismicisolation
@seismicisolation
Now, let’s classify a new instance with the feature values: Sepal Length = 6.2, Sepal
Width = 3.4, Petal Length = 5.4, Petal Width = 2.3.
To calculate the posterior probability for each class, we use Bayes’ theorem
and the Gaussian probability density function for the continuous features:
P(Setosa|data) = (P(6.2|Setosa) ✶ P(3.4|Setosa) ✶ P(5.4|Setosa) ✶
P(2.3|Setosa) ✶ P(Setosa)) / P(data) P(Versicolor|data) = (P(6.2|Versicolor) ✶
P(3.4|Versicolor) ✶ P(5.4|Versicolor) ✶ P(2.3|Versicolor) ✶ P(Versicolor)) / P(data)
P(Virginica|data) = (P(6.2|Virginica) ✶ P(3.4|Virginica) ✶ P(5.4|Virginica) ✶
P(2.3|Virginica) ✶ P(Virginica)) / P(data)
Plugging in the calculated means, standard deviations, and prior probabilities,
we get:
Since P(Virginica|data) is the highest, we would classify this new instance as the
Virginica species.
This illustration showcases the computations entailed in the training and
utilization of a Gaussian Naive Bayes classifier for continuous attributes. In
practical application, methods such as feature scaling and the management of
absent values may be necessary to enhance performance.
Below is a Python code that implements Gaussian Naive Bayes classification on
a dataset.
Steps:
Firstly, it is essential to import the required libraries for the task at hand.
These libraries include numpy, which is used for performing numerical
operations, matplotlib.pyplot, which is used for plotting, make_blobs from
sklearn datasets, which allows us to generate a synthetic dataset,
GaussianNB from sklearn naive_bayes, which is the Gaussian Naive Bayes
classifier, train_test_split from sklearn model_selection, which is used to split
the data into train and test sets, and accuracy_score from sklearn metrics,
which is used to calculate the classification accuracy.
To generate a synthetic dataset with two clusters, we can utilize the
make_blobs function. This function will generate a dataset with 1,000
samples, two features, and two classes.
In order to evaluate the performance of our model, it is necessary to split the
data into training and test sets. To achieve this, we can make use of the
train_test_split function. In this case, we will allocate 80% of the data for
training and the remaining 20% for testing.
@seismicisolation
@seismicisolation
Before applying the Naive Bayes classifier to the training data, it is beneficial
to visualize the data and observe the separability of the classes. This can be
accomplished by plotting the training data using the plt scatter function.
Next, we will create an instance of the GaussianNB classifier and fit it to the
training data using the gnb.fit(X_train, y_train) command.
Once the classifier has been trained, we can proceed to make predictions on
the test set. This can be done using the y_pred = gnb predict(X_test)
command.
To assess the performance of our model, we need to calculate the
classification accuracy. This can be achieved by utilizing the accuracy_score
function and passing in the true labels (y_test) and the predicted labels
(y_pred). The resulting accuracy can then be printed for further analysis.
Finally, we can visualize the test data after applying the Naive Bayes
classifier. This can be done by plotting the test data and coloring the points
according to the predicted class labels (y_pred).
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=0)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_st
# Plot the data before applying Naive Bayes
plt.figure(figsize=(10, 6))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='viridis', edgecolor='k'
plt.title('Data before applying Naive Bayes')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Train the Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Make predictions on the test set
y_pred = gnb.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
@seismicisolation
@seismicisolation
print(f'Accuracy: {accuracy:.2f}')
# Plot the data after applying Naive Bayes
plt.figure(figsize=(10, 6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='viridis', edgecolor='k')
plt.title('Data after applying Naive Bayes')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
→Fig. 5.30 presents a scatter plot visualizing the dataset before applying the Naïve
Bayes classification algorithm.
Accuracy: 0.94
The code gives two plots: one showing the original data before applying Naive
Bayes, and another showing the data after applying Naive Bayes, with the points
colored according to their predicted class labels. You should also see the
classification accuracy printed in the console.
The output will depend on the random state used to generate the synthetic
@seismicisolation
@seismicisolation
dataset, but you should expect a reasonably high accuracy since the data is well-
separated into two clusters.
This illustration exhibits the utilization of the Gaussian Naive Bayes classifier in
the Python programming language. Additionally, it showcases the visualization of
the data prior to and subsequent to the application of the algorithm. Furthermore,
the evaluation of the performance of the algorithm is conducted by employing the
accuracy metric.
→Fig. 5.31 shows a scatter plot visualizing the dataset after applying the Naïve
Bayes classification algorithm. This plot demonstrates how the algorithm has
classified the data points into different classes based on their features, providing
insights into the effectiveness of the Naïve Bayes classifier in separating the data
points according to their characteristics.
Let’s consider another Python code example that implements Gaussian Naive
Bayes classification on the iris dataset from scikit-learn.
Steps:
Firstly, the necessary libraries should be imported. These include numpy for
@seismicisolation
@seismicisolation
performing numerical operations, matplotlib.pyplot for generating plots,
load_iris from sklearn datasets for loading the iris dataset, GaussianNB from
sklearn naive_bayes for implementing the Gaussian Naive Bayes classifier,
train_test_split from sklearn model_selection for splitting the data into train
and test sets, and accuracy_score from sklearn metrics for calculating the
classification accuracy.
To load the iris dataset, the load_iris() function from scikit-learn can be
utilized. To focus on visualization, only the first two features, namely sepal
length and sepal width, are selected.
In order to divide the data into training and test sets, the train_test_split
function is employed. In this particular case, 80% of the data is allocated for
training purposes, while the remaining 20% is designated for testing.
To visualize the training data prior to applying the Naive Bayes algorithm,
the plt scatter function can be used. This will provide a visual representation
of the data and the separability of the classes.
By creating an instance of the GaussianNB classifier and fitting it to the
training data using the gnb.fit(X_train, y_train) syntax, the algorithm can be
implemented.
To make predictions on the test set, the y_pred = gnb predict(X_test) code can
be executed.
Once the predictions are made, the accuracy of the classification can be
calculated using the accuracy_score(y_test, y_pred) function. The result can
then be printed.
Finally, to visualize the test data after applying the Naive Bayes algorithm,
the predicted class labels, y_pred, can be used to color the points. This can be
achieved by plotting the test data and assigning colors based on the
predicted labels.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the iris dataset
iris = load_iris()
X = iris.data[:, :2] # We only take the first two features for visualization
y = iris.target
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_st
@seismicisolation
@seismicisolation
# Plot the data before applying Naive Bayes
plt.figure(figsize=(10, 6))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='viridis', edgecolor='k'
plt.title('Iris Data before applying Naive Bayes')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.show()
# Train the Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# Make predictions on the test set
y_pred = gnb.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
# Plot the data after applying Naive Bayes
plt.figure(figsize=(10, 6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='viridis', edgecolor='k')
plt.title('Iris Data after applying Naive Bayes')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.show()
→Fig. 5.32 displays a visualization of the IRIS dataset before applying the Naïve
Bayes classification algorithm.
@seismicisolation
@seismicisolation
Fig. 5.32: IRIS data before Naïve Bayes.
Accuracy: 0.90
The first plot shows the iris data before applying Naive Bayes, where the points
are colored according to their true class labels. The second plot shows the data
after applying Naive Bayes, with the points colored according to their predicted
class labels. The performance of the Gaussian Naive Bayes classifier in
distinguishing the three classes using the sepal length and sepal width features is
deemed to be satisfactory.
→Fig. 5.33 presents a visualization of the IRIS dataset after applying the Naïve
Bayes classification algorithm.
@seismicisolation
@seismicisolation
Fig. 5.33: IRIS data after Naïve Bayes.
The Multinomial Naive Bayes algorithm is a variation of the Naive Bayes algorithm
that is excellently suited for tasks involving the classification of text. In these tasks,
the features are indicative of the occurrence frequency of words or tokens within a
document.
The key assumptions made by Multinomial Naive Bayes are:
1. Calculate the prior probabilities of each class from the training data, P(class).
2. For each class, calculate the likelihood of observing a particular word count
@seismicisolation
@seismicisolation
vector by modeling it as a multinomial distribution:
1. The probabilities pi are estimated from the training data as (count of word i
in class + alpha) / (total word count in class + alpha ✶ vocabulary size), where
alpha is a smoothing parameter to avoid zero probabilities.
2. To categorize a novel record, one must compute the posterior probability for
each category by utilizing Bayes’ theorem:
Review Label
“great movie loved it” Positive
“terrible acting awful plot” Negative
“amazing visuals good story” Positive
“boring predictable waste of time” Negative
@seismicisolation
@seismicisolation
To commence, it is imperative to construct a lexicon comprising solely distinctive
terms derived from the dataset: Vocabulary = [“great”, “movie”, “loved”, “it”,
“terrible”, “acting”, “awful”, “plot”, “amazing”, “visuals”, “good”, “story”,
“boring”, “predictable”, “waste”, “of”, “time”]
Next, we calculate the prior probabilities of each class from the training data:
Now, we calculate the likelihood probabilities P(word|class) for each word and
class. To avoid zero probabilities, we use additive smoothing with α = 1.
For the Positive class:
Now, let’s classify a new review: “good movie but boring plot.”
We determine the posterior probability for each category by employing Bayes’
theorem and the multinomial likelihood.
P(Positive|review) = (P(good|Positive) ✶ P(movie|Positive) ✶ P(but|Positive) ✶
P(boring|Positive) ✶ P(plot|Positive) ✶ P(Positive)) / P(review) P(Negative|review)
= (P(good|Negative) ✶ P(movie|Negative) ✶ P(but|Negative) ✶
P(boring|Negative) ✶ P(plot|Negative) ✶ P(Negative)) / P(review)
Plugging in the calculated probabilities, we get:
@seismicisolation
@seismicisolation
Since P(Positive|review) > P(Negative|review), we would classify this new review as
“Positive.”
This particular example serves to demonstrate the computations that are
required in the process of training and utilizing a Multinomial Naive Bayes
classifier for the purpose of text classification. In practice, additional preprocessing
steps like tokenization, stopword removal, and feature selection may be required
for better performance.
Below is the Python code for Multinomial Naive Bayes:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
# Sample data
X = np.array([['young', 'yes', 'no', 'good'],
['young', 'yes', 'no', 'poor'],
['old', 'yes', 'yes', 'good'],
['old', 'yes', 'yes', 'poor'],
['young', 'no', 'no', 'good'],
['young', 'no', 'yes', 'poor'],
['old', 'no', 'yes', 'good'],
['old', 'no', 'yes', 'poor']])
y = np.array([1, 0, 1, 0, 1, 0, 0, 0])
# Label encode categorical data
label_encoder = LabelEncoder()
X_encoded = np.empty(X.shape)
for i in range(X.shape[1]):
X_encoded[:, i] = label_encoder.fit_transform(X[:, i])
# Split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, r
# Model training
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
# Predictions
y_pred = mnb.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
@seismicisolation
@seismicisolation
print("Accuracy: ", accuracy)
Accuracy: 0.5
@seismicisolation
@seismicisolation
arsenal, extensively employed in diverse domains to confront arduous prediction
tasks and achieve cutting-edge performance.
@seismicisolation
@seismicisolation
performance and robustness.
Below is the Python code for Bagging algorithm:
Generate synthetic data with two features and two classes using the
make_classification function from sklearn, initially.
Then, proceed to split the data into training and testing sets using the
train_test_split method.
Before applying Bagging, visualize the data through plotting.
Prior to Bagging, train a Random Forest classifier on the training data and
make predictions on the test set.
Display the accuracy before Bagging.
After Bagging, train multiple Random Forest classifiers with Bagging and
overlay their predictions on the plot, hence plotting the data again.
Lastly, exhibit both the plots before and after Bagging.
@seismicisolation
@seismicisolation
# Plot after bagging
plt.figure(figsize=(10, 7))
plt.title("After Bagging: Accuracy= "+str(bag_acc))
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.scatter(X[:,0], X[:,1], c=model.predict(X))
plt.show()
→Fig. 5.34 illustrates a plot representing the dataset before applying the bagging
ensemble technique.
→Fig. 5.35 demonstrates a plot representing the dataset after applying the
bagging ensemble technique.
@seismicisolation
@seismicisolation
Fig. 5.35: Accuracy after Bagging.
The result of executing this code will yield two graphical representations:
Plot before Bagging: Displays the spread of the data points prior to the
implementation of Bagging.
Plot after Bagging: Depicts the spread of the data points after the application
of Bagging with numerous Decision Tree classifiers.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate synthetic dataset
X, y = make_classification(n_samples = 1000, n_features=10, n_classes=2, random_st
@seismicisolation
@seismicisolation
# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_st
# Base GBM model
gbc = GradientBoostingClassifier(learning_rate=0.1, n_estimators=100, max_depth=3,
gbc.fit(X_train, y_train)
# Accuracy
acc = gbc.score(X_test, y_test)
print("Accuracy: ", acc)
# Plot Before Boosting
plt.figure(figsize = (6, 6))
plt.scatter(X[:,0], X[:,1], c = y)
plt.title("Before Boosting")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Plot After Boosting
plt.figure(figsize = (6, 6))
plt.scatter(X[:,0], X[:,1], c = gbc.predict(X))
plt.title("After Boosting")
plt.xlabel('Feature 1'); plt.ylabel('Feature 2')
plt.show()
Accuracy: 0.995
→Fig. 5.36 illustrates a plot representing the dataset before applying the
boosting ensemble technique.
@seismicisolation
@seismicisolation
Fig. 5.36: Accuracy before Boosting.
@seismicisolation
@seismicisolation
a. Train a base or weak learner model using the weighted training data.
b. Evaluate the model’s efficacy on the training data through a meticulous
comparison between its predictions and the corresponding actual
values.
c. Compute the alpha parameter, which denotes the significance of the
model based on its error.
d. Increase the weights of the samples that the model incorrectly
predicted.
e. Decrease the weights of the samples that the model correctly predicted.
3. Repeat the aforementioned procedures for a designated quantity of
estimators or iterations.
4. The final model combines each weak learner model using the alpha weights.
→Fig. 5.37 illustrates a plot representing the dataset after applying the boosting
ensemble technique
@seismicisolation
@seismicisolation
Fig. 5.37: Accuracy after Boosting.
The key concept behind AdaBoost is that it assigns more importance to the
samples that were misclassified by the previous models in each round. Hence, the
subsequent models aim to rectify the errors made by their predecessors.
The alpha parameter governs the contribution of each weak learner to the final
strong learner model. A higher alpha signifies better models.
By training successive models on the errors made by previous models and
combining multiple weak models, AdaBoost mitigates bias and variance, resulting
@seismicisolation
@seismicisolation
in improved performance.
Some advantages of AdaBoost include its ease of implementation, minimal
need for tuning, and compatibility with various simple weak learner models,
thereby yielding strong performance.
The Python library used for AdaBoost is
sklearn.ensemble.AdaBoostClassifier or
sklearn.ensemble.AdaBoostRegressor, depending on whether you’re working
on a classification or regression problem. These classes are included in the scikit-
learn library, which is widely recognized as a prominent machine learning library in
the Python programming language.
Here’s a brief explanation of the key components of the AdaBoostClassifier
and AdaBoostRegressor classes:
1. AdaBoostClassifier:
This class is used for classification tasks.
It implements the AdaBoost algorithm for classification.
The main parameters include base_estimator, n_estimators,
learning_rate, and algorithm.
The base_estimator parameter specifies the base learner to be used for
training (default is a decision tree).
The parameter n_estimators determines the quantity of boosting
rounds, which refers to the utilization of weak learners.
The contribution of each weak learner to the final prediction is regulated
by the learning_rate.
algorithm specifies the algorithm used for boosting (SAMME or
SAMME.R).
After undergoing training, the model possesses the ability to forecast
the classification labels of novel data points.
2. AdaBoostRegressor:
This class is used for regression tasks.
It implements the AdaBoost algorithm for regression.
It has similar parameters to AdaBoostClassifier, but it is used for
predicting continuous target variables instead of discrete class labels.
After training, the model can predict the continuous target values of new
data points.
@seismicisolation
@seismicisolation
numerous less influential models. The following are the essential stages:
The key difference from AdaBoost is that each new model in Gradient Boosting
tries to correct the residual errors from previous step rather than focusing on
misclassified examples.
Learning rate shrinks the contribution of each model to prevent overfitting.
Tree depth is also kept small.
Gradient descent-like improvement along residual errors gradient leads to
strong overall prediction. Combining multiple additive models yield robust
performance despite weak individual models.
Advantages are built-in regularization and handling variety of data. But can
overfit if not properly tuned.
The Python library used for Gradient Boosting is
sklearn.ensemble.GradientBoostingClassifier for classification tasks and
sklearn.ensemble.GradientBoostingRegressor for regression tasks. This library
constitutes an integral component of the scikit-learn (sklearn) package, renowned
for its widespread adoption as a Python-based machine learning library.
Here’s a brief explanation of the key components of the Gradient Boosting
library:
@seismicisolation
@seismicisolation
the loss function (loss), among other parameters.
4. Ensemble learning: Gradient Boosting is a method of ensemble learning
that sequentially combines several weak learners, most commonly decision
trees. The subsequent models in this technique aim to rectify the mistakes
made by their predecessors, ultimately leading to the development of a
powerful learner.
5. Gradient Boosting algorithm: Gradient Boosting constructs a sequence of
trees in an ensemble. In each iteration, a new tree is fitted to the residuals
from the previous iteration. The process of fitting involves minimizing a loss
function through the use of gradient descent.
6. Feature Importance: Gradient Boosting offers a feature importance
attribute, enabling users to comprehend the significance of each feature in
the prediction procedure.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2,
n_clusters_per_class=1, n_redundant=0, random_state=42)
@seismicisolation
@seismicisolation
y_pred_grad = grad_boost.predict(X_test)
# Calculate accuracy scores
accuracy_ada = accuracy_score(y_test, y_pred_ada)
accuracy_grad = accuracy_score(y_test, y_pred_grad)
# Plotting
plt.figure(figsize=(18, 5))
# Before Boosting
plt.subplot(1, 3, 1)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', marker='o',
plt.title('Before Boosting')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# After AdaBoost
plt.subplot(1, 3, 2)
plt.scatter(X_train[:, 0], X_train[:, 1], c=ada_boost.predict(X_train), cmap='cool
plt.title('After AdaBoost')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# After Gradient Boosting
plt.subplot(1, 3, 3)
plt.scatter(X_train[:, 0], X_train[:, 1], c=grad_boost.predict(X_train), cmap='coo
plt.title('After Gradient Boosting')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.tight_layout()
plt.show()
@seismicisolation
@seismicisolation
Summary
Linear regression is utilized to model the correlation between a dependent
variable and one or more independent variables by means of a linear
equation.
Logistic regression is employed for binary or multiclass classification tasks,
estimating the likelihood of a sample belonging to a specific class.
Decision trees and random forests are utilized for both classification and
regression tasks, creating structures resembling trees to make decisions
based on the values of features.
Support vector machines are employed for classification and regression
tasks, particularly effective in spaces with a high number of dimensions and
with intricate datasets.
k-Means clustering is used for unsupervised learning tasks, clustering data
points into k clusters based on their similarity. Principal component analysis
is applied for dimensionality reduction, transforming data with a high
number of dimensions into a lower-dimensional space while retaining the
variation.
Naive Bayes is used for classification tasks, particularly in text categorization
and spam filtering, based on Bayes’ theorem with strong assumptions of
independence.
Ensemble methods like boosting and bagging are used to improve model
performance by combining multiple weak learners into a strong learner
through either boosting (iterative improvement) or bagging (parallel
improvement).
The gradient descent method is applied for optimizing model parameters by
iteratively minimizing a cost function using gradient information.
The matrix inversion method is utilized for solving linear regression
problems by directly computing the parameter estimates using matrix
operations.
Polynomial regression is used for capturing non-linear relationships between
features and targets by fitting polynomial functions to the data.
Regularization in logistic regression is employed to prevent overfitting in
logistic regression models by penalizing large parameter values.
Clustering basics are used to identify natural groupings in data, assisting in
data exploration and segmentation tasks.
Multiclass classification is employed to forecast numerous categories within
a solitary undertaking, frequently implemented in the domains of image
recognition, natural language processing, and sentiment analysis.
The comprehensive analysis of bagging and boosting is employed to
enhance the resilience and generalizability of models by amalgamating
@seismicisolation
@seismicisolation
predictions from an assortment of models, each of which is trained on
distinct subsets of the data.
Exercise (MCQs)
1.
c) Regression
d) Dimensionality reduction
2.
c) Clustering tasks
d) Dimensionality reduction
3.
c) Clustering
d) Dimensionality reduction
4.
@seismicisolation
@seismicisolation
b) Low-dimensional spaces
c) Dimensionality reduction
d) Clustering
6.
c) Model evaluation
d) Clustering
7.
c) Decision trees
d) SVM
8.
@seismicisolation
@seismicisolation
b) A single strong learner
c) Linear models
d) Non-linear models
9.
d) Dimensionality reduction
11.
c) Clustering
d) Dimensionality reduction
12.
@seismicisolation
@seismicisolation
b) Decrease overfitting
c) Increase bias
d) Decrease variance
13.
c) Reinforcement learning
d) Semi-supervised learning
14.
c) Continuous values
d) Non-numeric data
15.
c) Increase overfitting
d) Reduce variance
Answers
1. c) Regression
2. b) Classification
@seismicisolation
@seismicisolation
3. b) Classification
4. a) High-dimensional spaces
5. d) Clustering
6. b) Dimensionality reduction
7. a) Bayes’ theorem
8. a) Multiple weak learners
9. a) Minimizing a cost function
10. c) Solving linear regression
11. b) Modeling non-linear relationships
12. b) Decrease overfitting
13. b) Unsupervised learning
14. b) Multiple classes
15. a) Improve model performance
@seismicisolation
@seismicisolation
similar data points together.
14. Multiclass classification involves predicting the ____________ of an input from
more than two classes.
15. Bagging and boosting are ensemble methods used to ____________ model
performance by combining multiple weak learners.
Answers
1. dependent, independent
2. classification
3. features
4. hyperplane
5. distinct
6. dimensionality reduction, visualization
7. Bayes’, independent
8. weak, performance
9. optimize
10. inverse
11. non-linear, complex
12. overfitting, cost
13. unsupervised
14. class
15. improve
Descriptive Questions
1. Describe the difference between supervised and unsupervised learning.
2. Explain the concept of overfitting and how it can be mitigated.
3. Describe the difference between classification and regression tasks.
4. Explain the main idea behind decision trees and how they make predictions.
5. Describe the concept of support vectors in Support Vector Machines (SVM).
6. Explain the k-means clustering algorithm and how it works.
7. Describe the purpose of Principal Component Analysis (PCA) in
dimensionality reduction.
8. Explain how Naive Bayes classifiers work and the assumptions they make.
9. Describe the difference between bagging and boosting ensemble methods.
10. Explain the concept of regularization in logistic regression and its purpose.
11. Use the Boston house prices dataset (from sklearn.datasets import
load_boston) to predict house prices based on features like average number
of rooms and crime rate.
12. Use the Iris dataset (from sklearn.datasets import load_iris) to classify iris
@seismicisolation
@seismicisolation
flowers into different species based on sepal and petal measurements.
13. Use the Iris dataset to build a decision tree classifier to predict the species of
an iris flower.
14. Generate a synthetic dataset using make_blobs from sklearn.datasets and
apply k-means clustering to identify clusters.
15. Apply PCA to the Iris dataset to reduce the dimensionality of the data and
visualize the transformed data.
@seismicisolation
@seismicisolation
Chapter 6 Advanced Machine Learning
Techniques
High Accuracy
GBTs can often achieve high accuracy on a variety of tasks, even with complex
datasets.
Flexibility
GBTs can be adapted to a wide range of tasks by changing the type of weak
learner, the loss function, and other hyperparameters.
Faster Interpretability
Unlike some other machine learning models, GBTs can be relatively easy to
interpret, which can be helpful for understanding why the model is making certain
predictions.
Quick Response
@seismicisolation
@seismicisolation
If we work with XGBoost, there is no need for data preprocessing.
With XGBoost, any type of data is accepted for training and testing the model.
Faster Prediction
@seismicisolation
@seismicisolation
Drawing 6.1: Decision tree.
Any way there is no doubt on the GBTs that are very good in the field of deep
learning still it has some of the disadvantages like a coin. In further development
boosting can be optimized by adequate loss function. This works on the concept of
@seismicisolation
@seismicisolation
“A big set of weak learner can create one strong learner”. XGBoost is coming from
Gradient Boosted decision tree algorithm. Decision tree is always known as weak
learner that why many experiments happened with it. Still we can say that there
are still some drawbacks in algorithms.
@seismicisolation
@seismicisolation
import xgboost as xgb
# Create regression matrices
dtrain_reg = xgb.DMatrix(X_train, y_train, enable_categorical=True)
dtest_reg = xgb.DMatrix(X_test, y_test, enable_categorical=True)
In the above code, we have seen step-by-step preparation of the dataset and
getting ready for training and testing data. The percentage of the dataset for
training and testing may be changed according to the application. But in many of
the cases, we are keeping 20% for testing and 80% for training purpose. In
gradient boosting, at each step, a new weak model is trained to predict the “error”
of the current strong model and we know that the error is a difference between
the predicted value and the expected value. The low error model is known better
than the higher error rate:
F |i+1 = F |i − f|i
where Fi+1 is the final error calculated, Fi is the strong model at step i, and fi is the
weak model at step I. This operation keeps repeating until it meets the given
maximum accuracy. Some of the points are given below, which will help us to
understand the internal working mechanism of GBTs:
1. GBT works by iteratively building the decision trees. Each tree is trained to
improve upon the predictions of the previous tree, and the final prediction is
made by combining the predictions of all the trees.
2. The loss function is a measure of how well the model’s predictions fit the
data. The loss function is used to train each tree, and it is also used to
determine when to stop training the model.
3. GBTs have a number of hyperparameters that can be tuned to improve the
model’s performance. These hyperparameters include the number of trees,
the learning rate, and the maximum depth of the trees.
XGBoost, short for eXtreme Gradient Boosting, is a specific and very popular
implementation of GBTs. In the previous code, we have seen that data collection
and sampling by specified percentage for training and testing datasets. Now let’s
have a look at how to calculate mean squared error (MSE) for better accuracy of
the trained model.
import numpy as np
mse = np.mean((actual - predicted) ✶✶ 2)
@seismicisolation
@seismicisolation
rmse = np.sqrt(mse)
# Define hyperparameters
params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}
n = 100
model = xgb.train(
params=params,
dtrain=dtrain_reg,
num_boost_round=n,
)
from sklearn.metrics import mean_squared_error
preds = model.predict(dtest_reg)
rmse = mean_squared_error(y_test, preds, squared=False)
print(f"RMSE of the base model: {rmse:.3f}")
In general, we can say that all these boosting algorithms share a common
principle, use boosting to create an ensemble of learners, and not only inherit the
strengths of GBTs like high accuracy, flexibility, and interpretability, but also boast
several improvements and unique features:
Due to its impressive performance and flexibility, XGBoost has become widely
adopted across various domains, including:
@seismicisolation
@seismicisolation
3. Manufacturing: Predicting machine failures, optimizing production
processes, and improving quality control
4. Healthcare: Predicting disease diagnoses, analyzing medical images, and
personalizing treatment plans
Gradient boosting techniques deal with a biggest problem called bias. Usually, we
get one issue with the decision tree that underfits the data. It splits the dataset
only few of times in an attempt to separate the data but technically we can divide
into small two pieces. This is how we can improve the performance of the model.
@seismicisolation
@seismicisolation
Random forest is a good example here, where it prunes and grows tree based on
the required data. Bagging is the technique here to reduce overall variance of the
algorithm implementation:
m
H(x)= ∑ αj hj (x)
j=1
where αj is the rate of learning, hj(x) is a weak learner, and if we make it a sum of
all this, then it becomes more powerful than an ensemble of weak learners. Let’s
see another code for extreme gradient boost code in production environment that
will produce a better accuracy in comparing with the previous example. Here, we
have given the complete code that starts from the beginning to end:
@seismicisolation
@seismicisolation
mse = np.mean((actual - predicted) ✶✶ 2)
rmse = np.sqrt(mse)
# Define hyperparameters
params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}
n = 100
model = xgb.train(
params=params,
dtrain=dtrain_reg,
num_boost_round=n,
)
from sklearn.metrics import mean_squared_error
preds = model.predict(dtest_reg)
rmse = mean_squared_error(y_test, preds, squared=False)
print(f"RMSE of the base model: {rmse:.3f}")
params = {"objective": "multi:softprob", "tree_method": "gpu_hist", "num_class"
n = 1000
results = xgb.cv(
params, dtrain_clf,
num_boost_round=n,
nfold=5,
metrics=["mlogloss", "auc", "merror"],
)
results.keys()
Index(['train-mlogloss-mean', 'train-mlogloss-std', 'train-auc-mean',
'train-auc-std', 'train-merror-mean', 'train-merror-std',
dtype='object')
results['test-auc-mean'].max()
import xgboost as xgb
# Train a model using the scikit-learn API
xgb_classifier = xgb.XGBClassifier(n_estimators=100, objective='binary:logistic'
xgb_classifier.fit(X_train, y_train)
# Convert the model to a native API model
model = xgb_classifier.get_booster()
@seismicisolation
@seismicisolation
Ensemble learning is a model that makes prediction based on a number of
different models. By combining a number of different models, ensemble learning
tends to be more flexible with less bias and less variance or data sensitivity. In this
segment, begging and boosting is the most common approach, where bagging is
a training bunch of models in a parallel way and learns from a random subset of
the data. Let’s have a look on the following code that shows the actual
implementation of a gradient boosting classifier on 5,000 random numbers:
The above code will gives us the score 0.9274285714285714, which is best in the
system in the initial stage. If we will train this model again, then the accuracy will
improve because of the begging and boosting technique.
@seismicisolation
@seismicisolation
Fig. 6.1: Boosting iterations and deviation in training model.
The Fig 6.1 shows plot of the Boosting iterations and deviation in training model.
The best example of bagging is random forest. On the other hand, boosting
means training a bunch of models sequentially, where every model learns from
previous mistakes. The best example for the boosting is gradient boosting tree.
Let’s have a look in the picture given below
@seismicisolation
@seismicisolation
Drawing 6.2: Bagging method.
The Drawing 6.2 shows the Bagging method. The boosting technique combines
weak learners sequentially so that every new tree corrects the error of the
previous one. There are several different loss functions but for multiclass
classification, cross-entropy is a very popular option. The cross-entropy of the
distribution q relative to a distribution p over a given set is defined in the cross-
entropy formula:
Finally, we can say that boosting is a core concept in XGBoost and plays a crucial
role in its impressive performance and capabilities. It is completely different from
traditional gradient boosting line. It builds an ensemble of weak learners (often
decision trees) sequentially. Each new learner focuses on correcting the errors
made by previous learners and uses the gradient of the loss function to guide the
learning process. Some of the key components of XGBoost are given below:
@seismicisolation
@seismicisolation
significantly speeding up training for large datasets.
Sparse data handling: Efficiently handles data with many missing values,
making it suitable for real-world scenarios.
There are few steps to boost the performance of the XGBoost, and some of
them are given below:
Initialize: Begin with a constant prediction (e.g., mean of target variable).
The Drawing 6.3 shows the Boosting algorithm.
Iteration
a. Calculate the residual (difference between actual and predicted values) for
each data point.
b. Build a new decision tree on the residuals, focusing on reducing errors.
c. Update the final prediction by weighting the new tree’s predictions based on
the learning rate.
d. Repeat: Iterate steps a-c until a stopping criterion is met (e.g., maximum
iterations, minimal progress).
1. Finance: Fraud detection, credit risk assessment, and stock price prediction
2. E-commerce: Product recommendation, customer churn prediction, and
anomaly detection
3. Natural language processing (NLP): Text classification, sentiment analysis,
and machine translation
4. Computer vision: Image classification, object detection, and image
segmentation
@seismicisolation
@seismicisolation
Both algorithms are widely used in the different sectors still we have some
comparisons below:
The Tab. 6.1 shows the Comparison of LightGBM and XGBoost algorithms with
respect to speed, memory usage, accuracy and ease of use. Let’s have a look by
hands-on coding, and a sample dataset snapshot is given below. Before execution
of this code, do not forget to install “lightbgm” library by using the following
command:
Below code will display as training accuracy 0.9647 and testing accuracy 0.8163. It
may change in your computer according to the dataset size:
@seismicisolation
@seismicisolation
Fig. 6.2: Dataset for model training with LightBGM.
The Fig 6.2 shows the Dataset for model training with LightBGM.
import pandas as pd
from sklearn.model_selection import train_test_split
import lightbgm as lgb
data = pd.read_csv("SVMtrain.csv")
# To define the input and output feature
x = data.drop(['Embarked', 'PassengerId'], axis=1)
y = data.Embarked
# train and test split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33,
@seismicisolation
@seismicisolation
model = lgb.LGBMClassifier(learning_rate=0.09, max_depth=-5, random_state
model.fit(x_train, y_train, eval_set=[(x_test, y_test), (x_train, y_train)],
verbose=20, eval_metric='logloss')
print('Training accuracy {:.4f}'.format(model.score(x_train, y_train)))
print('Testing accuracy {:.4f}'.format(model.score(x_test, y_test)))
Components of LightGBM
@seismicisolation
@seismicisolation
LightGBM models. It is specifically designed for gradient boosting algorithms and
might not be directly applicable to other machine learning models.
1. Calculate gradients: For each data point, the algorithm calculates the
gradient of the loss function with respect to the current model’s predictions.
Imagine a landscape where the loss function represents the valleys and hills,
and the gradient points you in the direction of steepest descent.
2. Sort by gradients: Data points are then sorted based on the absolute
magnitude of their gradients. Points with larger gradients (meaning they are
farther away from the ideal prediction) are considered more “important” for
training.
Selective Sampling
1. Large gradients: All data points with large gradients are retained for
training. These points contain valuable information for improvement.
2. Small gradients: Points with small gradients are randomly sampled with a
certain probability. This maintains diversity in the training set while
focusing on more informative points.
EFB is a valuable technique for improving the efficiency and potentially the
accuracy of LightGBM models, especially when dealing with large datasets and
high-dimensional feature spaces. However, it is important to consider its
limitations and evaluate its effectiveness in your specific context. Steps are given
below:
@seismicisolation
@seismicisolation
Kernel methods are a powerful technique used in various machine learning tasks
like classification, regression, and clustering. They offer a way to effectively handle
nonlinear data by implicitly transforming it into a higher dimensional space where
complex relationships become more apparent. Imagine trying to separate
different colored dots on a two-dimensional plane. Let’s have a look on the code
below:
By just applying the image classifier as a kernel method, update the pixels in the
memory-loaded photograph. If we apply the following changes, then we will some
black image that is the way of extracting important and required features from
image. This image will look like the below snapshot:
@seismicisolation
@seismicisolation
Fig. 6.3: Author’s original image.
The Fig 6.3 shows the Author’s original image before applying the kernel method.
If the dots are linearly separable (e.g., a straight line can divide them),
traditional linear algorithms like linear regression or linear support vector
@seismicisolation
@seismicisolation
machines (SVMs) work well. However, what if the dots form a more complex
pattern, like a circle or a spiral? Linear algorithms would not be able to effectively
separate them.
@seismicisolation
@seismicisolation
The Fig 6.4 shows the Author’s image after applying the kernel method.
Kernel methods come to the rescue! They implicitly map the data to a higher
dimensional space where the separation becomes linear. This mapping is done
using a mathematical function called a kernel.
1. Kernel function: This function takes two data points as input and calculates
a similarity measure between them. Different kernels exist for different data
types and problems (e.g., linear kernel, Gaussian kernel, and polynomial
kernel).
2. Feature space: The kernel function essentially represents an inner product
in a high-dimensional feature space, even though we never explicitly
compute the coordinates of the data points in that space.
3. Linear algorithm: A standard linear algorithm, like a linear SVM or linear
regression, operates in this high-dimensional space using the similarity
measure provided by the kernel.
The “kernel trick” is a key aspect of using kernel methods effectively in machine
learning. It refers to the clever way that kernel methods work with data without
explicitly transforming it into high-dimensional space, saving both computational
time and memory. Imagine you have data points that are not linearly separable in
their original lower dimensional space. To use a linear algorithm for classification
or regression, you would need to explicitly map the data into a higher dimensional
space, where it becomes linearly separable. However, this transformation can be
computationally expensive and memory-intensive for large datasets.
Instead of explicitly performing the transformation, the kernel trick leverages a
special function called a kernel. This kernel function takes two data points as input
and computes a measure of their similarity based on their inner product in the
high-dimensional space.
@seismicisolation
@seismicisolation
relationships between data points. Choosing the right kernel is crucial for the
performance of the model.
The kernel tricks will help us by selecting the right kernel, and its hyperparameters
are crucial for good performance. We can use it to understand the model behavior
in the high-dimensional space, which can be challenging. Kernel methods can be
prone to overfitting if not regularized properly. The kernel trick is a powerful
technique that unlocks the potential of kernel methods in machine learning. By
efficiently capturing data similarity in a high-dimensional space without explicit
computations, it enables flexible and powerful solutions for nonlinear problems.
The radial basis function (RBF) kernel, also known as the Gaussian kernel, is one of
the most popular and versatile kernels used in machine learning, particularly with
SVMs and other kernel methods. It excels at handling nonlinear data, making it a
valuable tool for various tasks like classification, regression, and clustering.
Imagine data points scattered in a two-dimensional plane. If the data forms a
straight line, a linear kernel can separate them easily. But what if the data forms a
circle or a more complex shape? That’s where RBF comes in. It implicitly maps the
data points to a higher dimensional space, where the separation becomes more
linear. This mapping is done by calculating the similarity between each pair of data
points based on their Euclidean distance. Points closer in the original space have a
higher similarity in the high-dimensional space, represented by a larger kernel
value. The interpolant that takes the form of a weighted sum of RBF interpolation
is a mesh-free method, meaning the nodes (points in the domain) need not lie on a
structured grid, and does not require the formation of a mesh. It is often spectrally
accurate and stable for large numbers of nodes even in high dimensions.
The Fig 6.5 shows the Different stages of RBF optimization.
@seismicisolation
@seismicisolation
Fig. 6.5: Different stages of RBF optimization.
There are many types of RBFs available and some of them are given below.
import pandas as pd
@seismicisolation
@seismicisolation
import numpy as np
from keras.layers import Layer
from keras import backend as K
class RBFLayer(Layer):
def __init__(self, units, gamma, ✶✶kwargs):
super(RBFLayer, self).__init__(✶✶kwargs)
self.units = units
self.gamma = K.cast_to_floatx(gamma)
def build(self, input_shape):
# print(input_shape)
# print(self.units)
self.mu = self.add_weight(name='mu',
shape=(int(input_shape[1]), self.units),
initializer='uniform',
trainable=True)
super(RBFLayer, self).build(input_shape)
def call(self, inputs):
diff = K.expand_dims(inputs) - self.mu
l2 = K.sum(K.pow(diff, 2), axis=1)
res = K.exp(-1 ✶ self.gamma ✶ l2)
return res
def compute_output_shape(self, input_shape):
return (input_shape[0], self.units)
# following dataset can be download from the URL
# →https://www.kaggle.com/datasets/anokas/kuzushiji?resource=download&select=k49-
# →https://www.kaggle.com/datasets/anokas/kuzushiji?resource=download&select=k49-
X = np.load('k49-train-imgs.npz')['arr_0']
y = np.load('k49-train-labels.npz')['arr_0']
y = (y <= 25).astype(int)
from keras.layers import Dense, Flatten
from keras.models import Sequential
from keras.losses import binary_crossentropy
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(RBFLayer(10, 0.5))
model.add(Dense(1, activation='sigmoid', name='foo'))
model.compile(optimizer='rmsprop', loss=binary_crossentropy)
model.fit(X, y, batch_size=256, epochs=3)
==================================Output===================================
@seismicisolation
@seismicisolation
Epoch 1/3
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\utils\tf_utils.py:492: The name
tf.ragged.RaggedTensorValue is deprecated. Please use
tf.compat.v1.ragged.RaggedTensorValue instead.
Epoch 2/3
Epoch 3/3
===========================================================================
@seismicisolation
@seismicisolation
connected neighbors, allowing for irregular-shaped clusters and handling
noise effectively. It is a powerful clustering algorithm that goes beyond the
limitations of k-means. We can use DBSCAN in anomaly detection (noise
identification), image segmentation, customer segmentation, market
research, and scientific data analysis. DBSCAN is a valuable tool in your data
analysis toolkit, but understanding its strengths and limitations is crucial for
effective application.
The Fig 6.6 shows plot of Sorted observation versus k-NN distance.
Let’s have a look on the code below, which is implementing DBSCN algorithm:
import pandas as pd
from numpy import array
df = pd.read_csv("→https://reneshbedre.github.io/assets/posts/tsne/tsne_scores.csv
df.head(2)
@seismicisolation
@seismicisolation
# check the shape of dataset
print(df.shape)
import numpy as np
from sklearn.neighbors import NearestNeighbors
# n_neighbors = 5 as kneighbors function returns distance of point to itself (i.e. first column will
nbrs = NearestNeighbors(n_neighbors = 5).fit(df)
# Find the k-neighbors of a point
neigh_dist, neigh_ind = nbrs.kneighbors(df)
# sort the neighbor distances (lengths to points) in ascending order # axis = 0 represents sort al
sort_neigh_dist = np.sort(neigh_dist, axis = 0)
import matplotlib.pyplot as plt
k_dist = sort_neigh_dist[:, 4]
plt.plot(k_dist)
plt.ylabel("k-NN distance")
plt.xlabel("Sorted observations (4th NN)")
plt.show()
from kneed import KneeLocator
kneedle = KneeLocator(x = range(1, len(neigh_dist)+1), y = k_dist, S =
curve = "concave", direction = "increasing", online=True)
# get the estimate of knee point
print(kneedle.knee_y)
kneedle.plot_knee()
plt.show()
from sklearn.cluster import DBSCAN
clusters = DBSCAN(eps = 4.54, min_samples = 4).fit(df)
# get cluster labels
clusters.labels_
# check unique clusters
set(clusters.labels_)
from collections import Counter
Counter(clusters.labels_)
import seaborn as sns
import matplotlib.pyplot as plt
p = sns.scatterplot(data = df, x = "t-SNE-1", y = "t-SNE-2", hue = clusters.labels_
sns.move_legend(p, "upper right", bbox_to_anchor = (1.17, 1.), title = 'Clusters'
plt.show()
@seismicisolation
@seismicisolation
Fig. 6.6: Sorted observation versus k-NN dist ance.
@seismicisolation
@seismicisolation
Fig. 6.7: Find knee point.
@seismicisolation
@seismicisolation
Fig. 6.8: Final clustering after implementation of DBSCAN.
Parameter sensitivity: Choosing optimal values for ε and MinPts can impact
results and require some experimentation.
Curse of dimensionality: Can be less effective in high-dimensional data due
to the influence of distance calculations.
Clustering by fast search and ordering points (CLARA): This algorithm
partitions data into k clusters while minimizing the cost of moving points
between clusters, leading to compact and well-separated clusters. This is a
density-based clustering algorithm known for its efficiency and ability to
identify clusters without a predetermined number. CLARA can be used in
anomaly detection (noise identification), image segmentation, customer
segmentation, market research, and scientific data analysis.
The Fig 6.8 shows plot of Final clustering after implementation of DBSCAN.
Some of the important features of CLARA are given below:
@seismicisolation
@seismicisolation
Density-based: Similar to DBSCAN, it identifies clusters based on density,
focusing on areas with many neighboring points and leaving out sparse
regions.
Fast search: Utilizes a decision graph to efficiently identify core points and
their potential cluster membership.
No predefined number of clusters: Like DBSCAN, you don’t need to specify
the number of clusters beforehand. CFSFDP finds them based on the data’s
inherent density.
Handles noise effectively: Can identify and exclude outliers or points in
low-density regions.
Finds arbitrarily shaped clusters: Similar to DBSCAN, it can discover
clusters of any shape, including elongated or irregular forms.
Mixture models: These models assume that the data arises from a mixture of
probability distributions, where each distribution represents a cluster. Popular
examples include Gaussian mixture models (GMMs) and latent Dirichlet allocation
(LDA). GMMs are a powerful tool for data clustering and density estimation,
particularly when dealing with data that can be represented as a mixture of
multiple overlapping Gaussian distributions while LDA is a powerful and versatile
probabilistic topic modeling technique widely used in text analysis and NLP tasks.
@seismicisolation
@seismicisolation
We can use them in document categorization and organization, text summarization
and topic extraction, information retrieval and recommendation systems, anomaly
detection and plagiarism detection, sentiment analysis and opinion mining, and
language modeling and dialogue systems. Mixture model is not recommended in:
@seismicisolation
@seismicisolation
@details The higher the value of maxneighbor, the closer is CLARANS to K-Medoids, and th
@param[in] data: Input data that is presented as list of points (objects), each point should
@param[in] number_clusters: amount of clusters that should be allocated.
@param[in] numlocal: the number of local minima obtained (amount of iterations for solvi
@param[in] maxneighbor: the maximum number of neighbors examined.
"""
clarans_instance = clarans(data, 3, 6, 4);
# calls the clarans method 'process' to implement the algortihm
(ticks, result) = timedcall(clarans_instance.process);
print("Execution time : ", ticks, "\n");
# returns the clusters
clusters = clarans_instance.get_clusters();
# returns the mediods
medoids = clarans_instance.get_medoids();
print("Index of the points that are in a cluster : ", clusters)
print("The target class of each datapoint : ", iris.target)
print("The index of medoids that algorithm found to be best : ", medoids)
=================================Output====================================
A peek into the dataset : [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2]]
Index of the points that are in a cluster : [[→50, →51, →52, →53, →54, →55,
→56, →57, →58, →59, →60, →61, →62, →63, →64, →65, →66, →67, →68, →69,
→70, →71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 119, 121, 123, 126, 127, 133, 138, 142,
149], [77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 114, 115, 116, 117,
118, 120, 122, 124, 125, 128, 129, 130, 131, 132, 134, 135, 136, 137, 139, 140, 141,
143, 144, 145, 146, 147, 148], [0, →1, →2, →3, →4, →5, →6, →7, →8, →9, →10,
→11, →12, →13, →14, →15, →16, →17, →18, →19, →20, →21, →22, →23, →24,
→25, →26, →27, →28, →29, →30, →31, →32, →33, →34, →35, →36, →37, →38,
→39, →40, →41, →42, →43, →44, →45, →46, →47, →48, →49]]
0000000000000111111111111111111111111
1111111111111111111111111122222222222
@seismicisolation
@seismicisolation
2222222222222222222222222222222222222
2 2]
The index of medoids that algorithm found to be best : [78, 128, →2]
===========================================================================
@seismicisolation
@seismicisolation
neighboring points decreases with distance. (The commonly used kernels
include flat and Gaussian.)
2. Start at each data point:
i. Calculate the average (mean) of its neighboring points within the kernel
bandwidth.
ii. Shift the point toward this calculated mean.
3. Repeat steps 1 and 2: Iteratively shift each point toward the mean of its
neighbors until convergence (meaning points stop moving significantly).
4. Identify clusters: Points that converge to the same location form a cluster.
1. Initialize the map: Create a grid of neurons in the lower dimensional space,
@seismicisolation
@seismicisolation
each with randomly assigned weights.
2. Present a data point: Randomly select a data point from the high-
dimensional space.
3. Find the winning neuron: Calculate the distance between the data point
and each neuron using a similarity measure (e.g., Euclidean distance).
Identify the neuron closest to the data point as the “winner.”
4. Update weights: Adjust the weights of the winning neuron and its
neighbors toward the data point, making them more responsive to similar
data points in the future.
5. Repeat steps 2–4: Iterate through the data points multiple times, refining
the map based on the data distribution.
1. Parameter selection: Choosing the right grid size and learning rate can
impact the results.
2. Interpretability: While visually informative, understanding the meaning of
specific regions on the map might require further analysis.
3. Performance in high dimensions: Effectiveness can decrease with
increasing dimensionality of the input data.
The best clustering technique depends on your specific data and goals. Consider
factors like:
@seismicisolation
@seismicisolation
6.3.1 Statistical Methods
6.3.1.1 Z-Score
Z-score, also known as the standard score, is a simple yet powerful statistical
method for anomaly detection. It tells you how many standard deviations a
particular data point is away from the mean of the dataset, providing a
standardized measure of deviation from the average. It measures how many
standard deviations a data point is away from the mean. Points with high Z-scores
are potential anomalies:
x−mean
Z − score = standard deviation
where mean is the average of all data points in the dataset and standard deviation
measures the spread of the data around the mean.
A Z-score of 0 means the data point is exactly equal to the mean. Positive Z-
scores indicate points above the mean, with higher values representing larger
deviations. Negative Z-scores indicate points below the mean, with absolute values
signifying the degree of deviation. Typically, data points with absolute Z-scores
greater than 2 or 3 are considered potential anomalies, as they fall outside the
expected range of the majority of data points. However, this threshold can be
adjusted based on your specific data and desired sensitivity. Let’s have a look on
the code:
import pandas as pd
import numpy as np
# Generate some sample data (replace with your dataset)
data = np.random.normal(loc=0, scale=1, size=100)
# Calculate the z-score for each data point
z_scores = (data - np.mean(data)) / np.std(data)
# Set a threshold for anomaly detection (e.g., z-score > 2 or < -2)
threshold = 2
# Identify anomalies
anomalies = np.where(np.abs(z_scores) > threshold)[0]
print(f"Anomalous data points: {anomalies}")
@seismicisolation
@seismicisolation
Feature scaling: In machine learning, Z-score is often used to standardize
features before processing by algorithms, ensuring all features have similar
scales and preventing one feature from dominating the analysis.
Quality control: Monitoring process variables in manufacturing or other
industries often involves using Z-scores to detect deviations from normal
operating ranges.
1. Can be sensitive to outliers itself, as they affect the mean and standard
deviation used for normalization.
2. Less effective for skewed or heavy-tailed data distributions.
1. Outlier detection: Points falling outside the range Q1 – 1.5 IQR and Q3 + 1.5
IQR can be considered potential outliers.
2. Data exploration: IQR provides a quick grasp of the central tendency and
spread of your data, complementing the median.
3. Comparing data distributions: You can compare the IQRs of different
groups or datasets to understand their relative variability.
4. Robust measure of variability: IQR is less sensitive to outliers compared to
the range, making it more reliable for skewed or noisy data.
@seismicisolation
@seismicisolation
middle 50%.
One-class SVM (OCSVM) is also known as learning normalcy for anomaly detection.
OCSVM is a powerful and versatile tool for anomaly detection, leveraging the
principles of SVMs to learn the normal behavior of your data and identify points
that deviate significantly. Let’s imagine you have a training dataset containing only
examples of “normal” data. An OCSVM analyzes this data and constructs a
boundary around it, capturing the essential characteristics of normalcy. It learns a
boundary around the normal data and flags points outside as anomalies. Data
points falling outside this boundary are then flagged as potential anomalies. Some
of the famous applications are fraud detection, anomaly detection in sensor data,
network intrusion detection, and industrial process monitoring.
Nearest neighbor (NN) is a versatile technique for anomaly detection. Points with
@seismicisolation
@seismicisolation
very few or many neighbors compared to others might be anomalies. The
assumption is that normal data points tend to cluster together, while anomalies
deviate from these clusters, having fewer (or more) similar neighbors. By analyzing
the number of NN for each data point, we can identify those significantly different
and flag them as potential anomalies. We can use NN in fraud detection, intrusion
detection in networks, and anomaly detection in sensor data.
6.4 Clustering
Clustering in deep learning refers to using deep neural networks to automatically
group data points into meaningful clusters based on their underlying similarities.
Unlike traditional clustering algorithms, deep learning approaches can discover
complex, nonlinear relationships between data points, making them particularly
suitable for analyzing high-dimensional data making them particularly suitable for
analyzing high-dimensional data. Data points not assigned to any cluster or
belonging to small clusters could be anomalous. We can use it in the following
fields:
@seismicisolation
@seismicisolation
Document clustering: Categorizing documents based on their topics or
themes
Customer segmentation: Grouping customers based on their behavior or
preferences
Anomaly detection: Identifying data points that deviate significantly from
the typical clusters
Scientific data analysis: Discovering hidden patterns and relationships in
complex datasets
@seismicisolation
@seismicisolation
6.4.2 Density-Based Methods
LOF: LOF is another powerful algorithm for anomaly detection, utilizing the
concept of local density to identify data points deviating from their surroundings.
It is used to calculate the ratio of the local density of a point to the density of its
neighbors. High LOF values indicate potential anomalies. LOF compares the local
density of a data point (its neighborhood) to the local densities of its neighbors.
Anomalies are considered points with significantly lower local density, indicating
they reside in sparse regions compared to their neighbors. The working
mechanism and steps involved in smooth working are given below:
1. Time series analysis: The time series analysis is a powerful tool for studying
and understanding sequences of data points collected over time. It detects
anomalies in time-dependent data by deviating from expected patterns or
trends. Time series analysis focuses on extracting meaningful information
and patterns from data points ordered chronologically. Some of the key
points about time series analysis are given below:
Decomposition: Breaks down the time series into trend, seasonality,
and residual components to analyze each aspect separately.
Autocorrelation and partial autocorrelation: Examine the relationship
between data points at different time lags to identify patterns and
dependencies.
Statistical modeling: Fit various statistical models (e.g., ARIMA and
SARIMA) to the data to capture seasonality, trends, and random
components.
@seismicisolation
@seismicisolation
Machine learning: Utilize techniques like recurrent neural networks or
long short-term memory networks to automatically learn complex
patterns and make predictions.
This could involve anything from stock prices to website traffic, and sensor
readings to weather data. It helps to answer questions like:
1. Are there any trends or seasonalities in the data?
2. What are the underlying patterns driving the data’s behavior?
3. Can we predict future values based on the historical data?
1. Fourier transform (FT): The fundamental tool, decomposing the signal into
a sum of sine and cosine waves of different frequencies and amplitudes.
2. Fast FT: A computationally efficient algorithm for calculating the FT, making
it practical for large datasets.
3. Power spectral density (PSD): This represents the distribution of power (or
energy) across different frequencies, providing insights into the dominant
frequencies and their relative importance.
4. Spectrogram: A visual representation of the PSD over time, showing how
the frequency content changes over the signal’s duration.
@seismicisolation
@seismicisolation
Summary
GBTs are a powerful machine learning technique used for both regression and
classification tasks. They work by combining multiple weak learners, typically
decision trees, into a single strong learner: XGBoost versus LightGBM, two
gradient boosting powerhouses.
Both XGBoost and LightGBM are popular implementations of GBTs known for
their high accuracy and efficiency. They are powerful algorithms for gradient
boosting, combining weak learners (decision trees) for improved accuracy and
performance.
Ensemble learning is a powerful machine learning technique that combines
the predictions of multiple models to achieve better performance than any single
model. It is like getting multiple experts to weigh in on a problem and then taking
the best guess based on their combined insights.
Kernel methods are a powerful class of algorithms in machine learning,
particularly known for their ability to handle nonlinear relationships in data.
While traditional linear models are limited to linear relationships, kernel methods
can learn complex patterns by implicitly mapping data into a higher dimensional
space where these relationships become linear. The “kernel trick” is a crucial
aspect of kernel methods, often referred to as the key to their magic. It allows
them to handle nonlinear data while maintaining computational efficiency. In
other words, we can say that they transform data into higher dimensions for linear
separation, enabling complex relationships between features.
The RBF kernel is a popular and powerful kernel function widely used in
various machine learning algorithms, particularly SVMs. It supports only nonlinear
data.
While k-means is a popular and widely used clustering technique, it does have
its limitations. Here are some alternative clustering techniques you can consider
depending on your specific needs and data characteristics:
DBSCAN is a powerful clustering algorithm that groups data points based on
their density and connectivity. It is particularly useful for:
@seismicisolation
@seismicisolation
NN is a fundamental concept in machine learning that has applications in both
classification and regression tasks. It is a simple yet powerful technique that
leverages the similarity between data points for making predictions.
Clustering is a fundamental technique in unsupervised machine learning that
involves grouping similar data points together. It is a powerful tool for discovering
hidden patterns and structures within unlabeled data, offering valuable insights
across various domains.
Distance-based methods refer to a broad range of techniques in machine
learning that leverage the concept of distance between data points for various
tasks. These methods have applications in both classification and regression,
making them valuable tools across different domains.
One-class SVMs are a powerful anomaly detection technique that leverages the
principles of SVMs for unsupervised learning tasks. It learns a boundary around
normal data, flagging points outside as anomalies.
The IQR is a robust measure of variability in statistics. It tells you how spread
out the middle 50% of your data is, excluding the potential outliers at the very top
and bottom.
Z-score identifies outliers based on standard deviations from the mean,
sensitive to outliers itself. Anomaly detection is a crucial aspect of data analysis,
focusing on identifying data points that deviate significantly from the expected
patterns or norms. It plays a vital role in various domains, from fraud detection in
finance to equipment failure prediction in manufacturing.
SOMs, also known as Kohonen maps, are a type of artificial neural network
used for dimensionality reduction and visualization of high-dimensional data.
They excel at preserving the topological structure of the data while mapping it
onto a lower dimensional space, typically a 2D grid.
Mean shift is a powerful and versatile technique in machine learning and data
analysis, particularly useful for unsupervised learning tasks like clustering and
density estimation. It operates by iteratively shifting data points toward the
“densest” region in their vicinity, ultimately converging to the modes or peaks in
the underlying data distribution.
Spectral clustering is a powerful technique in machine learning often used for
unsupervised learning tasks like clustering and graph partitioning. It leverages
the spectral properties of a similarity matrix to group data points based on their
underlying structure. This makes it particularly useful for identifying nonconvex
clusters and handling data with complex shapes, where other clustering
algorithms like k-means might struggle.
Exercise (MCQs)
1.
@seismicisolation
@seismicisolation
What is GBT?
A) A regression technique that iteratively builds trees to improve predictions
B) A classification technique that uses decision trees to classify data points
@seismicisolation
@seismicisolation
D) Small datasets with few features
5.
D)
None of the above
6.
C) It is sensitive to outliers.
D) It is computationally efficient.
7.
@seismicisolation
@seismicisolation
B) It defines the maximum distance between two points to be considered
neighbors
C) It determines the density threshold for clustering
@seismicisolation
@seismicisolation
B) Anomaly detection
C) Customer segmentation
@seismicisolation
@seismicisolation
1. What is the role of learning rate in GBTs?
2. How does early stopping help prevent overfitting in GBTs?
3. What are the different regularization techniques available in XGBoost and
LightGBM?
4. How can you compare the performance of XGBoost and LightGBM on a
specific dataset?
5. What are some limitations of GBTs?
6. Explain how GBTs can be used for both regression and classification tasks.
7. When might other ensemble methods like random forests be preferable to
GBTs?
8. How can GBTs be tuned for optimal performance?
9. Discuss the importance of feature engineering for GBTs.
10. What are some emerging developments in the field of gradient boosting?
11. Explain how DBSCAN works in detail.
12. What are the steps involved in using DBSCAN for clustering a dataset?
13. How can you choose the appropriate values for eps and min_samples?
14. How can you evaluate the performance of DBSCAN on a clustering task?
15. What are some of the alternative clustering algorithms to DBSCAN?
Answers
1. AdaBoost
2. overfitting
3. speed and efficiency
4. Ensemble learning
5. LightGBM
@seismicisolation
@seismicisolation
1. DBSCAN is a hierarchical clustering algorithm. (False)
2. DBSCAN is a density-based clustering algorithm. (True)
3. DBSCAN can handle clusters of arbitrary shapes. (True)
4. DBSCAN is always the best choice for clustering data. (False)
5. DBSCAN requires a distance metric to be defined. (True)
@seismicisolation
@seismicisolation
Chapter 7 Neural Networks and Deep
Learning
@seismicisolation
@seismicisolation
Fig. 7.1: Biological neuron in the human brain.
The Fig. 7.1 shows the Biological neuron in the human brain.
These networks learn by adjusting the connections between neurons based on
data, enabling them to perform complex tasks like:
Image recognition: Classifying objects in images (e.g., cats, dogs, and cars)
Natural language processing: Understanding and generating text (e.g.,
machine translation and chatbots)
Recommendation systems: Suggesting products or content users might
like
Fraud detection: Identifying suspicious financial transactions
1. Neurons: The basic unit of a neural network. Each neuron receives input
from other neurons, applies an activation function to process it, and sends its
output to other neurons.
2. Layers: Neurons are organized into layers: input layer (receives raw data),
hidden layer (process information), and output layer (generates final
output).
3. Activation function: Determines how a neuron transforms its input.
Common types include sigmoid, ReLU, and tanh.
4. Learning: Neural networks learn by adjusting the weights of connections
between neurons. This is done through algorithms like backpropagation,
which minimizes the error between the network’s output and the desired
output. In the world of neural networks, activation functions play a crucial
role, acting as the gatekeepers that determine what information gets passed
on to the next layer. They take the weighted sum of inputs from a neuron
and transform it into an output value, introducing nonlinearity and allowing
the network to learn complex patterns.
5. Training data: Large amounts of data are needed to train neural networks
effectively. The quality and quantity of data significantly impact performance.
@seismicisolation
@seismicisolation
Fig. 7.2: Artificial neural network.
Data requirements: Need large amounts of data for training, which can be
@seismicisolation
@seismicisolation
expensive and time-consuming.
Black box problem: Can be difficult to understand how they make decisions.
Computational cost: Training large networks can require significant
computing power.
7.2 Perceptron
Perceptrons are the fundamental building blocks of neural networks, serving as
the basic unit of computation. While seemingly simple, they hold immense power
when combined and trained, allowing complex learning and problem-solving.
Here’s a breakdown of their key features:
Perceptrons act as linear classifiers, meaning they can only learn and represent
linearly separable patterns. This limitation led to the development of more
complex architectures like MLPs with multiple layers and nonlinear activation
functions. Despite their limitations, perceptrons are powerful tools for
understanding the basic principles of neural networks and learning algorithms like
perceptron learning rule.
While not used in complex tasks anymore, perceptrons still find application
in:
Simple classification problems like spam filtering
Feature extraction and dimensionality reduction
@seismicisolation
@seismicisolation
Understanding the theoretical foundations of neural networks
To delve deeper, consider exploring concepts like
MLPs and their ability to learn nonlinear relationships
Different activation functions and their impact on learning
Perceptron learning rule and its limitations
Advanced neural network architectures like CNNs and recurrent neural
networks (RNNs) built upon the foundation of perceptrons
@seismicisolation
@seismicisolation
Fig. 7.3: Backpropagation.
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
# Create a Training and Test Data Set
input_train = np.array([[0, 1, 0], [0, 1, 1], [0, 0, 0],
[10, 0, 0], [10, 1, 1], [10, 0, 1]])
output_train = np.array([[0], [0], [0], [1], [1], [1]])
input_pred = np.array([1, 1, 0])
input_test = np.array([[1, 1, 1], [10, 0, 1], [0, 1, 10],
[10, 1, 10], [0, 0, 0], [0, 1, 1]])
output_test = np.array([[0], [1], [0], [1], [0], [0]])
# Scale the Data
@seismicisolation
@seismicisolation
scaler = MinMaxScaler()
input_train_scaled = scaler.fit_transform(input_train)
output_train_scaled = scaler.fit_transform(output_train)
input_test_scaled = scaler.fit_transform(input_test)
output_test_scaled = scaler.fit_transform(output_test)
# Create a Neural Network Class
class NeuralNetwork():
# Create an Initialize Function
def __init__(self, ):
self.inputSize = 3
self.outputSize = 1
self.hiddenSize = 3
self.W1 = np.random.rand(self.inputSize, self.hiddenSize)
self.W2 = np.random.rand(self.hiddenSize, self.outputSize)
self.error_list = []
self.limit = 0.5
self.true_positives = 0
self.false_positives = 0
self.true_negatives = 0
self.false_negatives = 0
# Create a Forward Propagation Function
def forward(self, X):
self.z = np.matmul(X, self.W1)
self.z2 = self.sigmoid(self.z)
self.z3 = np.matmul(self.z2, self.W2)
o = self.sigmoid(self.z3)
return o
def sigmoid(self, s):
return 1 / (1 + np.exp(-s))
def sigmoidPrime(self, s):
return s ✶ (1 - s)
# Create a Backward Propagation Function
def backward(self, X, y, o):
self.o_error = y - o
self.o_delta = self.o_error ✶ self.sigmoidPrime(o)
self.z2_error = np.matmul(self.o_delta,
np.matrix.transpose(self.W2))
self.z2_delta = self.z2_error ✶ self.sigmoidPrime(self.z2)
self.W1 += np.matmul(np.matrix.transpose(X), self.z2_delta)
self.W2 += np.matmul(np.matrix.transpose(self.z2),
@seismicisolation
@seismicisolation
self.o_delta)
# Create a Training Function
def train(self, X, y, epochs):
for epoch in range(epochs):
o = self.forward(X)
self.backward(X, y, o)
self.error_list.append(np.abs(self.o_error).mean())
# Create a Prediction Function
def predict(self, x_predicted):
return self.forward(x_predicted).item()
# Plot the Mean Absolute Error Development
def view_error_development(self):
plt.plot(range(len(self.error_list)), self.error_list)
plt.title('Mean Sum Squared Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
# Calculate the Accuracy and its Components
def test_evaluation(self, input_test, output_test):
for i, test_element in enumerate(input_test):
if self.predict(test_element) > self.limit and \
output_test[i] == 1:
self.true_positives += 1
if self.predict(test_element) < self.limit and \
output_test[i] == 1:
self.false_negatives += 1
if self.predict(test_element) > self.limit and \
output_test[i] == 0:
self.false_positives += 1
if self.predict(test_element) < self.limit and \
output_test[i] == 0:
self.true_negatives += 1
print('True positives: ', self.true_positives,
'\nTrue negatives: ', self.true_negatives,
'\nFalse positives: ', self.false_positives,
'\nFalse negatives: ', self.false_negatives,
'\nAccuracy: ',
(self.true_positives + self.true_negatives) /
(self.true_positives + self.true_negatives +
self.false_positives + self.false_negatives))
@seismicisolation
@seismicisolation
# Run a Script That Trains and Evaluate the Neural Network Model
NN = NeuralNetwork()
NN.train(input_train_scaled, output_train_scaled, 200)
NN.predict(input_pred)
NN.view_error_development()
NN.test_evaluation(input_test_scaled, output_test_scaled)
7.3 TensorFlow
TensorFlow is a powerful and popular open-source library for building and training
machine learning models, particularly in the realm of deep learning. The
fundamental data structure in TensorFlow. It represents multidimensional arrays
of numerical values, similar to matrices in linear algebra. Tensors have a specific
data type (integers, floats, etc.) and shape (number of dimensions and elements in
each dimension). Operations in TensorFlow are performed on tensors, allowing for
calculations and manipulations.
TensorFlow 2.0 introduced eager execution, allowing you to see the results of
operations immediately, line by line, similar to traditional scripting
languages.
This makes learning and debugging easier compared to the older symbolic
execution mode.
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
@seismicisolation
@seismicisolation
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
predictions
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
probability_model = tf.keras.Sequential([
model,
tf.keras.layers.Softmax()
])
probability_model(x_test[:5])
===============================Output======================================
Epoch 1/5
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\utils\tf_utils.py:492: The name
tf.ragged.RaggedTensorValue is deprecated. Please use
tf.compat.v1.ragged.RaggedTensorValue instead.
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\engine\base_layer_utils.py:384: The name
tf.executing_eagerly_outside_functions is deprecated. Please use
tf.compat.v1.executing_eagerly_outside_functions instead.
1875/1875 [==============================] - 6s 2ms/step - loss:
0.2924 - accuracy: 0.9146
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss:
0.1407 - accuracy: 0.9585
@seismicisolation
@seismicisolation
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss:
0.1039 - accuracy: 0.9687
Epoch 4/5
1875/1875 [==============================] - 6s 3ms/step - loss:
0.0874 - accuracy: 0.9734
Epoch 5/5
1875/1875 [==============================] - 9s 5ms/step - loss:
0.0734 - accuracy: 0.9770
313/313 - 2s - loss: 0.0697 - accuracy: 0.9786 - 2s/epoch – 5ms/step
===========================================================================
7.3.3 Keras
7.3.4 Sessions
import tensorflow as tf
# Create two tensors
a = tf.constant([1, 2, 2])
b = tf.constant([4, 5, 6])
@seismicisolation
@seismicisolation
# Add the tensors
c = tf.add(a, b)
# Print the result print(c) # Output: tf.Tensor([5 7 9], shape=(3,), dtype=int32)
@seismicisolation
@seismicisolation
categorical cross-entropy, mean squared error) appropriate for your
task.
2. Set metrics to evaluate your model’s performance (e.g., accuracy,
precision, and recall).
7. Train your model:
1. Use the fit method to train your model on the training data.
2. Monitor training progress on validation data to prevent overfitting.
3. Adjust hyperparameters (e.g., learning rate and number of epochs) if
needed.
8. Evaluate your model:
1. Use the evaluate method to assess performance on the testing data.
2. Analyze metrics to understand your model’s strengths and weaknesses.
9. Save and load your model: Use the save and load methods to save your
trained model for future use
Let’s have a look on the given code below which demonstrates simple linear
regression model by using TensorFlow deep learning framework eager execution:
import tensorflow as tf
# Define training data
x_train = tf.constant([1.0, 2.0, 3.0, 4.0])
y_train = tf.constant([2.0, 4.0, 6.0, 8.0])
# Initialize variables for weights and bias
w = tf.Variable(tf.random.normal([1]))
print("Weight : ", w)
b = tf.Variable(tf.random.normal([1]))
print("bias : ", b)
# Define the model function
def predict(x):
return w ✶ x + b
# Define the loss function
def loss(x, y):
# print(x, " : ", y)
return tf.reduce_mean(tf.square(predict(x) - y))
# Optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
# Training loop
for epoch in range(100):
# Calculate loss
current_loss = loss(x_train, y_train)
# Update weights and bias based on the loss
@seismicisolation
@seismicisolation
optimizer.minimize(lambda: loss(x_train, y_train), var_list=[w, b])
# Print the current loss
print(f"Epoch {epoch + 1}, Loss: {current_loss}")
# Make a prediction
prediction = predict(5)
print(f"Prediction for x = 5: {prediction}")
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Preprocess the data
x_train = x_train.reshape(60000, 28 ✶ 28).astype("float32") / 255.0
x_test = x_test.reshape(10000, 28 ✶ 28).astype("float32") / 255.0
# One-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
# Build the model
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation="relu"),
Dense(10, activation="softmax")
])
# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics
# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test accuracy: {accuracy:.4f}")
Keras is a high-level API built on top of TensorFlow, designed for ease of use and
rapid prototyping. It offers prebuilt components like layers, optimizers, and loss
functions, simplifying the process of building and experimenting with neural
@seismicisolation
@seismicisolation
networks. Keras is known for its readability and Pythonic syntax, making it easier
to learn and use compared to TensorFlow’s low-level APIs. Keras is widely used for
quick experimentation, building prototypes, and developing deep learning models
where simplicity and speed are priorities. On the other hand TensorFlow is a
comprehensive framework offering a wide range of functionalities for various
machine learning tasks including data manipulation, numerical computations, and
deep learning. It provides low-level APIs that give you fine-grained control over
your model architecture and training process. This allows for flexibility and
customization, but requires more coding effort and understanding of the
underlying concepts. TensorFlow is often used for research, complex tasks, and
production-grade models where fine-tuning and control are crucial.
If you’re new to deep learning or want to quickly experiment with different
architectures, Keras is a great starting point. As you gain experience and need
more control or flexibility, you can gradually transition to using TensorFlow’s low-
level APIs. You can even combine Keras and TensorFlow by building your model
with Keras’ high-level API and then fine-tuning specific parts using TensorFlow’s
lower-level functionalities. We can classify the TensorFlow and Keras applications
based upon the model requirements. The →Tab. 7.1 shows the Difference between
TensorFlow and Keras framework based on different features:
@seismicisolation
@seismicisolation
scaled to larger and more complex datasets by adding more layers and increasing
the number of filters. Some of the key concepts are given below:
CNN is best in the recognizing objects, scenes, and activities in images (e.g.,
classifying handwritten digits and detecting faces in photos). It performs well in
Locating and classifying objects within an image (e.g., identifying cars, pedestrians,
and traffic signs in self-driving car applications). Image segmentation is the very
powerful feature of CNN which is helping us by dividing an image into different
regions corresponding to objects or semantic categories (e.g., segmenting organs
in medical images). CNN is good in applying the artistic style of one image to
another (e.g., creating images that look like they were painted by Van Gogh). Some
of the popular CNN architectures include LeNet-5, AlexNet, VGGNet, ResNet, and
@seismicisolation
@seismicisolation
Inception. Frameworks like TensorFlow and PyTorch offer tools for building and
training CNNs. Let’s have a look on CNN model trained in python:
@seismicisolation
@seismicisolation
print(class_names)
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
normalization_layer = layers.Rescaling(1. / 255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1. / 255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
metrics=['accuracy'])
model.summary()
epochs = 10
@seismicisolation
@seismicisolation
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
]
)
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
model = Sequential([
@seismicisolation
@seismicisolation
data_augmentation,
layers.Rescaling(1. / 255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes, name="outputs")
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True),
metrics=['accuracy'])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy
(from_logits=True),
metrics=['accuracy'])
model.summary()
epochs = 15
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
@seismicisolation
@seismicisolation
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
sunflower_url = "→https://storage.googleapis.com/download.tensorflow.org/example_im
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)
img = tf.keras.utils.load_img(
sunflower_path, target_size=(img_height, img_width)
)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
"This image most likely belongs to {} with a {:.2f} percent
confidence."
.format(class_names[np.argmax(score)], 100 ✶ np.max(score))
)
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model. with open('model.tflite', 'wb') as f:
f.write(tflite_model)
TF_MODEL_FILE_PATH = 'model.tflite' # The default path to the saved TensorFlow Lite model
interpreter = tf.lite.Interpreter(model_path=TF_MODEL_FILE_PATH)
interpreter.get_signature_list()
classify_lite = interpreter.get_signature_runner('serving_default')
classify_lite
predictions_lite = classify_lite(sequential_1_input=img_array)['outputs']
score_lite = tf.nn.softmax(predictions_lite)
print(
"This image most likely belongs to {} with a {:.2f} percent
confidence."
.format(class_names[np.argmax(score_lite)], 100 ✶ np.max(score_lite))
)
print(np.max(np.abs(predictions – predictions_lite)))
@seismicisolation
@seismicisolation
Fig. 7.4: List of feature classification.
===========================Output==========================================
3670
2024-02-26 08:25:07.145531: I
tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is
optimized to use available CPU instructions in performance-critical operations.
@seismicisolation
@seismicisolation
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX2
AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the
appropriate compiler flags.
(32,)
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\backend.py:873: The name tf.get_default_graph is
deprecated. Please use tf.compat.v1.get_default_graph instead.
0.0 1.0
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\layers\pooling\max_pooling2d.py:161: The name
tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\optimizers\__init__.py:309: The name tf.train.Optimizer is
deprecated. Please use tf.compat.v1.train.Optimizer instead.
Model: "sequential"
@seismicisolation
@seismicisolation
max_pooling2d_2 (MaxPooling2D) (None, 22, 22, 64) 0
flatten (Flatten) (None, 30976) 0
dense (Dense) (None, 128) 3965056
dense_1 (Dense) (None, 5) 645
===========================================================================
Total params: 3989285 (15.22 MB)
Trainable params: 3989285 (15.22 MB)
Non-trainable params: 0 (0.00 Byte)
Epoch 1/10
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\utils\tf_utils.py:492: The name
tf.ragged.RaggedTensorValue is deprecated. Please use
tf.compat.v1.ragged.RaggedTensorValue instead.
WARNING:tensorflow:From
C:\Users\23188\PycharmProjects\workshop\.venv\Lib\site-
packages\keras\src\engine\base_layer_utils.py:384: The name
tf.executing_eagerly_outside_functions is deprecated. Please use
tf.compat.v1.executing_eagerly_outside_functions instead.
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
@seismicisolation
@seismicisolation
accuracy: 0.8372 - val_loss: 0.8921 - val_accuracy: 0.6567
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
===========================================================================
@seismicisolation
@seismicisolation
Fig. 7.5: Training and validation accuracy versus training and validation loss.
The Fig. 7.5 shows the Training and validation accuracy versus training and
validation loss.
The Fig. 7.6 shows the Training and validation accuracy versus training and
validation loss with 15 epochs.
The same model is being trained on the 15 epochs and have look
@seismicisolation
@seismicisolation
Fig. 7.6: Training and validation accuracy versus training and validation loss with
15 epochs.
@seismicisolation
@seismicisolation
Fig. 7.7: Trained model with 15 epochs.
@seismicisolation
@seismicisolation
conv2d_4 (Conv2D) (None, 90, 90, 32) 4640
max_pooling2d_4 (MaxPooling2D) (None, 45, 45, 32) 0
conv2d_5 (Conv2D) (None, 45, 45, 64) 18496
max_pooling2d_5 (MaxPooling2D) (None, 22, 22, 64) 0
dropout (Dropout) (None, 22, 22, 64) 0
flatten_1 (Flatten) (None, 30976) 0
dense_2 (Dense) (None, 128) 3965056
outputs (Dense) (None, 5) 645
===========================================================================
Total params: 3989285 (15.22 MB)
Trainable params: 3989285 (15.22 MB)
Nontrainable params: 0 (0.00 Byte)
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
@seismicisolation
@seismicisolation
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
This image most likely belongs to sunflowers with a 99.46 percent confidence.
@seismicisolation
@seismicisolation
2024-02-26 08:38:03.838746: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags {
serve }; Status: success: OK. Took 232444 microseconds.
---------------------------------
✶ 11 ARITH ops
(f32: 3)
(f32: 2)
(f32: 3)
(f32: 1)
This image most likely belongs to sunflowers with a 99.46 percent confidence.
9.536743e-07
===========================================================================
@seismicisolation
@seismicisolation
can even strengthen the feature extraction process. Pooling layers reduce
the spatial dimensions (width and height) of the data, typically by a factor of
2 or more. This results in a smaller output with fewer elements. By applying
different pooling operations, the layer summarizes the information
contained within a specific region of the input data. This region is often
called a pooling window.
3. Activation function: Introduces nonlinearity, allowing the network to learn
complex relationships between features. Popular choices include ReLU and
Leaky ReLU.
4. Fully connected layer: Typically used in the final stages for tasks like
classification or regression. Connects all neurons in one layer to all neurons
in the next, integrating the learned features.
The optimal architecture depends on your specific problem, dataset size, and
computational resources. Experimentation and exploration are key to finding the
best configuration for your needs. Few points must have to be kept in mind before
developing any CNN model:
Number of layers: Deeper networks often learn more complex features, but
require more data and computational resources.
Filter size and number: Smaller filters capture local details, while larger
ones capture larger features. More filters allow for learning a wider variety of
patterns.
Pooling type and stride: Max pooling identifies dominant features, while
average pooling summarizes information. Stride controls the downsampling
rate. Max pooling selects the maximum value within the pooling window.
This emphasizes the strongest activations and can be useful for detecting
@seismicisolation
@seismicisolation
dominant features like edges.
Activation function: Choice depends on the task and desired properties
(e.g., ReLU for efficiency and Leaky ReLU for avoiding dying neurons).
Batch normalization: Helps stabilize training and improve generalization.
Regularization techniques: Dropout and L1/L2 regularization prevent
overfitting by reducing model complexity.
Transfer learning: Utilize pretrained models on large datasets (e.g.,
ImageNet) as a starting point for fine-tuning on your specific task.
Data augmentation: Artificially expand your dataset with variations (e.g.,
flips and rotations) to improve generalization.
@seismicisolation
@seismicisolation
Fig. 7.8: Recurrent neural network.
1. Vanilla RNN: The basic RNN architecture, but can suffer from vanishing and
exploding gradients, limiting its ability to learn long-term dependencies.
Vanilla RNNs struggle with vanishing and exploding gradients, making it
difficult to learn dependencies over long sequences. LSTMs address this by
introducing gating mechanisms that control the flow of information through
the network.
2. Long short-term memory (LSTM): Introduces gating mechanisms to
control the flow of information, addressing the gradient issues and enabling
learning of longer dependencies. LSTM networks stand out for their ability to
learn and exploit long-term dependencies within sequential data as shown in
→Fig. 7.9.
The Fig. 7.9 shows the Long-short term memory.
@seismicisolation
@seismicisolation
This makes them particularly well-suited for tasks like NLP, speech
recognition, and time series forecasting, where understanding the context
across extended periods is crucial. The three main important gates available
are
A. Forget gate: Decides what information from the previous cell state
(memory) to discard.
B. Input gate: Controls what information from the current input to integrate
into the cell state.
C. Output gate: Determines what part of the cell state to expose as the output
of the unit.
1. Gated recurrent unit (GRU): Similar to LSTM but with simpler architecture
and fewer parameters, offering a balance between performance and
efficiency. GRU emerges as a compelling alternative to LSTMs. While both
excel at learning long-term dependencies within sequential data, GRUs offer
a more streamlined architecture with certain advantages. GRUs aim to
provide comparable performance to LSTMs while being simpler and
potentially less computationally expensive. GRUs combine the Forget and
Input gates of LSTMs into a single update gate and introduce a reset gate to
control the flow of information. This reduces the number of parameters and
@seismicisolation
@seismicisolation
operations compared to LSTMs.
@seismicisolation
@seismicisolation
regressor.add(Dropout(0.2))
#Adding the second RNN layer and some Dropout regularization
regressor.add(SimpleRNN(units = 50, activation='tanh', return_sequences=True
regressor.add(Dropout(0.2))
#Adding the third RNN layer and some Dropout regularization
regressor.add(SimpleRNN(units = 50, activation='tanh', return_sequences=True
regressor.add(Dropout(0.2))
#Adding the fourth RNN layer and some Dropout regularization
regressor.add(SimpleRNN(units = 50))
regressor.add(Dropout(0.2))
#Adding the output layer
regressor.add(Dense(units = 1))
#Compile the RNN
regressor.compile(optimizer='adam', loss='mean_squared_error')
#Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs=100, batch_size=32)
dataset_test = pd.read_csv('../input/stockprice-test/Stock_Price_Test.csv')
dataset_test.head()
real_stock_price = dataset_test.loc[:, ['Open']].values
real_stock_price
#Getting the predicted stock price
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']),
inputs = dataset_total[len(dataset_total)-len(dataset_test) - timesteps:].values.re
inputs = scaler.transform(inputs) #minmax scaler
inputs
X_test = []
for i in range(timesteps, 70):
X_test.append(inputs[i-timesteps:i,0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = scaler.inverse_transform(predicted_stock_price)
#inverse_transform ile, scale edildikten sonra predict edilen değerleri gerçek değer aralığına çe
plt.plot(real_stock_price, color='red', label='Real Google Stock Price')
plt.plot(predicted_stock_price, color='blue', label='Predicted Google Stock Price'
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()
@seismicisolation
@seismicisolation
data = pd.read_csv('../input/international-airline-passengers/international-airline-passe
data.head()
dataset = data.iloc[:, 1].values
plt.plot(dataset)
plt.xlabel('time')
plt.ylabel('number of passengers (in thousands)')
plt.title('Passengers')
plt.show()
dataset = dataset.reshape(-1,1) #(145, ) iken (145,1)e çevirdik
dataset = dataset.astype('float32')
dataset.shape
scaler = MinMaxScaler(feature_range= (0,1))
dataset = scaler.fit_transform(dataset)
train_size = int(len(dataset)✶0.5)
test_size = len(dataset)- train_size
train = dataset[0:train_size, :]
test = dataset[train_size:len(dataset), :]
print('train size: {}, test size: {}'.format(len(train), len(test)))
dataX = []
datay = []
timestemp = 10
for i in range(len(train) - timestemp - 1):
a = train[i:(i + timestemp), 0]
dataX.append(a)
datay.append(train[i + timestemp, 0])
trainX, trainy = np.array(dataX), np.array(datay)
dataX = []
datay = []
for i in range(len(test) - timestemp - 1):
a = test[i:(i + timestemp), 0]
dataX.append(a)
datay.append(test[i + timestemp, 0])
testX, testy = np.array(dataX), np.array(datay)
trainX.shape
trainX = np.reshape(trainX, (trainX.shape[0],1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0],1, testX.shape[1]))
trainX.shape
# Creating LSTM Model
from keras.models import Sequential
from keras.layers import Dense
@seismicisolation
@seismicisolation
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# model
model = Sequential()
model.add(LSTM(10, input_shape=(1, timestemp))) # 10 lstm neuron(block)
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainy, epochs=50, batch_size=1)
#make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainy = scaler.inverse_transform([trainy])
testPredict = scaler.inverse_transform(testPredict)
testy = scaler.inverse_transform([testy])
import math
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainy[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testy[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
# shifting train
trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[timestemp:len(trainPredict)+timestemp, :] = trainPredict
# shifting test predictions for plotting
testPredictPlot = np.empty_like(dataset)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+(timestemp✶2)+1:len(dataset)-1, :] = testPredict
# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()
@seismicisolation
@seismicisolation
sequence of data into another. This makes them invaluable for tasks like machine
translation, text summarization, speech recognition, and chatbots, where
understanding and generating sequences are crucial. There are two main
important components of seq2seq model like Encoder-Decoder Architecture and
attention mechanism. Encode-decoder architecture from seq2seq model consists
of two main components:
Encoder: Processes the input sequence, capturing its meaning and context.
It can be an RNN like LSTM or GRU, or a transformer-based architecture.
Decoder: Generates the output sequence, conditioned on the information
encoded by the encoder. It also uses an RNN or transformer-based
architecture, often with an attention mechanism to focus on relevant parts of
the encoded sequence.
On the other hand attention mechanism is the key component that allows the
decoder to selectively attend to different parts of the encoded sequence when
generating each element of the output sequence. This helps capture long-range
dependencies and improve the accuracy and coherence of the generated output.
We can use it in Translating text from one language to another, considering the
context and grammar of both languages and Generating a concise summary of a
longer text document, capturing the main points and overall meaning.
The Seq2seq can be used in Converting spoken language into text, taking into
account the nuances of pronunciation and context and Building conversational
agents that can understand and respond to user queries in a natural way. It is very
useful in Generating descriptions of images based on their visual content and
Creating musical pieces based on specific styles or themes. The seq2seq
architectures including transformer-based models like T5 and BART. It learns
about advanced attention mechanisms like self-attention and masked attention
and experiment with different loss functions and training techniques for seq2seq
models. We can utilize libraries like TensorFlow and PyTorch for building and
training seq2seq models. Let’s have a look on the python implementation:
import tensorflow as tf
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.models import Model
# Define the Encoder
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units):
super(Encoder, self).__init__()
self.embedding = Embedding(vocab_size, embedding_dim)
self.lstm = LSTM(enc_units, return_sequences=True, return_state=True
@seismicisolation
@seismicisolation
def call(self, x):
x = self.embedding(x)
output, state_h, state_c = self.lstm(x)
return output, state_h, state_c
# Define the Decoder
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units):
super(Decoder, self).__init__()
self.embedding = Embedding(vocab_size, embedding_dim)
self.lstm = LSTM(dec_units, return_sequences=True, return_state=True
self.dense = Dense(vocab_size, activation='softmax')
def call(self, x, initial_state):
x = self.embedding(x)
output, _, _ = self.lstm(x, initial_state=initial_state)
prediction = self.dense(output)
return prediction
# Define the Seq2Seq Model
class Seq2SeqModel(tf.keras.Model):
def __init__(self, encoder, decoder):
super(Seq2SeqModel, self).__init__()
self.encoder = encoder
self.decoder = decoder
def call(self, inputs):
source, target = inputs
enc_output, enc_state_h, enc_state_c = self.encoder(source)
dec_output = self.decoder(target, initial_state=[enc_state_h, enc_state_c])
return dec_output
# Define the hyperparameters and instantiate the model
vocab_size = 10000 # Example vocabulary size
embedding_dim = 256
enc_units = 512
dec_units = 512
encoder = Encoder(vocab_size, embedding_dim, enc_units)
decoder = Decoder(vocab_size, embedding_dim, dec_units)
seq2seq_model = Seq2SeqModel(encoder, decoder)
# Compile the model (you may choose an appropriate optimizer and loss function)
seq2seq_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics
@seismicisolation
@seismicisolation
The transfer learning emerges as a powerful technique for accelerating and
enhancing the training process, especially for complex tasks and limited data. It
involves leveraging the knowledge gained from a pretrained model on one task
(source task) to improve performance on a related task (target task). Instead of
training a model from scratch on your specific dataset, you utilize a pretrained
model that has already learned valuable representations from a large dataset
related to your task. This saves time and computational resources.
Pretrained models often contain rich feature representations that can be
adapted to your target task, leading to faster convergence and potentially better
performance compared to training from scratch. When you have limited labeled
data for your specific task, transfer learning allows you to leverage the knowledge
from a larger dataset, mitigating the data scarcity issue. You don’t simply copy the
entire pretrained model. Instead, you typically fine-tune its layers, adjusting the
weights and biases toward your specific task using your limited data. This balances
the benefits of pretrained knowledge with the need to adapt to your specific
problem.
Transfer learning offers a valuable toolbox for deep learning practitioners,
enabling faster training, improved performance, and efficient utilization of limited
data. By carefully selecting pretrained models, designing appropriate fine-tuning
strategies, and considering the limitations, you can leverage this technique to
unlock the power of deep learning for your specific tasks. Let’s have a look on the
following code that shows implementation:
import numpy as np
import keras
from keras import layers
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
layer = keras.layers.Dense(3)
layer.build((None, 4)) # Create the weights
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
layer = keras.layers.BatchNormalization()
layer.build((None, 4)) # Create the weights
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
layer = keras.layers.Dense(3)
layer.build((None, 4)) # Create the weights
@seismicisolation
@seismicisolation
layer.trainable = False # Freeze the layer
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
# Make a model with 2 layers
layer1 = keras.layers.Dense(3, activation="relu")
layer2 = keras.layers.Dense(3, activation="sigmoid")
model = keras.Sequential([keras.Input(shape=(3,)), layer1, layer2])
# Freeze the first layer
layer1.trainable = False
# Keep a copy of the weights of layer1 for later reference
initial_layer1_weights_values = layer1.get_weights()
# Train the model
model.compile(optimizer="adam", loss="mse")
model.fit(np.random.random((2, 3)), np.random.random((2, 3)))
# Check that the weights of layer1 have not changed during training
final_layer1_weights_values = layer1.get_weights()
np.testing.assert_allclose(
initial_layer1_weights_values[0], final_layer1_weights_values[0]
)
np.testing.assert_allclose(
initial_layer1_weights_values[1], final_layer1_weights_values[1]
)
inner_model = keras.Sequential(
[
keras.Input(shape=(3,)),
keras.layers.Dense(3, activation="relu"),
keras.layers.Dense(3, activation="relu"),
]
)
model = keras.Sequential(
[
keras.Input(shape=(3,)),
inner_model,
keras.layers.Dense(3, activation="sigmoid"),
]
)
model.trainable = False # Freeze the outer model
assert inner_model.trainable == False # All layers in `model` are now frozen assert
base_model = keras.applications.Xception(
@seismicisolation
@seismicisolation
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
# We make sure that the base_model is running in inference mode here, # by passing `training=
x = base_model(inputs, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
# model.fit(new_dataset, epochs=20, callbacks=..., validation_data=...)
# Unfreeze the base model
base_model.trainable = True
# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are take into account
model.compile(optimizer=keras.optimizers.Adam(1e-5),
# Very low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
# Train end-to-end. Be careful to stop before you overfit!
# model.fit(new_dataset, epochs=10, callbacks=..., validation_data=...)
tfds.disable_progress_bar()
train_ds, validation_ds, test_ds = tfds.load(
"cats_vs_dogs",
# Reserve 10% for validation and 10% for test
split=["train[:40%]", "train[40%:50%]", "train[50%:60%]"],
as_supervised=True, # Include labels
)
print(f"Number of training samples: {train_ds.cardinality()}")
print(f"Number of validation samples: {validation_ds.cardinality()}")
print(f"Number of test samples: {test_ds.cardinality()}")
plt.figure(figsize=(10, 10))
for i, (image, label) in enumerate(train_ds.take(9)):
ax = plt.subplot(3, 3, i + 1)
@seismicisolation
@seismicisolation
plt.imshow(image)
plt.title(int(label))
plt.axis("off")
resize_fn = keras.layers.Resizing(150, 150)
train_ds = train_ds.map(lambda x, y: (resize_fn(x), y))
validation_ds = validation_ds.map(lambda x, y: (resize_fn(x), y))
test_ds = test_ds.map(lambda x, y: (resize_fn(x), y))
augmentation_layers = [
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
def data_augmentation(x):
for layer in augmentation_layers:
x = layer(x)
return x
train_ds = train_ds.map(lambda x, y: (data_augmentation(x), y))
from tensorflow import data as tf_data
batch_size = 64
train_ds = train_ds.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()
validation_ds = validation_ds.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()
test_ds = test_ds.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()
for images, labels in train_ds.take(1):
plt.figure(figsize=(10, 10))
first_image = images[0]
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
augmented_image = data_augmentation(np.expand_dims(first_image, 0))
plt.imshow(np.array(augmented_image[0]).astype("int32"))
plt.title(int(labels[0]))
plt.axis("off")
base_model = keras.applications.Xception(
weights="imagenet", # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
include_top=False,
) # Do not include the ImageNet classifier at the top.
# Freeze the base_model
base_model.trainable = False
# Create new model on top
inputs = keras.Input(shape=(150, 150, 3))
# Pre-trained Xception weights requires that input be scaled
@seismicisolation
@seismicisolation
# from (0, 255) to a range of (-1., +1.), the rescaling layer
# outputs: `(inputs ✶ scale) + offset`
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(inputs)
# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x) # Regularize with dropout
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.summary(show_trainable=True)
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()],
)
epochs = 2
print("Fitting the top layer of the model")
model.fit(train_ds, epochs=epochs, validation_data=validation_ds)
# Unfreeze the base_model. Note that it keeps running in inference mode
# since we passed `training=False` when calling it. This means that
# the batchnorm layers will not update their batch statistics.
# This prevents the batchnorm layers from undoing all the training
# we've done so far.
base_model.trainable = True
model.summary(show_trainable=True)
model.compile(
optimizer=keras.optimizers.Adam(1e-5), # Low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()],
)
epochs = 1
print("Fitting the end-to-end model")
model.fit(train_ds, epochs=epochs, validation_data=validation_ds)
print("Test dataset evaluation")
model.evaluate(test_ds)
@seismicisolation
@seismicisolation
In the realm of deep learning, pretrained models stand as powerful allies, offering
a significant head start for tackling new tasks. By leveraging their prelearned
knowledge, you can accelerate training, enhance performance, and overcome data
scarcity challenges. These are deep neural networks already trained on large,
diverse datasets for general tasks like image recognition or NLP. They act as a
foundation upon which you build further capabilities. This is the core technique
where you utilize a pretrained model as a starting point for your specific task. It
involves:
Fine-tuning involves adjusting the weights and biases of the pretrained model,
typically in the later layers, using your own task-specific data. This adapts the
general features extracted earlier to your specific problem. Train your model with
@seismicisolation
@seismicisolation
your data, but only update the weights of the chosen layers (fine-tuning) while
keeping the earlier layers (pretrained) frozen (not updating). Fine-tuning offers the
following advantages:
@seismicisolation
@seismicisolation
them effectively. GANs are a powerful but complex tool. Understanding their core
concepts, strengths, limitations, and best practices is crucial for successful
implementation and achieving your desired generative outcomes.
N
1
Mean squared error = N
∑ (y2 − y1)
i=1
@seismicisolation
@seismicisolation
4. Data preprocessing and augmentation: Prepare your training data
carefully and consider data augmentation to improve the quality and
diversity of generated samples.
5. Monitoring and visualization: Continuously monitor the training process
and visualize generated outputs to identify potential issues and assess
progress.
@seismicisolation
@seismicisolation
2. Dropout: During training, some neurons are randomly deactivated, forcing
the network to learn robust features that don’t rely on specific activations.
3. Early stopping: Training is stopped when the model’s performance on a
validation set starts to deteriorate, preventing further overfitting.
4. Data augmentation: Artificially increasing the size and diversity of your
training data can improve model generalizability.
Regularization can make the optimization landscape smoother, with fewer local
minima, making it easier for optimization algorithms to find the global minimum.
Optimization algorithms can influence the effectiveness of regularization. For
example, using a learning rate that is too high can negate the benefits of
regularization. There’s no one-size-fits-all approach. The best combination of
regularization and optimization techniques depends on your specific problem,
data, and computational resources. Experimentation is crucial. Let’s have a look on
the code with Linear Regression given below:
import mglearn as ml
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from numpy import genfromtxt
dataset = genfromtxt('→https://raw.githubusercontent.com/m-mehdi/tutorials/main/bo
X = dataset[:, :-1]
y = dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
lr = LinearRegression().fit(X_train, y_train)
print(f"Linear Regression-Training set score: {lr.score(X_train, y_train):.2f}"
print(f"Linear Regression-Test set score: {lr.score(X_test, y_test):.2f}")
==================================Output===================================
===========================================================================
Try different techniques and combinations to find what works best for your task.
Monitor your model’s performance on both training and validation data to avoid
overfitting. By understanding the roles of regularization and optimization, you can
make informed decisions to train deep learning models that are both accurate and
generalizable, performing well on unseen data and avoiding the pitfalls of
@seismicisolation
@seismicisolation
overfitting. Remember, the journey to optimal performance often involves iterative
experimentation and careful tuning of these essential elements. Some of the
commonly used optimization algorithms are:
import mglearn as ml
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from numpy import genfromtxt
dataset = genfromtxt('→https://raw.githubusercontent.com/m-mehdi/tutorials/main/bo
X = dataset[:, :-1]
y = dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.7).fit(X_train, y_train)
print(f"Ridge Regression-Training set score: {ridge.score(X_train, y_train):
print(f"Ridge Regression-Test set score: {ridge.score(X_test, y_test):.2f}")
===================================Output==================================
@seismicisolation
@seismicisolation
===========================================================================
import mglearn as ml
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from numpy import genfromtxt
dataset = genfromtxt('→https://raw.githubusercontent.com/m-mehdi/tutorials/main/bo
X = dataset[:, :-1]
y = dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=1.0).fit(X_train, y_train)
print(f"Lasso Regression-Training set score: {lasso.score(X_train, y_train):
print(f"Lasso Regression-Test set score: {lasso.score(X_test, y_test):.2f}")
===============================Lasso
Regression=============================
===========================================================================
@seismicisolation
@seismicisolation
Fig. 7.10: Batch normalization.
@seismicisolation
@seismicisolation
process more stable and allowing the network to learn faster.
2. Improved performance: The stabilized learning process often leads to
better overall performance, achieving higher accuracy compared to models
without BatchNorm.
3. Less sensitivity to initialization: BatchNorm makes neural networks less
sensitive to the choice of initial weights, easing the training process and
reducing the risk of getting stuck in bad local minima.
4. Reduced gradient vanishing/exploding: Normalization can help mitigate
the issue of vanishing or exploding gradients, which can hinder training in
deep networks. We apply a batch normalization layer as follows for a
minibatch:
1 m
∑ xi
Xi −Meani
Xi = StdDevi
μB =
m i=1
1 m
∑ (xi − μB )
2
σB2 =
m i=1
∧ xi − μB
xi =
√σB2 + ∈
∧
yi = γxi + β = BNγ,β (xi )
BatchNorm is a valuable tool for training deep neural networks, offering faster
convergence, improved performance, and reduced sensitivity to initialization. By
understanding its core principles, benefits, and considerations, you can effectively
leverage BatchNorm to achieve better results in your deep learning projects. The
working mechanism of BatchNorm is given below:
Calculate mean and standard deviation: For each layer and each batch of
data, BatchNorm computes the mean and standard deviation of the
activations across that batch.
Normalize activations: Each activation in the batch is then subtracted by
the mean and divided by the standard deviation.
Scale and shift: To preserve information, the normalized activations are
multiplied by learned scale and shift parameters (gamma and beta), allowing
the network to adapt to the normalization step.
@seismicisolation
@seismicisolation
7.15.2 Best Practices of BatchNorm
1. Batch size: BatchNorm is typically used with small batch sizes, as it relies on
accurate statistics within each batch.
2. Hyperparameter tuning: Tuning the learning rate and adjusting the scale
and shift parameters can be crucial for optimal performance.
3. Minibatch statistics: BatchNorm uses statistics from the current minibatch,
which can be an approximation of the population statistics. This might lead
to slight inconsistencies during training and inference.
4. Alternative normalization techniques: Other normalization techniques like
layer normalization and group normalization exist, offering different trade-
offs and potentially better performance in specific scenarios.
@seismicisolation
@seismicisolation
safety net against exploding gradients and aiding in stable and efficient training.
By understanding its principles, benefits, and considerations, you can effectively
implement gradient clipping to enhance the performance and robustness of your
deep learning models. The three types of gradient clipping approaches available
are:
Let’s dive deeper into gradient clipping with following python code.
import tensorflow as tf
from tensorflow.keras import Model, layers
import numpy as np
import tensorflow_datasets as tfds
# Hyperparameters
num_classes = 10 # total classes (0-9 digits).
num_features = 784 # data features (img shape: 28✶28).
# Training Parameters
learning_rate = 0.001
training_steps = 1000
batch_size = 32
display_step = 100
# Network Parameters
# MNIST image shape is 28✶28px, we will then handle 28 sequences of 28 timesteps for every sa
num_input = 28 # number of sequences.
timesteps = 28 # timesteps.
num_units = 32 # number of neurons for the LSTM layer.
print(tf.__version__)
import neptune
run = neptune.init_run(project='common/tf-keras-integration', api_token='ANONYMOU
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Convert to float32.
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
# Flatten images to 1-D vector of 784 features (28✶28).
x_train, x_test = x_train.reshape([-1, 28, 28]), x_test.reshape([-1, num_features]
@seismicisolation
@seismicisolation
# Normalize images value from [0, 255] to [0, 1].
x_train, x_test = x_train / 255., x_test / 255.
# Use tf.data API to shuffle and batch data.
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(
# Create LSTM Model. class Net(Model):
# Set layers.
def __init__(self):
super(Net, self).__init__()
# RNN (LSTM) hidden layer.
self.lstm_layer = layers.LSTM(units=num_units)
self.out = layers.Dense(num_classes)
# Set forward pass.
def __call__(self, x, is_training=False):
# LSTM layer.
x = self.lstm_layer(x)
# Output layer (num_classes).
x = self.out(x)
if not is_training:
# tf cross entropy expect logits without softmax, so only
# apply softmax when not training.
x = tf.nn.softmax(x)
return x
# Build LSTM model.
network = Net()
# Cross-Entropy Loss.
# Note that this will apply 'softmax' to the logits.
def cross_entropy_loss(x, y):
# Convert labels to int 64 for tf cross-entropy function.
y = tf.cast(y, tf.int64)
# Apply softmax to logits and compute cross-entropy.
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=x)
# Average loss across the batch.
return tf.reduce_mean(loss)
# Accuracy metric.
def accuracy(y_pred, y_true):
# Predicted class is the index of highest score in prediction vector
(i.e. argmax).
correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true,
@seismicisolation
@seismicisolation
tf.int64))
return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1
# Adam optimizer.
optimizer = tf.optimizers.Adam(learning_rate)
# Optimization process. def run_optimization(x, y):
# Wrap computation inside a GradientTape for automatic differentiation.
with tf.GradientTape() as tape:
# Forward pass.
pred = network(x, is_training=True)
# Compute loss.
loss = cross_entropy_loss(pred, y)
# Variables to update, i.e. trainable variables.
trainable_variables = network.trainable_variables
# Compute gradients.
gradients = tape.gradient(loss, trainable_variables)
# Clip-by-value on all trainable gradients
gradients = [(tf.clip_by_value(grad, clip_value_min=-1.0,
clip_value_max=1.0)) for grad in gradients]
# Update weights following gradients.
optimizer.apply_gradients(zip(gradients, trainable_variables))
# Run training for the given number of steps. for step, (batch_x, batch_y) in enumerate
# Run the optimization to update W and b values.
run_optimization(batch_x, batch_y)
if step % display_step == 0:
pred = lstm_net(batch_x, is_training=True)
loss = cross_entropy_loss(pred, batch_y)
acc = accuracy(pred, batch_y)
run['monitoring/logs/loss'].log(loss)
run['monitoring/logs/acc'].log(acc)
print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))
Some of the best practices of gradient clipping approaches are given below:
@seismicisolation
@seismicisolation
Alternative techniques: Other techniques like gradient normalization and
adaptive learning rates can also help address exploding gradients.
Summary
The ANN is inspired by the human brain, and neural networks are interconnected
layers of artificial neurons that process information. Each neuron performs a
simple computation, and the connections between them determine the overall
behavior of the network. These networks learn by adjusting the connections
between neurons based on training data. The perceptrons and activation functions
serve as fundamental elements in constructing neural networks, paving the way
for complex learning and decision-making capabilities. Let’s delve into their
individual roles and how they work together. Imagine a perceptron as a simple
neuron-like structure that receives input signals, processes them, and generates
an output. It’s the building block of neural networks, responsible for basic
computations and information flow. TensorFlow is a powerful and versatile open-
source software library, primarily used for developing and deploying machine
learning and deep learning models. It provides a flexible and efficient platform
for various tasks from image recognition and NLP to self-driving cars and scientific
computing. Keras is a high-level API for building and training deep learning
models, developed and maintained by Google. It sits on top of powerful libraries
like TensorFlow and Theano, providing a simpler and more user-friendly interface
for deep learning tasks. Here’s what you need to know about Keras.
CNNs have emerged as champions in the field of image and video analysis.
Their unique architecture, inspired by the human visual system, allows them to
excel at tasks like image recognition, object detection, and video classification.
Let’s delve into the core concepts and capabilities of CNNs. CNNs play a crucial
role in image processing, excelling in various tasks due to their unique architecture
inspired by the human visual system. Let’s delve deeper into how CNNs contribute
to image processing and explore specific applications. In the fascinating world of
image processing, CNN architecture serves as the blueprint for CNNs, dictating
the flow of information and enabling them to excel at tasks like image
classification, object detection, and image segmentation.
The pooling and dropout layers play distinct yet essential roles in boosting
performance and preventing overfitting. Here’s a breakdown of their individual
functionalities and how they work together. RNNs emerge as a powerful tool for
processing sequential data, where the order of elements matters. Unlike
traditional neural networks that handle independent inputs, RNNs introduce a key
difference compared to feedforward neural networks: internal memory. This
memory allows them to retain information from previous inputs and use it to
process the current input, enabling them to capture dependencies and context
@seismicisolation
@seismicisolation
within sequential data. The sequential data emerges as a unique type of
information where the order of elements is crucial. Unlike independent data
points, understanding sequential data requires considering the past and
anticipating the future within its inherent structure. Imagine a sentence, a song, a
stock market timeline, or a video clip. These all represent sequential data, where
each element (word, note, price point, frame) carries intrinsic meaning and
influences those that follow. Processing sequential data effectively requires
methods that capture these dependencies and context.
The LSTM and GRU networks stand out as powerful tools for handling
sequential data, where order matters. Both are special types of RNNs, designed
to overcome a major limitation of traditional RNNs: the vanishing gradient
problem. This problem hinders their ability to learn long-term dependencies within
sequences. The seq2seq stands for Bridging the Gap Between Different Sequences
models. It emerges as a powerful and versatile tool for tasks involving the
transformation of one sequence of data into another. From translating
languages to generating captions for images, these models excel at capturing the
relationships and context within sequences, enabling them to perform various
impressive tasks. The transfer learning stands for Building on Existing Knowledge
for Faster Progress. It emerges as a powerful technique that allows you to
leverage knowledge gained from one task to improve performance on a
related one. Imagine training a dog to fetch a specific toy; you wouldn’t start from
scratch each time it encounters a new object. Similarly, transfer learning enables
models to “remember” what they’ve learned previously and adapt it to new
situations, significantly accelerating the learning process and improving results.
Fine-tuning involves modifying the weights of a pretrained model to adapt it to
a new task. Typically, the lower layers of the model, which capture general
features, are frozen, while the higher layers, responsible for more specific learning,
are fine-tuned on your own dataset.
The generative adversarial networks (GANs) stand out as a powerful and
fascinating technique for generating new data, like images, text, or music. Imagine
creating realistic portraits of people who never existed, or composing music in the
style of your favorite artist – that’s the kind of magic GANs can achieve!
Vanilla GAN: The original GAN architecture, with separate generator and
discriminator networks.
Deep convolutional GAN (DCGAN): Leverages convolutional layers in both
the generator and discriminator, particularly effective for image generation.
Wasserstein GAN (WGAN): Improves training stability by using a different loss
function and gradient penalty.
Generative adversarial networks with gradient penalty (GAN-GP):
Combines aspects of DCGAN and WGAN for improved stability and performance.
StyleGAN: Utilizes style transfer techniques to generate highly diverse and
@seismicisolation
@seismicisolation
realistic images.
The regularization and optimization are two fundamental techniques that
work hand-in-hand to improve the performance and generalizability of your
models. The batch normalization (BatchNorm) emerges as a powerful technique
that improves the training speed and stability of neural networks. It acts like a
magic ingredient, smoothing the training process and often leading to better
performance. The gradient clipping emerges as a crucial technique for
preventing exploding gradients, a phenomenon that can hinder the training
process and lead to unstable or even diverging models. Imagine training a dog:
you wouldn’t pull too hard on the leash, as it could make them resist or even run
away. Similarly, gradient clipping helps you “control” the learning process by
setting reasonable limits on the changes made to your model’s weights.
Exercise (MCQs)
1.
In a convolutional neural network (CNN), what does the “pooling layer” do?
A) Normalizes activations in the previous layer
B) Reduces the dimensionality of the feature maps
C) Softmax
D) Linear
3.
What is the main difference between gradient descent and Adam, two
popular optimization algorithms for neural networks?
@seismicisolation
@seismicisolation
A) Adam uses adaptive learning rates, while gradient descent has a fixed rate.
B) Adam is faster for large datasets, while gradient descent is better for small
datasets.
C) Adam requires less tuning of hyperparameters compared to gradient
descent.
D) Adam is more prone to overfitting than gradient descent.
4.
What are the main challenges associated with training generative adversarial
networks (GANs)?
A) Selecting the right architecture for both the generator and discriminator
B) Ensuring stable training and avoiding mode collapse
D)
All of the above
6.
C) Early stopping
@seismicisolation
@seismicisolation
7.
How can you interpret the weights of a trained neural network to understand
what it has learned?
A) By directly analyzing the weight values
B) Using visualization techniques like saliency maps
What are the ethical considerations involved in using deep learning models,
especially those trained on large datasets?
A) Bias and fairness in decision-making
B) Explainability and interpretability of model predictions
What are the latest advancements and research directions in the field of deep
learning?
A) Explainable AI (XAI) for interpretable models
B) Continual learning for adapting to new data streams
@seismicisolation
@seismicisolation
D) Neuromorphic computing for more efficient hardware
Answer Key
1. b) Reduces the dimensionality of the feature maps
2. d) Linear
3. a) Adam uses adaptive learning rates, while gradient descent has a fixed
rate.
4. a) To improve accuracy by reducing overfitting
5. d) All of the above
6. d) Data augmentation with label smoothing
7. d) All of the above
8. d) All of the above
9. d) All of the above
10. c) Transformers for natural language processing and beyond
@seismicisolation
@seismicisolation
4. ______________ is a valuable tool for training deep neural networks, offering
faster convergence, improved performance, and reduced sensitivity to
initialization.
5. Gradient clipping sets a_____________ for the magnitude of gradients.
Answers
1. Neural networks
2. Perceptrons
3. Pooling layer
4. BatchNorm
5. Maximum threshold
@seismicisolation
@seismicisolation
Chapter 8 Specialized Applications and Case
Studies
import nltk
# downloading and installing required libraries
nltk.download('maxent_ne_chunker')
nltk.download('punkt')
nltk.download('words')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import sent_tokenize, word_tokenize
text = """Natural Language Processing (NLP) is a field of artificial intelligence (AI) that de
sentences = sent_tokenize(text)
print(sentences)
words = word_tokenize(text)
print(words)
===============================Output======================================
@seismicisolation
@seismicisolation
['Natural Language Processing (NLP) is a field of artificial intelligence (AI) that
deals \nwith the interaction between computers and human language.', 'Its
goal is to enable computers \nto understand, interpret, and manipulate
natural language in a way that is similar to how \nhumans do.', 'Its primary
goal is to enable computers to understand, interpret, and generate \nhuman
language in a manner that is both meaningful and useful.', 'This includes
written text, \nspoken language, and even sign language.']
['Natural', 'Language', 'Processing', '(', 'NLP', ')', 'is', 'a', 'field', 'of', 'artificial',
'intelligence', '(', 'AI', ')', 'that', 'deals', 'with', 'the', 'interaction', 'between',
'computers', 'and', 'human', 'language', '.', 'Its', 'goal', 'is', 'to', 'enable',
'computers', 'to', 'understand', ',', 'interpret', ',', 'and', 'manipulate', 'natural',
'language', 'in', 'a', 'way', 'that', 'is', 'similar', 'to', 'how', 'humans', 'do', '.',
'Its', 'primary', 'goal', 'is', 'to', 'enable', 'computers', 'to', 'understand', ',',
'interpret', ',', 'and', 'generate', 'human', 'language', 'in', 'a', 'manner', 'that',
'is', 'both', 'meaningful', 'and', 'useful', '.', 'This', 'includes', 'written', 'text', ',',
'spoken', 'language', ',', 'and', 'even', 'sign', 'language', '.']
===========================================================================
The above code will tokenize the words and sentences. It shows that NLP is a
rapidly growing field with the potential to revolutionize the way we interact with
computers and information. NLP techniques can analyze the structure and
semantics of language to understand the intended meaning behind words and
sentences. NLP can be used to create text, translate languages, and even write
different kinds of creative content. We can use NLP in some of the fields below:
@seismicisolation
@seismicisolation
concise summaries of longer texts while preserving the key information and
main points. This is useful for quickly extracting important information from
large documents or articles.
5. Automatic language identification: Identify the language a piece of text is
written in. Understanding the meaning and intent behind user queries or
commands. This involves tasks like intent recognition, slot filling, and
dialogue management in conversational systems.
6. Speech recognition and text-to-speech: Convert speech to text and vice
versa. Converting spoken language into text. This is the technology behind
virtual assistants like Siri, Alexa, and Google Assistant.
7. Named entity recognition (NER): Identifying and classifying named entities
mentioned in text into predefined categories such as names of persons,
organizations, and locations.
8. Text generation: Generating human-like text based on given input or
prompts. This can be used for various applications such as chatbots, content
generation, and dialogue systems.
9. Question answering: Automatically answering questions posed in natural
language based on a given context or knowledge base. This includes tasks
like reading comprehension and FAQ systems.
10. Text mining: Extracting useful insights and patterns from large volumes of
text data. This includes techniques such as text clustering, topic modeling,
and trend analysis.
11. Part-of-speech (POS) tagging: It helps into improving the performance of
sequence labeling tasks by incorporating word embeddings as features in
machine learning models. Let’s have a look on the following code:
===================================Output==================================
@seismicisolation
@seismicisolation
'TO'), ('understand', 'VB'), (',', ','), ('interpret', 'VB'), (',', ','), ('and', 'CC'),
('manipulate', 'VB'), ('natural', 'JJ'), ('language', 'NN'), ('in', 'IN'), ('a', 'DT'),
('way', 'NN'), ('that', 'WDT'), ('is', 'VBZ'), ('similar', 'JJ'), ('to', 'TO'), ('how',
'WRB'), ('humans', 'NNS'), ('do', 'VBP'), ('.', '.'), ('Its', 'PRP$'), ('primary', 'JJ'),
('goal', 'NN'), ('is', 'VBZ'), ('to', 'TO'), ('enable', 'JJ'), ('computers', 'NNS'), ('to',
'TO'), ('understand', 'VB'), (',', ','), ('interpret', 'VB'), (',', ','), ('and', 'CC'),
('generate', 'VB'), ('human', 'JJ'), ('language', 'NN'), ('in', 'IN'), ('a', 'DT'),
('manner', 'NN'), ('that', 'WDT'), ('is', 'VBZ'), ('both', 'DT'), ('meaningful', 'JJ'),
('and', 'CC'), ('useful', 'JJ'), ('.', '.'), ('This', 'DT'), ('includes', 'VBZ'), ('written',
'VBN'), ('text', 'NN'), (',', ','), ('spoken', 'JJ'), ('language', 'NN'), (',', ','), ('and',
'CC'), ('even', 'RB'), ('sign', 'JJ'), ('language', 'NN'), ('.', '.')]
===========================================================================
NLP uses various machine learning techniques, such as deep learning, to analyze
and process language data. These methods are used to analyze the patterns and
relationships between different elements of language. NLP incorporates
knowledge of grammar, syntax, semantics, and pragmatics to understand the
nuances of human language. NLP can help break down language barriers and
facilitate communication between people and machines. NLP can automate tasks
that currently require human intervention such as analyzing customer feedback or
translating documents. NLP can help organizations make better decisions by
providing insights from large amounts of textual data. Like every coin has two side
one is head and tail, the same way by having the many positives with NLP still it is
having few challenges as well, Some of them are given below:
8.1.1 Tokenization
Tokenization is a fundamental task in NLP that involves breaking down a text into
smaller units called tokens. These tokens could be words, subwords, characters, or
even phrase s, depending on the specific requirements of the task at hand. The
@seismicisolation
@seismicisolation
process of tokenization plays a crucial role in many NLP tasks because it forms the
basis for further analysis and processing. These tokens can be individual words,
characters, sentences, or even phrases, depending on the specific task and chosen
technique. Computers cannot directly understand the meaning of continuous text.
Tokenization transforms it into a format that machines can process and analyze.
Tokenization lays the groundwork for various NLP tasks like sentiment analysis,
machine translation, and text classification. We have some challenges to in the
tokenization to, some of them are given below:
@seismicisolation
@seismicisolation
humans through natural language.” would be tokenized into [“NLP is a
fascinating field.”, “It involves the interaction between computers and
humans through natural language.”]. This divides the text into individual
sentences at full stops, exclamation marks, or question marks.
3. Subword tokenization: Subword tokenization breaks down words into
smaller meaningful units called subwords or morphemes. This is particularly
useful for handling out-of-vocabulary words and dealing with languages with
complex morphology. Techniques like byte pair encoding and WordPiece are
commonly used for subword tokenization.
4. Character tokenization: In character tokenization, each character in the
text becomes a separate token. This approach is useful when analyzing text
at a very fine-grained level or when dealing with languages with complex
scripts. This breaks down the text into individual characters, which is useful
for certain tasks like spelling correction or character-level language models.
5. Phrasal tokenization: Phrasal tokenization involves grouping consecutive
words into phrases or chunks based on predefined rules or patterns. This
can be useful for tasks like named entity recognition or chunking.
6. N-gram tokenization: This creates sequences of n consecutive words,
useful for tasks like language modeling and machine translation.
Word embeddings are a type of word representation in NLP that aims to capture
the semantic meaning of words in a continuous vector space. Traditional methods
of representing words, such as one-hot encoding or bag-of-words, lack the ability
to capture semantic relationships between words and often result in high-
dimensional and sparse representations. Word embeddings address these
limitations by representing words as dense vectors in a continuous vector space,
where similar words are mapped to nearby points. These embeddings are learned
from large corpora of text using unsupervised or semi-supervised techniques, such
as neural network models. Word embeddings are a powerful technique for
representing words as numerical vectors. These vectors capture the semantic
meaning and relationships between words, allowing machines to understand
language nuances beyond just the individual words themselves. Word embeddings
are a fundamental building block for many NLP tasks. By encoding semantic
meaning and relationships, they empower machines to understand and process
language in a way that is closer to how humans do. There are two popular
methods or algorithms for generating word embeddings such as
@seismicisolation
@seismicisolation
context. Developed by researchers at Google, Word2Vec is a shallow neural
network model that learns word embeddings by predicting the context of
words within a window of text. It consists of two main architectures:
Continuous Bag of Words (CBOW) and Skip-gram. CBOW predicts the target
word based on its context words, while Skip-gram predicts context words
given a target word.
2. GloVe (global vectors for word representation): GloVe is a method that
uses global word co-occurrence statistics, capturing semantic relationships
based on how often words appear together across the entire corpus. GloVe
is an unsupervised learning algorithm for obtaining word embeddings by
factorizing the co-occurrence matrix of words in a corpus. It leverages global
statistical information about the entire corpus to learn word representations.
3. FastText: This algorithm was developed by Facebook AI Research, and
FastText is an extension of Word2Vec that takes into account subword
information. Instead of learning embeddings only for complete words,
FastText constructs embeddings for character n-grams and averages them
to obtain the representation for each word. This approach is particularly
useful for handling out-of-vocabulary words and morphologically rich
languages.
@seismicisolation
@seismicisolation
relationships between words in different languages.
1. Sentiment analysis: Analyze the sentiment of text data by
understanding the emotions associated with words and their
relationships.
2. Text classification: Categorize text data into specific groups based on
the meaning encoded in the word embeddings.
3. Question answering: Answer questions accurately by understanding
the context and relationships between words in the query and the text
data.
@seismicisolation
@seismicisolation
due to their ability to capture long-range dependencies. LSTM is a special
type of RNN designed to address the vanishing gradient problem, allowing
them to learn long-term dependencies in sequences.
3. Gated recurrent units (GRUs): GRUs are another variant of RNNs that
simplify the architecture compared to LSTMs by combining the input and
forget gates into a single update gate. While they are conceptually similar to
LSTMs, GRUs have fewer parameters and are computationally more efficient,
making them suitable for applications with limited computational resources.
4. Transformer models: Transformers are a recent advancement in sequence
modeling that have gained widespread popularity in NLP. They rely on self-
attention mechanisms to capture global dependencies between input and
output sequences. Transformers consist of an encoder-decoder architecture,
where the encoder processes the input sequence and the decoder generates
the output sequence. Models like BERT (bidirectional encoder
representations from transformers) and GPT (generative pretrained
transformer) have achieved state-of-the-art performance on various NLP
tasks by leveraging transformer architectures. These models use an
attention mechanism to focus on the most relevant parts of the sequence
when processing each element, overcoming limitations of RNNs in handling
long sequences.
5. Convolutional neural networks (CNNs): While primarily used for image
processing, CNNs can also be applied to sequence modeling tasks in NLP.
CNNs operate on fixed-size input windows and learn hierarchical feature
representations by applying convolutional filters across the input sequence.
They are particularly effective for tasks like text classification and sentiment
analysis.
Overall we can say that the sequence modeling plays a vital role in modern NLP. By
considering the order and context of data points, these models enable machines
to understand complex relationships and perform various NLP tasks with greater
accuracy and effectiveness.
@seismicisolation
@seismicisolation
like:
1. Machine translation: Understand the context of a sentence in one
language and translate it accurately to another, preserving meaning and
structure.
2. Sentiment analysis: Analyze the sentiment of a text by considering not
just individual words but also their sequence and how they influence
each other’s meaning.
3. Speech recognition: Convert spoken language into text by
understanding the sequence of sounds and their relationships.
4. Text summarization: Identify and extract the main points from a piece
of text by considering the flow and sequence of ideas.
1. Word embeddings for time series: In NLP, words or tokens can be treated
as time-series data, especially in tasks like sentiment analysis or topic
modeling over time. Word embeddings, such as Word2Vec or GloVe, can
capture semantic relationships between words in the context of time. By
analyzing changes in word embeddings over time, it’s possible to forecast
future trends in language usage or sentiment.
2. Language models: Large pretrained language models like GPT or BERT have
been used for time-series forecasting in NLP. By fine-tuning these models on
historical text data, they can generate predictions about future text
sequences. For example, they can predict the next word in a sentence or
generate entire paragraphs of text based on past patterns.
3. Temporal convolutional networks (TCNs): TCNs are neural network
@seismicisolation
@seismicisolation
architectures designed for sequential data processing, including time-series
data. They use convolutional layers with causal padding to capture temporal
dependencies in the input sequence. TCNs have been applied to text data for
tasks like language modeling and text generation, making them suitable for
time-series forecasting in NLP.
4. Recurrent neural networks (RNNs) and long short-term memory (LSTM)
networks: RNNs and LSTMs are commonly used for sequential data
processing including time-series forecasting. In NLP, these architectures can
be adapted to model the temporal dynamics of text data and make
predictions about future sequences. For example, they can be trained to
predict the next word in a sentence or the sentiment of future text.
5. Attention mechanisms: Attention mechanisms, commonly used in
transformer architectures like BERT and GPT, can be leveraged for time-
series forecasting in NLP. These mechanisms allow the model to focus on
relevant parts of the input sequence, which is useful for capturing temporal
patterns in text data. By attending to historical text sequences, the model
can make predictions about future trends or language usage.
There is no doubt that NLP is not a good algorithm but still we have some
challenges into it in the model training such as:
@seismicisolation
@seismicisolation
Autoregressive (AR): This component takes into account the impact of past
values of the time series on the forecast. It considers how many past values
(called lags) are statistically significant in influencing the current value or in
the other words we can say. The AR component represents the relationship
between the current value of the series and its past values. It models the
dependency of the current observation on its lagged (past) values. The “p”
parameter determines the number of lagged observations included in the
model, where p, d, and q stands for:
p: Number of autoregressive terms (lag order)
d: Degree of differencing needed to achieve stationarity
q: Number of moving average terms
Choosing the appropriate p, d, and q values is crucial for accurate forecasts.
Various statistical tests and information criteria are used to identify the best
fitting model.
Integrated (I): The I component represents the differencing of the time
series to make it stationary. Stationarity is a key assumption in ARIMA
modeling, as it ensures that the statistical properties of the series remain
constant over time. The “d” parameter determines the order of differencing
required to achieve stationarity. This component deals with nonstationary
time series data where the mean, variance, or seasonality changes over time.
Differencing is applied to the data to achieve stationarity, making it
statistically stable for analysis and forecasting.
Moving average (MA): The MA component represents the dependency
between the current observation and a linear combination of past error
terms (residuals). It models the noise or random fluctuations in the time
series. The “q” parameter determines the number of lagged residuals
included in the model. This component considers the average of past errors
(the difference between predicted and actual values) to improve the forecast
by accounting for random fluctuations in the data.
ARIMA models are typically denoted by the notation ARIMA(p, d, q), where “p”
represents the AR order, “d” represents the differencing order, and “q” represents
the MA order.
While ARIMA models are primarily used for numerical data, they can be applied
to certain aspects of text data analysis, particularly when dealing with time-series
trends in textual information. For example, ARIMA models could be used to
forecast the frequency of certain keywords or phrases in a text corpus over time.
This could be useful for tasks such as analyzing trends in social media
conversations, monitoring changes in public opinion, or forecasting demand for
specific products or services based on textual data. We can use ARIMA model in
Predicting future sales figures, Forecasting stock prices, Estimating customer
@seismicisolation
@seismicisolation
demand, and Analyzing economic trends.
However, it’s important to note that ARIMA models may not be directly
applicable to all aspects of text data analysis, especially when dealing with
unstructured textual information. In such cases, other techniques such as NLP and
machine learning may be more appropriate for extracting insights and making
predictions from text data.
ARIMA models are a valuable tool for time series forecasting, offering a robust
and interpretable approach. However, it’s important to be aware of their
limitations and consider alternative methods for nonstationary or complex data or
for long-term forecasting needs. ARIMA algorithm can’t be used in some of the
situation like:
Prophet and neural networks are two different approaches used for time-series
forecasting, each with its own strengths and applications. Both Prophet and Neural
Networks are powerful tools for time-series forecasting, but they have their own
strengths and weaknesses. Prophet and neural networks offer different
approaches to time-series forecasting, each with its own strengths and
weaknesses. Prophet provides a simple yet powerful framework for forecasting
with strong seasonal patterns and special events, while neural networks offer
flexibility and scalability for modeling complex dependencies in sequential data.
The choice between these approaches depends on factors such as the nature of
the data, the presence of seasonal patterns, and the computational resources
available for model training and deployment.
@seismicisolation
@seismicisolation
and missing data
3. Intuitive model diagnostics and visualization tools for analyzing forecast
results.
4. Built-in support for modeling holidays and special events that impact the
time series.
Prophet is relatively easy to use and requires minimal data preprocessing,
making it accessible to users with varying levels of expertise. It provides a
powerful yet user-friendly interface for time-series forecasting, making it
suitable for both beginners and experienced practitioners.
Neural network: Neural networks, particularly recurrent neural networks
(RNNs) and their variants like long short-term memory (LSTM) networks, are
a class of deep learning models capable of learning complex patterns and
dependencies in sequential data. When applied to time-series forecasting,
neural networks offer several advantages:
1. Ability to capture nonlinear relationships and complex patterns in the
data.
2. Flexibility in modeling various types of time-series data, including both
univariate and multivariate series.
3. Scalability to handle large-scale datasets and high-dimensional input
features.
4. Capability to automatically extract relevant features from raw data,
reducing the need for manual feature engineering.
Pros Cons
Simple and user-friendly: Easy to use and Limited flexibility: Not as flexible as neural networks
understand, requiring minimal data in capturing complex nonlinear relationships in data.
preprocessing and code.
Interpretable model: Provides insights into the May struggle with nonstationary data: May not
factors influencing the forecast such as trend, perform well with data that exhibits significant trends
seasonality, and holidays. or changes in variance over time.
Handles seasonality and holidays: Can Limited feature engineering: Offers limited options
automatically capture and model seasonal for incorporating additional features beyond the
patterns and holiday effects. provided model components.
Fast and efficient: Requires less computational
@seismicisolation
@seismicisolation
resources compared to neural networks.
The →Tab. 8.2 shows some of the Pros and cons of neural network.
Pros Cons
High flexibility: Capable of capturing complex non- Complexity and difficulty of use: Can be complex
linear relationships in data, making them suitable for to set up, requiring more expertise in data science
diverse forecasting tasks. and machine learning.
Can handle nonstationary data: Able to learn from Interpretability: Can be difficult to interpret the
various types of data including nonstationary data. inner workings and reasoning behind the model’s
predictions.
Incorporation of additional features: Can be Computational cost: Training neural networks
combined with other features beyond the time series can be computationally expensive and resource-
data to improve accuracy. intensive.
Data requirements: Often require less amounts of Data requirements: Often require larger
data to achieve optimal performance. amounts of data to achieve optimal performance.
The best choice between Prophet and neural networks depends on several factors:
1. Data characteristics:
1. If your data exhibits seasonality, holidays, and limited non-linearity,
Prophet might be a good choice for its simplicity and interpretability.
2. If your data is complex, nonstationary, and requires capturing intricate
patterns, a neural network might be more suitable due to its flexibility.
2. Project requirements:
1. Speed and ease of use might be priorities if time constraints are tight
and interpretability is crucial.
2. Focus on high accuracy and capturing intricate patterns might
outweigh complexity concerns if resources allow.
3. Additional options:
1. Hybrid approaches: Combining Prophet with another algorithm like
XGBoost can leverage the interpretability of Prophet while improving its
ability to handle non-linearity.
2. Advanced neural network architectures: Explore specific neural
network architectures like LSTMs or Transformers designed for time-
series forecasting and capable of handling complex patterns.
@seismicisolation
@seismicisolation
Recommender systems are a type of information filtering system that aim to
predict user preferences or interests and recommend items (such as products,
movies, music, and articles) that are likely to be of interest to them. These systems
play a crucial role in various online platforms and services, helping users discover
relevant content and improving user engagement and satisfaction. NLP offers a
powerful toolkit for enhancing recommender systems by extracting meaningful
insights from textual data, leading to more accurate, personalized, and insightful
recommendations for users. Recommender systems are widely used in e-
commerce platforms, streaming services, social media, news websites, and other
online platforms to personalize user experiences, increase user engagement, and
drive business revenue. The choice of a specific recommender system depends on
factors such as the characteristics of the data, the available features, the scalability
requirements, and the desired level of recommendation accuracy. There are
several types of recommender systems, each employing different algorithms and
techniques:
@seismicisolation
@seismicisolation
those the user has liked or interacted with in the past. The content based
filtering can be divided into the several techniques.
1. Text similarity: Analyze the similarity between user preferences (e.g.,
reviews and product descriptions) and available items based on
keywords, topics, or semantic meaning extracted through NLP
techniques like word embeddings.
2. Named entity recognition (NER): Identify and extract relevant entities
like products, locations, or people from text data, enabling
recommendations based on specific user interests mentioned in past
interactions. Let’s have a look on the code below:
=========================Output====================================
(S
Natural/JJ
Language/NNP
Processing/NNP
(/(
(ORGANIZATION NLP/NNP)
)/)
is/VBZ
a/DT
field/NN
of/IN
@seismicisolation
@seismicisolation
artificial/JJ
intelligence/NN
(/(
AI/NNP
)/)
that/IN
deals/NNS
with/IN
the/DT
interaction/NN
between/IN
computers/NNS
and/CC
human/JJ
language/NN
./.
Its/PRP$
goal/NN
is/VBZ
to/TO
enable/JJ
computers/NNS
to/TO
@seismicisolation
@seismicisolation
understand/VB
, /,
interpret/VB
, /,
and/CC
manipulate/VB
natural/JJ
language/NN
in/IN
a/DT
way/NN
that/WDT
is/VBZ
similar/JJ
to/TO
how/WRB
humans/NNS
do/VBP
./.
Its/PRP$
primary/JJ
goal/NN
is/VBZ
@seismicisolation
@seismicisolation
to/TO
enable/JJ
computers/NNS
to/TO
understand/VB
, /,
interpret/VB
, /,
and/CC
generate/VB
human/JJ
language/NN
in/IN
a/DT
manner/NN
that/WDT
is/VBZ
both/DT
meaningful/JJ
and/CC
useful/JJ
./.
This/DT
@seismicisolation
@seismicisolation
includes/VBZ
written/VBN
text/NN
, /,
spoken/JJ
language/NN
, /,
and/CC
even/RB
sign/JJ
language/NN
./.)
===================================================================
@seismicisolation
@seismicisolation
3. Deep learning-based recommender systems: Deep learning models, such
as neural networks, can be used to learn complex patterns and
representations from user-item interaction data. Deep learning-based
recommender systems can automatically extract features from raw data and
capture intricate relationships between users and items, leading to more
accurate recommendations.
NLP is very good in terms of performance and accuracy but still it has some
challenges those are given below →Table 8.3:
Benefits Challenges
Improved accuracy and personalization: By Data quality and bias: The quality and
incorporating nuanced insights from text data, NLP can potential biases within textual data can impact
enhance the accuracy and personalization of the accuracy and fairness of
recommendations. recommendations.
Handling diverse textual data: NLP allows systems to Computational cost: NLP techniques can be
understand various forms of user input, including computationally expensive, requiring
reviews, social media posts, or search queries. powerful hardware and efficient algorithms.
Discovery of hidden patterns: NLP techniques can Semantic ambiguity: Language can be
unveil hidden patterns within text data, leading to ambiguous, and NLP models might
unexpected and valuable recommendations for users. misinterpret the meaning or intent of user-
generated text.
1. Object detection and recognition: Computer vision systems can detect and
recognize objects within images or videos. This capability is used in various
applications such as autonomous vehicles, surveillance systems, and
augmented reality (AR). Object detection is also essential in retail for
inventory management and in healthcare for identifying anatomical
structures in medical imaging. Computer vision algorithms can detect and
track objects in images and videos. This technology is used in various
applications including security surveillance, traffic monitoring, and robotics.
@seismicisolation
@seismicisolation
2. Image classification: Image classification involves categorizing images into
predefined classes or categories. This application is widely used in content
moderation, where images are classified as safe or unsafe for certain
audiences. Image classification is also employed in agriculture for identifying
crop diseases, in manufacturing for quality control, and in e-commerce for
visual search.
3. Facial recognition: Facial recognition technology identifies and verifies
individuals based on their facial features. It has applications in security and
law enforcement for identifying suspects or verifying identities. Facial
recognition is also used in access control systems, user authentication in
mobile devices, and personalized marketing. Facial recognition systems use
computer vision to identify individuals from images or videos. This
technology is used in various applications including security systems, social
media platforms, and law enforcement.
4. Medical image analysis: Computer vision plays a crucial role in analyzing
medical images such as X-rays, MRIs, and CT scans. It assists radiologists and
healthcare professionals in diagnosing diseases, detecting abnormalities,
and planning treatments. Medical image analysis techniques include image
segmentation, tumor detection, and organ localization. Computer vision is
used in medical imaging to analyze X-rays, CT scans, and MRIs to detect
abnormalities and aid in diagnosis.
5. Gesture recognition: Gesture recognition systems interpret human
gestures and movements from images or video sequences. These systems
are used in human-computer interaction, sign language recognition, and
virtual reality (VR) applications. Gesture recognition enables users to control
devices, interact with virtual environments, and communicate nonverbally.
6. Document analysis: Document analysis techniques extract and interpret
information from text documents, handwritten forms, and printed materials.
Optical character recognition converts scanned documents into editable text.
Document layout analysis identifies structural elements like headings,
paragraphs, and tables. Document analysis is applied in document
management systems, digital archives, and automated form processing.
7. Autonomous vehicles: Computer vision is a key technology in autonomous
vehicles for perceiving the surrounding environment and making driving
decisions. It enables vehicles to detect lane markings, traffic signs,
pedestrians, and other vehicles. Computer vision algorithms process data
from cameras, LiDAR, and radar sensors to navigate safely and
autonomously.
8. Visual inspection: Visual inspection systems detect defects and anomalies in
manufactured products during the production process. These systems
inspect surfaces, textures, colors, and shapes to identify deviations from
@seismicisolation
@seismicisolation
quality standards. Visual inspection is used in industries such as automotive
manufacturing, electronics assembly, and pharmaceutical production.
9. Self-driving car: Computer vision is essential for self-driving cars to perceive
their surroundings, identify objects like lanes, traffic signs, and pedestrians,
and navigate safely.
10. Augmented reality: AR overlays digital information onto the real world.
Computer vision is used in AR applications to track the user’s environment
and position digital elements accordingly.
11. Virtual reality: VR creates an immersive, computer-generated environment.
Computer vision can be used in VR applications to track the user’s
movements and interact with the virtual world.
12. Drone navigation: Drones use computer vision to navigate their
surroundings, avoid obstacles, and track targets.
13. Robot vision: Robots use computer vision to perceive their environment,
interact with objects, and complete tasks.
14. Quality control: Computer vision is used in quality control applications to
inspect products for defects
15. Retail: Computer vision is used in retail applications to track inventory,
automate checkout processes, and provide personalized recommendations
to customers.
Object detection and segmentation are two important tasks in computer vision,
both involving the identification and localization of objects within images or
videos. Object detection and segmentation are fundamental tasks in computer
vision with numerous applications including autonomous driving, surveillance,
medical imaging, and AR. These tasks enable machines to understand and interact
with visual data, paving the way for a wide range of intelligent applications and
services.
@seismicisolation
@seismicisolation
through a CNN.
Faster R-CNN: Faster R-CNN is a two-stage object detection framework
that uses a region proposal network to generate candidate object
regions, followed by a detection network to refine the proposals and
classify objects. It achieves high accuracy by jointly optimizing region
proposals and object detection.
YOLO (You Only Look Once): YOLO is another real-time object
detection method that divides the input image into a grid of cells and
predicts bounding boxes and class probabilities for each grid cell. It
processes the entire image in a single forward pass through a CNN,
making it faster than two-stage methods like Faster R-CNN.
Object segmentation: Object segmentation is the task of segmenting or
partitioning an image into multiple regions, each corresponding to a distinct
object or object instance. Unlike object detection, which identifies the
presence of objects and their bounding boxes, object segmentation provides
pixel-level masks for each object in the image. Common techniques for
object segmentation include:
Mask R-CNN: Mask R-CNN extends faster R-CNN by adding a branch for
predicting segmentation masks alongside bounding boxes and class
probabilities. It generates pixel-wise masks for each object instance in
the image, enabling precise segmentation of objects with complex
shapes and overlapping instances.
U-Net: U-Net is a fully convolutional network (FCN) architecture
designed for biomedical image segmentation but widely used in other
domains as well. It consists of an encoder-decoder structure with skip
connections that preserve spatial information at different scales. U-Net
is known for its effectiveness in segmenting objects from limited training
data.
Semantic segmentation: Semantic segmentation assigns a class label
to each pixel in the image, without distinguishing between different
instances of the same class. It provides a dense pixel-wise classification
of the entire image, allowing for scene understanding and pixel-level
analysis. Techniques such as FCNs and DeepLab are commonly used for
semantic segmentation tasks. Let’s have look on the code below:
@seismicisolation
@seismicisolation
# Get hypernyms for poodle
hypernyms = syn.hypernyms()
print("Hypernyms of 'dog': ", [h.lemmas()[0].name() for h in hypernyms])
# Get Antonym
synsets = wordnet.synsets('good')
antonym = None
# Search for an antonym in all synsets/lemmas
for syn in synsets:
for lemma in syn.lemmas():
if lemma.antonyms():
antonym = lemma.antonyms()[0].name()
break
if antonym:
break
if antonym:
print("Antonym of 'good': ", antonym)
else:
print("No antonym found for 'good'")
===============================Output==============================
===================================================================
@seismicisolation
@seismicisolation
error by receiving feedback from the environment. The RL offers a powerful
approach to training agents to make optimal decisions in dynamic and interactive
environments. With its potential to learn and adapt without explicit instructions, RL
is transforming various fields and holds promise for even broader applications in
the future. RL algorithms, such as Q-learning, deep Q-networks (DQN), policy
gradient methods, and actor-critic methods, learn to optimize the agent’s policy
through iterative interactions with the environment. RL has applications in various
domains, including robotics, game playing, autonomous systems,
recommendation systems, and finance, among others. Here’s a breakdown of key
aspects:
1. Agent: The learning entity that interacts with the environment and makes
decisions. The agent is the entity that interacts with the environment. It
observes the state of the environment, selects actions, and receives rewards
or penalties based on its actions. The agent’s goal is to learn a policy – a
mapping from states to actions – that maximizes cumulative rewards over
time.
2. Environment: The system or world the agent interacts with, providing
feedback through rewards and penalties. The environment represents the
external system or process with which the agent interacts. It is defined by a
set of states, actions, and transition dynamics. The environment also
provides feedback to the agent in the form of rewards or penalties based on
its actions.
3. Action: The choices the agent can make within the environment. An action is
a decision or choice made by the agent that affects the state of the
environment. Actions can be discrete (e.g., selecting from a finite set of
options) or continuous (e.g., specifying a value within a continuous range).
4. Reward: The feedback signal the environment provides to the agent,
indicating the goodness or badness of its actions. A reward is a scalar
feedback signal provided by the environment to the agent after each action.
It indicates the immediate desirability or quality of the action taken by the
agent. The agent’s goal is to learn a policy that maximizes the cumulative
reward over time.
5. Policy: The agent’s strategy for choosing actions is based on the current
state of the environment. A policy is a mapping from states to actions that
defines the agent’s behavior. It specifies the action the agent should take in
each state to maximize expected cumulative rewards. Policies can be
deterministic or stochastic, depending on whether they directly specify
actions or provide a probability distribution over actions.
@seismicisolation
@seismicisolation
6. State: A state represents the current situation or configuration of the
environment. It contains all the relevant information needed for the agent to
make decisions. States can be discrete or continuous, depending on the
nature of the environment.
7. Value functions: The value function estimates the expected cumulative
reward that an agent can achieve from a given state or state-action pair. It
quantifies the desirability of being in a particular state or taking a particular
action and is used to guide the agent’s decision-making process.
8. Exploration and exploitation: RL involves a trade-off between exploration
(trying out new actions to discover potentially better strategies) and
exploitation (selecting actions that are known to yield high rewards based on
current knowledge). Balancing exploration and exploitation is crucial for
effective learning.
1. Trial and error: Through interacting with the environment and receiving
rewards, the agent learns by trial and error. It gradually improves its policy
by selecting actions that lead to higher rewards over time.
2. No explicit instructions: Unlike supervised learning, RL agents don’t receive
explicit instructions on how to perform the task. They learn solely through
the reward feedback mechanism.
3. Delayed gratification: RL agents need to consider the long-term
consequences of their actions, not just the immediate reward, to achieve
optimal results.
@seismicisolation
@seismicisolation
# set up matplotlib
is_ipython = 'inline' in matplotlib.get_backend()
if is_ipython:
from IPython import display
plt.ion()
# if GPU is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Transition = namedtuple('Transition',
('state', 'action', 'next_state', 'reward'))
class ReplayMemory(object):
def __init__(self, capacity):
self.memory = deque([], maxlen=capacity)
def push(self, ✶args):
"""Save a transition"""
self.memory.append(Transition(✶args))
def sample(self, batch_size):
return random.sample(self.memory, batch_size)
def __len__(self):
return len(self.memory)
class DQN(nn.Module):
def __init__(self, n_observations, n_actions):
super(DQN, self).__init__()
self.layer1 = nn.Linear(n_observations, 128)
self.layer2 = nn.Linear(128, 128)
self.layer3 = nn.Linear(128, n_actions)
# Called with either one element to determine next action, or a batch
# during optimization. Returns tensor([[left0exp,right0exp]…]).
def forward(self, x):
x = F.relu(self.layer1(x))
x = F.relu(self.layer2(x))
return self.layer3(x)
# BATCH_SIZE is the number of transitions sampled from the replay buffer # GAMMA is the disco
# EPS_START is the starting value of epsilon # EPS_END is the final value of epsilon
# EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
# TAU is the update rate of the target network
# LR is the learning rate of the ``AdamW`` optimizer
BATCH_SIZE = 128
GAMMA = 0.99
EPS_START = 0.9
EPS_END = 0.05
@seismicisolation
@seismicisolation
EPS_DECAY = 1000
TAU = 0.005
LR = 1e-4
# Get number of actions from gym action space
n_actions = env.action_space.n
# Get the number of state observations
state, info = env.reset()
n_observations = len(state)
policy_net = DQN(n_observations, n_actions).to(device)
target_net = DQN(n_observations, n_actions).to(device)
target_net.load_state_dict(policy_net.state_dict())
optimizer = optim.AdamW(policy_net.parameters(), lr=LR, amsgrad=True)
memory = ReplayMemory(10000)
steps_done = 0
def select_action(state):
global steps_done
sample = random.random()
eps_threshold = EPS_END + (EPS_START - EPS_END) ✶ \
math.exp(-1. ✶ steps_done / EPS_DECAY)
steps_done += 1
if sample > eps_threshold:
with torch.no_grad():
# t.max(1) will return the largest column value of each row.
# second column on max result is index of where max element was
# found, so we pick action with the larger expected reward.
return policy_net(state).max(1).indices.view(1, 1)
else:
return torch.tensor([[env.action_space.sample()]], device=device, dtype
episode_durations = []
def plot_durations(show_result=False):
plt.figure(1)
durations_t = torch.tensor(episode_durations, dtype=torch.float)
if show_result:
plt.title('Result')
else:
plt.clf()
plt.title('Training…')
plt.xlabel('Episode')
plt.ylabel('Duration')
plt.plot(durations_t.numpy())
@seismicisolation
@seismicisolation
# Take 100 episode averages and plot them too
if len(durations_t) >= 100:
means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
means = torch.cat((torch.zeros(99), means))
plt.plot(means.numpy())
plt.pause(0.001) # pause a bit so that plots are updated
if is_ipython:
if not show_result:
display.display(plt.gcf())
display.clear_output(wait=True)
else:
display.display(plt.gcf())
def optimize_model():
if len(memory) < BATCH_SIZE:
return
transitions = memory.sample(BATCH_SIZE)
# Transpose the batch (see →https://stackoverflow.com/a/19343/3343043 for
# detailed explanation). This converts batch-array of Transitions
# to Transition of batch-arrays.
batch = Transition(✶zip(✶transitions))
# Compute a mask of non-final states and concatenate the batch elements
# (a final state would've been the one after which simulation ended)
non_final_mask = torch.tensor(tuple(map(lambda s: s is not None,
batch.next_state)), device=device, dtype=torch.bool)
non_final_next_states = torch.cat([s for s in batch.next_state
if s is not None])
state_batch = torch.cat(batch.state)
action_batch = torch.cat(batch.action)
reward_batch = torch.cat(batch.reward)
# Compute Q(s_t, a) - the model computes Q(s_t), then we select the
# columns of actions taken. These are the actions which would've been taken
# for each batch state according to policy_net
state_action_values = policy_net(state_batch).gather(1, action_batch)
# Compute V(s_{t+1}) for all next states.
# Expected values of actions for non_final_next_states are computed based
# on the "older" target_net; selecting their best reward with max(1).values
# This is merged based on the mask, such that we'll have either the expected
# state value or 0 in case the state was final.
next_state_values = torch.zeros(BATCH_SIZE, device=device)
with torch.no_grad():
@seismicisolation
@seismicisolation
next_state_values[non_final_mask] = target_net(non_final_next_states).max(
# Compute the expected Q values
expected_state_action_values = (next_state_values ✶ GAMMA) + reward_batch
# Compute Huber loss
criterion = nn.SmoothL1Loss()
loss = criterion(state_action_values, expected_state_action_values.unsqueeze(
# Optimize the model
optimizer.zero_grad()
loss.backward()
# In-place gradient clipping
torch.nn.utils.clip_grad_value_(policy_net.parameters(), 100)
optimizer.step()
if torch.cuda.is_available():
num_episodes = 600
else:
num_episodes = 50
for i_episode in range(num_episodes):
# Initialize the environment and get its state
state, info = env.reset()
state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(
for t in count():
action = select_action(state)
observation, reward, terminated, truncated, _ = env.step(action.item())
reward = torch.tensor([reward], device=device)
done = terminated or truncated
if terminated:
next_state = None
else:
next_state = torch.tensor(observation, dtype=torch.float32, device=device).uns
# Store the transition in memory
memory.push(state, action, next_state, reward)
# Move to the next state
state = next_state
# Perform one step of the optimization (on the policy network)
optimize_model()
# Soft update of the target network's weights
# θ′ ← τ θ + (1 −τ)θ′
target_net_state_dict = target_net.state_dict()
policy_net_state_dict = policy_net.state_dict()
for key in policy_net_state_dict:
@seismicisolation
@seismicisolation
target_net_state_dict[key] = policy_net_state_dict[key]✶TAU + target_net_state_
target_net.load_state_dict(target_net_state_dict)
if done:
episode_durations.append(t + 1)
plot_durations()
break
print('Complete')
plot_durations(show_result=True)
plt.ioff()
plt.show()
8.8 Q-learning
Q-learning is a popular RL algorithm used for learning optimal policies in Markov
decision processes, particularly in settings where the agent has full knowledge of
the environment. It is a model-free, value-based algorithm that learns an action-
value function (Q-function) to determine the quality of taking a particular action in
@seismicisolation
@seismicisolation
a given state. Q-learning is a specific model-free RL algorithm used to train an
agent to make optimal decisions in an environment. It belongs to a family of
algorithms known as value-based methods that learn by estimating the long-term
value of taking specific actions in different states. Over time, as the agent explores
the environment and receives feedback, the Q-values converge to the optimal
action-values, indicating the expected cumulative rewards of taking each action in
each state. The agent then follows the optimal policy by selecting actions with the
highest Q-values in each state.
Q-learning is particularly well-suited for discrete and deterministic
environments with finite state and action spaces. However, it can also be extended
to handle continuous and stochastic environments through function
approximation methods and experience replay techniques. Despite its simplicity,
Q-learning has been successfully applied in various domains including robotics,
game playing, and autonomous systems. Overall, Q-learning is a fundamental and
versatile algorithm in the field of RL. Its simplicity and off-policy learning
capabilities make it a popular choice for various applications. However, it’s crucial
to address the challenges of exploration, convergence, and dimensionality when
implementing Q-learning in complex tasks. Some of the terminologies are used in
Q-learning are:
1. Q-value (Q(s, a)): This represents the estimated future reward an agent
expects to receive by taking action “a” in state “s.”
2. State (s): The current situation or configuration the agent is in.
3. Action (a): The possible choices the agent can make in a given state.
4. Reward (r): The feedback signal the environment provides after the agent
takes an action.
@seismicisolation
@seismicisolation
Update Q-value: The agent updates the Q-value for the current state-action
pair (s, a) using the Bellman equation:
Q(s, a)= Q(s, a)+α[r + γmaxa ' Q(s' , a' )−Q(s, a)]
where α is the learning rate, controlling the step size of the updates. It
controls the weight given to the new information (learning from the current
experience)
γ is the discount factor, determining the importance of future rewards
(balancing immediate and long-term benefits)
r + γmaxa ′ Q(s′ a′ ) is the target value, representing the expected
cumulative reward of taking action a in state s and then following the
optimal policy thereafter.
Termination: Repeat steps 3 (interaction)–5 (Termination) until the
termination condition is met (e.g., a maximum number of iterations and
convergence of Q-values).
Next given →table 8.4 shows the benefits of Q-learning and it challenges
Benefits Challenges
Model-free: Doesn’t require a detailed model of Exploration vs. exploitation: Balancing exploration of
the environment, making it applicable to various new actions with exploiting known good actions
situations. remains crucial.
Off-policy learning: Can learn from data Convergence: In complex environments, convergence
collected using different policies, allowing for to an optimal policy can be slow or even impossible.
efficient exploration of the environment.
Simple and efficient: Easy to understand and Curse of dimensionality: With large state and action
implement, making it a popular choice for RL spaces, the number of Q-values to learn can become
applications. very high, making learning inefficient.
@seismicisolation
@seismicisolation
control tasks, and robotics. It has demonstrated strong performance and sample
efficiency compared to traditional RL algorithms, paving the way for further
advancements in deep RL. Here we gave given some steps to work with DQN:
@seismicisolation
@seismicisolation
in the Q-network.
Benefits Challenges
Scalability: Handles large state and action Complexity: Designing and training deep neural
spaces effectively due to function networks requires more expertise and computational
approximation with neural networks. resources.
Improved performance: Can achieve higher Exploration vs. exploitation: Balancing exploration and
performance on complex tasks compared to exploitation remains crucial for optimal learning.
traditional Q-learning.
Sample efficiency: Learns effectively from a Hyperparameter tuning: Tuning hyperparameters of
smaller amount of data due to experience the neural network and learning algorithm is crucial for
replay. achieving good performance.
1. Policy (π): Represents the probability distribution over possible actions the
agent can take in a given state.
2. State (s): The current situation or configuration the agent is in.
3. Action (a): The possible choices the agent can make in a given state.
4. Reward (r): The feedback signal the environment provides after the agent
takes an action.
They have been successfully applied to a wide range of RL tasks including robotics,
game playing, and NLP. Examples of policy gradient methods include REINFORCE,
actor-critic methods, and proximal policy optimization (PPO). The policy gradient
methods offer a powerful approach to RL, enabling agents to learn effective
policies for complex tasks. However, addressing the challenges of variance, sample
@seismicisolation
@seismicisolation
efficiency, and hyperparameter tuning is critical for successful application. Here
are some steps that show how policy gradient method works:
T
J(θ)= Eπθ [∑ γt rt ]
t=0
where rt is the reward received at time step t, T is the time horizon, and γ is
the discount factor that determines the importance of future rewards.
Step 3. Gradient ascent: Policy gradient methods use gradient ascent to
update the policy parameters θ in the direction of the gradient of the
objective function J(θ). The gradient of J(θ) with respect to the policy
parameters is given by
T
∇θ J (θ)= Eπθ [∑ ∇θ logπθ(at ∨ st ). Gt ]
t=0
where Gt is the return from time step t onward, also known as the return or
advantage.
Step 4. Policy update: The policy parameters are updated using stochastic
gradient ascent:
θ ← θ + α∇θ J (θ)
Challenges vs. benefits of policy gradient method is given in the table given below:
@seismicisolation
@seismicisolation
Benefit Challenges
Direct policy optimization: They directly optimize High variance: Estimating the policy gradient can be
the policy, which can be more efficient than noisy and lead to unstable learning, requiring careful
learning state-action values, especially in large implementation and techniques like variance
state spaces. reduction.
Policy interpretability: In some cases, the Sample efficiency: They can be sample-inefficient,
learned policy can be interpreted, providing meaning they may require a large amount of data to
insights into the agent’s decision-making process. learn effectively.
Versatility: They can be applied to various tasks Hyperparameter tuning: Tuning hyperparameters
and environments including continuous action of the learning algorithm and policy network is
spaces. crucial for achieving good performance.
1. REINFORCE: This is a simple policy gradient method that directly uses the
product of the reward and the gradient of the log-probability of the chosen
action to update the policy.
2. Actor-critic methods: These methods combine an actor (policy network)
that takes actions and a critic (value network) that estimates the value of the
current state. The critic’s feedback is then used to improve the actor’s policy.
3. Proximal policy optimization: This advanced method addresses issues like
the policy gradient vanishing problem and ensures that the updated policy
remains close to the original one, leading to more stable learning.
import gym
env = gym.make('CartPole-v1')
env.observation_space
env.action_space
import numpy as np
class LogisticPolicy:
def __init__(self, θ, α, γ):
# Initialize paramters θ, learning rate α and discount factor γ
self.θ = θ
self.α = α
self.γ = γ
def logistic(self, y):
# definition of logistic function
return 1 / (1 + np.exp(-y))
def probs(self, x):
@seismicisolation
@seismicisolation
# returns probabilities of two actions
y = x @ self.θ
prob0 = self.logistic(y)
return np.array([prob0, 1 - prob0])
def act(self, x):
# sample an action in proportion to probabilities
probs = self.probs(x)
action = np.random.choice([0, 1], p=probs)
return action, probs[action]
def grad_log_p(self, x):
# calculate grad-log-probs
y = x @ self.θ
grad_log_p0 = x - x ✶ self.logistic(y)
grad_log_p1 = - x ✶ self.logistic(y)
return grad_log_p0, grad_log_p1
def grad_log_p_dot_rewards(self, grad_log_p, actions, discounted_rewards):
# dot grads with future rewards for each action in episode
return grad_log_p.T @ discounted_rewards
def discount_rewards(self, rewards):
# calculate temporally adjusted, discounted rewards
discounted_rewards = np.zeros(len(rewards))
cumulative_rewards = 0
for i in reversed(range(0, len(rewards))):
cumulative_rewards = cumulative_rewards ✶ self.γ + rewards[i]
discounted_rewards[i] = cumulative_rewards
return discounted_rewards
def update(self, rewards, obs, actions):
# calculate gradients for each action over all observations
grad_log_p = np.array([self.grad_log_p(ob)[action] for ob, action in zip
assert grad_log_p.shape == (len(obs), 4)
# calculate temporaly adjusted, discounted rewards
discounted_rewards = self.discount_rewards(rewards)
# gradients times rewards
dot = self.grad_log_p_dot_rewards(grad_log_p, actions, discounted_rewards)
# gradient ascent on parameters
self.θ += self.α ✶ dot
def run_episode(env, policy, render=False):
observation = env.reset()
totalreward = 0
@seismicisolation
@seismicisolation
observations = []
actions = []
rewards = []
probs = []
done = False
while not done:
if render:
env.render()
observations.append(observation)
action, prob = policy.act(observation)
observation, reward, done, info = env.step(action)
totalreward += reward
rewards.append(reward)
actions.append(action)
probs.append(prob)
return totalreward, np.array(rewards), np.array(observations), np.array(actions
def train(θ, α, γ, Policy, MAX_EPISODES=1000, seed=None, evaluate=False):
# initialize environment and policy
env = gym.make('CartPole-v0')
if seed is not None:
env.seed(seed)
episode_rewards = []
policy = Policy(θ, α, γ)
@seismicisolation
@seismicisolation
#from gym.wrappers.monitoring.video_recorder import VideoRecorder from gym.wrappers.mon
# for reproducibility
GLOBAL_SEED = 0
np.random.seed(GLOBAL_SEED)
episode_rewards, policy = train(θ=np.random.rand(4),
α=0.002,
γ=0.99,
Policy=LogisticPolicy,
MAX_EPISODES=2000,
seed=GLOBAL_SEED,
evaluate=True)
import matplotlib.pyplot as plt
plt.plot(episode_rewards)
Summary
Natural language processing (NLP) is a subfield of artificial intelligence (AI)
concerned with enabling computers to understand and process human language.
It aims to bridge the gap between human communication and machine
understanding, allowing computers to interact with us more naturally and perform
tasks involving human language. Its primary goal is to enable computers to
understand, interpret, and generate human language in a way that is both
meaningful and contextually relevant. Tokenization and word embeddings are
fundamental concepts in NLP that work together to enable machines to
understand and process human language. Breaking down text into smaller units
such as words or sentences is the process of tokenization. In NLP, sequence
modeling plays a crucial role in tasks that involve analyzing and processing
sequential data such as text, speech, and even protein sequences in
bioinformatics. Unlike traditional machine learning methods that treat data points
as independent, sequence models consider the order and context of elements
within the sequence. ARIMA models are a class of statistical models commonly
used for time series analysis and forecasting. They are particularly useful when
dealing with data that exhibits nonstationarity, meaning the statistical properties
of the data (such as mean and variance) change over time. They are particularly
useful when the data exhibits stationarity, meaning the statistical properties
(mean, variance, and autocorrelation) remain constant over time. Prophet is an
open-source forecasting tool developed by Facebook’s Core Data Science team. It
is designed to handle time series data with strong seasonal effects and multiple
seasonality. Prophet uses an additive model where different components of the
time series (trend, seasonality, and holiday effects) are modeled separately and
@seismicisolation
@seismicisolation
combined to make predictions. Both Prophet and neural networks are powerful
tools for time series forecasting, but they differ in their approach, strengths, and
weaknesses:
Collaborative filtering (CF) is a technique used in recommender systems to
predict the preferences of a user based on the preferences of similar users or
items. It’s a powerful tool for personalizing recommendations across various
domains like suggesting products, movies, music, or even news articles. The
underlying idea is to leverage the collective wisdom of a group of users to infer
preferences for individual users.
Exercise (MCQs)
1.
C) Text summarization
D) Sentiment analysis
2.
What is the process of converting words into their base form called?
A) Tokenization
B) Lemmatization
C) Stemming
D) Normalization
3.
C) Algorithm
@seismicisolation
@seismicisolation
D) Language
4.
C) Cyclicity
D) Random error
7.
What type of model uses past observations to predict future values without
explicitly identifying trends or seasonality?
A) ARIMA model
B) Exponential smoothing model
@seismicisolation
@seismicisolation
C) Linear regression model
The mean squared error (MSE) is a commonly used metric to evaluate the
performance of a time series forecast. A lower MSE indicates:
A) A higher deviation between predicted and actual values.
B) A better fit between predicted and actual values.
C)
No relationship between predicted and actual values.
D) The forecast is always accurate.
9.
@seismicisolation
@seismicisolation
C) Hybrid recommender system
D) Demographic filtering
12.
C) Both a and b
D) Neither a nor b
14.
@seismicisolation
@seismicisolation
C) Discovery of new items and products
Answers Key
1. b) Image recognition
2. c) Stemming
3. b) The
4. c) Represent text as a frequency distribution of words
5. c) Recurrent neural network (RNN)
6. c) Cyclicity (not a universal component, only present in some time series)
7. d) Naïve forecast model (assumes future values are equal to the last
observed value)
8. b) A better fit between predicted and actual values (lower error means
predictions are closer to actual values)
9. d) Predicting daily website traffic with a significant weekly pattern (ARIMA
models can capture seasonality)
10. d) Overfitting the model to the training data (a challenge in all machine
learning tasks, not specific to time series forecasting)
11. d) Demographic filtering (not a common type)
12. c) Similarities between users based on their past interactions
13. c) Both a and b (cold start affects both new users and new items)
14. b) Collaborative filtering only (used to reduce dimensionality in user-item
matrix)
15. d) Eliminating the need for human intervention (not a complete
replacement, humans still play a role in system design and optimization)
@seismicisolation
@seismicisolation
environment, states, actions, and rewards. How do these components
interact with each other in the learning process?
5. Compare and contrast two different reinforcement learning algorithms such
as Q-learning and DQN. Discuss their strengths and weaknesses in different
scenarios.
6. Reinforcement learning is often used in situations where the environment is
partially observable or dynamic. How do RL algorithms handle these
complexities? What are some challenges and potential solutions for learning
in such environments?
7. Explain the steps involved in the time-series forecasting process, starting
with data collection and preprocessing to model selection, evaluation, and
interpretation. What are some important considerations and challenges at
each step?
Answers
1. word embeddings
2. breaking down
3. Sentence tokenization
4. GloVe
5. RNNs
References
[1] Raschka, S. & Mirjalili, V. (2019). Python Machine Learning: Machine Learning
@seismicisolation
@seismicisolation
and Deep Learning with Python, Scikit-learn, and TensorFlow 2 (3rd ed.). Packt
Publishing. →
[2] Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd
ed.). O’Reilly Media. a, b
[3] Barua, T., Doshi, R. & Hiran, K.K. (2020). Mobile Applications Development: With
Python in Kivy Framework Walter de Gruyter GmbH & Co KG. →
[4] Müller, A.C. & Guido, S. (2016). Introduction to Machine Learning with Python: A
Guide for Data Scientists O’Reilly Media. →
[5] Testas, A. (2023, November 27). Distributed Machine Learning with PySpark
Apress. →
[6] Brownlee, J. (2016). Machine Learning Mastery with Python: Understand Your
Data, Create Accurate Models, and Work Projects End-To-End Machine Learning
Mastery. →
[7] Hiran, K.K., Doshi, R., Kant, K., Ruchi, H. & Lecturer, D.S. (2013). Robust & Secure
Digital Image Watermarking Technique Using Concatenation Process Cloud
Computing View Project Digital Image Processing View Project Robust & Secure
Digital Image Watermarking Technique Using Concatenation Process.
International Journal of ICT and Management, 117,
→https://www.researchgate.net/publication/320404232 →
[8] Harrington, P. (2012). Machine Learning in Action Manning Publications. →
[9] Patel, R. (2018). Python Deep Learning: Next Generation Techniques to
Revolutionize Computer Vision, AI, and Deep Learning Packt Publishing. →
[10] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn:
Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–
2830. →
[11] Aggarwal, C.C. (2018). Python for Data Science: A Guide to Successful Python
Tools for Data Science Springer. →
[12] Jain, R.K. (2023, April 10). A Survey on Different Approach Used for Sign
Language Recognition Using Machine Learning. Asian Journal of Computer Science
and Technology, 12(1), 11–15. →https://doi.org/10.51983/ajcst-2023.12.1.3554 →
[13] Vasques, X. (2024, March 6). Machine Learning Theory and Applications John
Wiley & Sons. →
[14] Wireko, J.K., Hiran, K.K. & Doshi, R. (2018). Culturally Based User Resistance to
@seismicisolation
@seismicisolation
New Technologies in the Age of IoT in Developing Countries: Perspectives from
Ethiopia. International Journal of Emerging Technology and Advanced Engineering,
8(4), 96–105. →
[15] Heaton, J. (2018). Introduction to Deep Learning Using Python: A Guide for
Data Scientists CreateSpace Independent Publishing Platform. →
[16] Jain, R.K., Kant Hiran, K., Maheshwari, R. & Vaishali,. (2023, April 20). Lung
Cancer Detection Using Machine Learning Algorithms. 2023 International
Conference on Computational Intelligence, Communication Technology and
Networking (CICTN), →https://doi.org/10.1109/cictn57981.2023.10141467 →
[17] VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for
Working with Data O’Reilly Media. →
[18] LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep Learning. Nature, 521(7553),
436–444. →
[19] Jain, R.K. & Agarwal, V. (2023, May 24). Comparative Analysis of Visual
Positioning Techniques for Indoor Navigation Systems. Asian Journal of
Engineering and Applied Technology, 12(1), 18–22.
→https://doi.org/10.51983/ajeat-2023.12.1.3596 →
[20] Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and
Trends® in Machine Learning, 2(1), 1–127. →
[21] Kingma, D.P. & Ba, J. (2014). Adam: A Method for Stochastic Optimization.
arXiv preprint arXiv:1412.6980. →
[22] Dadhich, M., Hiran, K.K. & Rao, S.S. (2021). Teaching–learning perception
toward blended E-learning portals during pandemic lockdown. In Sharma, T., Ahn,
Chang, Verma, O, Panigrahi, B, Soft Computing: Theories and Applications:
Proceedings of SoCTA 2020, Volume Vol. 2 (pp. 119–129). Springer Singapore,
Singapore. →
[23] Hiran, K.K., Henten, A., Shrivas, M.K. & Doshi, R. (2018, August). Hybrid
educloud model in higher education: The case of Sub-Saharan Africa, Ethiopia. In
Quist-Aphetsi K., Kuada E., 2018 IEEE 7th International Conference on Adaptive
Science & Technology (ICAST) (pp. 1–9). IEEE Ghana Section, IEEE. →
[24] Mijwil, M.M., Aggarwal, K., Doshi, R., Hiran, K.K. & Sundaravadivazhagan, B.
(2022). Deep Learning Techniques for COVID-19 Detection Based on Chest X-ray
and CT-scan Images: A Short Review and Future Perspective. Asian Journal of
Applied Sciences. 24: Volume 10, Issue 3 (Page. 224-231). →
[25] Dadhich, M., Hiran, K.K., Rao, S.S. & Sharma, R. (2022). Factors Influencing
Patient Adoption of the IoT for E-health Management Systems (E-hms) Using the
UTAUT Model: A High Order SEM-ANN Approach. International Journal of Ambient
@seismicisolation
@seismicisolation
Computing and Intelligence (IJACI), 13(1), 1–18. →
[26] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R.
(2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting.
Journal of Machine Learning Research, 15(1), 1929–1958. →
[27] Hiran, K.K., Khazanchi, D., Vyas, A.K. & Padmanaban, S. (2021). Machine
Learning for Sustainable Development. Machine Learning for Sustainable
Development, →https://doi.org/10.1515/9783110702514 →
[28] Saini, H.K., Jain, K.L., Hiran, K.K. & Bhati, A. (2021). Paradigms to Make Smart
City Using Blockchain. Blockchain 3.0 For Sustainable Development. 10, p.21. →
[29] Tyagi, S.K.S., Mukherjee, A., Pokhrel, S.R. & Hiran, K. (2020a). An Intelligent and
Optimal Resource Allocation Approach in Sensor Networks for Smart Agri-IoT.
Smart Agri-IoT. I E E E Sensors Journal, 21(16), 17439–17446.
→https://doi.org/10.1109/JSEN.2020.3020889 →
[30] Jain, R.K. & Rathi, S.K. (2021). A review paper on sign language recognition
using machine learning techniques. In Emerging Trends in Data Driven Computing
and Communications (Ed. Mathur, R. et al.) Springer. Page No. 91–98. →
[31] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S.,
Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard,
M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M. & Zheng, X. (2016). TensorFlow:
Large-scale Machine Learning on Heterogeneous Systems. Software available from
tensorflow.org. →
[32] Zeiler, M.D. & Fergus, R. (2014). Visualizing and understanding convolutional
networks. In Fleet, D, Pajdla, Schiele, B, Tuytelaars, T European Conference on
Computer Vision (pp. 818–833). Springer, Cham. →
[33] Prajapati, R.K., Bhardwaj, Y., Jain, R.K. & Kamal Kant Hiran, D. (2023, April 20). A
review paper on automatic number plate recognition using machine learning : An
in-depth analysis of machine learning techniques in automatic number plate
recognition: Opportunities and limitations. In 2023 International Conference on
Computational Intelligence, Communication Technology and Networking (CICTN).
→https://doi.org/10.1109/cictn57981.2023.10141318 →
[34] Wong, K.K.L. (2023, October 31). Cybernetical Intelligence John Wiley & Sons.
→
[35] Jain, R. & Hiran, K.K. (2024). BIONET. Advances in Systems Analysis, Software
Engineering, and High Performance Computing Book Series.
→https://doi.org/10.4018/979-8-3693-1131-8.ch004 →
[36] Simonyan, K. & Zisserman, A. (2014). Very Deep Convolutional Networks for
Large-scale Image Recognition. arXiv preprint arXiv:1409.1556. →
@seismicisolation
@seismicisolation
[37] Acharya, S., Jain, U., Kumar, R., Prajapat, S., Suthar, S. & Jain, R.K. JARVIS: A
Virtual Assistant for Smart Communication. ijaem, 3(6), pp. 460–465. →
[38] Pajankar, A., & Joshi, A. (2022). Hands-on Machine Learning with Python:
Implement Neural Network Solutions with Scikit-learn and PyTorch. apress. →
[39] Hochreiter, S. & Schmidhuber, J. (1997). Long Short-term Memory. Neural
Computation, 9(8), 1735–1780. →
[40] Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. (2016). Deep Learning
(Adaptive Computation and Machine Learning Series) The MIT Press. →
[41] Jain, R.K., Hiran, K. & Paliwal, G. (2012). Quantum Cryptography: A New
Generation Of Information Security System. Proceedings of International Journal of
Computers and Distributed Systems, ISSN, 2278–5183, 2(1), Page No. 42–45. →
[42] Kuhn, M. & Johnson, K. (2013). Applied Predictive Modeling Springer. →
[43] Marsland, S. (2015). Machine Learning: An Algorithmic Perspective Chapman
and Hall/CRC. →
[44] McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and IPython (2nd ed.). O’Reilly Media. →
[45] Hossain, E. (2023, December 26). Machine Learning Crash Course for
Engineers Springer Nature. →
[46] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S. & Fei-Fei, L.
(2015). ImageNet Large Scale Visual Recognition Challenge. International Journal
of Computer Vision, 115(3), 211–252. →
[47] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. & Rabinovich, A.
(2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition IEEE Xplore, USA (pp. 1–9). →
[48] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J. … Kudlur, M. (2016).
TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium
on Operating Systems Design and Implementation, USENIX ASSOCIATION, (OSDI
16) (pp. 265–283). →
[49] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. &
Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information
Processing Systems, 3(11), 2672–2680. →
[50] Oliphant, T.E. (2007). Python for Scientific Computing. Computing in Science &
Engineering, 9(3), 10–20. →
[51] Hiran, K.K. & Doshi, R. (2013). An Artificial Neural Network Approach for Brain
Tumor Detection Using Digital Image Segmentation. Brain, 2(5), 227–231. →
@seismicisolation
@seismicisolation
[52] Hiran, K.K. & Henten, A. (2020). An Integrated TOE–DoI Framework for Cloud
Computing Adoption in the Higher Education Sector: Case Study of Sub-Saharan
Africa, Ethiopia. International Journal of System Assurance Engineering and
Management, 11, 441–449. →
[53] Mahrishi, M., Hiran, K.K., Meena, G. & Sharma, P. (Eds.) (2020). Machine
Learning and Deep Learning in Real-time Applications IGI global. →
[54] Hiran, K.K., Jain, R.K., Lakhwani, K. & Doshi, R. (2021). Machine Learning:
Master Supervised and Unsupervised Learning Algorithms with Real Examples
(English Edition) BPB Publications. →
[55] Chollet, F. (2017). Deep Learning with Python Manning Publications. →
[56] Hiran, K.K., Jain, R.K., Lakhwani, K. & Doshi, R. (2021). Machine Learning:
Master Supervised and Unsupervised Learning Algorithms with Real Examples BPB
Publications. →
[57] Bisong, E. (2019, September 27). Building Machine Learning and Deep
Learning Models on Google Cloud Platform. →
[58] Khazanchi, D., Vyas, A.K., Hiran, K.K. & Padmanaban, S. (Eds.) (2021).
Blockchain 3.0 For Sustainable Development Vol. 10 Walter de Gruyter GmbH & Co
KG. →
[59] Ye, A. & Wang, Z. (2022, December 27). Modern Deep Learning for Tabular
Data Apress. →
[60] Lakhwani, K., Gianey, H.K., Wireko, J.K. & Hiran, K.K. (2020). Internet of Things
(IoT): Principles, Paradigms and Applications of IoT (English Edition). →
[61] Hiran, K.K., Doshi, R., Fagbola, T. & Mahrishi, M. (2019). Cloud Computing:
Master the Concepts, Architecture and Applications with Real-world Examples and
Case Studies Bpb Publications. →
[62] Hiran, K.K. & Doshi, R. (2013). Robust & Secure Digital Image Watermarking
Technique Using Concatenation Process. International Journal of ICT and
Management. Vol- I, Issue - 2, Page no. 117–121. →
[63] Müller, A.C. (2016). Advanced Machine Learning with Python: Explore the Most
Sophisticated Algorithms and Techniques for Building Intelligent Systems Packt
Publishing. →
[64] Acharya, R. (2019). Python Data Science Cookbook: Discover the Latest Python
Tools and Techniques to Help You Tackle the World of Data Acquisition and
Analysis Packt Publishing. →
[65] Priyadarshi, N., Padmanaban, S., Hiran, K.K., Holm-Nielson, J.B. & Bansal, R.C.
(Eds.) (2021). Artificial Intelligence and Internet of Things for Renewable Energy
@seismicisolation
@seismicisolation
Systems Vol. 12 Walter de Gruyter GmbH & Co KG. →
[66] Doshi, R. & Hiran, K.K. (2024). Explainable artificial intelligence as a
cybersecurity aid. In Ghonge, M. M., Pradeep, N., Jhanjhi, N., & Kulkarni, P.
Advances in Explainable AI Applications for Smart Cities (pp. 98–113). IGI Global. →
[67] Lakhwani, K., Bhargava, S., Somwanshi, D., Doshi, R. & Hiran, K.K. (2020,
December). An enhanced approach to infer potential host of coronavirus by
analyzing its spike genes using multilayer artificial neural network. In 2020 5th IEEE
International Conference on Recent Advances and Innovations in Engineering
(ICRAIE) (pp. 1–5). IEEE Delhi Section, India. →
[68] Hardas, M., Mathur, S., Dadhich, M., Bhaskar, A. & Hiran, K.K. (2023, August).
Multi-class classification of retinal fundus images in diabetic retinopathy using
probabilistic neural network. In 2023 International Conference on Emerging
Trends in Networks and Computer Communications (ETNCC) (pp. 275–282). IEEE
South Africa Section. →
[69] Narkhede, N., Mathur, S., Bhaskar, A.A., Hiran, K.K., Dadhich, M. & Kalla, M.
(2023, August). A new methodical perspective for classification and recognition of
music genre using machine learning classifiers. In 2023 International Conference
on Emerging Trends in Networks and Computer Communications (ETNCC) (pp. 94–
99). IEEE South Africa Section. →
[70] Mijwil, M.M., Hiran, K.K., Doshi, R., Dadhich, M., Al-Mistarehi, A.H. & Bala, I.
(2023). ChatGPT and the Future of Academic Integrity in the Artificial Intelligence
Era: A New Frontier. Al-Salam. Journal for Engineering and Technology, 2(2), 116–
127. →
[71] Prajapati, R.K., Bhardwaj, Y., Jain, R.K. & Hiran, K.K. (2023, April). A review paper
on automatic number plate recognition using machine learning: An in-depth
analysis of machine learning techniques in automatic number plate recognition:
Opportunities and limitations. In 2023 International Conference on Computational
Intelligence, Communication Technology and Networking (CICTN) (pp. 527–532).
IEEE UP section, India. →
@seismicisolation
@seismicisolation
Index
@seismicisolation
@seismicisolation