Production Engineering From DevOps To MLOps
Production Engineering From DevOps To MLOps
Abstract
This book takes a DevOps approach to MLOps and uniquely posi-
tions how MLOps is an extension of well-established DevOps principles
using real-world use cases. It leverages multiple DevOps concepts and
methodologies such as CI/CD and software testing. It also demonstrates
the additional concepts from MLOps such as continuous training that
expands CI/CD/CT to build, operationalize and monitor ML models.
Contents
Production Engineering from DevOps to MLOps 4
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Supporting this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1
Building software with build tools . . . . . . . . . . . . . . . . . . . . 34
Building software using continuous integration . . . . . . . . . . . . . 35
How does version control software contribute to DevOps? . . . . . . . 38
Change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Building releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2
Microservices and Docker containers . . . . . . . . . . . . . . . . . . . 90
Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11 - Testing ML Models 98
Functional testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Integration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Acceptance testing and system testing . . . . . . . . . . . . . . . . . . 101
Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Load Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . 102
Canary testing and A/B testing . . . . . . . . . . . . . . . . . . . . . . 102
Multi-Armed Bandit Testing (MAB) added by Arnab . . . . . . . . . 104
Shadow Mode Testing - Champion-Challenger Paradigm added by Arnab105
User Interface testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3
Production Engineering from DevOps to MLOps
An open source book written by Sebastien Donadio and Arnab Bose. Published
under a Creative Commons license and free to read online here. All code licensed
under an MIT license. Contributions are most welcome.
This book is also available on Leanpub (PDF, EPUB).
[ ]
Overview
This book takes a DevOps approach to MLOps and uniquely positions how
MLOps is an extension of well-established DevOps principles using real-world
use cases. It leverages multiple DevOps concepts and methodologies such as
CI/CD and software testing. It also demonstrates the additional concepts from
4
MLOps such as continuous training that expands CI/CD/CT to build, opera-
tionalize and monitor ML models.
• Beginner-friendly yet comprehensive. From the very basics of pro-
gramming up to front-end and back-end web development, a lot of topics
are covered in a simple and approachable way. No prior knowledge needed!
• Easy to follow. Code along directly in your browser or build an efficient
JavaScript development environment on your local machine.
5
Bull, and as an IT Credit Risk Manager with Société Générale in France. Se-
bastien has taught various computer science and financial engineering courses
over the past fifteen years at a variety of academic institutions, including the
University of Versailles, Columbia University’s Fu Foundation School of Engi-
neering and Applied Science, University of Chicago and NYU Tandon School of
Engineering. Courses taught include: Computer Architecture, Parallel Architec-
ture, Operating Systems, Machine Learning, Advanced Programming, Real-time
Smart Systems, Computing for Finance in Python, and Advanced Computing
for Finance. Sebastien holds a Ph.D. in High Performance Computing Opti-
mization, an MBA in Finance and Management, and an MSc in Analytics from
the University of Chicago. His main passion is technology, but he is also a scuba
diving instructor and an experienced rock-climber.
Book structure
This book takes a DevOps approach to MLOps and uniquely positions how
MLOps is an extension of well-established DevOps principles using real-world
use cases. It leverages multiple DevOps concepts and methodologies such as
CI/CD and software testing. It also demonstrates the additional concepts from
MLOps such as continuous training that expands CI/CD/CT to build, opera-
tionalize and monitor ML models.
We lead the readers to build a full DevOps/ML infrastructure by using a collec-
tion of real-world case studies. It goes into the details of the principles starting
from DevOps to the domain of MLOps which focuses on operationalizing and
monitoring ML models.
6
Chapter 2: Understanding clouding for devops , introduces how we can use the
cloud in the DevOps approach.
Chapter 3: Building software by understanding the whole toolchain , covers how
to build software and how to enrich it with libraries.
Chapter 4: Introducing MLOps, gets an insight into the challenges in opera-
tionalizing Machine Learning models.
Chapter 5: Preparing the Data , gives insight into the importance of data in
Machine Learning models.
Chapter 6: Using a Feature Store , explains how a feature store promotes
reusability to develop quick robust Machine Learning models.
Chapter 7: Building ML models , describes how to manage a Machine Learning
model.
Chapter 8 Understanding ML Pipelines , depicts the importance of data feed-
back loops.
Chapter 9. Interpreting and Explaining ML models , discusses how to dissect
an ML model from the perspectives of explainability and interpretability.
Chapter 10 Building Containers and Managing Orchestration , presents the use
of containerized applications.
Chapter 11 Testing ML Models , describes how testing improves software quality;
it is a critical part of the DevOps/MLOps process.
Chapter 12 Monitoring ML Models , illustrates what are the different ways that
an ML model can underperform in production.
Chapter 13 Evaluating Fairness , explain what bias and fairness are in ML
models.
Chapter 14 Exploring Antifragility and ML Model Environmental Impact , com-
prehends antifragility and how it can be used to make your ML model robust
and the environment impact of your ML model.
7
To start this chapter, we will define what DevOps is. ## Defining DevOps
In this section, we will describe in depth what DevOps is. We will talk about
where it started and what problems it solves. We will review how much impact
it has on the company structure and its benefits.
8
safeguards risks causing the system to become unstable, which is against their
mission statement.
The answer to this conundrum is DevOps, which unifies all parties involved
in software development and deployment—business users, developers, test engi-
neers, security engineers, and system administrators into a single, highly auto-
mated workflow to deliver high-quality software quickly while maintaining the
integrity and stability of the entire system.
The DevOps solution is to: - determine the guiding principles, expectations,
and priorities. - work together to solve problems inside and between teams. -
make time for more complex work, and automate routine and repetitive tasks.
- measure every item put into production and incorporate comments into your
work. - promote a more successful culture of working effectively together across
varied abilities and specialized knowledge, and share the facts with everyone
concerned.
9
• Operations engage developers; which enhances system stability. More
frequent deliveries inject less unpredictability into the system, reducing
the probability of deadly failure. Even better, rather than being released
at odd hours or on the weekends, these more restricted releases may be
done throughout the day, when everyone is at work and ready to handle
issues.
• Test engineers create a test environment with automated provisioning that
is almost exactly like the production environment, which leads to more
accurate testing and an improved ability to forecast the performance of
future releases. Like other groups, test engineers benefit from teamwork
and automation to boost productivity.
• Product managers get faster feedback. They can adapt software features
faster to their clients’ needs.
• Business owners and executives see DevOps enables the company to pro-
duce high-quality goods considerably more quickly than rivals that rely on
conventional software development techniques, actions that boost revenue
and enhance the brand value. High-quality developers, system administra-
tors, and test engineers want to work in the most advanced, productive
environment available, which is another factor in the capacity to draw in
and keep top personnel. Finally, senior executives spend less time inter-
vening in inter-departmental disagreements when developers, operations,
and QA collaborate, giving them more time to create the specific business
goals that everyone is now working together to achieve.
10
Figure 1.1: The DevOps workflow
11
Figure 1.2: Software and operating system
Figure 1.2 represents the link between software, operating system, and hardware.
An operating system performs three primary tasks: managing the computer’s
resources, such as processor, memory, and storage; creating a user interface; and
running and supporting application software.
Operating System (OS) has several main functionalities: - interface for sharing
hardware resources
OS manages hardware resources by providing the software layer to control them.
Displaying a character on the monitor after the user uses the keyboard, control-
ling the movement of a mouse, and storing a program in memory are examples
of resource handling that an OS can perform. - process scheduling
When using a computer, we need to have many software running in parallel.
Indeed, when we use our favorite browser, we may need to see the time or play
music in the background. To do so, the OS will task these different processes
to run on the underlying hardware. - memory management
Hardware has a different level of memory hierarchy. Processors perform calcula-
tions by using operands stored in registers. Registers are very limited and data
will be stored in memory. Memory is divided into different levels. The level
1 (named L1) is the one that is the closest to the processor. Therefore the L1
latency is much lower than all the other levels. - data access/storage
When we create Machine Learning models, we need to have a large amount of
data. Data are usually stored on a hard disk or disk. The OS is in charge of
organizing data storage by using files, and directories and setting a set of rules
on how to organize the storage unit. - communication
Lastly, the last function is to communicate with the outside world. If a computer
wants to communicate with another one, the OS will organize the input/output
12
of a given architecture.
The OS is critical when running any software. Figure 1.3 shows all the functions
that software will be able to use when running on an OS. If we want to run
software running a machine learning model. The data will be located on a hard
disk (File Management and Device Management). Data will be loaded into
memory (Memory Management) and will be processed (Processor Management).
If the data comes from any data streaming in real-time such as financial data,
the data comes through the network (Network, Communication Management).
13
We can see a Central Process Unit (aka processor or CPU) divided into cores.
These cores have registers and different memory hierarchies. They share the
same L3 cache which allows these cores to share data. Every core can access
memory and handle input/output with external devices such as network cards
or storage units.
It is critical to understand how a system works since when we design software
efficiency, reliability, and scalability will be a part of the requirement of our
clients. When we know how an OS works, what is the next step? Choosing an
operating system.
The next section will talk about the different types of operating systems available
on the market.
14
They are the main competitors of Windows on personal devices. macOS was
released in 2001 and iOS got released 6 years later. The main difference between
the two is one is built for computers and the other for mobiles. - Linux (Unix-
based system)
A family of open-source Unix-like operating systems known as Linux is based on
the Linux kernel, which Linus Torvalds initially made available in 1991. There
are many different Linux distributions (such as Red Hat, Ubuntu, Debian, Fe-
dora,…). Linux is far from being as used as the two previous OSs. Linux is
open source and easily configurable. It is the most used OS in the production
environments. - Android
Android was created in 2003 and bought by Google in 2005. Google decided to
make this OS open-source. Android is mainly used on mobile devices which are
not Apple devices.
Figure 1.5: Market share of OS for production environment
15
drivers for many devices and pieces of hardware - low maintenance cost: unlike
other OS, the maintenance cost is much lower
The Linux kernel is the most stable, and secure. Any software going to pro-
duction will have to run on this OS. It is important to learn the basics of how
to interact with the operating system. That’s why in this book, we chose to
introduce you to this system.
In the next section, we will review the basic commands of a Linux operating
system and we will talk learn how to automate tasks.
16
a flag. With the -h switch, we access the help pages for the majority of Linux
commands. Flags are frequently optional. The input we provide to a command
to enable appropriate operation is known as an argument or parameter. The
argument can be anything you write in the terminal, although it is typically a
file path. Flags can be called using hyphens (-) and double hyphens (–), and
the function will execute arguments in the order you pass them to it.
Let’s start with the following commands: - ls is the command to list the content
of a directory. - pwd returns the current directory path - cd changes the current
directory to another one. Many options will make you navigate through the
Linux file systems - rm deletes files and mv moves files - man displays the help
for a command line - touch modifies the access and modification time of a file
- chmod changes the access rights for files - sudo allows running a command as
admin (or root) - top or htop will list processes running on a machine - echo
prints a string on the screen (it is used to debug or monitor the steps of a
script) - cat reads the content of a file - ps return the list of all the processes
run by a shell - kill sends a message to a process. It is usually used to kill a
process - ping is a network command to test network connectivity with another
networking interface - vi or vim is a text editor - history list all the previous
command lines - which returns the full path of a software - less helps you to
inspect a file backward and forward - head or tail displays the first/last lines
of a file - grep matches lines from a regular expression (or a string) - whoami
displays the current user - wc displays the word counts of a file - find searchs
files in directories - wget downloads a file from a URL
The list above is not exhaustive but it will help you to bootstrap your work for
scripting and automating some tasks.
17
bash script1.sh
or by changing the access right to the file:
chmod a+x script1.sh
./script1.sh
It will display the same string: bash "This book is useful"
If you would like to add more commands to this file, it is possible. We just need
to edit this file again and add more instructions to this file.
#!/bin/bash
Summary
In this chapter, we learned what DevOps is and we saw how operating system
works and how they can help in the DevOps process. We also learned how to
create scripts to automate tasks. In the next chapter, we will talk about the
cloud and we will learn how to use the cloud in the context of DevOps.
18
Virtualizing hardware for DevOps
Virtualization allows using a full machine capacity by splitting these resources
among many users and environments. Let’s consider two physical servers with
distinct services: a web server and a mail server. Only a small portion of each
server’s operational capacity is being utilized. Since these services have been
tested and run for many years, we do not want to interrupt them or attempt
an upgrade of this machine; which could potentially prevent these services from
running. Using virtualization will allow the gathering of the two services into
one physical server. They will run on two different virtualized environments
and leave the second server for other services.
The virtual environments—the things that use those resources—and the actual
resources are divided by software known as hypervisors. The majority of busi-
nesses virtualize using hypervisors, which may either be loaded directly into
hardware (like a server) or on top of an operating system, such as a laptop.
Your physical resources are divided up by hypervisors so that virtual environ-
ments can utilize them.
Resources are divided between the various virtual worlds and the actual environ-
ment. Users engage with the virtual world and do calculations there (typically
called a guest machine or virtual machine). A single data file serves as the vir-
tual machine. And like any digital file, it may be transferred between computers,
opened on either one, and expected to function as intended.
The hypervisor transfers the request to the actual system and caches the modifi-
cations when a user or program sends an instruction in the virtual environment
that requires more resources from the physical environment; everything happens
at practically native speed (particularly if the request is sent through an open
source hypervisor based on KVM, the Kernel-based Virtual Machine).
There are different types of hypervisors as shown in Figure 2.1. The selection
of the type will depend on the business needs.
Type Schema
Data
19
Type Schema
Desktop
Server
OS
Network
20
livery and testing phases of the software development process. The DevOps
teams may test and develop using systems and devices that are comparable to
those used by end users thanks to virtualization. In this manner, testing and
development are expedited and take less time. The program may also be tested
in virtual live situations before deployment. Real-time testing is made easier
by the team’s ability to monitor the results of any modification made to the
program. The quantity of computer resources is decreased by doing these op-
erations in virtualized settings. The quality of the product is raised thanks to
this real-time testing. The time needed to retest and rebuild the program for
production is less when working in a virtual environment. As a result, virtu-
alization eliminates the DevOps team’s additional work while assuring quicker
and more dependable delivery.
Virtualization in DevOps has various benefits, some of which include: - The
amount of effort is less
There is no need to locally upgrade the virtualization-related hardware and
software because these changes are regularly made by the virtualization service
providers. A company’s IT team may concentrate on other crucial tasks, saving
the business money and time.
• Testing environment
We can create a local testing environment with virtualization. It is possible to
test software in this environment in several different ways. No data will be lost
even if a server fails. As a result, dependability is improved, and software may
be evaluated in this virtual environment before being deployed in real time.
• Energy-saving
Because virtual machines are used throughout the virtualization process rather
than local servers or software, it reduces power or energy consumption. This
energy is conserved, lowering the cost, and the money saved may be used for
other beneficial activities.
• Increasing hardware efficiency
The demand for physical systems is reduced by virtualization. As a result, power
consumption and maintenance expenses are decreased. Memory and CPU use
is better utilized.
Virtualization implementation difficulties
Virtualization in DevOps has many benefits, but it also has certain drawbacks
or restrictions.
• time commitment
Even if less time is spent on development and testing, it still takes a lot of time
to set up and use because of this.
• security hazard
21
The virtualization procedure increases the risk of a data breach since remote
access and virtualizing desktops or programs are not particularly secure options.
• understanding of infrastructure
The IT team has to have experience with virtualization to deal with it. There-
fore, if a business wants to start using virtualization and DevOps, either the
current staff may be trained or new employees are needed. It takes a lot of time
and is expensive.
In this section, we saw the advantages of virtualization for DevOps. We are now
going to talk about the cloud for DevOps.
22
resources. Not only it is difficult to get hardware, but it is also complicated to
find technologists to manage hardware and operating systems.
There are different ways of using a cloud: - private cloud: - community cloud: -
public cloud - hybrid cloud
These different types of cloud go from the in-house hardware which could be con-
sidered like a private network to the public cloud where the computer resource
is owned by corporations or institutions.
23
• Infrastructure as a Service (IaaS)
IaaS is a model for cloud computing services. It provides users with internet
access to computer resources in the “cloud,” a virtualized environment. It pro-
vides bandwidth, load balancers, IP addresses, network connections, virtual
server space, and other computer infrastructure. The assortment of comput-
ers and networks that make up the pool of hardware resources are frequently
scattered throughout several data centers. This increases the redundancy and
dependability of IaaS. The computing solution known as IaaS (Infrastructure
as a Service) is complete. One alternative for small businesses looking to cut
expenditures on their IT infrastructure is IaaS. Every year, expensive expenses
are incurred for upkeep and the acquisition of new parts such as hard drives,
network connections, and external storage devices.
Hypervisor
A piece of hardware, firmware, or software known as a “hypervisor” enables
the setup and use of virtual machines on computers (VM). Each virtual system
is known as a guest machine, and a host machine is a computer on which a
hypervisor runs one or more virtual machines. The hypervisor treats resources
like CPU, memory, and storage as a pool that may be easily shared between
existing guests or new virtual machines. The independence of the guest VMs
from the host hardware allows hypervisors to maximize the utilization of a
system’s resources and increase IT mobility. Since it makes it possible for them
to be quickly moved between many computers, the hypervisor is sometimes
referred to as a virtualization layer. Multiple virtual machines can run on a
single physical server.
Figure 2.2 represents two types of hypervisors: - Native or bare-metal type-1
hypervisors
To manage guest operating systems and control hardware, these hypervisors
work directly on the host’s hardware. They are thus occasionally referred to as
“bare-metal hypervisors.” The most common server-based setups for this type
of hypervisor are corporate data centers.
• Hypervisors of type 2 or hosted
These hypervisors utilize a common operating system, much like other computer
programs (OS). A guest operating system runs as a process on the host, while
type-2 hypervisors insulate guest operating systems from the host operating
system. A type 2 hypervisor should be used by individual users who want to
run several operating systems on a personal computer.
24
Figure 2.2: Type of hypervisors
Different hypervisors will be offered in various geographies and availability zones
by cloud service providers. Figure 2.3 shows how public clouds organized their
regions.
25
Creating a clouded solution
The region in which the program will operate must first be selected. If your
instance doesn’t need to be close to another, you can choose any region you
prefer. We would recommend that if you are using your instance in the US to
build an instance in your US region. We will build the following components of
the system: - An Amazon EC2 instance is a virtual server on Amazon’s Elastic
Compute Cloud (EC2) that runs applications on the architecture of Amazon
Web Services (AWS). While EC2 is a service that enables corporate subscribers
to run application programs in a computing environment, AWS is a full and
dynamic cloud computing platform. It may be used to build virtually endless
amounts of virtual machines (VMs). A variety of instances with different CPU,
memory, storage, and networking resource options will need to be selected. -
Scalability, data accessibility, security, and speed are all features of Amazon
S3. For a variety of use scenarios, including data lakes, websites, mobile apps,
backup and restore, archives, business applications, IoT devices, and big data
analytics, Amazon S3 enables the storage and protection of any quantity of data.
To suit your unique commercial, organizational, and compliance demands, you
may optimize, manage, and configure data access using Amazon S3’s adminis-
trative tools.
Concretely the steps to create an instance will depend on when you will read
this book. We are going to give you the high-level steps to create your clouded
solution:
1. you need to create an AWS account. If you just want to try how AWS
works, we recommend getting a trial version. Once your account is created,
you need to log on to this account to start the next step.
2. you select your region and then the type of AWS component you want to
build. In this example, you need to select EC2.
3. When you start setting up your new EC2 instance, you are required to
choose the Amazon Machine Image (AMI). AWS provides many operating
systems. We would recommend using the same operating system you use
in the section Virtualizing hardware for DevOps.
4. you need to select the type of instance. This one will reserve hardware for
the instance you are creating. There are some tiers that are smaller and
could be cheap to use
5. you must choose which Virtual Private Cloud (VPC) and which subnets
inside your VPC you wish to create your instance. Prior to starting the
instance, it is preferable to decide and arrange this. A VPC is a sub-
cloud within your cloud tenancy. For instance, if you need to create many
independent applications, you can choose to split your tenancy into several
VPCs (aka sub-clouds)
6. you have to choose a couple of other options which are less important.
Then you will be able to create an S3 bucket to have enough storage for
your application. The S3 bucket is not required if you don’t need more
space.
26
7. Once the EC2 instance is created, you will be guided to create a key to
use with an ssh terminal. We would recommend creating a new one and
using this key to connect to this instance
8. As the last step of the creation of these instances, it is advisable to set
the inbound communication. For instance, if you want to create a python
notebook running on this EC2 instance, you will need to open port 8888
to be reachable from outside.
In this section, we learned how to create a computing instance in the cloud.
Because more and more DevOps/MLOps tools are clouded, getting familiar
with the cloud functions is critical. We would recommend reading some online
tutorials on cloud providers, they all have an exhaustive list of tutorials that
can help you to create examples to learn how to use the cloud. We will now
summarize what we learned in this chapter.
Summary
In this chapter, we learned how to create a machine in the cloud, we also learned
how the cloud companies organize their regions and what a private cloud was.
The next chapter will focus on how to build software and how to use libraries.
27
this machine that represent the state of this machine (which is the decoding
algorithm). This disk rotates to decode the Enigma messages.
Figure 3.1: Representation of the state machine of Alan Turing from the movie
The Imitation Game
Turing Machine
English mathematician Alan Turing developed an abstract computing system
that could compute real numbers with an unlimited amount of memory and a
limited number of configurations or states in a 1937 publication. This computer
system, which came to be known as the Turing machine, is regarded as one of
the primary concepts in theoretical computer science. In turn, Alan Turning is
frequently referred to as the inventor of the modern computer and the father of
computer science.
State Machine
There were about 100 revolving drums in this electro-mechanical device. When
a character from an encoded message was fed into the device, it set off a series
of events in which the cylinders rotated and changed states, with the state of
the next cylinder being dependent on the state of the preceding.
The processing units of all computer devices are powered by current solid-state
systems, which are comparable to this process. A state machine essentially
comprises states and transitions. Consider a straightforward binary system,
where there are only two possible states: on and off. A state is essentially any
situation of a system that depends on earlier inputs and responds to later inputs.
The machine can compute or execute an algorithm thanks to these sequences,
which are conditioned by a finite set of states; these sequences represent the
machine’s program.
28
computers bearing his name. Von Neumann laid the groundwork for contem-
porary electronic stored-program computers by fusing a state machine with a
memory unit.
29
Harvard Architecture followed Von Neumann Architecture. It was founded on
the idea that there should be distinct buses for data and instruction traffic. It
was primarily created to get around Von Neumann Architecture’s bottleneck.
Unlike the Von Neuman Architecture, there is a distinct physical memory ad-
dress for instructions and data and instructions can be executed in 1 cycle.
Today’s computing devices such as desktop computers, laptops, and smart-
phones are all modeled on the von Harvard architecture. We could continue
the history of computers to reach the current architecture we are currently us-
ing but what we will summarize is that the architecture followed the same design
but got more complex by adding cores, and levels of memory cache. The goal
was to increase the parallelism to get a more performant machine. The point
is that it became way too complicated to talk to the machine directly. The
need for a translator between humans and machines was higher than ever: the
compiler was born and that’s what we are going to introduce in the next section.
30
Generating an executable with compilers
Software systems are collections of programs in different languages. A layer-
ing of languages, starting with assembly and increasingly moving away from
the native language of computers to high-level languages that are more easily
understood by humans, are spoken by computers in addition to their native
instruction sets. Humans may structure complicated systems using languages
better suited for such tasks, such as C, C++, Java, and Python. All abstrac-
tions require more labor and logic to function, and with this added logic, there
is frequently a trade-off between performance, complexity, and implementation
utilizing various software engineering methodologies.
What connects the hardware and software? All data within a CPU is expressed
in binary, which is the only language a CPU can understand. In essence, it is the
machine as seen by the programmer. The compiler enables us to overcome an
impedance mismatch between how people think about creating software systems
and how computers compute and store data. They are crucial for accelerating
program runtime, as well. Computer code created in one programming language
(the source language) is translated into another language by compilers (the
target language). They convert computer code between higher-level and lower-
level languages. They may produce machine code, assembly languages, object
code, and intermediate code. John Mauchly developed Short Code in 1949.
(aka. Brief Code). The ability to translate algebraic equations into boolean
expressions was a feature unique to this compiler. Then came COBOL, Lisp,
Fortran, and C. More compilers mean more languages. Compilers continued to
advance in intelligence by becoming better at enhancing the abstraction of what
developers wished to convey with effective hardware execution. To enhance
software engineering, new programming paradigms were introduced. Python
and Java opened up object-oriented programming to all users in the 1990s.
A compiler can be divided into three phases as represented in Figure 3.4:
• the front-end phase deals with the pre-processing of the code source. It
will validate syntax and grammar with the code source. This phase will
create an abstract syntax tree.
• the middle-end phase is in charge of receiving the abstract syntax tree
resulting from the parsing of the previous phase. Many passes will per-
form code optimizations. The code optimizations can optimize different
objective functions. The most important one is to optimize the runtime
code. Another example of objective functions could be optimizing energy
consumption.
• the back-end phase deals with the selection of the target, it can be a
target machine, a virtual machine, or just another code. This phase can
do register allocations and the instruction scheduling.
31
Figure 3.4: Compiler phases
Dennis M. Ritchie, an American computer scientist, created the C programming
language at Bell Laboratories in the early 1970s. Since its inception, hundreds
of C compilers have been developed for various operating systems and types of
architecture. Thirteen years later, the GNU Compiler debuted with the express
purpose of being an open cooperation. The structure below demonstrates how
this C compiler creates code that can be executed on an x86 Linux kernel.
32
the C language, the function main).
The linker part is critical in software engineering because that’s the part in
charge of getting external code (libraries). Any language can import libraries
from other developers for helping them code faster and have code that is more
reliable by using code that was tested before. C and C++ will be able to use
static libraries while Python will use dynamic libraries. We will now see the
difference between these two types of libraries.
33
Building software with build tools
A build system is a group of software tools that are used to streamline the build
process, which is roughly defined as the process of “translating” source code
files into executable binary code files. Even though numerous build systems
have been “developed” and put to use for more than three decades, most of
them still use the same fundamental methods that Make first used to introduce
the directed acyclic graph (DAG). The traditional GNU Make, CMake, QMake,
Ninja, Waf, and many others are still in use today. We will show you how to
create C projects using some of these well-liked build systems in this section.
Let’s review two build systems: - GNU Make: Using the instructions in the
makefile file, GNU Make creates projects. A makefile must be created to instruct
GNU Make on how to build a project.
This example creates an executable toto.exe by compiling the source toto.c.
# Makefile
CC = gcc
exe: toto.c
$(CC) toto.c -o toto.exe
• CMake: The adaptable, open-source CMake framework is used to handle
the build process, which is independent of the operating system and com-
piler. Unlike many cross-platform solutions, CMake is designed to be used
in conjunction with the native build environment. Simple configuration
files called CMakeLists.txt files are used to produce common build files
(such as Makefiles on Unix and projects/workspaces on Windows MSVC).
CMake can produce a native build environment that can assemble exe-
cutable binaries, compile source code, build libraries, produce wrappers,
and more. Multiple builds from the same source tree are consequently sup-
ported by CMake because it allows both in-place and out-of-place builds.
Both static and dynamic library builds are supported by CMake. CMake
also creates a cache file that is intended for use with a graphical editor,
which is a useful feature. For instance, CMake locates include files, li-
braries, and executables as it runs and may come across optional build
directives. This data is captured and stored in the cache, which the user
is free to edit before the native build files are generated. Source man-
agement is also made simpler by CMake scripts since they consolidate
the build script into a single file with a better-organized, understandable
structure.
This example creates an executable toto.exe by compiling the source toto.c.
cmake_minimum_required(VERSION 3.9.1)
project(CMakeToto)
add_executable(toto.exe toto.c)
We could give you more examples of different build tools but they will all have
34
the same principle. Now that we know how to build software, we would like to
know how to automate the build.
35
- deploying choices . A CI tool should make deployment straightforward. - alter-
natives for integration . CI tool can connect to other project-related software
and services. - security and safety . Whether it is open-source or commercial,
a good CI tool should not increase the risk of getting data compromised.
A CI tool must meet the goals of the project and company while also being
technically competent.
We can give a list of a few software but this list will be far from being exhaus-
tive: - Jenkins is one of the most well-liked free open-source CI programs that
is frequently used in software engineering. It is a Java-based server-based CI
program that needs a web server to run, it makes automated builds and testing
simple. - Atlassian’s Bamboo is a server-based CI and deployment platform
with an easy-to-use drag-and-drop user interface. Developers who already uti-
lize other Atlassian services (such as Jira) frequently use this tool. Bamboo
enables the creation of new branches automatically and their merging following
testing. Continuous deployment and delivery are simple to do using this tech-
nology. - GitLab CI is a free continuous integration tool with open-source code.
For projects hosted on GitLab, this highly scalable solution is simple to install
and configure due to GitLab API. GitLab CI is capable of deploying builds
in addition to testing and developing projects. This tool highlights the areas
where the development process needs to be improved. - CicleCI is a platform
for continuous integration and delivery. It may be used in the cloud or locally
and supports a variety of coding languages. Automated testing, building, and
deployment are simple with this program. Numerous customization tools are
included in its simple user interface. With CircleCI, developers can swiftly de-
crease the number of problems and raise the caliber of their apps. - TravisCI
has no requirement for a server because the service is hosted in the cloud. Addi-
tionally, TravisCI has an enterprise-focused on-premises version. The fact that
this utility automatically backs up the most recent build each time you run a
new one is one of its nicest features.
36
Figure 3.7: CI tools
Figure 3.7 shows the market share between the different CI solutions.
In the rest of this chapter, we are going to choose Jenkins to build our example.
37
system of your choice (for us, it will be the GitHub website) - add build step
and specify where the CMake file is
Once all the steps are done, you can save the project and test the project to see
if Jenkins starts the build by getting the file from the GitHub repo and building
the software using CMake.
We will now talk about the missing part of this continuous integration process,
the source version control system.
38
two unconnected pieces of work that must then be carefully untangled and re-
vised. Developers that have never used version control may have added versions
to their files, sometimes with suffixes like “final” or “latest,” and then dealt with
a new final version afterward. You may have code blocks that are commented
out because you wish to remove certain functionality but save the code in case
you need it in the future. Version control offers a solution to these issues.
Version control software is a crucial component of the day-to-day professional
activities of the contemporary software team. The enormous benefit version
control provides them even on tiny solo projects is often recognized by individual
software engineers who are used to working with a good version control system
in their teams. Many developers wouldn’t even think about working without
version control systems for non-software tasks once they become acclimated to
their potent advantages.
Version control systems have seen significant advancements, some of which are
superior to others. SCM (Source Code Management) tools or RCS are other
names for VCS (Version Control System). Git is one of the most widely used
VCS tools available right now. Git belongs to the DVCS category of Distributed
VCSs; more on that later. Git is a free and open-source VCS system, like many
of the most well-known ones on the market right now. The following are the
main advantages of version control, regardless of what they are labeled or the
technology employed. 1. The whole long-term modification history for each
file. This refers to each change made throughout time by various individuals.
Changes include things like adding and removing files as well as altering con-
tent. The handling of file renaming and transfer differs amongst different VCS
applications. This history should also contain the modification’s author, the
date, and any documented justifications. Going back to older revisions makes
it possible to analyze defects’ fundamental causes, which is crucial for resolving
problems with software that is more than a few years old. If the application is
still being developed, almost anything may be considered an “earlier version.”
2. merging and branching that team members should work simultaneously, but
even those working alone might gain from being able to focus on several streams
of change. Developers may keep many streams of work distinct from one another
while yet having the opportunity to merge them back together to guarantee that
their changes do not conflict by defining a “branch” in VCS systems. For every
feature, every release, or both, the branching technique is used by many soft-
ware development teams. Teams may choose from a range of workflow choices
when determining how to utilize a VCS’s branching and merging capabilities. 3.
Traceability. Root cause analysis and other forensics can benefit from the ability
to track every change made to the program, link it to project management and
bug-tracking tools like Jira, and annotate each change with a note outlining its
goal and aim. When reading the code and attempting to understand what it
is doing and why it is created in a certain way, having the annotated history
of the code at your fingertips may help developers make changes that are cor-
rect, aesthetically pleasing, and consistent with the system’s planned long-term
design. This is critical for developers to be able to anticipate future work with
39
ancient code, and it can be especially significant for doing so.
The list of VCS is also pretty long from CVS to SVN to Mercurial and Git, we
could write a lot of comparisons. We just would like to show what the landscape
is nowadays.
40
igure 3.9: Git workflow
Figure 3.9 shows how to use git in companies that develop software for their
business. We have a main branch named Master which could be the default
branch where the latest code can be found. The other branch will help develop
new releases by having a branch Develop where their development of the new
features is made. Once this development branch is stable and we are ready to
create a release, we will merge the development branch with the Release branch.
When a code is in production and we have a critical problem to solve, we cannot
wait for the next release. We will use the Hotfix branch to change the code as
soon as possible.
To ease your work in creating the whole DevOps toolchain, we are recommending
using GitHub. This website will walk you through how to create an initial git
repo and then how to make some changes to the code.
Change management
Change management is a method for transforming an organization’s objectives,
procedures, or technology. Change management implements techniques for im-
plementing, controlling, and adapting to change. To make it an effective process,
change management must evaluate how adjustments or replacements will affect
processes, systems, and workers. Plan, test, communicate schedule, implement,
document, and evaluate change. Change management requires documentation
to establish an audit trail and assure compliance with internal and external
controls, including regulations.
Each change request must be reviewed for its impact on the project. Change
management is crucial in this process. Senior executives in charge of change
control must assess how a change in one part of the project might influence
other areas and the overall project.
It is important to put in place a set of metrics helping to monitor the changes:
41
- Scope. Change requests must be examined for scope impact. - Schedule.
Change requests must be evaluated for impact on the project timeline. - Costs.
Change proposals must be examined for financial impacts. Overages on project
activities may rapidly modify project expenses since labor is the biggest expen-
diture. - Quality. Change requests must be considered for impact on project
quality. Rushing may cause faults, therefore accelerating the project timetable
can impair quality. - HR. Change requests must be reviewed for extra or special-
ized effort. The project manager may lose crucial resources when the timetable
changes. - Communications. Appropriate stakeholders must be notified of ap-
proved modification requests. - Risk. Change requests must be risk-assessed.
Even slight adjustments might cause logistical, budgetary, or security issues.
In the change management culture, there are three important steps:
1- Unfreeze: where all the actors (tech, business, or other stakeholders) decide
why and how they need to change the current state
2- Change the system: the people in charge of making changes operate
3- Freeze: once the changes have been done, we evaluate if the state is better or
if it needs to be rolled back.
After this section, we know how to make changes and we know the process to
make changes. We will not apply it in the context of software releases.
Building releases
After a set of changes, once a scope of features has been implemented, we may
decide to release our changes to production. We first need to ensure that the
change we will deliver will not damage the system (and of course, the stakehold-
ers of the system). For that, we will test if the system still performs correctly
after the set of changes we want to build.
As we saw in chapter 3, the first step of building a release is to stage the changes
going to production. For that, we can use a specific branch coming from the
revision control software. We will then use a building system compiling and
linking to obtain an executable.
Variety of Tests
Planning and output analysis is necessary for testing. To verify various activities
and requirements, DevOps experts may conduct various tests. Tests are carried
out to check the software’s integration, performance, accessibility, and usefulness
rather than only to identify flaws in the source code and gauge its accuracy. As
a result, testing proceeds in the order shown in the figure below, with early unit
testing of the application used to verify the smallest testable components of the
program.
Remember that the execution, scope, length, and data of functional tests might
change. DevOps workers need to have a strong knowledge of their testing
42
methodology before diving into any of these categories. They must realize that
high-level DevOps testing is done without consideration for code structure anal-
ysis, which is often done by code developers. By using tests that don’t need
knowledge of the program’s secret core architecture, the DevOps approach to
black-box testing avoids these restrictions.
Summary
We will now conclude this chapter by summarizing what we discovered. We first
learned how to talk to a machine by using a compiler. We learned how to build
software by using a build system. Then we saw how to use CI software such as
Jenkins to automate the process of the build by using a version control system.
This chapter closed the first part of this book about DevOps, we will now start
a new adventure by learning what MLOps is. The next chapter will finally
introduce this topic.
Motivation
The jury is out on how many data science projects do not make it to produc-
tion, with numbers hovering in the high 80s [1] . In whatever way you look at it,
this implies a low percentage of production operationalization of artificial intel-
ligence (AI)/machine learning (ML) models. That is in contrast to predictions
and societal hope that AI will improve our lives. So why this low percentage?
Different reasons contribute to this such as inadequate data management, siloed
enterprise organizations, not understanding ML technical debt, among others.
Machine learning operations (MLOps) principles help manage these hurdles for
ML model operationalization in production.
AI/ML model operationalization is similar but not the same as software deploy-
ment. Similar because you are writing and managing model code. This is where
DevOps principles that have proven to manage hurdles in software production
deployment are useful. Concepts such as scripting, task automation and CI/CD
outlined in the previous chapters are cornerstones in MLOps.
43
Different because ML model is not just code but has an important data dimen-
sion that determines the parameter values. This added complexity has implica-
tions in terms of data tracking and how for the same (Python or R or C++)
algorithm code you can have different model parameters. MLOps manages this
data complexity and we will go into more details on this in the next chapter.
ML Operationalization Complexities
There are multiple complexities around ML model operationalization that need
to be addressed by MLOps (Figure 4.1). ML models deal with a lot of data.
They need to have multiple training runs and experiments with possibly different
models. Once a model is decided upon it needs to be deployed into production
with governance, security and compliance in place. Data assumptions during
training may not hold in production so deployed models need to be continuously
monitored for drift. Models that have “decayed” need to be retrained and
redeployed in a systematic manner.
ML Lifecycle
Using the above complexities as guides, the desiderata of a MLOps platform
are:
1. Integration with data infrastructure - enable easy data tracking and man-
agement via
2. Versioning - enforce that any changes to the data (for example, outlier
44
removal, imputation) should be a version change of the data.
3. Location independence (cloud, on-premise, hybrid) - allow seamless access
to data irrespective of whether it is on the cloud (for example, Amazon
S3, Azure blob), on-premise storage or a hybrid combination of both.
4. Governance - govern all data validation, transformations and preparation
such that sensitive fields (for example, gender, race, ethnicity) are not
used for predictive modeling, and track all ML model releases for audit
and reproducibility.
5. Collaboration - enable team management via
6. Team development - ML model building in enterprises is a team sport so
support multi-user development via DevOps concepts such as CI/CD.
7. Experimentation in a centralized place - track all ML experimentation
so that model building is consistent with the data version, the algorithm
version and the third-party library version that is enforced when the team
is using the same ML pipeline.
8. Reuse of pipelines - since ML model building is an iterative process, make
it simple to change parameters and reuse ML pipelines for multiple exper-
iments.
9. Ease of adoption - enable established best practices via
10. Popular ML libraries / packages - support popular ML libraries (for ex-
ample scikit-learn, Tensorflow, Pytorch).
11. Not-steep learning curve - leverage established ML model building princi-
ples such that there is no steep learning curve.
12. Managed deployment - enable easy model deployment via
13. Containerization - make it simple to build containers with a few clicks to
deploy ML models.
14. Model monitoring - support model monitoring with notifications such as
model drift and data drift.
The above wish list is encapsulated in the ML lifecycle workflow in Figure 4.2. It
has 3 main pillars that we call phases - data management, model management
and deployment management. Below we go through each component of the
workflow. Later chapters go into the details for each component.
45
Figure 4.2: ML model building lifecycle
Data Management/DataOps
This phase deals with data complexities such as versioning, consequence of feed-
back loop such as training data selection bias (Figure 4.3) where the model
performs well in one segment of the data and not others.
46
2. EDA - perform exploratory data analysis to understand the data version.
3. Verify and Validate Data - critical to verify that the data is as expected
and then validate the data.
4. Feature Engineering - once the data is verified and validated build features
specific to the business problem.
5. Prepare Data - the data is prepared in the correct format for model build-
ing consumption.
Model Management/ModelOps
This phase addresses experimentations with ML model pipelines to determine
the correct model to deploy and complexities such as indirect system influence
on models. For example, assume two models that work together with one giv-
ing user options and the second one showing information for a selected option.
Then the behavior and updates to the first model influences the outcomes and
selections from the second model, though both models are not related directly
(Figure 4.4). There are also other direct system influence complexities through
signal mixing such as ensembles, correction cascade where a model is dependent
on another model/pipeline.
47
3. Hyperparameter Tuning - part of the ML experiments include trying dif-
ferent hyperparameters.
4. Track Experiments, XAI and Testing - track the outcome of each experi-
ment, compare them side-by-side, run explainability and test the models.
5. Model Card - once the models to be deployed to production are ready,
document them using model card.
Deployment Management
This phase includes deployment of ML models, production testing, monitoring,
and configuration challenges such as modifications that can be tracked and
reproducible (Figure 4.5).
48
ready, deploy and run the model.
2. Production Testing - run any tests that are necessary for A/B testing or
Multi-Armed Bandit Testing in production.
3. Performance Visualization and Model Monitoring - insight into the perfor-
mance of the deployed ML model with dashboard visualization and setup
ML model monitoring to detect performance deterioration.
Agile ML Lifecycle
Agile project management is a good fit for the ML lifecycle due to its iterative
nature. Given the multiple experiments with different data versions, data and
feature engineering code, algorithm code, iterative processes work well. For
example, in the agile ML lifecycle it is common practice to finance technical debt
during experimentation with prototypes. Once a model (and its parameters) is
declared ready-for-production, then the technical debt is cleared and code is
eventually good quality.
Let’s go into details of agile ML for data management (Figure 4.6). Once
you verify a specific data version, you do EDA on the data in terms of data
distribution and other statistical characteristics. During EDA you can connect
to different versions of the data per your statistical requirements. Thereafter
you validate the data in terms of filling gaps, scanning for outliers, among others.
Again you can go back to different versions of the data or do more EDA to better
understand the data. Next you develop business and domain specific features
in feature engineering. Based on the EDA outcome, you can explore multiple
features and their characteristics with different data versions. So each step of the
data management phase is executed in an agile manner with multiple iterations.
One reason is that data exploration and feature building is very exploratory and
innovative, so each iteration reveals new information to be used for the pipeline.
49
Figure 4.6: Agile ML Data Management
Likewise, each step in the model management phase is also iterative as you
explore and discover different combinations of algorithms, hyperparameters and
data features during model building. In the last step, the winning model is
documented using a model card. We go into these details in later chapters.
In the deployment management phase, first you design how you plan to run
your inference service and then deploy. After post deployment feedback (such
as production machine workload), you may go back and redesign the inference
service. Then depending on the outcome of production deployment tests, you
may again redesign (and maybe redeploy) the inference service. And based on
the performance and model monitoring, there may be redesign and/or produc-
tion redeployment.
What is AIOps
Artificial Intelligence for Operations (AIOps) is the use of machine learning to
automate and enhance IT operations. This enables IT Ops to declutter the
noise from the signal given that there are a lot of sources of information such
as machine and application log files.
AIOps generates both descriptive pattern discovery (what happened when the
failure occurred) and predictive inference insights (if the last failure pattern re-
peats, then what are the chances of another failure). Example of the former is
using correlation analysis to discover which patterns tend to happen together.
Example of the latter is root-cause-analysis to identify which patterns are re-
sponsible for recurring issues.
50
AIOps use-case examples range from predictive maintenance of IoT devices and
heavy machinery such as windmills and jet engines to monitoring computer
networks for Denial-of-Service attacks and other security issues.
Summary
In this chapter, we understood the need for MLOps and the multiple challenges
in operationalizing ML models. We covered the concepts and principles of
MLOps and how to manage a ML workflow using agile project management.
And the difference between MLOps and AIOps. In the next chapter, we look
into the intricacies of data preparation for ML model development.
[1] https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-
it-into-production/
51
Figure 5.1: Time on data from academia to industry
Feature Engineering
In contrast to EDA, feature engineering is very much use-case-dependent. This
is where you build custom features that are relevant to the ML model, which in
turn is dependent on the use case. For example, if you are building an ML model
to predict the winner of a tennis match, then features such as double-fault, and
aces served, are relevant. But they are not if you are building an ML model for
credit card fraud detection. Instead, fraud-relevant features such as the number
of transactions in the last three days, and the location of the transactions, are
important. However, for either use case, EDA is valid.
In conclusion, understanding the data details is very important to extract the
most from the data. You should expect to spend the majority of an ML project
time exploring, analyzing, and transforming the data. Given the time spent on
the data, there is a requirement (just like code) to be able to try out different
52
data variations and roll back to an earlier version, capabilities that are provided
by a versioning system. This underscores the need for data versioning.
Data Versioning
As mentioned above, the time spent on analyzing and trying out different things
with the data needs to be systematic with a lineage. That is provided by a data
versioning system. Typically a data versioning system provides capabilities such
as
1. Origin determination so that you can always go back to the raw data as
it was collected before any custom modifications.
2. Version tracking so that you know which changes/transformations are in
the data that you are using.
3. Rollback so that you have access to previous versions that you can use
when iterating on model building.
53
the data is expected to deliver different models with different parameter values.
So the data is the primary driver toward parameters, as we see in detail in the
next section.
(a)
54
(b)
file_name": "000000332351.png", "image_id": 332351}, {"segments_info":
55
optimization techniques such as gradient descent. This is referred to as software
stack 2.0.
In contrast, software stack 1.0 is deterministic coding where you have func-
tions with parameter values written in the code and the function pre- and
post-conditions are clear. In contrast, software 2.0 parameter values are de-
pendent on optimization techniques driven by the data, so that code behavior
is probabilistic as demonstrated in Figure 5.4.
Data Governance
Given the importance of data in software 2.0, data governance is a critical
component in any enterprise’s data management strategy. Data governance
covers the availability, validation, usability, privacy, and security of the data.
Security and privacy of the data are covered in the next sections of this chapter.
The first part which includes data sourcing, validation, usability, and transfor-
mations is documented using Datasheet for datasets [4] [4] (it includes some
security and privacy as well). The datasheet approach addresses the needs of
both the data creators and the data consumers, enables communication between
them, and promotes transparency, accountability, and reuse of data.
The datasheet covers the following set of questions (summary) with information
on the dataset -
1. Motivation – why was this dataset created and by whom? Who funded
the project?
2. Composition – what is in the dataset in terms of features? Are there
any recommended train/test splits? Is the data dependent on any other
(external) data?
3. Collection Process – how was the dataset collected and over what time-
frame? If individuals are involved, were they notified about the data
56
collection, and did they consent to the use of their data?
4. Preprocessing/Cleaning/Labeling – what transformations have been done
to the dataset (related to data versioning discussed previously)? Is the
ETL tool/software used for the transformation available?
5. Uses – what are the intended and not-intended use cases for the dataset?
Has the data been used already for any project?
6. Distribution – how to distribute and use the dataset from IP/legalities
perspective? When will the data be distributed?
7. M aintenance – who supports the dataset and how is it maintained? When
will the data be updated?
As you can see, data availability, validation, and usability are covered in the
datasheet. For details on any of the questions above or for example use cases
of the datasheet, please refer to the paper. Now that the data is ready to be
used, you need to verify that there are no privacy violations and that it is stored
securely. In the next section, we discuss data security.
Data Security
Data security is the prevention of data breaches, unauthorized access to data,
and information leaks. At the enterprise level, regulations such as the EU GDPR
[5] (Global Data Protection Regulation) encourage enterprises to implement
system-level security protocols such as encryption, incident management, and
network integrity to protect data. Common techniques include access control
using security keys, access monitoring and logging, and data encryption during
transit.
At the individual level, in addition to enterprise security protocols, there are
regulations such as the California Consumer Privacy Act [6] that protect per-
sonal data privacy during the usage, retention, and sharing of data. Outside of
the regulatory framework, there are ways to enforce data privacy as we see in
the next section.
Data Privacy
Data privacy pertains to data that contains an individual’s personally identifi-
able information (PII) and general sensitive data such as healthcare and financial
data. It is also applicable to quasi-identifying data that can uniquely identify an
individual when merging data from different sources such as credit card transac-
tions and cell phone locations. In this section, we focus on techniques to impose
data privacy.
57
Federated Learning
A model training technique for preserving data privacy is federated learning that
does not need any data sharing during model training. The raw data (that may
contain sensitive information) is kept in the original location. An algorithm that
needs to use different data from different locations is sent to each location (local
server) for local model building. The model is trained locally and the model
parameters are sent to a centralized server to integrate into the global model.
A popular technique for parameter value integration is averaging the different
local values. As illustrated in Figure 5.5, the global model is in a centralized
global server and is sent to the local servers post-parameter update.
In this methodology instead of sending all the data from different locations to
the model, you send the model to different locations where the data resides.
58
Differential Privacy (DP)
In the previous section we see how privacy is preserved by not sharing the
data but keeping it local and sharing only model parameter value. However,
this is not possible for many ML projects where data is sourced from different
locations and data sharing is imperative. Differential privacy (DP) is a math-
ematical definition of privacy that has shown the most promise in research to
preserve privacy during data sharing. It balances learning nothing about an
individual (from the data) and learning useful information about a population.
It is achieved by adding (some) noise or randomness to data transformations
such that conclusions made from the data are independent of any specific data
point.
59
Pytorch implements DP in a python package called opacus [8] [8] . Tensorflow
DP uses DP-Stochastic Gradient Descent. Both clip gradients and then add
random noise to them. Note that there is a privacy vs accuracy trade-off. As
shown in Figure 5.6 with an increase in privacy (go down the y-axis to lower
vulnerability), accuracy goes down. As with most trade-offs, there is a sweet
spot after which the drop in accuracy does not justify the increase in privacy
guarantees.
Synthetic Data
Another way to address data privacy is to use synthetic data for model build-
ing. In this case, the privacy concerns are based on how the synthetic data is
generated -
60
1. Using real data - data privacy for the real data is guaranteed through
techniques such as differential privacy and then that data is used to gen-
erate additional synthetic data using ML models. For example, generate
synthetic credit card transaction data using a Generative Adversarial Net-
work (GAN) and a small real dataset.
2. Without any real data - there are no data privacy issues as the entire
dataset is generated with synthetic data developed with simulated models
and using knowledge from subject matter experts. For example, generate
healthcare claims data where the features or the data fields are standard
and their types and values are created using Python libraries such as Faker
[9] [9] as determined by subject experts.
Summary
In this chapter, we understand the importance of data not just from data gover-
nance perspective, but also from model building perspective in software 2.0. In
addition to data versioning, we also discussed the need to preserve privacy and
security. Now that the data is ready for use in ML model building, we look at
feature development and feature store in the next chapter.
[1] https://pypi.org/project/autoimpute/
[2] Andrej Karpathy, Software 2.0, https://karpathy.medium.com/software-2-0-
a64152b37c35
[3] https://www.kdnuggets.com/2020/12/mlops-why-required-what-is.html
[4] Timnit Gebru et al, Datasheets for Datasets, arXiv:1803.09010 , [1803.09010]
Datasheets for Datasets (arxiv.org) 1803.09010 , Dec 2021.
[5] General Data Protection Regulation (GDPR) – Official Legal Text (gdpr-
info.eu) (GDPR)(gdpr-info.eu)
[6] Home - California Consumer Privacy Act
[7] https://dataskeptic.com/blog/episodes/2020/differential-privacy-at-the-us-
census
[8] [8] https://github.com/pytorch/opacus
[9] [9] https://faker.readthedocs.io/en/master/
61
• Understand the motivation of a feature store
• Appreciate how a feature store promotes reusability to develop quick ro-
bust ML models
• View important components of a feature store including automated feature
engineering
• Discover the benefits and challenges of a feature store
We start this chapter with understanding reusability in the model development
part of the ML lifecycle.
62
Figure 6.1: ML Pipeline Components from Data ingestion to Model Training
Therefore feature engineering holds the promise of reusability among the differ-
ent components. Given that is the case, one way to promote this concept is to
construct and store the features one time, and reuse those features many times.
That is what a feature store does.
63
Figure 6.2: Feature store data flow for ML model from data ingestion to serving
both training and inference
64
3. Versioning - need to version features so that different modifications can be
made to assess their impact on ML model training and you can roll back
to earlier versions if required. Akin to data versioning that is discussed in
Chapter 5.
4. Catalog - need to make the features discoverable by indexing the features
with metadata so that other ML applications know what is available. An
example of this discovery is using a natural language-based query retrieval.
This is often done by the Feature Registry. Furthermore, feature stores
can enhance the visibility of existing features by recommending possible
features based on a query of certain attributes and metadata. This em-
powers not-very-experienced data scientists with insights associated with
experienced senior data scientists, thus enhancing the efficiency of the
data science team.
5. Serving - need to support both training (batch mode) and inference (such
as streaming) modes.
6. Governance - need governance (to manage the features and enable reuse)
such as
7. Access - control to decide who gets access to work on which features.
8. Ownership - Identify feature ownership i.e. responsibility for maintaining
and updating features.
9. Regulatory Audit - check for bias/ethics and comply with compliance reg-
ulations to ensure that the developed features are not in violation.
10. Lineage - maintain data source lineage for transparency.
11. Monitoring - need to monitor the incoming data used to create features
and detect any data drifts. We discuss monitoring in Chapter 12.
As indicated in #1 Transformation, an important part of a feature store is au-
tomated feature engineering. There are different ways this can be implemented,
as outlined in the next section.
65
coefficient matrix is not over-determined and is invertible. One hot encoding
- encode each level as a vector where if there are K levels the vector is of size
K. Feature vectors - use vectors to encode each level (vectors are usually < K
size if there are K levels) where the distance (such as Euclidean) between the
vectors are semantically determined. Thus semantically similar levels have vec-
tors close to each other by distance measure. For example, encoding colors with
feature vectors will have blue and azure vectors close to each other in distance.
The aggregations are count/frequency of a specific level, the number of times
a level is hit in a given period. - Categorical ordered aka ordinal|Numbering
- assign numbers in ascending or descending order. For example, if an alert
has low, medium, and high levels, the corresponding encoding maybe 0, 1, or 2
to indicate the order of criticality. For the aggregations, they are the same as
unordered categorical.
Next, we outline the benefits and challenges of a feature store and list 3 popular
open-source feature stores.
66
1. An additional component to build and maintain - Feature store is another
component in your ML pipeline that you need to build and maintain.
2. Risk of training-serving skew - Feature store is used for both training and
inference, each with different latency requirements. This often results in
different architectures for the two pipelines. Therefore there is always a
risk of training-serving skew where the data transformations/aggregations
are different between training and inference (discussed in Chapter 8).
3. Does not become a feature swamp - adequate governance is required such
that a feature store does not become a feature swamp and a dumping
ground for features that are not completed and/or never used.
4. Continuous monitoring - need continuous monitoring of the developed
feature to detect any unexpected drift or gaps (Chapter 12).
Open-Source Feature Stores
In this section we introduce 3 open-source feature stores -
1. Feature tools ( https://www.featuretools.com/ ) - an open-source frame-
work for automated feature engineering.
2. Feast ( https://www.featuretools.com/ ) - open-source feature store with
support for data located in the cloud and on-premise.
3. Hopworks Feature Store ( https://www.hopsworks.ai/ ) - open-source fea-
ture store that can ingest data from the cloud and on-premise and is a
component of Hopworks ML platform (that supports ML model training
and serving).
There is a list of commercially available feature stores at https://www.featurestore.org/
.
Summary
In this chapter we looked at the motivations and structure of a feature store and
how they are beneficial to ML model development. We also outlined automated
feature engineering and challenges that arise with a feature store and listed
open-source and commercial feature stores that are available in the industry. In
the next chapter, we use these features to build ML models.
[1] https://towardsdatascience.com/mlops-building-a-feature-store-here-are-
the-top-things-to-keep-in-mind-d0f68d9794c6
[2] https://en.wikipedia.org/wiki/Power_transform
[3] https://en.wikipedia.org/wiki/Odds_ratio
67
7 - Building Machine Learning Models
In this seventh chapter we look at what is involved in building machine learning
models after the data is ready for ingestion (for example from a feature store
discussed in Chapter 6). By the end of this chapter, you will be able to:
• Understand what is meant by algorithm versioning
• Define what is AutoML and its purpose
• Describe an ML model using a model card
We start this chapter with versioning an ML algorithm.
ML Algorithm Versioning
You want to version ML algorithms for repeatability and governance, similar
reasons to data versioning (Chapter 5). But you may be thinking, what do I
need to version when I am using say an algorithm from scikit-learn? To start
with, you may have different candidate algorithms that you want to try, each
corresponding to a different algorithm version.
68
before training/learning [1] . For example, you can create different structures
of an algorithm listed in Table 7.1 with different hyperparameters. You can use
different versions of the random forest algorithm with different tree depths, or
use different versions of a neural network with different learning rates for the
retail sales prediction example (Figure 7.2).
So you can have combinatorial versioning of algorithms (Figure 7.1) with hy-
perparameters (Figure 7.2). In Chapter 8 we see how this works with data
versioning in ML pipelines.
Algorithm Hyperparameters
Linear Regression Regularization parameter,…
Random forest Number of estimators
Support Vector Machine C and gamma
Neural network Learning rate,…
Figure 7.2: Versions with the same algorithm and different hyperparameters
As demonstrated, algorithm versioning helps you keep track of all the moving
parts in ML model building. Note that these are just sample algorithms that
you can use for the retail sales prediction example. There are multiple other
algorithm options including ensembling the aforementioned algorithms. Deter-
mining what is the best algorithm to use necessitates building models with each
and evaluating the model metrics. Sometimes the challenge may be that you
cannot train a model continuously from start to finish for whatever reason be
it time, hardware availability, data, or cost. At the same time, you do not want
to lose the training you have done till then. This is where model checkpointing
comes in handy.
69
Model Checkpointing
As outlined above, a machine learning model training may be paused for many
reasons, some decided by you such as suspend now and pick it up later to free
up computation resources, others not decided by you such as out-of-memory
or execution crash. Irrespective of the reason, you would want to restart the
training from the last completed point as opposed to restarting from the begin-
ning. This is what model checkpointing allows as demonstrated in Figure 7.3.
Checkpoints A, B and C save the state of a model during training and validation.
Checkpoints B and C pickup from the saved model state in checkpoint A and
B, respectively, with each region indicated differently in the loss curves.
Model checkpointing is similar to model save but with a subtle difference. In the
former, you have the model source code so you are only saving the parameter
values. In the latter, you are serializing the complete model - parameter values
and model architecture. ML libraries such as TensorFlow [2] and PyTorch [3]
have APIs for model checkpointing.
Armed with algorithm versioning, hyperparameter versioning and model check-
pointing, now you are ready to make a serious impact on a business project.
However, manually determining which combination gives the best result (as
evaluated by a business metric) is cumbersome. Automated machine learning
is an automated methodology that does this automatically. In other words,
it builds models with multiple combinations of different algorithms and differ-
ent hyperparameters and ranks them based on a user-defined model evaluation
metric.
70
Figure 7.3: Model checkpointing during training
71
with some upskilling.
• Experienced data scientists use AutoML to narrow down the algorithm
and hyperparameter possibilities, including neural network architectures.
And then develop or fine-tune the models themselves. The benefit is that
trained data scientists with math and algorithm backgrounds use AutoML
as an efficiency tool. In the next section we discuss using AutoML to search
neural network architectures.
72
AutoML Open-source Libraries
Below are examples of open-source automl libraries -
• AutoWEKA ( https://www.automl.org/automl/autoweka/ ) - oldest li-
brary released in 2013.
• Auto-sklearn ( https://automl.github.io/auto-sklearn/master/ ) – for ML
models
• Auto Keras ( https://autokeras.com/ ) – for DL models
• TPOT ( https://epistasislab.github.io/tpot/ )
• H2O AutoML ( https://www.h2o.ai/products/h2o-automl/ )
• Ludwig ( https://eng.uber.com/introducing-ludwig/ )
• FLAML ( https://microsoft.github.io/FLAML/ ) – Fast and Lightweight
AutoML
73
Figure 7.5: ML model development from manual to AutoML
Closely associated with the concepts of AutoML, semi-auto ML and manual
ML is how much code you will have to write when using them. As indicated in
Figure 7.5, this introduces the associated concepts of low-code or no-code that
differentiate from full-code -
Full-code – when you write code for an ML model (for example using libraries
such as scikit-learn, TensorFlow, or PyTorch). Associated with manual ML.
No-code – when you use graphical tools such as drag-and-drop components on
a workspace and connect them with arrows to build a system without writing
any code. Associated with semi-auto ML (no fine-tuning, just use the pre-built
ML model as-is) and AutoML (use the automatically built ML model as-is).
Low-code – when you use similar graphical tools but may have to update settings
in a configuration file or update parameters in a code file or write some basic
scripts. Also associated with both semi-auto ML (update scripts to fine-tune
and train the pre-built ML model with your dataset) and AutoML (update
scripts to change hyperparameters, if allowed by the AutoML framework).
Model Card
Just like a nutrients facts label tells you every detail about the product - how
it was made, its constituents, etc., similarly a model card [6] [6] (introduced
by Google) tells you everything about the model. Specifically, a model card
outlines
1. Model details - basic information about when the model was built, version,
license details etc.
2. Intended use - what are the different use-cases that the model was built
74
for and the out-of-scope use-cases.
3. Factors - what are the relevant factors used to build the model.
4. Metrics - what are the performance measures to evaluate model impact
5. Evaluation data - what evaluation dataset (and preprocessing, if any) was
used to analyze the model.
6. Training data - what training dataset (and preprocessing, if any) was used
to build the model.
7. Quantitative analysis - what are the results of model analysis.
8. Ethical considerations - what ethical aspects (bias and fairness discussed
in Chapter 13) was taken care of to build the model.
9. Caveats and recommendations - warnings and recommendations on using
the model.
As you can see a model card details everything pertinent and important there
is to know about a model. Now that you have a ML model documented with a
model card, you need to ensure that there is model governance to keep track of
which model is deployed to or updated in production.
Model Governance
Given the possible Cambrian explosion of data versions, hyperparameters, and
algorithm versions that constitute a specific model, it is imperative that en-
terprises require strong ML model governance to keep track of which model is
currently in production and who authorized the last production release. This is
managed using approval workflows.
Approval Workflows are processes designed to ensure the authorized flow of in-
formation and/or artifacts in organizations. Such workflows are very familiar in
everyday enterprise activities such as applications for leave, expense reimburse-
ments, equipment allocation, and the like. There are industry-standard best
practices for these workflows, which each organization modifies to suit its needs
and culture. They have also become commonplace in the world of software
development, for example,
• A developer issues a “Pull Request” (PR) to a Tech Lead / Manager. The
code is reviewed and corrected before being merged into the release branch
of the code repository.
• Managers review unit test/integration test reports before releasing code
for the QA team to test. Additional tests may be required before the code
is deemed ready for QA.
• A Release Manager reviews QA reports before releasing software to a
production system.
75
Figure 7.6: Model approval workflow based on different roles and responsibilities
Approval Workflows are based on the concepts of Roles and Requests. A work-
flow typically defines a Request (to be performed by a specific Role), and mul-
tiple levels of approvals to be
provided by other Roles. In the first example provided above, the Developer
Role made a Pull Request to be approved by a Manager Role.
In addition, an ideal workflow should be customizable to suit the needs of the
organization and should maintain a log of requests and approvals (by whom and
when), thus providing a complete audit trail.
Let us follow the workflow in Fig. 7.6. Assume a data scientist completes an
ML model. Next, a manager reviews the model and provides feedback. Once
the model is approved by the manager, it is promoted for validation and testing
in quality assurance (QA). In QA, a data analyst (or any QA specialist) tests
the model. If the model has issues, then the data analyst provides feedback
to the data scientist, the model is rebuilt, and the cycle is repeated. Once the
model passes the QA test, the data analyst approves and promotes the model to
production. In production, an IT engineer deploys the model. All the workflow
roles, responsibilities, and subsequent actions are managed, labeled, identified,
and controlled using approval workflows. Such workflows are often managed by
MLOps platforms.
In conclusion, the demonstrated approval workflow maintains a complete log of
approvals (by whom and when) and promotions (dev to QA and QA to pro-
duction) for a production ML model. Such model governance improves the ML
production lifecycle with streamlined processes and accountability in a collabo-
rative environment.
76
Summary
In this chapter we understand algorithm versioning and discuss AutoML. We
also look at model documentation using a model card and no/low/full code ML
model development. Lastly, we discuss model governance given the different
data versions, hyperparameters, and algorithm versions. In the next chapter,
we discuss ML pipelines that are the backbone of ML experimentation and
model building.
Phases of Experimentation
The iterative nature of ML model building was outlined in the last chapter
where you managed data versions for different experiments. Experimentation
has 3 different phases that form an experimental wheel as defined by Stefan
Thomke [1]:
1. Generate verifiable hypothesis - this is where you have a hypothesis that
using data version 1.1.1 is better than version 4.0 (Figure 5.2) because
including outliers makes the model robust.
77
2. Run disciplined experiments - this is where you have a platform to run
experiments where you build ann ML model using different data versions
(Figure 5.2) and different algorithm and hyperparameter versions (Figure
7.1 and Figure 7.2) and compare the metrics.
3. Learn Meaningful Insights - use the feedback from the metrics comparison
to learn insights and derive insights such as imputation is not important
but outlier removal is and uses these insights to generate the next verifiable
hypotheses (for example to use version 2.1.1 in Figure 5.2).
The ability to run the above experimentation wheel is enabled by ML pipelines.
And the knobs to change for an ML experiment span data versioning, hyperpa-
rameters versioning, and algorithms/pipeline versioning [2] [2] (Figure 8.1). For
example, you may want to experiment with the same data version in the same
ML pipeline but with different hyperparameters. Another experiment would
be using the same data version with the same algorithm hyperparameter but
different pipeline configurations such as different data mappings or mash-ups
with external data. A third configuration may use the same data version with
different algorithms. As outlined in Chapter 4 on the desiderata of an MLOps
platform, team collaboration with different members running different experi-
ments (some of which will fail fantastically like M2 and not all successful ones
will perform like M1) is important to find that winning model combination.
ML Pipeline
A ML pipeline is a workflow to run experiments outlined in the previous section
starting with data ingestion and ending with model delivery. It includes all the
steps necessary to train a model from DataOps and ModelOps in Figure 4.2 such
78
as data validation, data preprocessing, data mappings (including any external
data mash-ups), model training, model evaluation, and model delivery (ready
for production deployment). Note that this includes defining an ML pipeline
for AutoML discussed in Chapter 7. A workflow example is given in Figure 8.2
-
1. Use a hypothesis to connect to a specific data version.
2. [Optional] Perform additional mapping/mash-ups on the data.
3. [Optional] Generate features - each new feature corresponds to a new
pipeline version.
4. Choose an algorithm and specific hyperparameters. Note that changing
hyperparameters corresponds to a new hyperparameter version.
5. Train an ML model (can be manual or Semi-Auto ML or AutoML from
Chapter 7).
6. Calculate model metrics (training error, if possible out-of-sample valida-
tion error).
7. Deliver the trained model ready to be deployed if needed (to be determined
based on the model metrics) - this completes the experiment.
79
Figure 8.3: Directed Acyclic Graph (DAG) used in ML pipelines
ML Pipeline Advantages
The technical advantages of ML pipelines are -
1. Independent steps so multiple team members can collaborate on the
pipeline simultaneously.
2. Start from any specific point - Re-run does not have to always start at the
beginning, you can choose a midpoint skipping steps not needed.
3. Reusability promoted by pipeline templates for specific scenarios that al-
ways follow the same sequence - for example, you can create a copy of
a pipeline and reuse for another experiment (i.e. use another data ver-
sion and/or another set of hyperparameters and/or another data map-
ping/external data mash-up).
4. Tracking and versioning of data, model and metrics.
5. Modular development with isolated changes to independent components
that enable faster development time.
6. Visual representation of components / functionalities deployed to produc-
tion.
80
The primary business advantage provided by ML pipelines is cost reduction
due to - 1. Multiple quick iterations via different experiments that give more
development time for novel models – something that data scientists love to do
= better job satisfaction + retention.
1. Standard processes to update existing models.
2. Systematic methodology to reproduce results.
3. Increased data governance with data and model audit – create a paper
trail.
Inference Pipelines
So far in this chapter we have discussed training ML pipelines where you run
different experiments to create the best (from your metrics-perspective) ML
model and deliver it ready for deployment. That model is then deployed to
production and is ready for inferencing. Here you have to be diligent so that
your inference pipeline includes all the data mapping(s) and mash-up(s) and
feature engineering that were done on the raw incoming data in the training
pipeline. As illustrated in Figure 8.4, all the steps prior to model run have to
be identical so that there are no differences between the data used to train the
algorithm to generate the deployed model, and the data used by the same model
for inferencing.
If there is a difference in the data sent to the model then your pipelines have
a Training-Serving skew. Be careful to avoid this skew. Otherwise you violate
one of the basic assumptions in machine learning that the training and inference
data are from the same data distribution.
Note that training and inference pipelines are complementary. The training
pipeline’s primary purpose is to evaluate models and deliver the one with the
best metrics. The inference pipeline’s primary purpose is to generate results
from the model and monitor the model for any drift (more on this in Chapter
12). More on this result generation and feedback for the ML model in the next
section.
81
Figure 8.4: Complementary Train and Inference ML Pipelines
82
there is new training data available to deliver an updated model. For example,
a recommender system is a good example of collecting training data using a
feedback loop - when a user clicks on a recommendation that is used as positive
(implied) feedback.
In the next section, we outline three popular open-source pipelines for use with
your ML code.
83
1. Apache Airflow ( https://airflow.apache.org /) - originated at Airbnb this
is a workflow engine that manages scheduling and running jobs and data
pipelines.
2. Kubeflow Pipelines ( https://www.kubeflow.org/ ) - started as an open-
source Google to manage TensorFlow jobs on Kubernetes.
3. MLflow ( https://mlflow.org/ ) - provides a framework to track experiment
metrics and parameters and visualize and compare them in a browser.
Summary
In this chapter we looked at the motivations behind ML pipelines and the ad-
vantages of using them both from technical and business perspectives. We also
outlined continuous training that extends the DevOps CI/CD methodology to
CT using a data flywheel. Lastly, we present 3 popular open source implemen-
tations for ML pipelines. In the next chapter, we look at model interpretability
and explainability.
[1] Thomke Stefan H., Experimentation Works, Harvard Business Review, 2020
[2] How MLOps Helps CIOs Reduce Operations Risk (xpresso.ai) (xpresso.ai)
[3] J. Collins, Good to Great: Why Some Companies Make the Leap and Others
Don’t, HarperBusiness, 2001.
ML Model Interpretability
Consider that you have built an ML model that outputs the probability of a
patient developing diabetes after 6 months and before 12 months. The inputs
to the model are a variety of healthcare data such as electronic health records,
84
lab results, number of visits to the emergency department, among others. The
objective is to rank patients from high to low probabilities and determine which
high-probability (high-risk) patients should be contacted for medical interven-
tion/treatment.
The output of the ML model is 0 to 1, with 0 indicating no risk and 1 a certainty
of diabetes. From a business standpoint, you have to determine the threshold
above which you are going to medically intervene and reach out to patients. This
threshold is a function of your organization’s capacity to manage the number
of interventions in a given period, the intensity of the interventions, and the
history of health outcomes of patients with such probability modeling. In other
words, multiple business factors influence the threshold and how you interpret
the output of the ML model. For example, a large hospital group with sizable
resources will have a lower threshold for intervention than a small hospital with
limited resources with a higher threshold. Note that the ML model workings
remain unchanged as you alter your interpretation of the model output based
on external factors.
Consider another example where you build a model to determine the probability
of a bank customer asking for a loan to default. Based on this probability you
will determine whether to approve the loan or not (Figure 9.1). The output of
the ML model is 0 to 1 with 0 indicating no risk and 1 a certainty of loan default.
Again, you have to determine how you interpret the output of the ML model
for loan approval. Possibly in 2021, you would have a greater appetite to give
loans, and depending on your bank budget your threshold may be low. In 2022,
given the economic conditions and interest rates, your threshold presumably
will be higher. As in the previous example, the inner structure of the ML model
remains unchanged as you change your threshold based on economic conditions.
ML Model Explainability
Now that you have determined how to interpret your ML model output, let
us take a look at why the ML model is giving a particular output. In other
words, we want to explain why in the first example the ML model is outputting
a particular diabetes probability for say John Doe and likewise in the second
85
example why the ML model is outputting a particular loan default probability
for say Mary Jane. This is ML model explainability and is key to understand-
ing the inner workings of an ML model. It is also important for prescriptive
analytics - to use the model to determine what action(s) to take to change a
possible outcome. For example, maybe John Doe has a high probability of dia-
betes because of obesity so a possible intervention might be walking 20 miles a
week. Mary Jane may have a low probability of default because she has taken
loans previously that she has successfully paid off, so the bank now has higher
confidence in her model output. A popular ML explainability library is SHAP
that we discuss next.
Shapley Explainability
A popular ML explainability library is Shapley Additive exPlanations (SHAP)
developed by Scott Lundberg and Su-In Lee in 2017 [1] [1] . SHAP [2] [2] is based
on Shapley’s values that are based on cooperative game theory. It is named after
Lloyd Shapley who introduced this concept in 1951 and was later awarded the
Nobel Prize for it. SHAP also explains individual data instance predictions
adopting techniques from a method called Local Interpretable Model-agnostic
Explanations [3] [3] (LIME).
The SHAP framework considers prediction for a data instance as a game. The
difference between the actual instance prediction and the average prediction of
all instances is the gain or loss of an instance from playing the game. SHAP
treats each feature value of a data instance as a “player”, who works with each
other feature values for loss or gain (= the difference between the instance
predicted value and the average value). As different players (feature values)
contribute to the game differently, the Shapley value is the average marginal
contribution by each player (feature value) across all possible coalitions (data
instances).
Note that SHAP explains the output of any ML model (using shap.Explainer( )
or shap.KernelExplainer( ) APIs) including deep learning ( shap.DeepExplainer(
) ) NLP and Computer Vision models ( shap.GradientExplainer( ) ).
The Shapley value is the only attribution method that satisfies properties Effi-
ciency , Symmetry , Dummy, and Additivity , which together may be considered
a definition of a fair payout -
1. Efficiency - for a data instance, the sum of its feature contributions should
be the same as the difference in its prediction from the average.
2. Symmetry - two feature contributions are the same if they have equal
contributions to all possible coalitions.
3. Dummy - a feature has a value of 0 if it does not have any impact on the
predicted value.
4. Additivity - the combined value of a feature contributes to the average.
86
For example, the Shapley value for a feature of a random forest algorithm
is the average of the Shapley value of the feature across all trees.
Figure 9.2 is an example of a global view of the feature importance using SHAP
that calculates the marginal contribution of each feature across the data in-
stances (or coalitions as is called in cooperative game theory). In the example,
we have a model to predict the risk of End Stage Renal Disease (ESRD) for an
individual given the features listed on the y-axis. As seen in the diagram, the
x-axis represents the influence of a feature on the model outcome (whatever that
may be) with the vertical line indicating no influence. Points to the left (right)
of the line denote a decrease (increase) in the output with feature (aka player
in game theory) value change. Feature values are codified as low (blue) to high
(red). Some feature contributions are explainable. For example, low (blue) val-
ues of the diagnosis of hypertensive CKD decrease the output (since they are to
the left of the line), in other words, reduce the risk of ESRD. Likewise, high (red)
values of the same feature increase the output, i.e. higher risk of ESRD (since
they are to the right of the line). Some feature values are not clearly explainable.
For example, time since hypertension indicates that while a high feature value
(red) lowers the risk of ESRD, a low value (blue) can increase or decrease the
risk. In conclusion, such a global viewpoint clearly demonstrates the impact of
individual features on the predicted output, thereby enabling users to develop
an actionable plan.
87
In conclusion, Shapley values are important for ML model explainability that
is algorithm agnostic due to the cooperative game theory approach.
88
influence may not be clear - is John HbA1c high because he is overweight, or
is he overweight because of high HbA1c? This is answered with counterfactual
analysis that checks that if John was not overweight would he still be at a high
risk of diabetes? With causal inference analysis, is done using observational
data and with a python package called DoWhy [4] . If the risk is lowered with
weight reduction, then causality is determined. Once causality is determined it
helps with explainability - a factor that contributes to John’s high risk of dia-
betes is weight. So to reduce the risk, reduce weight. To make this actionable,
an example intervention is to walk 20 miles a week (Figure 9.4).
Summary
In this chapter we learned about techniques to interpret and explain ML models,
including causal inference and counterfactuals. In the next chapter, we look at
using containers to deploy ML models.
[1] Scott Lundberg and Su-In Lee, A Unified Approach to Interpreting Model
Predictions , NIPS 2017, https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-
Paper.pdf .
[2] https://github.com/slundberg/shap
[3] https://github.com/marcotcr/lime
[4] https://github.com/py-why/dowhy
89
Microservices and Docker containers
In this section, we will learn about the advantages of using microservices and
how to combine them with docker containers. We will first learn the pros of
using microservices.
90
method’s comprehensive approach to data governance. Compliance processes
benefit from microservices’ more precise approach.
• Multilingual technologies
Microservices enable developers to utilize diverse programming languages and
technologies without compromising software architecture. Java may be used
to code app functionalities. Another developer uses Python. This adaptability
creates “technology-agnostic” teams.
Developers package and deploy microservices using Docker containers in private
and hybrid clouds. Microservices and cloud environments facilitate scalability
and speed-to-market.
One of the main benefits is developers access microservices from one cloud loca-
tion and
cloud-based back-end modifications to microservices don’t affect other microser-
vices.
We will now talk about the docker containers.
91
Figure 10.2: Docker diagram
Figure 10.2 explains how docker works. It is divided into the following compo-
nents: - Images: serve as the foundation for containers. - Containers — Used to
execute applications created using Docker images10. We use docker run to con-
struct a container. The docker ps command may be used to see a list of active
containers. - Docker Daemon: host’s background service in charge of creating,
executing, and disseminating Docker containers. Clients communicate with this
daemon. - The Docker Client is a command-line program that connects the user
to the daemon. In a broader sense, there may be more types of clients as well,
like Kitematic, which gives users a graphical user interface. - a repository for
Docker images10 called Docker Hub. The registry may be seen as a collection of
all accessible Docker images10. One may run their Docker registries and utilize
them to retrieve images10 if necessary.
Figure 10.3: High-level workflow for the Docker containerized application life
cycle
Developers start the inner-loop process by developing code. Developers specify
everything before sending code to the repository in the inner-loop stage (for
92
example, a source control system such as Git). The repository commits Contin-
uous Integration (CI) and the procedure.
The inner loop includes “code,” “run,” “test,” and “debug,” plus actions before
executing the software locally. The developer uses Docker to run and test the
program. Next, we’ll outline the inner-loop process.
DevOps is more than a technology or toolset; it’s a philosophy that demands
cultural transformation. People, methods, and technologies speed up and fore-
cast application life cycles. Companies that embrace a containerized workflow
reorganize to fit their people and processes.
DevOps replaces error-prone manual procedures with automation, improving
traceability and repeatable workflows. With on-premises, cloud, and closely
integrated tools, organizations can manage environments more effectively and
save money.
Docker technologies are available at practically every step of your DevOps pro-
cess for Docker applications, from your development box during the inner loop
(code, run, debug) through the build-test-CI phase and the staging and produc-
tion environments.
Quality improvement helps uncover faults early in the development cycle, reduc-
ing repair costs. By putting the environment and dependencies in the image and
delivering the same image across many environments, you encourage removing
environment-specific settings, making deployments more dependable.
Rich data from effective instrumentation (monitoring and diagnostics) helps
guide future priorities and expenditures.
DevOps shouldn’t be a destination. It should be introduced slowly via ade-
quately scoped initiatives to show success, learn, and improve.
Kubernetes
Kubernetes was created in the Google lab to manage containerized applications
in a variety of settings, including physical, virtual, and cloud infrastructure. It
is an open-source technology that aids in the creation and administration of
application containerization. It can automate application deployment, applica-
tion scaling, and application container operations across clusters. It can build
infrastructure that is focused on containers. Let’s learn why Kubernetes and
DevOps are not only two mutually exclusive worlds but also a perfect pair.
93
code so they may discover bugs early. As DevOps gained popularity, teams cob-
bled together pipelines from various distinct platforms, requiring customization.
Adding a tool required rebuilding the pipeline, which was inefficient. They found
the solution of using containerization. Containers bundle an application or ser-
vice’s code and dependencies to operate in any software environment. By using
containers to execute microservices, enterprises can design flexible, portable
pipelines. This enabled them to add or alter tools without affecting the overall
process, providing seamless CI/CD pipelines.
As DevOps switched to containerized workloads, orchestration and scalability
issues arose. Kubernetes came then. Kubernetes automates the deployment,
scaling, and administration of containers, allowing companies to handle thou-
sands of containers. Kubernetes delivers resilience, reliability, and scalability to
DevOps initiatives.
Kubernetes’ key DevOps characteristics are: - Everything in Kubernetes may
be built “as code,” meaning access restrictions, databases, ports, etc. for tools
and apps and environment parameters are declarative and saved in a source
repository. This code may be versioned so teams can send configuration and
infrastructure changes to Kubernetes. - As enterprises adopt cloud-native initia-
tives, their teams need tools that can be smoothly integrated across platforms
and frameworks. Kubernetes can flexibly orchestrate containers on-premises,
at the edge, in the public or private cloud, or between cloud service providers
(CSPs). - Kubernetes’ automatic deployment and update rollback capabilities
let you deliver upgrades with no downtime. Tools and apps in a container-based
CI/CD pipeline are split into microservices. Kubernetes updates individual con-
tainers without affecting other processes. Kubernetes makes it easy to test in
production to catch flaws and vulnerabilities before deploying updates. - Ku-
bernetes distributes resources effectively and just as required, lowering overhead
costs and optimizing server utilization. Kubernetes boosts development produc-
tivity by providing fast feedback and warnings about code, expediting issue
fixes, and lowering time to market. - Kubernetes may operate anywhere in a
company’s infrastructure. This enables teams to implement it without switching
infrastructures.
Let’s now learn more in-depth about the features of Kubernetes.
94
can execute apps on the cloud. It facilitates the transition from host-centric to
container-centric infrastructure.
Kubernetes Nodes The following Node server components are required for
Kubernetes master communication.
95
Kubelet This tiny node service relays information to and from the control
plane. It reads configuration data from etcd store. This receives directives
from the master component. kubelet maintains the work state and node server.
Network rules, port forwarding, etc. are managed.
Proxy Kubernetes This proxy service operates on each node and enables
remote hosts access services. It forwards requests to proper containers and does
load balancing. It makes networking predictable, accessible and separated. It
maintains node pods, volumes, secrets, and container health checks.
96
Figure 10.5: Kubernetes Master Machine components
Summary
In this chapter, we underlined the advantages of using micro-services. We de-
scribed how pairing containers and microservices to fasten time to market for
software. Then we finally scaled up by learning how to use Kubernetes. In the
next chapter, we will learn how to release software.
97
11 - Testing ML Models
In this chapter, we will talk about testing. Testing ensures software quality and
it is a critical part of the DevOps/MLOps process. You will learn about the
many forms of functional testing and their functions in software deployment.
By using several technologies that are linked with version control software, we
will investigate test automation and unit testing in further detail. By the end of
this chapter you will be capable of doing the following: - creating unit testing -
creating performance testing - creating integration testing - creating UI testing
We will start by introducing what testing is. # Introduction to testing
Software/Model testing checks, monitors, and ensures quality standards across
the whole development operations cycle. In DevOps/MLOps, we are actively
engaged in managing the testing procedures of new software/new model as it is
being built. Testing is essential to make sure that the software complies with
the business requirements established for the application and that it merges
successfully with the current code without altering it or interfering with its
dependencies. When checking if a model still performs as well or better than the
prior version, we set metrics that will allow us to sign off on new modifications.
This set of metrics is linked to the business: - confusion matrix: we separate the
dataset into Training and Test datasets. The training dataset will be used to
train the model while the test dataset is put aside. When the model is trained
to predict, we attempt to do so using the test dataset. And after the data
are divided into a matrix, we can observe how much of our model’s predictions
are accurate and how much of them are false: true-positive, true-negative, false-
positive and false-negative - accuracy: the closeness of measuring findings to the
actual value. It reveals the precision with which our classification model can
forecast the classes specified in the issue statement. - recall/sensitivity/True
Positive Rate: often utilized in situations when determining the truth is crucial
- precision: often utilized in situations when it is crucial to avoid having a large
number of False positives - specificity: genuine negative rate which gauges the
percentage of real negatives that are accurately classified as such - F1-score:
gauge the accuracy of a test of binary classification. To calculate the score, it
takes into account the test’s recall and precision. - and any other metric which
makes sense for the business. It could be the money made by a model, the cost
saved by a model, or the risk forecasted by a model,…
We demonstrated that testing was an important step of the code/model de-
ployment. We will now talk more in-depth about the various testing we will
encounter.
Functional testing
To assure quality and value, we run tests at various phases of the software
development process. Functional testing is carried out throughout the software
development process to ensure that the program complies with the requirements.
98
Each function is put to the test by having its arguments filled up with values,
then the result is compared to what was forecasted. Applications are built and
deployed with functional testing in place. As a result of DevOps’ cyclical nature,
software functions are continuously monitored and evaluated, which leads to a
source code revision for subsequent development cycles. In the whole iterative
process, functional testing is crucial.
The DevOps process is shown in the chart below. The construction and deploy-
ment phases of the delivery pipeline described in Figure 11.1 have been covered
in this book. We can observe that testing is an important part of this workflow.
99
understanding of the testing technique. We will start by talking about unit
testing.
Unit testing
A program’s whole code base is tested using unit testing to ensure that every
function and each component is operating as intended. The foundation of unit
testing is the examination of the smallest testable components, units, or blocks
that make up the program. As a result, after assigning values to the function’s
arguments, unit testing validates the function return (which is a single output).
It is an iterative best practice that makes the code base more independent and
modular the more often it is used during development. This indicates that unit
testing makes it simple to test and, if required, repair each new function in
the developers’ branch. As branches are included in new builds and developers’
work continues while including their faults, issues will become more difficult to
isolate and repair if unit tests are not performed throughout development.
To put it another way, unit testing creates modularity and divides the software
into smaller programs, each with a particular purpose. Coverage is the number
of functional units or blocks that are tested.
We may create a function script add (x,y), and insert arguments as an example
of unit testing to compare the output to an anticipated value. This would
essentially be a unit test run.
Define add (x, y):
provide x + y
Thus, using the function add (6, 3) should result in the number 9 being returned.
The unit test would print the error if the value was different.
To prevent any output variation that might invalidate the testing analysis, unit
testing needs control over the input data utilized for testing. Therefore, a testing
environment that is separate from the development process must be set up. It is
important not to link any external components. For instance, if a code is linked
to the parameters fetched from a database, we don’t want to use the databases
in the unit tests. We will create a mock database (which mimics the behavior
of the database) and we will use this mock object in the unit test.
We learned how to create a unit test, we will now learn how to get a higher view
of the functioning of our software/model by talking about integration testing.
Integration Testing
By integrating the modules that were examined in unit testing into larger group-
ings, integration testing increases the complexity of testing software. One user
request requires the cooperation of many software functional units. This proce-
dure must be followed in a certain order to get the intended result. Integration
100
testing makes sure these components work together as required. In other words,
the integration technique checks to see whether the new modules are compatible
with the code base’s current modules.
Since integration testing does not simulate a database as we did with the mock
database in the unit testing. The integration approach has direct access to the
database. As a consequence, integration tests may not pass if they cannot access
resources. These resources were not needed for unit testing but will be needed
for integration testing.
For unit testing, we used the function add as an example, for integration testing,
to keep on building this example, we will use a matrix multiply using the function
add and the function multiplies as an example of integration testing.
The integration testing regroups many more components and gives a full view
of the system works on a few use cases, we will not talk about the acceptance
testing.
Regression Testing
Regression testing aids in determining if our program consistently delivers when
it is used frequently. When all of the tests reveal that a program is operating
properly and by its design, it has succeeded.
101
To do this, we develop a golden performance reference that serves as a bench-
mark for how our program should function. We’ll be looking for our software
to perform just as well as its ideal reference every time we execute it.
Canary testing
A new software version or a new feature is tested with actual users in a live
setting as part of software testing known. It is accomplished by pushing a
limited number of end users some code updates live.
The new code or “canary” only affects a tiny number of people, therefore its
overall effect is minimal. If the new code turns out to be problematic or causes
issues for users, the modifications may be rolled back.
A small number of end users act as a test group for new code during program
canary testing. These users, like the canary in the coal mine, are not aware
that they are aiding in the early detection of application issues. Monitoring
software notifies the development team if a code update is problematic so they
may correct it before it is made available to more people, reducing the chance
that the experience for all users will be negatively impacted.
A canary release is an excellent approach to introducing small code modifications
connected to the introduction of new features or a new software version. Because
the code is made available to real users in production, the development team
can assess if the changes have the intended or anticipated effects rapidly.
Developers may move a small portion of users to new features in a new version
via canary deployment. The impacts of any problems with the new software
are limited by merely exposing a portion of the total user population to it, and
102
developers may more easily roll back a problematic release without having it
harm the user base as a whole.
A/B testing
A/B testing, often known as split testing, is a method for determining whether
the version of something may assist a person or organization achieve a goal more
successfully. To make sure that modifications to a website or page component
are driven by facts and not just one person’s opinion, A/B testing is often used
in web development.
A/B tests are anonymous research in which the subjects are not made aware
that a test is being run. Version A serves as the control in a typical A/B test on
a Web page, whereas Version B serves as the variation. During the test period,
103
half of the website visitors get version A of the page, which has no modifications,
and the other half receive version B, which has a change intended to increase
a certain statistic like clickthrough rate, conversion, engagement, or time spent
on the page. It is determined if the control or the variation performed better
for the targeted aim by analyzing end-user behavior that is collected during the
test period [a] [a] .
The difference between A/B testing and canary testing is that the first one, the
two versions are known to be functional. A/B testing focuses is to check the
preference of the user while canary testing is to test if a new feature will work
better. If this new feature doesn’t work, it will be easy to roll back to the
previous version.
104
Figure 11.3: Multi-Armed Bandit testing
105
Figure 11.4: Difference between Virtual Machines and Docker containers
Summary
In this chapter, we enforced the importance of testing in the DevOps/MLOps
workflow. We learned how to create a unit test, and integration test and talked
about all the other specific tests that ensure software and model quality. In the
next chapter, we will talk about the monitoring step of the workflow.
106
understand how this may happen.
As illustrated in Figure 12.1, you can think of an ML model as having data
inputs X and output y . In terms of probabilities, there is the input probability
P(X) , the input-conditional output probability P(y|X), and the output marginal
probability P(y) .
107
so that a statistical comparison can be made to verify if the distribution is
significantly different from the training data distribution.
The production and training data may be different because of the following
reasons -
1. Sampling bias - it could be that the training data that was sampled from
a population is biased and is not a true representation of the underly-
ing population distribution, as demonstrated in Figure 12.2. Therefore,
during production, the ML model is exposed to a population distribution
that is different from the training data distribution. For example, using
cartoon pictures of dogs to train a model to identify the type of dog while
production data includes real-life pictures of dogs.
Figure 12.2: Biased training sample distribution different from the test sample
1. Non-stationary environment - an ML production model that is receiving
outside data may be exposed to a non-stationary environment where the
data characteristics (such as mean and variance) are changing with time
(as illustrated in Figure 12.3) and the data processing does not correct
for it. For example, trending data such as movie ticket sales over the
years is data from a non-stationary environment. Note that there are
data processing techniques available to change a non-stationary data to a
stationary data but are beyond the scope of this chapter.
108
Figure 12.3: Non-stationary feature
Given that now we understand why production data may be different, in the
next sections we cover characteristics of those differences, how to detect them
and importantly, how to correct for them such that the ML model regains
performance in production.
109
Figure 12.4: Covariate shift with different distribution in training (before) and
production (after)
An example of covariate shift is when an image ML model is developed to detect
cars using black-and-white pictures and the production data contains colored
images12 of the same cars as in the training data. Another example of covariate
shift is when a spoken English speech recognition algorithm to detect what is
being said is trained using an Australian accent and used an American accent.
A third example is when a disease detection algorithm using patient data is
trained with data of 20 and 30-year-olds and used on Medicare (ages 65 or
older) population data.
110
Correct Covariate Shift
Once you detect the feature(s) with covariate shift, you have 2 options to com-
pensate for the shift -
1. Drop the feature(s) - this is more of a hard-line but simple approach where
you rebuild the model without the feature. The downside to this approach
is that if this is an important feature, then removing it from the model
may reduce the accuracy.
2. Retrain the model - you retrain the model with the shifted production data
as the updated training data. If there are not many shifted data points,
then you may have to assign higher weights to them during training. Note
that the downside to this approach is that if the feature returns to the
earlier distribution, you will detect another covariate shift and need to
redo the training,
111
Detect Prior Probability Shift
Prior probability shift can be detected using a methodology called Population
Stability Index (PSI). PSI is a comparative measure of how much a variable has
changed between two samples. It divides the training data output into bins and
uses the bins to compare with production data output [5] [5] . Therefore, if the
production data output has a different distribution, then the PSI detects it.
112
after some time. During the transient phase, there may be unexpected changes
to the model performance. In such circumstances, the statistical techniques
discussed above may not be able to detect them. Instead, they are detectable
by monitoring model performance for each data instance as described below.
113
Figure 12.6: Model Performance monitoring using bounds determined at train-
ing
114
attack on the production machine, this may result in increased latency where
the ML model is not responding in time.
Summary
In this chapter we discuss why ML model monitoring is required, and the con-
cepts behind detection and correction. We also outline different reasons that a
ML model may be performing different-than-expectation in production. In the
next chapter, we look at ML model fairness.
[1] https://towardsdatascience.com/to-monitor-or-not-to-monitor-a-model-is-
there-a-question-c0e312e19d03
[2] https://en.wikipedia.org/wiki/Chi-squared_test
[3] https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
[4] https://towardsdatascience.com/mlops-model-monitoring-prior-probability-
shift-f64abfa03d9a
[5] https://towardsdatascience.com/mlops-model-monitoring-prior-probability-
shift-f64abfa03d9a
13 - Evaluating Fairness
In this thirteenth chapter, we discuss ML model fairness. This Is an important
topic for ML ethics, specifically bias. Now that you know how to build, deploy,
115
test, and monitor ML models, you may need to ensure that your ML model is
fair. Specifically, in this chapter you will be able to:
• Understand what is bias and what is fairness in ML models
• Learn how to detect bias in ML models
• Take action to mitigate bias so that your ML model is fair
• Analyze the trade-off between fairness and accuracy
In the first section, we discuss bias and fairness.
ML Model Bias
There are different sources of bias in ML models. We start with the most popular
data bias -
1. Data bias - when the data is biased towards or against a particular idea,
or group of phenomena, the trained ML model inherits that bias. For
example, assume you are building an ML model to target financial as-
set management advisors to buy your company’s mutual funds. The ML
model determines which advisors will buy the fund based on their profile,
and geographical location, among others. If most of the advisors in Cal-
ifornia tend to buy your funds relative to other states, then your data is
biased from a geographical perspective. Consequently, during inference,
116
the ML model is likely to indicate that California-based advisors are going
to buy your mutual fund. In reality, that may not happen and your ML
model may overestimate your mutual fund selling success for California
leading to high false positive (i.e. ML model estimates yes to buying funds
but the actual is a no).
2. Algorithmic bias - when you are building an ML model, you can choose
to overfit or underfit on training data. If the algorithm overfits a dataset,
then the inference for that dataset will likely have a high variance with
a lot of false positives and false negatives (i.e. ML model estimates do
not match actuals). For example, an ML model cross-selling to retail
customers may focus on a specific customer segment and overfit on that
segment. Therefore the prediction if cross-sell to a customer from the
overfit segment may have a high precision (i.e. the ML model estimate
matches the actual). But the ML model may also have a lot of false
positives and false negatives for customers from a different segment.
3. Business bias - we have seen how different business and global circum-
stances can change the interpretation of an ML model output in Chapter
8. For example, during high inflation loan applications can be subjected
to higher approval thresholds than during low inflation.
In the next section, we outline how to detect an ML model bias.
117
Table 13.1: Chronic disease prediction ML model output binned by age and
gender for bias detection
118
How does ML Model Fairness affect Accuracy?
Bias correction often mitigates (but not eliminates) ML model bias. This comes
at a cost (remember the adage: there is no free lunch). And the cost is ML
model accuracy. Think of it this way - if you had all the details for a specific
group of retail customers, then you would know them like family. Any ML model
built using that data would have a near-perfect prediction of their likes/dislikes.
But the model is very unfair since you are biased towards those customers. To
make it fair, you need to give up some information (forget something about the
customers in that group) that would reduce the ML model accuracy. That is
the trade-off.
Once you have your ML model ready, identify the area(s) where you can make
the ML model fair. Quantity fairness before you make any changes. Quantify
accuracy using your defined metric. Start making the ML model fairer in steps.
Calculate the accuracy for each step. You will notice that as you make your ML
model fairer, you are likely giving up on accuracy. In other words, fairness and
accuracy form a Pareto pair as illustrated in Figure 13.2.
Figure
13.2: Pareto curve of fairness vs accuracy
Summary
In this chapter, we understood ML model fairness and bias, the reason for ML
model bias, and how to detect and correct the bias. We also discussed the trade-
off when correcting for bias and making an ML model fairer. In the next chapter,
we look at how to make an ML model robust to failures using anti-fragility.
119
14 - Exploring Antifragility and ML Model Envi-
ronmental Impact
Congratulations on reaching this last chapter - by now you know how to build,
deploy, test, detect bias, and monitor ML models. In this fourteenth chapter,
we discuss techniques on how to make your ML model robust and to understand
the environmental impact of ML models. In this chapter you will be able to:
• Understand antifragility and how it can be used to make your ML model
robust
• Determine the environmental impact of training your ML model and how
you can help
We start with the concept of antifragility.
What is Antifragility?
Antifragility as the name suggests is the opposite of fragility. So what does that
mean? Let’s start with fragility - it means things or systems that break down
when they are subject to randomness, in the form of vibrations or something
else. So what is the opposite of this phenomenon - well, it is not that things or
systems can handle randomness. That is called robustness. The opposite is that
instead of breaking down, systems become stronger when subject to randomness.
This is called antifragility.
Machines are fragile - they do not do well when encountering randomness. Hu-
mans, on the other hand, are antifragile. When we manage unforeseen situa-
tions (i.e. randomness) we gain additional experience and become stronger. In
the next section, we discuss what is the association between antifragility and
ML models.
120
the next section, we go through the principles of chaos engineering to make your
ML model production system stronger.
121
Figure 14.1: Plot to prioritize system
failures in terms of how often they happen (Likelihood) and their influence
(Impact)
Failures or fault lines exposed by these tests once corrected are going to make
your ML model production system stronger. That is antifragility leading to
robust ML models and systems.
In the next section, we discuss the environmental impact of training and running
your ML models.
122
Summary
In this last chapter we look at a toolkit based on antifragility that helps improve
ML model system resiliency and makes them highly available. We also discussed
the environmental impact of training ML models and how to calculate them.
123