0% found this document useful (0 votes)
16 views124 pages

Production Engineering From DevOps To MLOps

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views124 pages

Production Engineering From DevOps To MLOps

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Production Engineering from DevOps to MLOps

Arnab Bose and Sebastien Donadio

Abstract
This book takes a DevOps approach to MLOps and uniquely posi-
tions how MLOps is an extension of well-established DevOps principles
using real-world use cases. It leverages multiple DevOps concepts and
methodologies such as CI/CD and software testing. It also demonstrates
the additional concepts from MLOps such as continuous training that
expands CI/CD/CT to build, operationalize and monitor ML models.

Contents
Production Engineering from DevOps to MLOps 4
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Supporting this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

About the authors 5

About this book 6


Who this book is for . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Book structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
What this book covers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1 - Getting to know the DevOps universe 7


The DevOps workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The DevOps state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . 11
Using operating systems for DevOps . . . . . . . . . . . . . . . . . . . 11
Scripting and automation . . . . . . . . . . . . . . . . . . . . . . . . . 16
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 - Understanding clouding for DevOps 18


Virtualizing hardware for DevOps . . . . . . . . . . . . . . . . . . . . 19
Cloud for DevOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 - Building software by understanding the whole toolchain 27


How does a computer work? . . . . . . . . . . . . . . . . . . . . . . . . 27
Bridging Hardware and Software . . . . . . . . . . . . . . . . . . . . . 30

1
Building software with build tools . . . . . . . . . . . . . . . . . . . . 34
Building software using continuous integration . . . . . . . . . . . . . 35
How does version control software contribute to DevOps? . . . . . . . 38
Change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Building releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 - Introducing Machine Learning Operations (MLOps) 43


Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
ML Operationalization Complexities . . . . . . . . . . . . . . . . . . . 44
ML Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Agile ML Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
What is AIOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 - Preparing the Data 51


Time Spent on Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Data Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Software Stack 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Data Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Data Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 - Using a Feature Store 61


What is Reusable in ML Model Development? . . . . . . . . . . . . . 62
What is a Feature Store? . . . . . . . . . . . . . . . . . . . . . . . . . 63
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7 - Building Machine Learning Models 68


ML Algorithm Versioning . . . . . . . . . . . . . . . . . . . . . . . . . 68
Automated Machine Learning (AutoML) . . . . . . . . . . . . . . . . . 71
Model Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Model Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

8 - Understanding Machine Learning Pipelines 77


Phases of Experimentation . . . . . . . . . . . . . . . . . . . . . . . . 77
ML Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9 - Interpreting & Explaining Machine Learning Models 84


ML Model Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . 84
ML Model Explainability . . . . . . . . . . . . . . . . . . . . . . . . . 85
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

10 - Building Containers and Managing Orchestration 89

2
Microservices and Docker containers . . . . . . . . . . . . . . . . . . . 90
Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

11 - Testing ML Models 98
Functional testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Integration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Acceptance testing and system testing . . . . . . . . . . . . . . . . . . 101
Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Load Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . 102
Canary testing and A/B testing . . . . . . . . . . . . . . . . . . . . . . 102
Multi-Armed Bandit Testing (MAB) added by Arnab . . . . . . . . . 104
Shadow Mode Testing - Champion-Challenger Paradigm added by Arnab105
User Interface testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

12 - Monitoring Machine Learning Models 106


Why is ML Model Monitoring Needed? . . . . . . . . . . . . . . . . . 106
Why Production Data may be different from Training Data . . . . . . 107
ML Model Feature Different in Production Data from Training Data -
Covariate Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
ML Model Output Different in Production Data from Training Data -
Prior Probability Shift . . . . . . . . . . . . . . . . . . . . . . . . 111
ML Model Conditional Output Different in Production Data from
Training Data - Concept Shift . . . . . . . . . . . . . . . . . . . . 112
Monitor Transient ML Model Performance Changes . . . . . . . . . . 112
System Health Operational Monitoring . . . . . . . . . . . . . . . . . . 114
Detect System Health Change . . . . . . . . . . . . . . . . . . . . . . . 115
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

13 - Evaluating Fairness 115


What are Bias and Fairness? . . . . . . . . . . . . . . . . . . . . . . . 116
ML Model Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
ML Model Bias Detection . . . . . . . . . . . . . . . . . . . . . . . . . 117
ML Model Bias Correction . . . . . . . . . . . . . . . . . . . . . . . . 118
How does ML Model Fairness affect Accuracy? . . . . . . . . . . . . . 119
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

14 - Exploring Antifragility and ML Model Environmental Impact120


What is Antifragility? . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Chaos Engineering - Antifragility with ML Models . . . . . . . . . . . 120
Experimentation with Chaos Engineering . . . . . . . . . . . . . . . . 121
Environmental Impact of ML Models . . . . . . . . . . . . . . . . . . . 122
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

3
Production Engineering from DevOps to MLOps
An open source book written by Sebastien Donadio and Arnab Bose. Published
under a Creative Commons license and free to read online here. All code licensed
under an MIT license. Contributions are most welcome.
This book is also available on Leanpub (PDF, EPUB).

[ ]

Overview
This book takes a DevOps approach to MLOps and uniquely positions how
MLOps is an extension of well-established DevOps principles using real-world
use cases. It leverages multiple DevOps concepts and methodologies such as
CI/CD and software testing. It also demonstrates the additional concepts from

4
MLOps such as continuous training that expands CI/CD/CT to build, opera-
tionalize and monitor ML models.
• Beginner-friendly yet comprehensive. From the very basics of pro-
gramming up to front-end and back-end web development, a lot of topics
are covered in a simple and approachable way. No prior knowledge needed!
• Easy to follow. Code along directly in your browser or build an efficient
JavaScript development environment on your local machine.

Supporting this work


A lot of time and energy went into this content. If you find it useful and want
to support this effort, here’s what you can do:
• Buy it under one of the available formats (see above). Any financial
contribution would be much appreciated.
• Spread the word about it.
Thanks in advance for your support!

About the authors


Arnab Bose is Chief Scientific Officer at Abzooba, a data analytics company
and Clinical Associate Professor at the University of Chicago teaching Advanced
Linear Algebra, Machine Learning, Time Series Forecasting, MLOps and Health
Analytics. He is a twenty-year industry veteran focused on machine learning and
deep learning models for unstructured and structured data. Arnab has extensive
industry experience in using data to influence behavioral outcomes in healthcare,
retail, finance, and automated vehicle control.
Arnab is an avid writer and has published numerous papers for conferences and
academic journals, as well as book chapters. He enjoys public speaking and
has given talks on data analytics at universities and industry conferences in the
US, Australia, and India. He serves on the board of the financial engineering
graduate program at University of Southern California. Arnab holds MS and
PhD degrees in electrical engineering from University of Southern California
and a BS in electrical engineering from the Indian Institute of Technology at
Kharagpur, India. Arnab enjoys music, sports and runs in half-marathons.
Sebastien Donadio has nearly two decades of experience in the fields of high-
performance computing, software design and development, and financial com-
puting. Currently an architect in Bloomberg Office of the CTO, he has a wide
variety of professional experience, including serving as CTO of a FX/Crypto
trading shop, head of software engineering at HC Technologies, quantitative
trading strategy software developer at Sun Trading, partner at high-frequency
trading hedge fund AienTech, and a technological leader in creating operating
systems for the Department of Defense. He also has research experience with

5
Bull, and as an IT Credit Risk Manager with Société Générale in France. Se-
bastien has taught various computer science and financial engineering courses
over the past fifteen years at a variety of academic institutions, including the
University of Versailles, Columbia University’s Fu Foundation School of Engi-
neering and Applied Science, University of Chicago and NYU Tandon School of
Engineering. Courses taught include: Computer Architecture, Parallel Architec-
ture, Operating Systems, Machine Learning, Advanced Programming, Real-time
Smart Systems, Computing for Finance in Python, and Advanced Computing
for Finance. Sebastien holds a Ph.D. in High Performance Computing Opti-
mization, an MBA in Finance and Management, and an MSc in Analytics from
the University of Chicago. His main passion is technology, but he is also a scuba
diving instructor and an experienced rock-climber.

About this book


Thanks for having chosen this book. I hope reading it will be both beneficial
and pleasurable to you.

Who this book is for


This book targets three different personas. First, data engineers and DevOps
engineers who manage ML data and model platforms, deploy ML model software
into production and monitor them. Second, full-stack data scientists who not
only build ML models but work on the end-to-end stack of the ML lifecycle
starting with data ingestion to production deployment and monitoring. Third,
project managers who need to understand the intricacies of the different steps
in taking an ML model to production.

Book structure
This book takes a DevOps approach to MLOps and uniquely positions how
MLOps is an extension of well-established DevOps principles using real-world
use cases. It leverages multiple DevOps concepts and methodologies such as
CI/CD and software testing. It also demonstrates the additional concepts from
MLOps such as continuous training that expands CI/CD/CT to build, opera-
tionalize and monitor ML models.
We lead the readers to build a full DevOps/ML infrastructure by using a collec-
tion of real-world case studies. It goes into the details of the principles starting
from DevOps to the domain of MLOps which focuses on operationalizing and
monitoring ML models.

What this book covers


Chapter 1: Introducing DevOps , gives an overview of what DevOps is. It will
introduce what an operating system is and how we can operate it.

6
Chapter 2: Understanding clouding for devops , introduces how we can use the
cloud in the DevOps approach.
Chapter 3: Building software by understanding the whole toolchain , covers how
to build software and how to enrich it with libraries.
Chapter 4: Introducing MLOps, gets an insight into the challenges in opera-
tionalizing Machine Learning models.
Chapter 5: Preparing the Data , gives insight into the importance of data in
Machine Learning models.
Chapter 6: Using a Feature Store , explains how a feature store promotes
reusability to develop quick robust Machine Learning models.
Chapter 7: Building ML models , describes how to manage a Machine Learning
model.
Chapter 8 Understanding ML Pipelines , depicts the importance of data feed-
back loops.
Chapter 9. Interpreting and Explaining ML models , discusses how to dissect
an ML model from the perspectives of explainability and interpretability.
Chapter 10 Building Containers and Managing Orchestration , presents the use
of containerized applications.
Chapter 11 Testing ML Models , describes how testing improves software quality;
it is a critical part of the DevOps/MLOps process.
Chapter 12 Monitoring ML Models , illustrates what are the different ways that
an ML model can underperform in production.
Chapter 13 Evaluating Fairness , explain what bias and fairness are in ML
models.
Chapter 14 Exploring Antifragility and ML Model Environmental Impact , com-
prehends antifragility and how it can be used to make your ML model robust
and the environment impact of your ML model.

1 - Getting to know the DevOps universe


This first chapter starts the first part of this book by talking about DevOps.
In this first chapter, we will talk about the definition of DevOps, we will demon-
strate that DevOps is widely used and is critical for any software development
and deployment. Then we will talk about the importance of knowing operating
systems, we will give a few examples then we will use Linux to create scripts to
automate tasks.
By the end of this chapter, you will be able to: - know the DevOps process -
select an operating system for your need - create a script to do automation

7
To start this chapter, we will define what DevOps is. ## Defining DevOps
In this section, we will describe in depth what DevOps is. We will talk about
where it started and what problems it solves. We will review how much impact
it has on the company structure and its benefits.

The DevOps word


Patrick Debois, the French software engineer, who later rose to prominence as
one of its gurus, first used the term “DevOps” in 2009. “DevOps” was created
by fusing the terms “development” and “operations”. These two words help
to figure out exactly what individuals mean when they use DevOps. Notably,
DevOps isn’t a method, a tool, or a rule. DevOps is sometimes referred to as
a “culture”. In addition, we refer to the “DevOps movement” when discussing
issues like adoption rates and potential future trends, and the “DevOps envi-
ronment” when describing an IT organization that has embraced a DevOps
culture.

The DevOps origin


System engineers or IT experts made up a large portion of the original DevOps
team. They developed best practices to manage software within a company,
such as configuration management, system monitoring, automated provisioning,
and the toolchain approach.
Another origin for DevOps was the Agile methodology. Agile software devel-
opment recommends close cooperation of customers, product management, de-
velopers, and software quality/testing to fill in the gaps. By quickly iterating
toward a better product, it helps get a product more tested with a reduced time
to market and more in line with the client’s needs. The way the app and sys-
tems interact is a fundamental part of the value proposition to the clients. This
iterative approach allows product teams to include clients’ feedback. Therefore
DevOps can be considered an extension of Agile.

The DevOps solution


Developers and system administrators agree that their business-side clients fre-
quently push them in various directions. Business customers expect change—
new functions, new offerings, and fresh sources of income as soon as possible.
They also desire a system that is reliable and unaffected by outages or disrup-
tions. Companies are therefore faced with the dilemma of either maintaining a
stable but stagnant environment or providing changes fast while managing an
unstable production environment. Neither option is deemed suitable by business
leaders. And, more importantly, neither enables a firm to give its customers the
best solutions possible.
The developers’ goal is to produce software at an accelerated rate. On the other
side, Operations is aware that making changes quickly and without enough

8
safeguards risks causing the system to become unstable, which is against their
mission statement.
The answer to this conundrum is DevOps, which unifies all parties involved
in software development and deployment—business users, developers, test engi-
neers, security engineers, and system administrators into a single, highly auto-
mated workflow to deliver high-quality software quickly while maintaining the
integrity and stability of the entire system.
The DevOps solution is to: - determine the guiding principles, expectations,
and priorities. - work together to solve problems inside and between teams. -
make time for more complex work, and automate routine and repetitive tasks.
- measure every item put into production and incorporate comments into your
work. - promote a more successful culture of working effectively together across
varied abilities and specialized knowledge, and share the facts with everyone
concerned.

The DevOps efficiency


DevOps encompasses various variants of the concept: - collaboration: developers
and IT work together - automation: large portions of the end-to-end software de-
velopment and deployment process are automated by DevOps using toolchains
- continuous integration: DevOps is Agile-based. Developers are required to fre-
quently integrate their work with other developers. Integration problems and
disagreements become apparent considerably sooner than in waterfall develop-
ment. - continuous delivery: when a feature is ready, it is delivered to the
production environment - continuous testing: software quality is intrinsically
in the DevOps design. Developers create ways to validate their coding and the
testing team will validate their features independently - continuous monitoring:
tech teams evaluate software availability and performance continues to increase
stability. Continuous monitoring makes it possible to immediately pinpoint
the source of problems to proactively avoid outages and lessen user problems.
- quick repair: bug fixes should be deployed into the production environment
very quickly

The DevOps adoption


Most companies adopted the DevOps process. It is a win-win solution for all
the participants:
• Programmers benefit greatly from automated provisioning since it allows
them to set up their development environments without the need for pa-
perwork, protracted approval processes, or lost time while waiting for IT
to supply a server. The way developers work is altered when they can
quickly provision a working environment complete with all the necessary
tools (computing power, storage, network, and apps). They possess far
greater creativity and originality. It’s much simpler to experiment with
various choices, run various scenarios, and properly test their code.

9
• Operations engage developers; which enhances system stability. More
frequent deliveries inject less unpredictability into the system, reducing
the probability of deadly failure. Even better, rather than being released
at odd hours or on the weekends, these more restricted releases may be
done throughout the day, when everyone is at work and ready to handle
issues.
• Test engineers create a test environment with automated provisioning that
is almost exactly like the production environment, which leads to more
accurate testing and an improved ability to forecast the performance of
future releases. Like other groups, test engineers benefit from teamwork
and automation to boost productivity.
• Product managers get faster feedback. They can adapt software features
faster to their clients’ needs.
• Business owners and executives see DevOps enables the company to pro-
duce high-quality goods considerably more quickly than rivals that rely on
conventional software development techniques, actions that boost revenue
and enhance the brand value. High-quality developers, system administra-
tors, and test engineers want to work in the most advanced, productive
environment available, which is another factor in the capacity to draw in
and keep top personnel. Finally, senior executives spend less time inter-
vening in inter-departmental disagreements when developers, operations,
and QA collaborate, giving them more time to create the specific business
goals that everyone is now working together to achieve.

The DevOps workflow


Figure 1.1 represents the DevOps workflow that we will follow in this book. We
can start the initial state with the code itself. Any features or bug fixes will be
made by a code change. This code is versioned by a Version Control Software
such as git, svn, or mercurial. This version of the code will be compiled and
unit tested to be sure that the most recent changes did not impact the other
part of the software. The artifact control is the following step, this part is used
by models. The goal of this part is to manage data and metadata. The Test
box is the following step. We will have the software tested for performance,
integration, and regression. The goal is to ensure that the software we release
is better in terms of the number of features and the number of bugs. Once all
the tests have been passed, we can now release the software in production. The
software will be monitored, bugs will be reported and included in a backlog with
the rest of the features to be implemented.

10
Figure 1.1: The DevOps workflow

The DevOps state-of-the-art


We could write much longer to demonstrate that DevOps has been a key part
of the process development of successful firms by increasing software quality,
reducing the time to market, and improving system stability and security but a
lot of books have been written on this purpose. All the major book publishers
have at least one or many books on this topic. They all explain the advantages
and the implementation of a DevOps system.
The evidence is unambiguous: DevOps is here to stay. It has been successful in
uniting business users, developers, test engineers, security engineers, and system
administrators into a unified workflow aimed at satisfying client needs.
The goal of this book is not to talk about DevOps but not to give you more in-
depth knowledge on how to use DevOps. In the following section, we will start
by describing how an operating system works and why it is very important to
know the basics of the operating system for DevOps.

Using operating systems for DevOps


In the prior section, we talked about the DevOps principles. We know that
DevOps is an efficient culture for software development. The first question is
now where do we start? Because the software runs on operating systems, it is
important to know what an opening system is and why it is so important in the
DevOps process.

How operating system works?


As we know the DevOps process’s goal is to improve software quality and time
to market. When we build a machine learning model software or any software,
it will run on an operating system.

11
Figure 1.2: Software and operating system
Figure 1.2 represents the link between software, operating system, and hardware.
An operating system performs three primary tasks: managing the computer’s
resources, such as processor, memory, and storage; creating a user interface; and
running and supporting application software.
Operating System (OS) has several main functionalities: - interface for sharing
hardware resources
OS manages hardware resources by providing the software layer to control them.
Displaying a character on the monitor after the user uses the keyboard, control-
ling the movement of a mouse, and storing a program in memory are examples
of resource handling that an OS can perform. - process scheduling
When using a computer, we need to have many software running in parallel.
Indeed, when we use our favorite browser, we may need to see the time or play
music in the background. To do so, the OS will task these different processes
to run on the underlying hardware. - memory management
Hardware has a different level of memory hierarchy. Processors perform calcula-
tions by using operands stored in registers. Registers are very limited and data
will be stored in memory. Memory is divided into different levels. The level
1 (named L1) is the one that is the closest to the processor. Therefore the L1
latency is much lower than all the other levels. - data access/storage
When we create Machine Learning models, we need to have a large amount of
data. Data are usually stored on a hard disk or disk. The OS is in charge of
organizing data storage by using files, and directories and setting a set of rules
on how to organize the storage unit. - communication
Lastly, the last function is to communicate with the outside world. If a computer
wants to communicate with another one, the OS will organize the input/output

12
of a given architecture.
The OS is critical when running any software. Figure 1.3 shows all the functions
that software will be able to use when running on an OS. If we want to run
software running a machine learning model. The data will be located on a hard
disk (File Management and Device Management). Data will be loaded into
memory (Memory Management) and will be processed (Processor Management).
If the data comes from any data streaming in real-time such as financial data,
the data comes through the network (Network, Communication Management).

Figure 1.3: function of the operating system


It is expected that an operating system will run every process fairly. It means
that the operating system will give enough processing time to all of them. To do
so, it is important to understand how a computer is designed. We recommend
reading the book Computer Architecture written by John Hennessy and David
Patterson. Figure 1.4 shows a computer architecture.

Figure 1.4: Computer architecture

13
We can see a Central Process Unit (aka processor or CPU) divided into cores.
These cores have registers and different memory hierarchies. They share the
same L3 cache which allows these cores to share data. Every core can access
memory and handle input/output with external devices such as network cards
or storage units.
It is critical to understand how a system works since when we design software
efficiency, reliability, and scalability will be a part of the requirement of our
clients. When we know how an OS works, what is the next step? Choosing an
operating system.
The next section will talk about the different types of operating systems available
on the market.

What OS should we choose?


Choosing an OS is an important part of the process when we build the technology
part of a company. Most of the time, when joining a firm, this choice may not
be available but as we will talk about in Chapter 2, we will see that with the
cloud, that be can a question we should answer even if the company has already
chosen an OS for its employees.
The goal of this section is not to get into a debate on which OS is better. It
will not be possible to answer. What is important to answer is for which needs,
we will choose a system or another one.
We will first describe what we should look for when comparing OS: - Community
and historical footprint
Knowing when a system has been created will help to know its reliability of a
system. If the system is well used and still maintained, the maturity should be
high enough to trust the system. - File structure and memory management
It is important to know how the data are stored and how scalable a system will
be when handling a large quantity of data. - Configuration (registry, databases,
files)
When choosing an operating system, because the needs can be different, it is
critical to know if a system is configurable. For instance, a system requiring an
ultra-low latency in terms of networking will have a different configuration than
a system requiring a high-scale web server. - Interfaces and command line
As we know an operating system is made for users. The way that we will interact
with the system to automate tasks, to create software is also a critical part of
the choice of an OS
We will compare the following opening systems: - Microsoft Windows
It was first created as a graphical interface for the Microsoft OS DOS in 1985.
In 1995, Windows replaced MS-DOS and became an OS. Windows was the most
used OS in the world for many years. - macOS and iOS

14
They are the main competitors of Windows on personal devices. macOS was
released in 2001 and iOS got released 6 years later. The main difference between
the two is one is built for computers and the other for mobiles. - Linux (Unix-
based system)
A family of open-source Unix-like operating systems known as Linux is based on
the Linux kernel, which Linus Torvalds initially made available in 1991. There
are many different Linux distributions (such as Red Hat, Ubuntu, Debian, Fe-
dora,…). Linux is far from being as used as the two previous OSs. Linux is
open source and easily configurable. It is the most used OS in the production
environments. - Android
Android was created in 2003 and bought by Google in 2005. Google decided to
make this OS open-source. Android is mainly used on mobile devices which are
not Apple devices.
Figure 1.5: Market share of OS for production environment

Figure 1.5: Market share of OS for production environment


Figure 1.5 shows the market share of Linux as the most used OS for the pro-
duction environment. The benefits of using Linux for servers are the following:
- free and open-source: we can see and modify the code easily. It is easy to
configure to specific business needs. - reliability: the uptime for a server is un-
beatable. Linux does not crash and does not to be restarted for a configuration
change. - security: Linux is the most secure system in the industry implement-
ing security mechanism for data and processes - hardware support: there are

15
drivers for many devices and pieces of hardware - low maintenance cost: unlike
other OS, the maintenance cost is much lower
The Linux kernel is the most stable, and secure. Any software going to pro-
duction will have to run on this OS. It is important to learn the basics of how
to interact with the operating system. That’s why in this book, we chose to
introduce you to this system.
In the next section, we will review the basic commands of a Linux operating
system and we will talk learn how to automate tasks.

Scripting and automation


Without a doubt, since human intelligence is necessary at every stage of the
system development life cycle, human interaction cannot be eliminated from
computer-based systems. To minimize issues in the final result, automation is
key in the DevOps process. Automation was not created by computer scientists
and has been around in the industry for centuries. The goal of DevOps is to
reduce the time to market, improve the quality, save time, increase consistency,
reduce labor, and lower cost. By explaining the benefits of automation, we
can already observe that they are similar to the benefits of DevOps. Without
automation, DevOps will not be as efficient as it is. We can automate all the
different phases of DevOps: software building, software configuration, software
deployment, software testing, software monitoring, and alerting.
Why are we talking about scripting and automation in this same chapter? By
using scripting code, we can systematically sequence several tasks. Scripting is
the same as programming. What scripting language can we use? - Python is a
general-purpose language and it is open-source. It is the most used language in
the world today. It is also platform agnostic. - Shell Scripting or Powershell is
one of the most popular and well-supported Unix shell scripts. Linux systems are
supported by the Bash and Shell scripting languages. Additionally, it supports
both Mac and Windows operating systems. - Ruby, Pearl, and JavaScript can
be also considered scripting language solutions.
For this section, we choose to work with Shell scripting. There are not any
particular shell scripting languages to start. Shell scripting is done using sh,
bash, csh, and tcsh. The most used shell scripting is bash shell scripting. For
Unix-based computers, Bash is an interpreted scripting language.

Starting with the basic Linux command lines


The goal of this section is to present the most used Linux command lines. A
Linux command is an application or utility that runs from the command line. An
interface that receives lines of text and converts them into instructions for your
computer. A graphical user interface (GUI) is just a command-line application
abstraction. For instance, a command is carried out every time you click the
“X” to close a window. We provide the command you execute options by using

16
a flag. With the -h switch, we access the help pages for the majority of Linux
commands. Flags are frequently optional. The input we provide to a command
to enable appropriate operation is known as an argument or parameter. The
argument can be anything you write in the terminal, although it is typically a
file path. Flags can be called using hyphens (-) and double hyphens (–), and
the function will execute arguments in the order you pass them to it.
Let’s start with the following commands: - ls is the command to list the content
of a directory. - pwd returns the current directory path - cd changes the current
directory to another one. Many options will make you navigate through the
Linux file systems - rm deletes files and mv moves files - man displays the help
for a command line - touch modifies the access and modification time of a file
- chmod changes the access rights for files - sudo allows running a command as
admin (or root) - top or htop will list processes running on a machine - echo
prints a string on the screen (it is used to debug or monitor the steps of a
script) - cat reads the content of a file - ps return the list of all the processes
run by a shell - kill sends a message to a process. It is usually used to kill a
process - ping is a network command to test network connectivity with another
networking interface - vi or vim is a text editor - history list all the previous
command lines - which returns the full path of a software - less helps you to
inspect a file backward and forward - head or tail displays the first/last lines
of a file - grep matches lines from a regular expression (or a string) - whoami
displays the current user - wc displays the word counts of a file - find searchs
files in directories - wget downloads a file from a URL
The list above is not exhaustive but it will help you to bootstrap your work for
scripting and automating some tasks.

Writing your first script to automate a task


Let’s assume you want to start scripting a command line. We are going to use
the simplest command line: echo.
echo "This book is useful"
If you write this command in a command line, you will display the string in an
argument.
This book is useful .
If we want to have this command line in a script, we will need to create a file
and add this command line to this file.
vim scrip1.sh to create and edit a file
#!/bin/bash

echo "This book is useful"


We can run this command in two ways by either running:

17
bash script1.sh
or by changing the access right to the file:
chmod a+x script1.sh

./script1.sh
It will display the same string: bash "This book is useful"
If you would like to add more commands to this file, it is possible. We just need
to edit this file again and add more instructions to this file.
#!/bin/bash

echo -n "This book is useful to know how to count to 5"


for (( iterator = 0 ; counter < 5 ; counter++ ))
do
echo -n "$iterator "
done
If you run this script, you will have the following output:
This book is useful to know how to count to 5
0 1 2 3 4 5
By combining the command of the first part of this section, you will have so
many ways of scripting any functionalities you think can be necessary for your
DevOps operation.

Summary
In this chapter, we learned what DevOps is and we saw how operating system
works and how they can help in the DevOps process. We also learned how to
create scripts to automate tasks. In the next chapter, we will talk about the
cloud and we will learn how to use the cloud in the context of DevOps.

2 - Understanding clouding for DevOps


In this second chapter, we cover virtualization and the cloud. By the end of this
chapter, you will be able to: - create a virtualized machine - create a computing
instance - create a storage bucket - understand the cloud regions - configure a
virtual cloud
In the first section, we will study virtualization and we will explain why DevOps
can benefit from it.

18
Virtualizing hardware for DevOps
Virtualization allows using a full machine capacity by splitting these resources
among many users and environments. Let’s consider two physical servers with
distinct services: a web server and a mail server. Only a small portion of each
server’s operational capacity is being utilized. Since these services have been
tested and run for many years, we do not want to interrupt them or attempt
an upgrade of this machine; which could potentially prevent these services from
running. Using virtualization will allow the gathering of the two services into
one physical server. They will run on two different virtualized environments
and leave the second server for other services.
The virtual environments—the things that use those resources—and the actual
resources are divided by software known as hypervisors. The majority of busi-
nesses virtualize using hypervisors, which may either be loaded directly into
hardware (like a server) or on top of an operating system, such as a laptop.
Your physical resources are divided up by hypervisors so that virtual environ-
ments can utilize them.
Resources are divided between the various virtual worlds and the actual environ-
ment. Users engage with the virtual world and do calculations there (typically
called a guest machine or virtual machine). A single data file serves as the vir-
tual machine. And like any digital file, it may be transferred between computers,
opened on either one, and expected to function as intended.
The hypervisor transfers the request to the actual system and caches the modifi-
cations when a user or program sends an instruction in the virtual environment
that requires more resources from the physical environment; everything happens
at practically native speed (particularly if the request is sent through an open
source hypervisor based on KVM, the Kernel-based Virtual Machine).
There are different types of hypervisors as shown in Figure 2.1. The selection
of the type will depend on the business needs.

Type Schema

Data

19
Type Schema

Desktop

Server

OS

Network

Figure 2.1: Hypervisor types


• Data: Different sources of data can be reduced into a single source of data
• Desktop: Desktop on many different machines
• Server: A server can handle multiple services
• Operating system: A machine can handle multiple instances of operating
systems
• Network: Create virtual networks within a physical network
The use of virtualization is essential in DevOps. DevOps automates the de-

20
livery and testing phases of the software development process. The DevOps
teams may test and develop using systems and devices that are comparable to
those used by end users thanks to virtualization. In this manner, testing and
development are expedited and take less time. The program may also be tested
in virtual live situations before deployment. Real-time testing is made easier
by the team’s ability to monitor the results of any modification made to the
program. The quantity of computer resources is decreased by doing these op-
erations in virtualized settings. The quality of the product is raised thanks to
this real-time testing. The time needed to retest and rebuild the program for
production is less when working in a virtual environment. As a result, virtu-
alization eliminates the DevOps team’s additional work while assuring quicker
and more dependable delivery.
Virtualization in DevOps has various benefits, some of which include: - The
amount of effort is less
There is no need to locally upgrade the virtualization-related hardware and
software because these changes are regularly made by the virtualization service
providers. A company’s IT team may concentrate on other crucial tasks, saving
the business money and time.
• Testing environment
We can create a local testing environment with virtualization. It is possible to
test software in this environment in several different ways. No data will be lost
even if a server fails. As a result, dependability is improved, and software may
be evaluated in this virtual environment before being deployed in real time.
• Energy-saving
Because virtual machines are used throughout the virtualization process rather
than local servers or software, it reduces power or energy consumption. This
energy is conserved, lowering the cost, and the money saved may be used for
other beneficial activities.
• Increasing hardware efficiency
The demand for physical systems is reduced by virtualization. As a result, power
consumption and maintenance expenses are decreased. Memory and CPU use
is better utilized.
Virtualization implementation difficulties
Virtualization in DevOps has many benefits, but it also has certain drawbacks
or restrictions.
• time commitment
Even if less time is spent on development and testing, it still takes a lot of time
to set up and use because of this.
• security hazard

21
The virtualization procedure increases the risk of a data breach since remote
access and virtualizing desktops or programs are not particularly secure options.
• understanding of infrastructure
The IT team has to have experience with virtualization to deal with it. There-
fore, if a business wants to start using virtualization and DevOps, either the
current staff may be trained or new employees are needed. It takes a lot of time
and is expensive.
In this section, we saw the advantages of virtualization for DevOps. We are now
going to talk about the cloud for DevOps.

Creating a virtualized machine on your computer


In this part, we will use an example of a type-2 hypervisor: VMware
Workstation Player. You can get some information on this website:
https://www.vmware.com/products/workstation-player.html
Once you download the software, you will install it on your local machine.
We are using this hypervisor because you can find a free version to evaluate it:
- Launch the installer when VMware Workstation Player has been downloaded,
then continue the installation procedure. You will have the opportunity to
install an enhanced keyboard driver, even though you won’t initially require it.
- Most likely, you already know which Linux OS you wish to try. Some Linux
distributions, but not all, are well suited to running on a virtual machine. In
a virtual computer, every 32-bit and 64-bit distribution functions. If you do
not know, you can go on the Ubuntu website to download an iso file containing
the Ubuntu distribution - Once you have downloaded the iso file you can now
create a new virtual machine - Select the default option, Installer disc image
then you will browse to the iso file - You will have to select a few options, then
your virtual machine will be ready to run
Once all these steps are done, you have a Linux machine running on your macOS
or windows environment. You will be able to do a lot of testing without having
any fears to impact your machine.
We learned how to create a virtualized machine on your computer, we will now
learn how to create a virtualized machine in the cloud. In the next section, we
will talk about the cloud for DevOps.

Cloud for DevOps


Getting hardware became very challenging. The supply chain crisis made the
ships’ delivery complex. However, the demand for storage and computation
power kept raising and will keep on going. Companies have a large amount of
data to store and process. Because processing data and transactions are critical
for them, companies will need to find a third-party solution such as cloud com-
puting/storage. Clouding is a cheaper solution than getting in-house hardware

22
resources. Not only it is difficult to get hardware, but it is also complicated to
find technologists to manage hardware and operating systems.
There are different ways of using a cloud: - private cloud: - community cloud: -
public cloud - hybrid cloud
These different types of cloud go from the in-house hardware which could be con-
sidered like a private network to the public cloud where the computer resource
is owned by corporations or institutions.

Benefits of Cloud Computing


With cloud computing, you may use services as you need them and only pay for
what you need. Without a lot of internal resources, we can manage IT operations
as an outsourced unit. Additionally, hiring engineers is rather expensive, and
putting together a technical team can take a lot of time.
The key benefits of cloud computing are: - Users spend less on computing and
IT infrastructure, which improves performance. - There are fewer complications
with uptime. - The program is instantly accessible for updates. - Operating
system compatibility has been enhanced. - Backup and recovery - Scalability
and effectiveness - improved data security as a result of more storage capacity
The three major Cloud Computing Offerings are
• Software as a Service (SaaS)
A software distribution strategy known as SaaS involves vendors or service
providers hosting software and making it accessible to customers online (inter-
net). SaaS is becoming a more popular delivery paradigm as the core technology
for Web Services or Service Oriented Architecture (SOA). The internet makes
this service accessible to users all around the world. Traditionally, you had to
pay for the program upfront and install it on your computer. On the other side,
SaaS users pay a monthly subscription fee online rather than purchasing the
product. Anyone who needs access to a particular piece of software, whether it
be one or two people or tens of thousands of workers in an organization, can
sign up to become a user. SaaS is compatible with any device that has access
to the internet. Just a few of the crucial tasks that SaaS can assist you with
include accounting, sales, invoicing, and budgeting.
• Platform as a Service (PaaS)
Developers may construct apps and services on a platform and in an environment
made available by PaaS. This service is accessible online since it is hosted in the
cloud. PaaS services receive constant updates and feature additions. Businesses,
web developers, and software developers may all benefit from PaaS. It functions
as a platform for the creation of applications. It involves application deployment,
testing, collaboration, hosting, and maintenance as well as software support and
management.

23
• Infrastructure as a Service (IaaS)
IaaS is a model for cloud computing services. It provides users with internet
access to computer resources in the “cloud,” a virtualized environment. It pro-
vides bandwidth, load balancers, IP addresses, network connections, virtual
server space, and other computer infrastructure. The assortment of comput-
ers and networks that make up the pool of hardware resources are frequently
scattered throughout several data centers. This increases the redundancy and
dependability of IaaS. The computing solution known as IaaS (Infrastructure
as a Service) is complete. One alternative for small businesses looking to cut
expenditures on their IT infrastructure is IaaS. Every year, expensive expenses
are incurred for upkeep and the acquisition of new parts such as hard drives,
network connections, and external storage devices.

Hypervisor
A piece of hardware, firmware, or software known as a “hypervisor” enables
the setup and use of virtual machines on computers (VM). Each virtual system
is known as a guest machine, and a host machine is a computer on which a
hypervisor runs one or more virtual machines. The hypervisor treats resources
like CPU, memory, and storage as a pool that may be easily shared between
existing guests or new virtual machines. The independence of the guest VMs
from the host hardware allows hypervisors to maximize the utilization of a
system’s resources and increase IT mobility. Since it makes it possible for them
to be quickly moved between many computers, the hypervisor is sometimes
referred to as a virtualization layer. Multiple virtual machines can run on a
single physical server.
Figure 2.2 represents two types of hypervisors: - Native or bare-metal type-1
hypervisors
To manage guest operating systems and control hardware, these hypervisors
work directly on the host’s hardware. They are thus occasionally referred to as
“bare-metal hypervisors.” The most common server-based setups for this type
of hypervisor are corporate data centers.
• Hypervisors of type 2 or hosted
These hypervisors utilize a common operating system, much like other computer
programs (OS). A guest operating system runs as a process on the host, while
type-2 hypervisors insulate guest operating systems from the host operating
system. A type 2 hypervisor should be used by individual users who want to
run several operating systems on a personal computer.

24
Figure 2.2: Type of hypervisors
Different hypervisors will be offered in various geographies and availability zones
by cloud service providers. Figure 2.3 shows how public clouds organized their
regions.

Figure 2.3: Availability zone and regions


There are three major Cloud Service Providers: Amazon Web Services (AWS),
Google Cloud Perform (GCP), and Microsoft Azure.
Depending on the price and the managed services you are familiar with, you
may choose between different providers.
The emphasis in this part will be on AWS.

25
Creating a clouded solution
The region in which the program will operate must first be selected. If your
instance doesn’t need to be close to another, you can choose any region you
prefer. We would recommend that if you are using your instance in the US to
build an instance in your US region. We will build the following components of
the system: - An Amazon EC2 instance is a virtual server on Amazon’s Elastic
Compute Cloud (EC2) that runs applications on the architecture of Amazon
Web Services (AWS). While EC2 is a service that enables corporate subscribers
to run application programs in a computing environment, AWS is a full and
dynamic cloud computing platform. It may be used to build virtually endless
amounts of virtual machines (VMs). A variety of instances with different CPU,
memory, storage, and networking resource options will need to be selected. -
Scalability, data accessibility, security, and speed are all features of Amazon
S3. For a variety of use scenarios, including data lakes, websites, mobile apps,
backup and restore, archives, business applications, IoT devices, and big data
analytics, Amazon S3 enables the storage and protection of any quantity of data.
To suit your unique commercial, organizational, and compliance demands, you
may optimize, manage, and configure data access using Amazon S3’s adminis-
trative tools.
Concretely the steps to create an instance will depend on when you will read
this book. We are going to give you the high-level steps to create your clouded
solution:
1. you need to create an AWS account. If you just want to try how AWS
works, we recommend getting a trial version. Once your account is created,
you need to log on to this account to start the next step.
2. you select your region and then the type of AWS component you want to
build. In this example, you need to select EC2.
3. When you start setting up your new EC2 instance, you are required to
choose the Amazon Machine Image (AMI). AWS provides many operating
systems. We would recommend using the same operating system you use
in the section Virtualizing hardware for DevOps.
4. you need to select the type of instance. This one will reserve hardware for
the instance you are creating. There are some tiers that are smaller and
could be cheap to use
5. you must choose which Virtual Private Cloud (VPC) and which subnets
inside your VPC you wish to create your instance. Prior to starting the
instance, it is preferable to decide and arrange this. A VPC is a sub-
cloud within your cloud tenancy. For instance, if you need to create many
independent applications, you can choose to split your tenancy into several
VPCs (aka sub-clouds)
6. you have to choose a couple of other options which are less important.
Then you will be able to create an S3 bucket to have enough storage for
your application. The S3 bucket is not required if you don’t need more
space.

26
7. Once the EC2 instance is created, you will be guided to create a key to
use with an ssh terminal. We would recommend creating a new one and
using this key to connect to this instance
8. As the last step of the creation of these instances, it is advisable to set
the inbound communication. For instance, if you want to create a python
notebook running on this EC2 instance, you will need to open port 8888
to be reachable from outside.
In this section, we learned how to create a computing instance in the cloud.
Because more and more DevOps/MLOps tools are clouded, getting familiar
with the cloud functions is critical. We would recommend reading some online
tutorials on cloud providers, they all have an exhaustive list of tutorials that
can help you to create examples to learn how to use the cloud. We will now
summarize what we learned in this chapter.

Summary
In this chapter, we learned how to create a machine in the cloud, we also learned
how the cloud companies organize their regions and what a private cloud was.
The next chapter will focus on how to build software and how to use libraries.

3 - Building software by understanding the whole


toolchain
In this chapter, we will talk about all the steps to building software. By the end
of this chapter, you will be capable of: - using a compiler to create an executable
- making the difference between dynamic and static libraries - how to automate
software building using continuous integration software - understanding file ver-
sion control
In the first section, we will review how a computer works. Then we will explain
how to talk to a computer by using a compiler. We will talk about libraries that
help programmers to build software then we will talk about how to automate
building to finish this part, and we will learn about managing code sources.
We will start by studying how a computer works.

How does a computer work?


It is best to start at the beginning if you want to grasp how a computer functions.
Alan Turing, who created a device to decode the Enigma code, was represented
by Benedict Cumberbatch in Morten Tyldum’s 2014 film The Imitation Game.
The Enigma machine could render all communications between the Axis nations
incomprehensible to the Allies. In large part because of the machine created
by Alan Turing, the Allies were able to interpret these signals and win the war.
Figure 3.1 shows the visual of this device in the film. We can observe disks on

27
this machine that represent the state of this machine (which is the decoding
algorithm). This disk rotates to decode the Enigma messages.

Figure 3.1: Representation of the state machine of Alan Turing from the movie
The Imitation Game

Turing Machine
English mathematician Alan Turing developed an abstract computing system
that could compute real numbers with an unlimited amount of memory and a
limited number of configurations or states in a 1937 publication. This computer
system, which came to be known as the Turing machine, is regarded as one of
the primary concepts in theoretical computer science. In turn, Alan Turning is
frequently referred to as the inventor of the modern computer and the father of
computer science.

State Machine
There were about 100 revolving drums in this electro-mechanical device. When
a character from an encoded message was fed into the device, it set off a series
of events in which the cylinders rotated and changed states, with the state of
the next cylinder being dependent on the state of the preceding.
The processing units of all computer devices are powered by current solid-state
systems, which are comparable to this process. A state machine essentially
comprises states and transitions. Consider a straightforward binary system,
where there are only two possible states: on and off. A state is essentially any
situation of a system that depends on earlier inputs and responds to later inputs.
The machine can compute or execute an algorithm thanks to these sequences,
which are conditioned by a finite set of states; these sequences represent the
machine’s program.

From the Von Neumann Architecture to the Harvard Architecture


John von Neumann was a Hungarian-American mathematician who contributed
to the creation of several significant ideas in foundational and applied mathe-
matics. His 1945 paper served as the foundation for the architecture of the

28
computers bearing his name. Von Neumann laid the groundwork for contem-
porary electronic stored-program computers by fusing a state machine with a
memory unit.

Figure 3.2: Von-


Neumann architecture
As shown in Figure 3.2, von Neumann’s architecture consists of a stored-program
computer, where instruction data and program data are stored in the same
memory. The central processing unit (CPU) is made up of the control unit and
the arithmetic/logic unit.
By supplying the timing and control signals that the other computer components
need and instructing them on how to respond to the program instructions, the
control unit enables the transition between states. The relevant data is processed
by the arithmetic/logic unit (ALU), which enables the execution of arithmetic
(addition, subtraction, multiplication, and division) and logic (and, or, not)
operations.
The computer may then store both the data to be processed and the program
instructions in the memory unit. A crucial component of Von Neumann’s archi-
tecture is the memory unit, which allows us to transmit a variety of program
instructions directly to a computer. Earlier computer architecture models had
set programs with a specific function that were in striking contrast to this.

29
Harvard Architecture followed Von Neumann Architecture. It was founded on
the idea that there should be distinct buses for data and instruction traffic. It
was primarily created to get around Von Neumann Architecture’s bottleneck.
Unlike the Von Neuman Architecture, there is a distinct physical memory ad-
dress for instructions and data and instructions can be executed in 1 cycle.
Today’s computing devices such as desktop computers, laptops, and smart-
phones are all modeled on the von Harvard architecture. We could continue
the history of computers to reach the current architecture we are currently us-
ing but what we will summarize is that the architecture followed the same design
but got more complex by adding cores, and levels of memory cache. The goal
was to increase the parallelism to get a more performant machine. The point
is that it became way too complicated to talk to the machine directly. The
need for a translator between humans and machines was higher than ever: the
compiler was born and that’s what we are going to introduce in the next section.

Bridging Hardware and Software


As we previously saw, the architecture of computers became more and more
complex to speed up applications. We started getting architecture with many
cores (many CPU units) and many memory locations. The non-unform mem-
ory access (NUMA) architecture became mainstream on all the computers and
servers we are using today.

Figure 3.3: NUMA architecture


Figure 3.3 represents the high-level concept of the NUMA architecture, we can
see many cores (equivalent to the ALU unit of the CPU) and the interconnection
between the memory cache of all these cores and the way they access the main
memory. Of course, we could eventually use assembly code to talk to this
machine like we did for many years but the use of compilers is much easier. We
will now learn how to talk to an architect by using compilers.

30
Generating an executable with compilers
Software systems are collections of programs in different languages. A layer-
ing of languages, starting with assembly and increasingly moving away from
the native language of computers to high-level languages that are more easily
understood by humans, are spoken by computers in addition to their native
instruction sets. Humans may structure complicated systems using languages
better suited for such tasks, such as C, C++, Java, and Python. All abstrac-
tions require more labor and logic to function, and with this added logic, there
is frequently a trade-off between performance, complexity, and implementation
utilizing various software engineering methodologies.
What connects the hardware and software? All data within a CPU is expressed
in binary, which is the only language a CPU can understand. In essence, it is the
machine as seen by the programmer. The compiler enables us to overcome an
impedance mismatch between how people think about creating software systems
and how computers compute and store data. They are crucial for accelerating
program runtime, as well. Computer code created in one programming language
(the source language) is translated into another language by compilers (the
target language). They convert computer code between higher-level and lower-
level languages. They may produce machine code, assembly languages, object
code, and intermediate code. John Mauchly developed Short Code in 1949.
(aka. Brief Code). The ability to translate algebraic equations into boolean
expressions was a feature unique to this compiler. Then came COBOL, Lisp,
Fortran, and C. More compilers mean more languages. Compilers continued to
advance in intelligence by becoming better at enhancing the abstraction of what
developers wished to convey with effective hardware execution. To enhance
software engineering, new programming paradigms were introduced. Python
and Java opened up object-oriented programming to all users in the 1990s.
A compiler can be divided into three phases as represented in Figure 3.4:
• the front-end phase deals with the pre-processing of the code source. It
will validate syntax and grammar with the code source. This phase will
create an abstract syntax tree.
• the middle-end phase is in charge of receiving the abstract syntax tree
resulting from the parsing of the previous phase. Many passes will per-
form code optimizations. The code optimizations can optimize different
objective functions. The most important one is to optimize the runtime
code. Another example of objective functions could be optimizing energy
consumption.
• the back-end phase deals with the selection of the target, it can be a
target machine, a virtual machine, or just another code. This phase can
do register allocations and the instruction scheduling.

31
Figure 3.4: Compiler phases
Dennis M. Ritchie, an American computer scientist, created the C programming
language at Bell Laboratories in the early 1970s. Since its inception, hundreds
of C compilers have been developed for various operating systems and types of
architecture. Thirteen years later, the GNU Compiler debuted with the express
purpose of being an open cooperation. The structure below demonstrates how
this C compiler creates code that can be executed on an x86 Linux kernel.

Figure 3.5: GNU C compiler script


Figure 3.5 shows the GNU C compiler is a script that makes several tool calls.
The first one, the c-preprocessor (also known as cpp), uses the source files and
headers to perform simple changes such as macro expansion. The assembly
code generation for the x86 architecture will thereafter be handled by the C
Compiler (cc), as we previously described. The assembler (as), the following
utility, creates a relocatable object file using assembly code.
To create a single executable file or library, the linker is responsible for assem-
bling libraries and relocatable object files and assigning addresses.
The executable file format will be parsed by the operating system. This is
an ELF (Executable and Linkable Format) file on Linux, as opposed to a PE
(Portable Executable) file, on Windows. There is a loader for every operating
system. Which software components from the disk go where in memory is
decided by the loader. The executable’s virtual address space is allotted at this
point. then the execution will begin at the entrance point (in the example of

32
the C language, the function main).
The linker part is critical in software engineering because that’s the part in
charge of getting external code (libraries). Any language can import libraries
from other developers for helping them code faster and have code that is more
reliable by using code that was tested before. C and C++ will be able to use
static libraries while Python will use dynamic libraries. We will now see the
difference between these two types of libraries.

Static vs. Dynamic Linking


In software engineering, many executable applications rely on external code
libraries, which are frequently given by outside sources like an operating system
vendors. There are two ways that a program can integrate this code to deal with
these external dependencies: by statically linking in all the code and creating a
standalone binary, or by dynamically linking in the external code and requiring
the operating system to inspect the executable file to determine which libraries
are required to run the program and load them separately.
Static linking will be used by the linker to combine the application code and
dependencies into a single binary object. There is no way for a program to
benefit from the same library being used by numerous applications because this
binary object has all the dependencies included, necessitating the loading of
each piece of code separately at runtime. For instance, the Glibc library is used
by several programs on Linux. These programs would waste a lot of memory
storing the same dependent library again if they were statically linked. Static
linking offers a key benefit: even when using objects retrieved from an external
library, the compiler and linker may collaborate to optimize all function calls.
Instead of using a stub to replace the locations of the required libraries, dynamic
linking enables the linker to produce a smaller binary file. When an application
starts, the dynamic linker will load that library by loading the relevant shared
object from the disk. The dependence won’t be loaded into memory until it’s
required. When many active processes make use of the same library, memory
allocated to the library’s code may be distributed among them. But this ef-
fectiveness comes at a price: when used through the Procedure Linkage Table,
library functions are indirectly accessed (PLT). This indirection may result in
extra cost, particularly if a library’s brief function is regularly utilized. To
reduce this expense, typical HFT systems will employ static linking wherever
possible.
Building software is a sequence of a few steps; which could be highly complex.
It is important to know how to automate this sequence of building software.
The build tools will help to perform this process. We will now talk about how
to build software with build tools.

33
Building software with build tools
A build system is a group of software tools that are used to streamline the build
process, which is roughly defined as the process of “translating” source code
files into executable binary code files. Even though numerous build systems
have been “developed” and put to use for more than three decades, most of
them still use the same fundamental methods that Make first used to introduce
the directed acyclic graph (DAG). The traditional GNU Make, CMake, QMake,
Ninja, Waf, and many others are still in use today. We will show you how to
create C projects using some of these well-liked build systems in this section.
Let’s review two build systems: - GNU Make: Using the instructions in the
makefile file, GNU Make creates projects. A makefile must be created to instruct
GNU Make on how to build a project.
This example creates an executable toto.exe by compiling the source toto.c.
# Makefile
CC = gcc
exe: toto.c
$(CC) toto.c -o toto.exe
• CMake: The adaptable, open-source CMake framework is used to handle
the build process, which is independent of the operating system and com-
piler. Unlike many cross-platform solutions, CMake is designed to be used
in conjunction with the native build environment. Simple configuration
files called CMakeLists.txt files are used to produce common build files
(such as Makefiles on Unix and projects/workspaces on Windows MSVC).
CMake can produce a native build environment that can assemble exe-
cutable binaries, compile source code, build libraries, produce wrappers,
and more. Multiple builds from the same source tree are consequently sup-
ported by CMake because it allows both in-place and out-of-place builds.
Both static and dynamic library builds are supported by CMake. CMake
also creates a cache file that is intended for use with a graphical editor,
which is a useful feature. For instance, CMake locates include files, li-
braries, and executables as it runs and may come across optional build
directives. This data is captured and stored in the cache, which the user
is free to edit before the native build files are generated. Source man-
agement is also made simpler by CMake scripts since they consolidate
the build script into a single file with a better-organized, understandable
structure.
This example creates an executable toto.exe by compiling the source toto.c.
cmake_minimum_required(VERSION 3.9.1)
project(CMakeToto)
add_executable(toto.exe toto.c)
We could give you more examples of different build tools but they will all have

34
the same principle. Now that we know how to build software, we would like to
know how to automate the build.

Building software using continuous integration


According to the development approach known as “continuous integration” (CI)
every time a piece of code is changed in a software program, a build should be
created, followed by testing. The purpose of this idea is to solve the issue of
discovering bugs that crop up later in the build lifecycle. Continuous integra-
tion was implemented to make sure that code updates and builds were never
carried out in isolation by the developers as opposed to the developers working
in isolation and not integrating sufficiently. Any software development process
includes continuous integration as a crucial step. The following questions are
assisted by the continuous integration method for the software development
team.
Systems can occasionally become so complicated that each component has many
interfaces. Always make sure that all the software components function har-
moniously with one another in such situations. The code may be simply too
sophisticated if the continuous integration process continues to fail.
The majority of test cases will almost always verify that the code complies with
acceptable coding standards. This is an excellent time to determine whether the
code complies with all the necessary coding standards by doing an automated
test following the automated build.
If the test cases don’t cover the necessary functionality of the code, testing the
code is useless. So it’s always a good idea to make sure that the test cases that
are prepared cover all of the application’s important scenarios.
Figure 3.6 illustrates the steps of the continuous integration process. The code
source is initially in a version control system (section Version control system of
this chapter). Once a developer modifies the code in it, the continuous integra-
tion process triggers a build. If the build succeeds the software testing phase
starts. Any errors in these phases will be reported to developers. If there are
no errors, the software will be deployed to the right environment.

Figure 3.6: Continuous integration process


We may pick from a variety of continuous integration technologies. There are
so many that it is difficult to choose the finest one. What you should look for
is the following criteria: - strong ecosystem . A CI tool tries to expedite project
release and eliminate additional development labor. - compatibility with clouds
. A good CI tool should make it simple to transport data to and from the cloud.

35
- deploying choices . A CI tool should make deployment straightforward. - alter-
natives for integration . CI tool can connect to other project-related software
and services. - security and safety . Whether it is open-source or commercial,
a good CI tool should not increase the risk of getting data compromised.
A CI tool must meet the goals of the project and company while also being
technically competent.
We can give a list of a few software but this list will be far from being exhaus-
tive: - Jenkins is one of the most well-liked free open-source CI programs that
is frequently used in software engineering. It is a Java-based server-based CI
program that needs a web server to run, it makes automated builds and testing
simple. - Atlassian’s Bamboo is a server-based CI and deployment platform
with an easy-to-use drag-and-drop user interface. Developers who already uti-
lize other Atlassian services (such as Jira) frequently use this tool. Bamboo
enables the creation of new branches automatically and their merging following
testing. Continuous deployment and delivery are simple to do using this tech-
nology. - GitLab CI is a free continuous integration tool with open-source code.
For projects hosted on GitLab, this highly scalable solution is simple to install
and configure due to GitLab API. GitLab CI is capable of deploying builds
in addition to testing and developing projects. This tool highlights the areas
where the development process needs to be improved. - CicleCI is a platform
for continuous integration and delivery. It may be used in the cloud or locally
and supports a variety of coding languages. Automated testing, building, and
deployment are simple with this program. Numerous customization tools are
included in its simple user interface. With CircleCI, developers can swiftly de-
crease the number of problems and raise the caliber of their apps. - TravisCI
has no requirement for a server because the service is hosted in the cloud. Addi-
tionally, TravisCI has an enterprise-focused on-premises version. The fact that
this utility automatically backs up the most recent build each time you run a
new one is one of its nicest features.

36
Figure 3.7: CI tools
Figure 3.7 shows the market share between the different CI solutions.
In the rest of this chapter, we are going to choose Jenkins to build our example.

Using Jenkins for continuous integration


Jenkins may be used as a server on many different operating systems, most no-
tably Linux but also Windows, macOS, and several Unix variations. It may be
executed on Oracle JRE or OpenJDK and requires a Java 8 VM or above. Jenk-
ins typically operates as a Java servlet inside of a Jetty application server. Other
Java application servers, including Apache Tomcat, can execute it. Jenkins has
recently been modified to function within a Docker container. The Docker Hub
online repository contains read-only Jenkins images3.
Jenkins can use a large variety of plugins. Plugins simplify the integration of
other development tools into the Jenkins environment, enhance build and source
code management, and broaden the capabilities of the Jenkins Web UI. One of
the most common uses of plugins is the provision of integration points for CI/CD
sources and destinations.
Let’s learn how to link our build software, CMake with Jenkins: - need to ensure
that CMake will build the software by running CMake. For that, you will go
to the directory where you have the source and the CMakeLists.txt (file with
the directives describing the project’s source files and targets). Then you will
type cmake . If the software builds, you can proceed to the next step. - login
to Jenkins. If you install Jenkins on your computer, you will be able to use a
web browser to access the default port 8080 http://localhost:8080 - create a new
item . You can choose a freestyle project . - need to link to the version control

37
system of your choice (for us, it will be the GitHub website) - add build step
and specify where the CMake file is
Once all the steps are done, you can save the project and test the project to see
if Jenkins starts the build by getting the file from the GitHub repo and building
the software using CMake.
We will now talk about the missing part of this continuous integration process,
the source version control system.

How does version control software contribute to DevOps?


The process of monitoring and controlling changes to software code is known as
version control, commonly referred to as source control. Software technologies
called version control systems assist software development teams in tracking
changes to source code over time. They are especially helpful for DevOps teams
since they enable them to speed up deployments and cut down on development
time.
Every change to the code is recorded by version control software in a particular
form of a database. If a mistake is made, programmers may go back in time
and review prior iterations of the code to help repair it while causing the least
amount of interruption to the entire team.
Software engineers constantly write new source code and modify current source
code while working in teams. The code for a project, application, or piece of
software is often arranged in a “file tree” or folder structure. Each developer may
make modifications in various locations within the file tree, thus one developer
may be working on a new feature while another edits the code to address an
unrelated problem.
Version control helps teams deal with problems of this kind by tracking the
particular modifications made by each contributor and avoiding conflicts across
ongoing activities. A concurrent developer may make changes to the software
that is incompatible with those made in another area of the program. This issue
has to be identified and fixed systematically without preventing other engineers
from doing their work.
Without dictating a certain method to work, good version control software sup-
ports a developer’s workflow. Instead of imposing restrictions on what operating
system or toolchain developers must use, it should also operate on any platform.
Instead of the annoying and cumbersome process of file locking, which grants
the go-ahead to one developer at the price of obstructing the development of
others, excellent version control systems permit a seamless and continuous flow
of changes to the code.
Without using any kind of version control, software development teams fre-
quently encounter issues like not knowing which changes have been made and
are accessible to users or the generation of incompatible modifications between

38
two unconnected pieces of work that must then be carefully untangled and re-
vised. Developers that have never used version control may have added versions
to their files, sometimes with suffixes like “final” or “latest,” and then dealt with
a new final version afterward. You may have code blocks that are commented
out because you wish to remove certain functionality but save the code in case
you need it in the future. Version control offers a solution to these issues.
Version control software is a crucial component of the day-to-day professional
activities of the contemporary software team. The enormous benefit version
control provides them even on tiny solo projects is often recognized by individual
software engineers who are used to working with a good version control system
in their teams. Many developers wouldn’t even think about working without
version control systems for non-software tasks once they become acclimated to
their potent advantages.
Version control systems have seen significant advancements, some of which are
superior to others. SCM (Source Code Management) tools or RCS are other
names for VCS (Version Control System). Git is one of the most widely used
VCS tools available right now. Git belongs to the DVCS category of Distributed
VCSs; more on that later. Git is a free and open-source VCS system, like many
of the most well-known ones on the market right now. The following are the
main advantages of version control, regardless of what they are labeled or the
technology employed. 1. The whole long-term modification history for each
file. This refers to each change made throughout time by various individuals.
Changes include things like adding and removing files as well as altering con-
tent. The handling of file renaming and transfer differs amongst different VCS
applications. This history should also contain the modification’s author, the
date, and any documented justifications. Going back to older revisions makes
it possible to analyze defects’ fundamental causes, which is crucial for resolving
problems with software that is more than a few years old. If the application is
still being developed, almost anything may be considered an “earlier version.”
2. merging and branching that team members should work simultaneously, but
even those working alone might gain from being able to focus on several streams
of change. Developers may keep many streams of work distinct from one another
while yet having the opportunity to merge them back together to guarantee that
their changes do not conflict by defining a “branch” in VCS systems. For every
feature, every release, or both, the branching technique is used by many soft-
ware development teams. Teams may choose from a range of workflow choices
when determining how to utilize a VCS’s branching and merging capabilities. 3.
Traceability. Root cause analysis and other forensics can benefit from the ability
to track every change made to the program, link it to project management and
bug-tracking tools like Jira, and annotate each change with a note outlining its
goal and aim. When reading the code and attempting to understand what it
is doing and why it is created in a certain way, having the annotated history
of the code at your fingertips may help developers make changes that are cor-
rect, aesthetically pleasing, and consistent with the system’s planned long-term
design. This is critical for developers to be able to anticipate future work with

39
ancient code, and it can be especially significant for doing so.
The list of VCS is also pretty long from CVS to SVN to Mercurial and Git, we
could write a lot of comparisons. We just would like to show what the landscape
is nowadays.

Figure 3.8: The most used VCS


Figure 3.8 shows the dominance of Git in the market. Git has been created
by the Linux creator, Linus Torvalds in 2005 and since then, it never ceased
getting into companies and organizations. GitHub and Bitbucket are cloud-
based hosting services that let you manage Git repositories.
When using git, we are not tied to using one given template. However, the most
used workflow is called the git workflow.

40
igure 3.9: Git workflow
Figure 3.9 shows how to use git in companies that develop software for their
business. We have a main branch named Master which could be the default
branch where the latest code can be found. The other branch will help develop
new releases by having a branch Develop where their development of the new
features is made. Once this development branch is stable and we are ready to
create a release, we will merge the development branch with the Release branch.
When a code is in production and we have a critical problem to solve, we cannot
wait for the next release. We will use the Hotfix branch to change the code as
soon as possible.
To ease your work in creating the whole DevOps toolchain, we are recommending
using GitHub. This website will walk you through how to create an initial git
repo and then how to make some changes to the code.

Change management
Change management is a method for transforming an organization’s objectives,
procedures, or technology. Change management implements techniques for im-
plementing, controlling, and adapting to change. To make it an effective process,
change management must evaluate how adjustments or replacements will affect
processes, systems, and workers. Plan, test, communicate schedule, implement,
document, and evaluate change. Change management requires documentation
to establish an audit trail and assure compliance with internal and external
controls, including regulations.
Each change request must be reviewed for its impact on the project. Change
management is crucial in this process. Senior executives in charge of change
control must assess how a change in one part of the project might influence
other areas and the overall project.
It is important to put in place a set of metrics helping to monitor the changes:

41
- Scope. Change requests must be examined for scope impact. - Schedule.
Change requests must be evaluated for impact on the project timeline. - Costs.
Change proposals must be examined for financial impacts. Overages on project
activities may rapidly modify project expenses since labor is the biggest expen-
diture. - Quality. Change requests must be considered for impact on project
quality. Rushing may cause faults, therefore accelerating the project timetable
can impair quality. - HR. Change requests must be reviewed for extra or special-
ized effort. The project manager may lose crucial resources when the timetable
changes. - Communications. Appropriate stakeholders must be notified of ap-
proved modification requests. - Risk. Change requests must be risk-assessed.
Even slight adjustments might cause logistical, budgetary, or security issues.
In the change management culture, there are three important steps:
1- Unfreeze: where all the actors (tech, business, or other stakeholders) decide
why and how they need to change the current state
2- Change the system: the people in charge of making changes operate
3- Freeze: once the changes have been done, we evaluate if the state is better or
if it needs to be rolled back.
After this section, we know how to make changes and we know the process to
make changes. We will not apply it in the context of software releases.

Building releases
After a set of changes, once a scope of features has been implemented, we may
decide to release our changes to production. We first need to ensure that the
change we will deliver will not damage the system (and of course, the stakehold-
ers of the system). For that, we will test if the system still performs correctly
after the set of changes we want to build.
As we saw in chapter 3, the first step of building a release is to stage the changes
going to production. For that, we can use a specific branch coming from the
revision control software. We will then use a building system compiling and
linking to obtain an executable.
Variety of Tests
Planning and output analysis is necessary for testing. To verify various activities
and requirements, DevOps experts may conduct various tests. Tests are carried
out to check the software’s integration, performance, accessibility, and usefulness
rather than only to identify flaws in the source code and gauge its accuracy. As
a result, testing proceeds in the order shown in the figure below, with early unit
testing of the application used to verify the smallest testable components of the
program.
Remember that the execution, scope, length, and data of functional tests might
change. DevOps workers need to have a strong knowledge of their testing

42
methodology before diving into any of these categories. They must realize that
high-level DevOps testing is done without consideration for code structure anal-
ysis, which is often done by code developers. By using tests that don’t need
knowledge of the program’s secret core architecture, the DevOps approach to
black-box testing avoids these restrictions.

Summary
We will now conclude this chapter by summarizing what we discovered. We first
learned how to talk to a machine by using a compiler. We learned how to build
software by using a build system. Then we saw how to use CI software such as
Jenkins to automate the process of the build by using a version control system.
This chapter closed the first part of this book about DevOps, we will now start
a new adventure by learning what MLOps is. The next chapter will finally
introduce this topic.

4 - Introducing Machine Learning Operations


(MLOps)
In this fourth chapter, we introduce the motivation and concepts of MLOps. By
the end of this chapter, you will be able to: - Appreciate the need for MLOps -
Get an insight into the challenges in operationalizing ML models - Understand
the concepts and principles of MLOps - Manage ML workflow using agile project
management
In the first section, we introduce the motivation of MLOps and its similarity to
DevOps.

Motivation
The jury is out on how many data science projects do not make it to produc-
tion, with numbers hovering in the high 80s [1] . In whatever way you look at it,
this implies a low percentage of production operationalization of artificial intel-
ligence (AI)/machine learning (ML) models. That is in contrast to predictions
and societal hope that AI will improve our lives. So why this low percentage?
Different reasons contribute to this such as inadequate data management, siloed
enterprise organizations, not understanding ML technical debt, among others.
Machine learning operations (MLOps) principles help manage these hurdles for
ML model operationalization in production.
AI/ML model operationalization is similar but not the same as software deploy-
ment. Similar because you are writing and managing model code. This is where
DevOps principles that have proven to manage hurdles in software production
deployment are useful. Concepts such as scripting, task automation and CI/CD
outlined in the previous chapters are cornerstones in MLOps.

43
Different because ML model is not just code but has an important data dimen-
sion that determines the parameter values. This added complexity has implica-
tions in terms of data tracking and how for the same (Python or R or C++)
algorithm code you can have different model parameters. MLOps manages this
data complexity and we will go into more details on this in the next chapter.

ML Operationalization Complexities
There are multiple complexities around ML model operationalization that need
to be addressed by MLOps (Figure 4.1). ML models deal with a lot of data.
They need to have multiple training runs and experiments with possibly different
models. Once a model is decided upon it needs to be deployed into production
with governance, security and compliance in place. Data assumptions during
training may not hold in production so deployed models need to be continuously
monitored for drift. Models that have “decayed” need to be retrained and
redeployed in a systematic manner.

Figure 4.1: Complexities in machine learning (ML) model operationalization

ML Lifecycle
Using the above complexities as guides, the desiderata of a MLOps platform
are:
1. Integration with data infrastructure - enable easy data tracking and man-
agement via
2. Versioning - enforce that any changes to the data (for example, outlier

44
removal, imputation) should be a version change of the data.
3. Location independence (cloud, on-premise, hybrid) - allow seamless access
to data irrespective of whether it is on the cloud (for example, Amazon
S3, Azure blob), on-premise storage or a hybrid combination of both.
4. Governance - govern all data validation, transformations and preparation
such that sensitive fields (for example, gender, race, ethnicity) are not
used for predictive modeling, and track all ML model releases for audit
and reproducibility.
5. Collaboration - enable team management via
6. Team development - ML model building in enterprises is a team sport so
support multi-user development via DevOps concepts such as CI/CD.
7. Experimentation in a centralized place - track all ML experimentation
so that model building is consistent with the data version, the algorithm
version and the third-party library version that is enforced when the team
is using the same ML pipeline.
8. Reuse of pipelines - since ML model building is an iterative process, make
it simple to change parameters and reuse ML pipelines for multiple exper-
iments.
9. Ease of adoption - enable established best practices via
10. Popular ML libraries / packages - support popular ML libraries (for ex-
ample scikit-learn, Tensorflow, Pytorch).
11. Not-steep learning curve - leverage established ML model building princi-
ples such that there is no steep learning curve.
12. Managed deployment - enable easy model deployment via
13. Containerization - make it simple to build containers with a few clicks to
deploy ML models.
14. Model monitoring - support model monitoring with notifications such as
model drift and data drift.
The above wish list is encapsulated in the ML lifecycle workflow in Figure 4.2. It
has 3 main pillars that we call phases - data management, model management
and deployment management. Below we go through each component of the
workflow. Later chapters go into the details for each component.

45
Figure 4.2: ML model building lifecycle

Data Management/DataOps
This phase deals with data complexities such as versioning, consequence of feed-
back loop such as training data selection bias (Figure 4.3) where the model
performs well in one segment of the data and not others.

Figure 4.3: Data complexity in ML model operation


The components that comprise this phase are:
1. Connect to Data Version - data plays an important role in ML model
building, therefore using the right version of the data is important.

46
2. EDA - perform exploratory data analysis to understand the data version.
3. Verify and Validate Data - critical to verify that the data is as expected
and then validate the data.
4. Feature Engineering - once the data is verified and validated build features
specific to the business problem.
5. Prepare Data - the data is prepared in the correct format for model build-
ing consumption.

Model Management/ModelOps
This phase addresses experimentations with ML model pipelines to determine
the correct model to deploy and complexities such as indirect system influence
on models. For example, assume two models that work together with one giv-
ing user options and the second one showing information for a selected option.
Then the behavior and updates to the first model influences the outcomes and
selections from the second model, though both models are not related directly
(Figure 4.4). There are also other direct system influence complexities through
signal mixing such as ensembles, correction cascade where a model is dependent
on another model/pipeline.

Figure 4.4: ML model indirect dependency via external user interaction


The components that comprise this phase are: 1. Determine Algorithm - de-
pending on the business problem and data characteristics, you need to determine
the algorithm to use.
1. Design Pipeline - once an algorithm is determined, put together the
pipeline that will use the data and the algorithm.
2. ML Experiments - post design of the pipeline, run the pipeline for different
experiments with different data versions and algorithms.

47
3. Hyperparameter Tuning - part of the ML experiments include trying dif-
ferent hyperparameters.
4. Track Experiments, XAI and Testing - track the outcome of each experi-
ment, compare them side-by-side, run explainability and test the models.
5. Model Card - once the models to be deployed to production are ready,
document them using model card.

Deployment Management
This phase includes deployment of ML models, production testing, monitoring,
and configuration challenges such as modifications that can be tracked and
reproducible (Figure 4.5).

Figure 4.5: ML model production configuration challenges


The components that comprise this phase are - 1. Define and Configure Inference
Service - configure where the model is going to run in production as inference
service.
1. Deploy and Run Inference Service - once the production environment is

48
ready, deploy and run the model.
2. Production Testing - run any tests that are necessary for A/B testing or
Multi-Armed Bandit Testing in production.
3. Performance Visualization and Model Monitoring - insight into the perfor-
mance of the deployed ML model with dashboard visualization and setup
ML model monitoring to detect performance deterioration.

Agile ML Lifecycle
Agile project management is a good fit for the ML lifecycle due to its iterative
nature. Given the multiple experiments with different data versions, data and
feature engineering code, algorithm code, iterative processes work well. For
example, in the agile ML lifecycle it is common practice to finance technical debt
during experimentation with prototypes. Once a model (and its parameters) is
declared ready-for-production, then the technical debt is cleared and code is
eventually good quality.
Let’s go into details of agile ML for data management (Figure 4.6). Once
you verify a specific data version, you do EDA on the data in terms of data
distribution and other statistical characteristics. During EDA you can connect
to different versions of the data per your statistical requirements. Thereafter
you validate the data in terms of filling gaps, scanning for outliers, among others.
Again you can go back to different versions of the data or do more EDA to better
understand the data. Next you develop business and domain specific features
in feature engineering. Based on the EDA outcome, you can explore multiple
features and their characteristics with different data versions. So each step of the
data management phase is executed in an agile manner with multiple iterations.
One reason is that data exploration and feature building is very exploratory and
innovative, so each iteration reveals new information to be used for the pipeline.

49
Figure 4.6: Agile ML Data Management
Likewise, each step in the model management phase is also iterative as you
explore and discover different combinations of algorithms, hyperparameters and
data features during model building. In the last step, the winning model is
documented using a model card. We go into these details in later chapters.
In the deployment management phase, first you design how you plan to run
your inference service and then deploy. After post deployment feedback (such
as production machine workload), you may go back and redesign the inference
service. Then depending on the outcome of production deployment tests, you
may again redesign (and maybe redeploy) the inference service. And based on
the performance and model monitoring, there may be redesign and/or produc-
tion redeployment.

What is AIOps
Artificial Intelligence for Operations (AIOps) is the use of machine learning to
automate and enhance IT operations. This enables IT Ops to declutter the
noise from the signal given that there are a lot of sources of information such
as machine and application log files.
AIOps generates both descriptive pattern discovery (what happened when the
failure occurred) and predictive inference insights (if the last failure pattern re-
peats, then what are the chances of another failure). Example of the former is
using correlation analysis to discover which patterns tend to happen together.
Example of the latter is root-cause-analysis to identify which patterns are re-
sponsible for recurring issues.

50
AIOps use-case examples range from predictive maintenance of IoT devices and
heavy machinery such as windmills and jet engines to monitoring computer
networks for Denial-of-Service attacks and other security issues.

Summary
In this chapter, we understood the need for MLOps and the multiple challenges
in operationalizing ML models. We covered the concepts and principles of
MLOps and how to manage a ML workflow using agile project management.
And the difference between MLOps and AIOps. In the next chapter, we look
into the intricacies of data preparation for ML model development.

[1] https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-
it-into-production/

5 - Preparing the Data


In this fifth chapter, we understand the importance of data in ML model devel-
opment and ways to protect data privacy. By the end of this chapter, you will
be able to:
• Appreciate the importance of data in ML models
• See the need for data versioning
• Understand how to ensure data privacy
We start this chapter with the importance of data and versioning it.

Time Spent on Data


In academia, a lot of emphases are placed on the mechanics of algorithms and
the steps toward model building, for good reasons. Students need to understand
the fundamentals and the math behind the algorithms so that they know when
to use what strategy. In contrast, the industry places a lot more emphasis on
data and very little on algorithms, again for good reasons (Figure 5.1). The
industry needs to extract information and insights from data that benefit the
business. The actionable outcome is what drives business more than which
algorithms were used.

51
Figure 5.1: Time on data from academia to industry

Exploratory Data Analysis (EDA)


As explained in the last chapter, a good start with data analysis is to check
for statistical metrics such as mean, median, standard deviation, missing data,
and correlation. If the data has multiple categories, check for imbalance where
categories have unequal proportions of data points. This data exploration step
is EDA. Note that EDA is a statistical approach that is valid across all types of
data and independent of the business use case. Unlike feature engineering, we
discuss next.

Feature Engineering
In contrast to EDA, feature engineering is very much use-case-dependent. This
is where you build custom features that are relevant to the ML model, which in
turn is dependent on the use case. For example, if you are building an ML model
to predict the winner of a tennis match, then features such as double-fault, and
aces served, are relevant. But they are not if you are building an ML model for
credit card fraud detection. Instead, fraud-relevant features such as the number
of transactions in the last three days, and the location of the transactions, are
important. However, for either use case, EDA is valid.
In conclusion, understanding the data details is very important to extract the
most from the data. You should expect to spend the majority of an ML project
time exploring, analyzing, and transforming the data. Given the time spent on
the data, there is a requirement (just like code) to be able to try out different

52
data variations and roll back to an earlier version, capabilities that are provided
by a versioning system. This underscores the need for data versioning.

Data Versioning
As mentioned above, the time spent on analyzing and trying out different things
with the data needs to be systematic with a lineage. That is provided by a data
versioning system. Typically a data versioning system provides capabilities such
as
1. Origin determination so that you can always go back to the raw data as
it was collected before any custom modifications.
2. Version tracking so that you know which changes/transformations are in
the data that you are using.
3. Rollback so that you have access to previous versions that you can use
when iterating on model building.

Figure 5.2: Data versioning example


The above concepts are illustrated in Figure 5.2. You start with the original data
as it was acquired/collected. Post verification of the data, there are different
options to validate the data such as removal of outliers, imputation to fill miss-
ing data, mean centering, and winsorizing, among others. There are different
strategies for data imputation starting from mean imputation to least squares
to nearest neighbors. Mean centering means that you subtract the mean value
of the data from each data point. This moves the data to be centered around
the mean while leaving the distribution unchanged. On the other hand, win-
sorizing means you remove the extreme values of the data to limit the effect
of possible outliers. While some operations are commutative (remove outliers,
mean centering), others are dependent on the sequence. Most importantly, not
all operations may be needed.
At the outset, it is not clear what will work best from the perspective of model
training and model metrics. So you may have to try different options with
different sequencing. Data versioning keeps this exercise manageable with data
lineage that allows you to switch between different versions. Each version of

53
the data is expected to deliver different models with different parameter values.
So the data is the primary driver toward parameters, as we see in detail in the
next section.

Image data versioning


Unlike structured data, image data changes between versions do not include
changes to the underlying image ( Figure 5.3a) . Instead, it incorporates changes
in the bounding boxes or image segmentation as demonstrated in Figure 5.4b.
One approach often utilized is to store such meta-information separate from
the underlying image and version of that meta-data. For example, store the
bounding boxes’ information as meta-data separate from the underlying image
as shown in Figure 5.4c. And when the bounding boxes are changed or labeled,
update the meta-data version.

(a)

54
(b)
file_name": "000000332351.png", "image_id": 332351}, {"segments_info":

[{"id": 8732838, "category_id": 70, "iscrowd": 0,


"bbox": [45, 0, 379, 628], "area": 170095},
{"id": 3882584, "category_id": 176, "iscrowd": 0,
"bbox": [0, 0, 425, 323], "area": 29935},

{"id": 2306905, "category_id": 190, "iscrowd": 0,


"bbox": [0, 302, 425, 338], "area": 46272},
{"id": 4539746, "category_id": 199, "iscrowd": 0,
"bbox": [362, 254, 63, 369], "area": 11737}],
(c)
Figure 5.3: Image versioning (a) original image (b) image segmentation (c)
segmentation meta-data.
Source: https://cocodataset.org

Software Stack 2.0


As Andrej Karpathy put it “Gradient descent can write better code than you.
I’m sorry” . Quite a straightforward way to say that the data is responsible
for the parameter values. We identify the algorithm to use and determine its
specification (i.e. number of parameters to estimate). That generates a shell
with initial parameter values. Data then populate those parameter values using

55
optimization techniques such as gradient descent. This is referred to as software
stack 2.0.
In contrast, software stack 1.0 is deterministic coding where you have func-
tions with parameter values written in the code and the function pre- and
post-conditions are clear. In contrast, software 2.0 parameter values are de-
pendent on optimization techniques driven by the data, so that code behavior
is probabilistic as demonstrated in Figure 5.4.

Figure 5.4: Software stack 2.0


where the software parameters are determined by data

Data Governance
Given the importance of data in software 2.0, data governance is a critical
component in any enterprise’s data management strategy. Data governance
covers the availability, validation, usability, privacy, and security of the data.
Security and privacy of the data are covered in the next sections of this chapter.
The first part which includes data sourcing, validation, usability, and transfor-
mations is documented using Datasheet for datasets [4] [4] (it includes some
security and privacy as well). The datasheet approach addresses the needs of
both the data creators and the data consumers, enables communication between
them, and promotes transparency, accountability, and reuse of data.
The datasheet covers the following set of questions (summary) with information
on the dataset -
1. Motivation – why was this dataset created and by whom? Who funded
the project?
2. Composition – what is in the dataset in terms of features? Are there
any recommended train/test splits? Is the data dependent on any other
(external) data?
3. Collection Process – how was the dataset collected and over what time-
frame? If individuals are involved, were they notified about the data

56
collection, and did they consent to the use of their data?
4. Preprocessing/Cleaning/Labeling – what transformations have been done
to the dataset (related to data versioning discussed previously)? Is the
ETL tool/software used for the transformation available?
5. Uses – what are the intended and not-intended use cases for the dataset?
Has the data been used already for any project?
6. Distribution – how to distribute and use the dataset from IP/legalities
perspective? When will the data be distributed?
7. M aintenance – who supports the dataset and how is it maintained? When
will the data be updated?
As you can see, data availability, validation, and usability are covered in the
datasheet. For details on any of the questions above or for example use cases
of the datasheet, please refer to the paper. Now that the data is ready to be
used, you need to verify that there are no privacy violations and that it is stored
securely. In the next section, we discuss data security.

Data Security
Data security is the prevention of data breaches, unauthorized access to data,
and information leaks. At the enterprise level, regulations such as the EU GDPR
[5] (Global Data Protection Regulation) encourage enterprises to implement
system-level security protocols such as encryption, incident management, and
network integrity to protect data. Common techniques include access control
using security keys, access monitoring and logging, and data encryption during
transit.
At the individual level, in addition to enterprise security protocols, there are
regulations such as the California Consumer Privacy Act [6] that protect per-
sonal data privacy during the usage, retention, and sharing of data. Outside of
the regulatory framework, there are ways to enforce data privacy as we see in
the next section.

Data Privacy
Data privacy pertains to data that contains an individual’s personally identifi-
able information (PII) and general sensitive data such as healthcare and financial
data. It is also applicable to quasi-identifying data that can uniquely identify an
individual when merging data from different sources such as credit card transac-
tions and cell phone locations. In this section, we focus on techniques to impose
data privacy.

57
Federated Learning
A model training technique for preserving data privacy is federated learning that
does not need any data sharing during model training. The raw data (that may
contain sensitive information) is kept in the original location. An algorithm that
needs to use different data from different locations is sent to each location (local
server) for local model building. The model is trained locally and the model
parameters are sent to a centralized server to integrate into the global model.
A popular technique for parameter value integration is averaging the different
local values. As illustrated in Figure 5.5, the global model is in a centralized
global server and is sent to the local servers post-parameter update.
In this methodology instead of sending all the data from different locations to
the model, you send the model to different locations where the data resides.

Figure 5.5: Federated learning


The general characteristics of federated learning are -
1. Training data is not iid (independent and identically distributed) – given
that the training data is local to multiple clients.
2. Training data is unbalanced – non-uniform use of training data by different
clients.
3. Training is massively distributed - the number of local model training is
generally more than the average number of data points per client.
4. Training has limited communication – offline or on slow or inexpensive
media to send the parameter values to a centralized server.
The other methodology for data privacy that is quite popular is different privacy.

58
Differential Privacy (DP)
In the previous section we see how privacy is preserved by not sharing the
data but keeping it local and sharing only model parameter value. However,
this is not possible for many ML projects where data is sourced from different
locations and data sharing is imperative. Differential privacy (DP) is a math-
ematical definition of privacy that has shown the most promise in research to
preserve privacy during data sharing. It balances learning nothing about an
individual (from the data) and learning useful information about a population.
It is achieved by adding (some) noise or randomness to data transformations
such that conclusions made from the data are independent of any specific data
point.

The mathematical definition of DP is as follows - assume 2 datasets and


that differ by a single data point. A DP is provided when the maximum
difference in the outcome of the same transformations (linear or nonlinear) on

and are bounded (less than or equal to) . is the strength of

privacy guarantee with a lower value indicating higher privacy guarantee. is


the probability that 𝜖 does not hold and is generally set to be inverse of dataset
size, rounded to the nearest order of magnitude.
There is a question of where to add the noise that divides DP into 2 strategies -
1. Global DP - randomness added to the data transformation for the entire
dataset. This adds less noise to the dataset but requires a trusted data
curator to handle the original dataset.
2. Local DP - randomness added to each data point. This requires no data
curator but does add more noise to the dataset.
An example of Global DP is noise added to the output of a database query. The
DB administrator is the data curator who queries the database, handles the
original data, and then adds noise to the dataset. Another example of Global
DP is the US 2020 census data [7] [7] .
An example of local DP is a toy example of a survey where a student is asked if
she/he is attending the University of Chicago. In response, each student follows
this logic (that adds noise to each data value): 1. Flips a coin. 2. If tails then
respond truthfully. 3. If heads, then flips again and responds “Yes” if heads
and “No” if tails.
So if we know how many students said “Yes” to attending the University of
Chicago, then we can know the true percentage p by

59
Pytorch implements DP in a python package called opacus [8] [8] . Tensorflow
DP uses DP-Stochastic Gradient Descent. Both clip gradients and then add
random noise to them. Note that there is a privacy vs accuracy trade-off. As
shown in Figure 5.6 with an increase in privacy (go down the y-axis to lower
vulnerability), accuracy goes down. As with most trade-offs, there is a sweet
spot after which the drop in accuracy does not justify the increase in privacy
guarantees.

Figure 5.6: DP vs accuracy during model training


Source: https://blog.tensorflow.org/2020/06/introducing-new-privacy-testing-
library.html

Synthetic Data
Another way to address data privacy is to use synthetic data for model build-
ing. In this case, the privacy concerns are based on how the synthetic data is
generated -

60
1. Using real data - data privacy for the real data is guaranteed through
techniques such as differential privacy and then that data is used to gen-
erate additional synthetic data using ML models. For example, generate
synthetic credit card transaction data using a Generative Adversarial Net-
work (GAN) and a small real dataset.
2. Without any real data - there are no data privacy issues as the entire
dataset is generated with synthetic data developed with simulated models
and using knowledge from subject matter experts. For example, generate
healthcare claims data where the features or the data fields are standard
and their types and values are created using Python libraries such as Faker
[9] [9] as determined by subject experts.

Summary
In this chapter, we understand the importance of data not just from data gover-
nance perspective, but also from model building perspective in software 2.0. In
addition to data versioning, we also discussed the need to preserve privacy and
security. Now that the data is ready for use in ML model building, we look at
feature development and feature store in the next chapter.

[1] https://pypi.org/project/autoimpute/
[2] Andrej Karpathy, Software 2.0, https://karpathy.medium.com/software-2-0-
a64152b37c35
[3] https://www.kdnuggets.com/2020/12/mlops-why-required-what-is.html
[4] Timnit Gebru et al, Datasheets for Datasets, arXiv:1803.09010 , [1803.09010]
Datasheets for Datasets (arxiv.org) 1803.09010 , Dec 2021.
[5] General Data Protection Regulation (GDPR) – Official Legal Text (gdpr-
info.eu) (GDPR)(gdpr-info.eu)
[6] Home - California Consumer Privacy Act
[7] https://dataskeptic.com/blog/episodes/2020/differential-privacy-at-the-us-
census
[8] [8] https://github.com/pytorch/opacus
[9] [9] https://faker.readthedocs.io/en/master/

6 - Using a Feature Store


In this sixth chapter, we continue from the last chapter to look at ways data
is used in ML model building and cover what a feature store is and the role it
plays in ML model development. By the end of this chapter, you will be able
to:

61
• Understand the motivation of a feature store
• Appreciate how a feature store promotes reusability to develop quick ro-
bust ML models
• View important components of a feature store including automated feature
engineering
• Discover the benefits and challenges of a feature store
We start this chapter with understanding reusability in the model development
part of the ML lifecycle.

What is Reusable in ML Model Development?


A good practice in software engineering is to write reusable code and leverage
existing modules/functions/components. Extending that into ML model devel-
opment, what part of the ML model development (sans the performance metrics
and model delivery) is reusable? Looking at Figure 6.1, let’s go through each
module to find out -
1. Connect to a Data Version - data connectors are standard components
that are reusable from an engineering perspective, but this is not specific
to ML model development.
2. Data Verification - very specific to the data schema and often not reusable.
3. Data Validation - very specific to the data schema and often not reusable.
4. EDA - very specific to the data characteristics, but the statistical concepts
used for EDA (such as mean/variance) can be reused - this is related to
data discovery that is part of but not specific to ML model development.
For example, data discovery is needed also for data visualization.
5. Feature Engineering - this is about developing features that are character-
istics of the data (Chapter 5). However, these features may be useful in
other business problems (and ML models) that require the same data. For
example, a feature that aggregates hourly data to daily data can be used
in multiple different types of forecasting models. Therefore, the potential
for reuse.
6. Algorithmic Fine-tuning - this is specific to the algorithm and the data
characteristics, hence not reusable.

62
Figure 6.1: ML Pipeline Components from Data ingestion to Model Training
Therefore feature engineering holds the promise of reusability among the differ-
ent components. Given that is the case, one way to promote this concept is to
construct and store the features one time, and reuse those features many times.
That is what a feature store does.

What is a Feature Store?


A feature store is a repository for features that includes feature development and
feature cataloging. This promotes feature discovery and reuses across different
ML applications. A feature store connects to the data ingestion and is the
interface between an ML algorithm and data. The basic components of a feature
store are shown in Figure 6.2 and discussed below [1] [1] .

63
Figure 6.2: Feature store data flow for ML model from data ingestion to serving
both training and inference

Feature Store Components and Functionalities


The 3 major components of a feature store are
1. An offline feature store - this is for batch processing of features that do
not have strict latency requirements. For example, t he offline store can
be implemented as a distributed file system (ADLS, S3, etc) or a data
warehouse (Redshift, Snowflake, BigQuery, Azure Synapse, etc).
2. An online feature store - this is for real-time processing of features serving
a feature vector as input for an ML model with strict latency requirements,
usually of the order of milliseconds. For example, the online store is ideally
implemented as a key-value store (eg. Redis, Mongo, Hbase, Cassandra,
etc.) for fast lookups depending on the latency requirement.
3. A Feature Registry - this is to store feature metadata with lineage to be
used by both offline & online feature stores. It is a utility to discover all
the features available in the store and information about how they were
generated. It may include a search/ recommendation capability to enable
easy discovery of features.
With the above main feature store components, the functionalities supported
are:
1. Transformation - need to develop features from the incoming data. This
includes automated feature engineering that we talk about in the next
section.
2. Storage - need to store the features such that they can be used at any
time.

64
3. Versioning - need to version features so that different modifications can be
made to assess their impact on ML model training and you can roll back
to earlier versions if required. Akin to data versioning that is discussed in
Chapter 5.
4. Catalog - need to make the features discoverable by indexing the features
with metadata so that other ML applications know what is available. An
example of this discovery is using a natural language-based query retrieval.
This is often done by the Feature Registry. Furthermore, feature stores
can enhance the visibility of existing features by recommending possible
features based on a query of certain attributes and metadata. This em-
powers not-very-experienced data scientists with insights associated with
experienced senior data scientists, thus enhancing the efficiency of the
data science team.
5. Serving - need to support both training (batch mode) and inference (such
as streaming) modes.
6. Governance - need governance (to manage the features and enable reuse)
such as
7. Access - control to decide who gets access to work on which features.
8. Ownership - Identify feature ownership i.e. responsibility for maintaining
and updating features.
9. Regulatory Audit - check for bias/ethics and comply with compliance reg-
ulations to ensure that the developed features are not in violation.
10. Lineage - maintain data source lineage for transparency.
11. Monitoring - need to monitor the incoming data used to create features
and detect any data drifts. We discuss monitoring in Chapter 12.
As indicated in #1 Transformation, an important part of a feature store is au-
tomated feature engineering. There are different ways this can be implemented,
as outlined in the next section.

Automated Feature Engineering


Aggregations and transformations are two popular feature engineering that can
be automated for both continuous and categorical variables.
Let’s review the data type with their transformation and aggregation techniques:
- Continuous|Univariate - absolute value, imputation, mean center, winsorize,
smoothing/averaging, binning, change scale of data using for example log or
inverse or power [2] transformations Bivariate - the difference between 2 vari-
ables, odds-ratio [3]|. The aggregatios are mean, median, standard deviation,
variance. - Categorical unordered aka nominal|Dummy encoding - assign num-
bers to the levels and ensure that if there are K levels you encode using only K-1
new variables. This is to ensure that for algorithms such as linear regression the

65
coefficient matrix is not over-determined and is invertible. One hot encoding
- encode each level as a vector where if there are K levels the vector is of size
K. Feature vectors - use vectors to encode each level (vectors are usually < K
size if there are K levels) where the distance (such as Euclidean) between the
vectors are semantically determined. Thus semantically similar levels have vec-
tors close to each other by distance measure. For example, encoding colors with
feature vectors will have blue and azure vectors close to each other in distance.
The aggregations are count/frequency of a specific level, the number of times
a level is hit in a given period. - Categorical ordered aka ordinal|Numbering
- assign numbers in ascending or descending order. For example, if an alert
has low, medium, and high levels, the corresponding encoding maybe 0, 1, or 2
to indicate the order of criticality. For the aggregations, they are the same as
unordered categorical.
Next, we outline the benefits and challenges of a feature store and list 3 popular
open-source feature stores.

Benefits of a Feature Store


Using a feature store with the aforementioned components and functionalities
has multiple benefits as follows -
1. The well-defined interface between data and algorithm - feature store sits
between the data repository and algorithm and provides a well-defined
interface between the two. This makes it easy to add/remove data sources
and swap algorithms.
2. Centralized repository of features - feature building is not easy and takes
time. Feature store provides a central location where all the features are
generated and stored. This helps in writing robust code using pre-built
and tested features.
3. Promote reuse of features and collaboration - Features stores make it easy
to search for and reuse pre-built and tested features. Furthermore, it helps
in crowdsourcing features across the enterprise - you can leverage existing
features to build new features and store them in the feature store.
4. Training for junior data scientists - who have access to insightful features
developed by experienced senior data scientists.
5. Reduced time-to-market - with the reusability of built and tested features,
you can focus on fine-tuning the algorithm and building ML pipelines
(discussed in Chapter 8) for more iterations, thereby reducing the time-to-
market of your ML model.

Challenges of a Feature Store


While the aforementioned benefits are attractive, they do come with challenges
-

66
1. An additional component to build and maintain - Feature store is another
component in your ML pipeline that you need to build and maintain.
2. Risk of training-serving skew - Feature store is used for both training and
inference, each with different latency requirements. This often results in
different architectures for the two pipelines. Therefore there is always a
risk of training-serving skew where the data transformations/aggregations
are different between training and inference (discussed in Chapter 8).
3. Does not become a feature swamp - adequate governance is required such
that a feature store does not become a feature swamp and a dumping
ground for features that are not completed and/or never used.
4. Continuous monitoring - need continuous monitoring of the developed
feature to detect any unexpected drift or gaps (Chapter 12).
Open-Source Feature Stores
In this section we introduce 3 open-source feature stores -
1. Feature tools ( https://www.featuretools.com/ ) - an open-source frame-
work for automated feature engineering.
2. Feast ( https://www.featuretools.com/ ) - open-source feature store with
support for data located in the cloud and on-premise.
3. Hopworks Feature Store ( https://www.hopsworks.ai/ ) - open-source fea-
ture store that can ingest data from the cloud and on-premise and is a
component of Hopworks ML platform (that supports ML model training
and serving).
There is a list of commercially available feature stores at https://www.featurestore.org/
.

Summary
In this chapter we looked at the motivations and structure of a feature store and
how they are beneficial to ML model development. We also outlined automated
feature engineering and challenges that arise with a feature store and listed
open-source and commercial feature stores that are available in the industry. In
the next chapter, we use these features to build ML models.

[1] https://towardsdatascience.com/mlops-building-a-feature-store-here-are-
the-top-things-to-keep-in-mind-d0f68d9794c6
[2] https://en.wikipedia.org/wiki/Power_transform
[3] https://en.wikipedia.org/wiki/Odds_ratio

67
7 - Building Machine Learning Models
In this seventh chapter we look at what is involved in building machine learning
models after the data is ready for ingestion (for example from a feature store
discussed in Chapter 6). By the end of this chapter, you will be able to:
• Understand what is meant by algorithm versioning
• Define what is AutoML and its purpose
• Describe an ML model using a model card
We start this chapter with versioning an ML algorithm.

ML Algorithm Versioning
You want to version ML algorithms for repeatability and governance, similar
reasons to data versioning (Chapter 5). But you may be thinking, what do I
need to version when I am using say an algorithm from scikit-learn? To start
with, you may have different candidate algorithms that you want to try, each
corresponding to a different algorithm version.

Figure 7.1: Versions with different algorithms


For example, consider a brick-and-mortar retail sales prediction model that
incorporates weather. The input features (say from a feature store from Chapter
6) may be historical sales, temperature, precipitation, and day-of-week. These
features may be used with any of the following models: linear regression, random
forest, support vector machines, and neural networks, as illustrated in Figure
7.1. Each experimental run with the same data version (from Chapter 5) uses
a new algorithm type and version.
Another example of versioning is with different hyperparameters for the same
algorithm. Note that a hyperparameter is a parameter whose value is defined

68
before training/learning [1] . For example, you can create different structures
of an algorithm listed in Table 7.1 with different hyperparameters. You can use
different versions of the random forest algorithm with different tree depths, or
use different versions of a neural network with different learning rates for the
retail sales prediction example (Figure 7.2).
So you can have combinatorial versioning of algorithms (Figure 7.1) with hy-
perparameters (Figure 7.2). In Chapter 8 we see how this works with data
versioning in ML pipelines.

Algorithm Hyperparameters
Linear Regression Regularization parameter,…
Random forest Number of estimators
Support Vector Machine C and gamma
Neural network Learning rate,…

Table 7.1: Algorithms and corresponding hyperparameters.

Figure 7.2: Versions with the same algorithm and different hyperparameters
As demonstrated, algorithm versioning helps you keep track of all the moving
parts in ML model building. Note that these are just sample algorithms that
you can use for the retail sales prediction example. There are multiple other
algorithm options including ensembling the aforementioned algorithms. Deter-
mining what is the best algorithm to use necessitates building models with each
and evaluating the model metrics. Sometimes the challenge may be that you
cannot train a model continuously from start to finish for whatever reason be
it time, hardware availability, data, or cost. At the same time, you do not want
to lose the training you have done till then. This is where model checkpointing
comes in handy.

69
Model Checkpointing
As outlined above, a machine learning model training may be paused for many
reasons, some decided by you such as suspend now and pick it up later to free
up computation resources, others not decided by you such as out-of-memory
or execution crash. Irrespective of the reason, you would want to restart the
training from the last completed point as opposed to restarting from the begin-
ning. This is what model checkpointing allows as demonstrated in Figure 7.3.
Checkpoints A, B and C save the state of a model during training and validation.
Checkpoints B and C pickup from the saved model state in checkpoint A and
B, respectively, with each region indicated differently in the loss curves.
Model checkpointing is similar to model save but with a subtle difference. In the
former, you have the model source code so you are only saving the parameter
values. In the latter, you are serializing the complete model - parameter values
and model architecture. ML libraries such as TensorFlow [2] and PyTorch [3]
have APIs for model checkpointing.
Armed with algorithm versioning, hyperparameter versioning and model check-
pointing, now you are ready to make a serious impact on a business project.
However, manually determining which combination gives the best result (as
evaluated by a business metric) is cumbersome. Automated machine learning
is an automated methodology that does this automatically. In other words,
it builds models with multiple combinations of different algorithms and differ-
ent hyperparameters and ranks them based on a user-defined model evaluation
metric.

70
Figure 7.3: Model checkpointing during training

Automated Machine Learning (AutoML)


AutoML is automated ML model development spanning data ingestion, param-
eter extraction, and selection with feature engineering, hyperparameter opti-
mization, and algorithm selection. AutoML automates the processes in Figure
7.1 and Figure 7.2 as illustrated in Figure 7.4. The benefit of such automation
is that a lot of the glue code needed for the aforementioned functionalities is
already available to you. The outcome of AutoML is ranked ML models (to be
precise ML pipelines that we look at in Chapter 8).

AutoML Usage and Benefits


AutoML are used in the industry in a variety of ways -
• Business and other analysts use AutoML to develop models without going
into the algorithm or hyperparameter selection details - these users are
sometimes referred to as citizen data scientists. The benefit of AutoML is
democratizing ML model usage. Not all employees in an enterprise need
an intensive math and algorithm code development background. AutoML
empowers citizen data scientists who may not have the intensive ML data
science background to impact enterprise productivity using ML models

71
with some upskilling.
• Experienced data scientists use AutoML to narrow down the algorithm
and hyperparameter possibilities, including neural network architectures.
And then develop or fine-tune the models themselves. The benefit is that
trained data scientists with math and algorithm backgrounds use AutoML
as an efficiency tool. In the next section we discuss using AutoML to search
neural network architectures.

Figure 7.4: Automated Machine Learning (AutoML) steps

AutoML for Neural Architecture Search (NAS)


Neural Architecture Search is where you use AutoML to search for the best
(as defined by a loss function) neural network architecture from a set of possi-
bilities. For example, Google implements NAS using reinforcement learning or
evolutionary algorithms (such as genetic algorithms) to search the vast space of
possible neural networks to design neural network architectures.
A restriction of AutoML including NAS is that its search space comprises pre-
defined algorithms and hyperparameters that may be subject to the designer’s
bias. To remove this restriction, Google defined a concept called AutoML-Zero
[4] [4] that uses evolutionary algorithms to start from empty programs and use
basic math operations as building blocks to construct ML code. Next, we list
popular open-source AutoML libraries.

72
AutoML Open-source Libraries
Below are examples of open-source automl libraries -
• AutoWEKA ( https://www.automl.org/automl/autoweka/ ) - oldest li-
brary released in 2013.
• Auto-sklearn ( https://automl.github.io/auto-sklearn/master/ ) – for ML
models
• Auto Keras ( https://autokeras.com/ ) – for DL models
• TPOT ( https://epistasislab.github.io/tpot/ )
• H2O AutoML ( https://www.h2o.ai/products/h2o-automl/ )
• Ludwig ( https://eng.uber.com/introducing-ludwig/ )
• FLAML ( https://microsoft.github.io/FLAML/ ) – Fast and Lightweight
AutoML

ML Model Development - Full/Low/No Code


Using AutoML reveals an interesting spectrum in ML model development. At
one end of the spectrum is automated ML development as defined in this chapter.
At the other end of the spectrum is manual ML development where a data sci-
entist works with the data and algorithms and trains them (maybe with model
checkpointing) using open-source libraries such as scikit-learn, TensorFlow, Py-
Torch, among others. Additionally, there is an interesting methodology in the
middle that we discuss next.
As shown in Figure 7.5, there is a semi-automated ML between the AutoML
and manual ML model development where you have off-the-shelf pre-trained
models available for use. Such pre-trained models target domain-specific use
cases. Their usage is expected in enterprises with data science teams who want
accelerators to solve a business problem. These enterprises do not have the
luxury of starting from scratch due to the competitive landscape but have data
scientists who can improve and fine-tune the pre-trained models. Let us explain
using a couple of examples. Assume two pre-trained models - an employee churn
propensity model to assign a probability to an employee to leave a company and
a patient infection propensity model to assign a probability to a hospital patient
to develop a site infection. Both models are developed using domain-specific
data collected from multiple sources that give them reasonable predictive power.
The employee churn model uses data from different enterprises with varying
numbers of employees. The patient infection model uses data from different
hospitals of different sizes (as determined by the number of beds). Both these
models have reasonable accuracy out of the box when applied to a specific
enterprise or a specific hospital. However, to increase their accuracy for that
specific company or hospital, the respective data science teams can fine-tune
the models using transfer learning [5] [5] with the company or hospital-specific
dataset.

73
Figure 7.5: ML model development from manual to AutoML
Closely associated with the concepts of AutoML, semi-auto ML and manual
ML is how much code you will have to write when using them. As indicated in
Figure 7.5, this introduces the associated concepts of low-code or no-code that
differentiate from full-code -
Full-code – when you write code for an ML model (for example using libraries
such as scikit-learn, TensorFlow, or PyTorch). Associated with manual ML.
No-code – when you use graphical tools such as drag-and-drop components on
a workspace and connect them with arrows to build a system without writing
any code. Associated with semi-auto ML (no fine-tuning, just use the pre-built
ML model as-is) and AutoML (use the automatically built ML model as-is).
Low-code – when you use similar graphical tools but may have to update settings
in a configuration file or update parameters in a code file or write some basic
scripts. Also associated with both semi-auto ML (update scripts to fine-tune
and train the pre-built ML model with your dataset) and AutoML (update
scripts to change hyperparameters, if allowed by the AutoML framework).

Model Card
Just like a nutrients facts label tells you every detail about the product - how
it was made, its constituents, etc., similarly a model card [6] [6] (introduced
by Google) tells you everything about the model. Specifically, a model card
outlines
1. Model details - basic information about when the model was built, version,
license details etc.
2. Intended use - what are the different use-cases that the model was built

74
for and the out-of-scope use-cases.
3. Factors - what are the relevant factors used to build the model.
4. Metrics - what are the performance measures to evaluate model impact
5. Evaluation data - what evaluation dataset (and preprocessing, if any) was
used to analyze the model.
6. Training data - what training dataset (and preprocessing, if any) was used
to build the model.
7. Quantitative analysis - what are the results of model analysis.
8. Ethical considerations - what ethical aspects (bias and fairness discussed
in Chapter 13) was taken care of to build the model.
9. Caveats and recommendations - warnings and recommendations on using
the model.
As you can see a model card details everything pertinent and important there
is to know about a model. Now that you have a ML model documented with a
model card, you need to ensure that there is model governance to keep track of
which model is deployed to or updated in production.

Model Governance
Given the possible Cambrian explosion of data versions, hyperparameters, and
algorithm versions that constitute a specific model, it is imperative that en-
terprises require strong ML model governance to keep track of which model is
currently in production and who authorized the last production release. This is
managed using approval workflows.
Approval Workflows are processes designed to ensure the authorized flow of in-
formation and/or artifacts in organizations. Such workflows are very familiar in
everyday enterprise activities such as applications for leave, expense reimburse-
ments, equipment allocation, and the like. There are industry-standard best
practices for these workflows, which each organization modifies to suit its needs
and culture. They have also become commonplace in the world of software
development, for example,
• A developer issues a “Pull Request” (PR) to a Tech Lead / Manager. The
code is reviewed and corrected before being merged into the release branch
of the code repository.
• Managers review unit test/integration test reports before releasing code
for the QA team to test. Additional tests may be required before the code
is deemed ready for QA.
• A Release Manager reviews QA reports before releasing software to a
production system.

75
Figure 7.6: Model approval workflow based on different roles and responsibilities
Approval Workflows are based on the concepts of Roles and Requests. A work-
flow typically defines a Request (to be performed by a specific Role), and mul-
tiple levels of approvals to be
provided by other Roles. In the first example provided above, the Developer
Role made a Pull Request to be approved by a Manager Role.
In addition, an ideal workflow should be customizable to suit the needs of the
organization and should maintain a log of requests and approvals (by whom and
when), thus providing a complete audit trail.
Let us follow the workflow in Fig. 7.6. Assume a data scientist completes an
ML model. Next, a manager reviews the model and provides feedback. Once
the model is approved by the manager, it is promoted for validation and testing
in quality assurance (QA). In QA, a data analyst (or any QA specialist) tests
the model. If the model has issues, then the data analyst provides feedback
to the data scientist, the model is rebuilt, and the cycle is repeated. Once the
model passes the QA test, the data analyst approves and promotes the model to
production. In production, an IT engineer deploys the model. All the workflow
roles, responsibilities, and subsequent actions are managed, labeled, identified,
and controlled using approval workflows. Such workflows are often managed by
MLOps platforms.
In conclusion, the demonstrated approval workflow maintains a complete log of
approvals (by whom and when) and promotions (dev to QA and QA to pro-
duction) for a production ML model. Such model governance improves the ML
production lifecycle with streamlined processes and accountability in a collabo-
rative environment.

76
Summary
In this chapter we understand algorithm versioning and discuss AutoML. We
also look at model documentation using a model card and no/low/full code ML
model development. Lastly, we discuss model governance given the different
data versions, hyperparameters, and algorithm versions. In the next chapter,
we discuss ML pipelines that are the backbone of ML experimentation and
model building.

[1] https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning) (ma-


chine_learning)
[2] https://www.tensorflow.org/api_docs/python/tf/train/CheckpointManager
[3] https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
[4] https://ai.googleblog.com/2020/07/automl-zero-evolving-code-that-
learns.html
[5] https://en.wikipedia.org/wiki/Transfer_learning
[6] M. Mitchell et.al., Model Cards for Model Reporting , Assoc. for Computing
Machinery, 2019, https://arxiv.org/pdf/1810.03993.pdf

8 - Understanding Machine Learning Pipelines


This eighth chapter pulls together data from Chapter 5 (including feature store
from Chapter 6) and ML algorithms and model building from Chapter 7 and
introduces experimentation using machine learning (ML) pipelines. By the end
of this chapter, you will be able to:
• Understand why ML pipelines are important
• Understand what to ensure when productionizing pipelines
• How to create data feedback loops
• Be cognizant of available open-source ML pipeline tools
We start this chapter with the notion of experimentation that is at the heart of
iterative ML model development.

Phases of Experimentation
The iterative nature of ML model building was outlined in the last chapter
where you managed data versions for different experiments. Experimentation
has 3 different phases that form an experimental wheel as defined by Stefan
Thomke [1]:
1. Generate verifiable hypothesis - this is where you have a hypothesis that
using data version 1.1.1 is better than version 4.0 (Figure 5.2) because
including outliers makes the model robust.

77
2. Run disciplined experiments - this is where you have a platform to run
experiments where you build ann ML model using different data versions
(Figure 5.2) and different algorithm and hyperparameter versions (Figure
7.1 and Figure 7.2) and compare the metrics.
3. Learn Meaningful Insights - use the feedback from the metrics comparison
to learn insights and derive insights such as imputation is not important
but outlier removal is and uses these insights to generate the next verifiable
hypotheses (for example to use version 2.1.1 in Figure 5.2).
The ability to run the above experimentation wheel is enabled by ML pipelines.
And the knobs to change for an ML experiment span data versioning, hyperpa-
rameters versioning, and algorithms/pipeline versioning [2] [2] (Figure 8.1). For
example, you may want to experiment with the same data version in the same
ML pipeline but with different hyperparameters. Another experiment would
be using the same data version with the same algorithm hyperparameter but
different pipeline configurations such as different data mappings or mash-ups
with external data. A third configuration may use the same data version with
different algorithms. As outlined in Chapter 4 on the desiderata of an MLOps
platform, team collaboration with different members running different experi-
ments (some of which will fail fantastically like M2 and not all successful ones
will perform like M1) is important to find that winning model combination.

Figure 8.1 - ML experiments

ML Pipeline
A ML pipeline is a workflow to run experiments outlined in the previous section
starting with data ingestion and ending with model delivery. It includes all the
steps necessary to train a model from DataOps and ModelOps in Figure 4.2 such

78
as data validation, data preprocessing, data mappings (including any external
data mash-ups), model training, model evaluation, and model delivery (ready
for production deployment). Note that this includes defining an ML pipeline
for AutoML discussed in Chapter 7. A workflow example is given in Figure 8.2
-
1. Use a hypothesis to connect to a specific data version.
2. [Optional] Perform additional mapping/mash-ups on the data.
3. [Optional] Generate features - each new feature corresponds to a new
pipeline version.
4. Choose an algorithm and specific hyperparameters. Note that changing
hyperparameters corresponds to a new hyperparameter version.
5. Train an ML model (can be manual or Semi-Auto ML or AutoML from
Chapter 7).
6. Calculate model metrics (training error, if possible out-of-sample valida-
tion error).
7. Deliver the trained model ready to be deployed if needed (to be determined
based on the model metrics) - this completes the experiment.

Figure 8.2: ML pipeline example


Pipelines need orchestration so that the components are executed in the correct
order. In the Hidden Technical Debt in Machine Learning paper, the authors
found that one of the reasons for the failure of ML projects is the brittle and
often not-portable scripts that glue the different components of an ML workflow.
This is mitigated by ML Pipelines that abstract the glue code and manage the
orchestration in a standardized way.
Pipeline tools manage the flow of tasks through a directed graph representation
of the task dependencies (Figure 8.3). Directed graphs ensure that task depen-
dencies are honored. They are acyclic (no cycles) so that the orchestration is
not going back to a previously completed task, and are directed such that the
execution flows in one direction.

79
Figure 8.3: Directed Acyclic Graph (DAG) used in ML pipelines

ML Pipeline Advantages
The technical advantages of ML pipelines are -
1. Independent steps so multiple team members can collaborate on the
pipeline simultaneously.
2. Start from any specific point - Re-run does not have to always start at the
beginning, you can choose a midpoint skipping steps not needed.
3. Reusability promoted by pipeline templates for specific scenarios that al-
ways follow the same sequence - for example, you can create a copy of
a pipeline and reuse for another experiment (i.e. use another data ver-
sion and/or another set of hyperparameters and/or another data map-
ping/external data mash-up).
4. Tracking and versioning of data, model and metrics.
5. Modular development with isolated changes to independent components
that enable faster development time.
6. Visual representation of components / functionalities deployed to produc-
tion.

80
The primary business advantage provided by ML pipelines is cost reduction
due to - 1. Multiple quick iterations via different experiments that give more
development time for novel models – something that data scientists love to do
= better job satisfaction + retention.
1. Standard processes to update existing models.
2. Systematic methodology to reproduce results.
3. Increased data governance with data and model audit – create a paper
trail.

Inference Pipelines
So far in this chapter we have discussed training ML pipelines where you run
different experiments to create the best (from your metrics-perspective) ML
model and deliver it ready for deployment. That model is then deployed to
production and is ready for inferencing. Here you have to be diligent so that
your inference pipeline includes all the data mapping(s) and mash-up(s) and
feature engineering that were done on the raw incoming data in the training
pipeline. As illustrated in Figure 8.4, all the steps prior to model run have to
be identical so that there are no differences between the data used to train the
algorithm to generate the deployed model, and the data used by the same model
for inferencing.
If there is a difference in the data sent to the model then your pipelines have
a Training-Serving skew. Be careful to avoid this skew. Otherwise you violate
one of the basic assumptions in machine learning that the training and inference
data are from the same data distribution.
Note that training and inference pipelines are complementary. The training
pipeline’s primary purpose is to evaluate models and deliver the one with the
best metrics. The inference pipeline’s primary purpose is to generate results
from the model and monitor the model for any drift (more on this in Chapter
12). More on this result generation and feedback for the ML model in the next
section.

81
Figure 8.4: Complementary Train and Inference ML Pipelines

Data Feedback and Flywheel


As you can see in the Inference pipeline above, there is a “store results + data”
component. This component constitutes data feedback and stores new data
input to the model during inference. The stored data serves as new training
data that is used in the training pipeline.
There are 2 types of data feedback in ML models -
1. Implicit feedback - a user clicks on an online advertisement implying the
ad was useful.
2. Explicit feedback - a user rates a movie in an online streaming service.
The motivation behind data feedback is to institute a cycle of model retraining
and data collection also known as data flywheel (Figure 8.5). The flywheel
concept is from Jim Collins [3] [3] and it depicts a cycle that is enabled by all
the different components and leads to improvement.
From a ML perspective, the data flywheel concept is tied to Continuous Training
(CT). CT forms the third part of a CI/CD/CT pipeline where a ML model is
retrained with new training data. The need to retrain is enabled by different
triggers such as data drift (Chapter 12) or a heuristic time window such as
the first of each month. The data flywheel ensures that when CT is triggered

82
there is new training data available to deliver an updated model. For example,
a recommender system is a good example of collecting training data using a
feedback loop - when a user clicks on a recommendation that is used as positive
(implied) feedback.
In the next section, we outline three popular open-source pipelines for use with
your ML code.

Figure 8.5: Data Flywheel

Open Source Pipeline Implementations


There are multiple open-source pipeline implementations and we list 3 of them in
this chapter. Note that most of these implementations use YAML files to define
the pipeline components and dependencies (including external library versions).
An example of such a YAML file pipeline definition is in Figure 8.6.
name: spacy-example
channels:
- conda-forge
depencies:
- python=3.7
- pip
- pip:
- mlflow >= 1.0
- spacy==2.2.3
Figure 8.6: YAML file from spacy example of MLflow.
The popular pipeline frameworks are -

83
1. Apache Airflow ( https://airflow.apache.org /) - originated at Airbnb this
is a workflow engine that manages scheduling and running jobs and data
pipelines.
2. Kubeflow Pipelines ( https://www.kubeflow.org/ ) - started as an open-
source Google to manage TensorFlow jobs on Kubernetes.
3. MLflow ( https://mlflow.org/ ) - provides a framework to track experiment
metrics and parameters and visualize and compare them in a browser.

Summary
In this chapter we looked at the motivations behind ML pipelines and the ad-
vantages of using them both from technical and business perspectives. We also
outlined continuous training that extends the DevOps CI/CD methodology to
CT using a data flywheel. Lastly, we present 3 popular open source implemen-
tations for ML pipelines. In the next chapter, we look at model interpretability
and explainability.

[1] Thomke Stefan H., Experimentation Works, Harvard Business Review, 2020
[2] How MLOps Helps CIOs Reduce Operations Risk (xpresso.ai) (xpresso.ai)
[3] J. Collins, Good to Great: Why Some Companies Make the Leap and Others
Don’t, HarperBusiness, 2001.

9 - Interpreting & Explaining Machine Learning


Models
This ninth chapter discusses how to dissect an ML model from the perspectives
of explainability and interpretability. Until now, you have learned about build-
ing models using data versions (Chapter 5), feature stores (Chapter 6), ML
algorithms (Chapter 7), and experimentations with ML pipelines (Chapter 8).
Now that you have a built model, in this chapter you will be able to:
• Assess how to interpret the output of an ML model
• Understand how to explain the output of an ML model
• Use techniques such as causal inference and counterfactuals for ML model
explainability
We start this chapter with ML model interpretability.

ML Model Interpretability
Consider that you have built an ML model that outputs the probability of a
patient developing diabetes after 6 months and before 12 months. The inputs
to the model are a variety of healthcare data such as electronic health records,

84
lab results, number of visits to the emergency department, among others. The
objective is to rank patients from high to low probabilities and determine which
high-probability (high-risk) patients should be contacted for medical interven-
tion/treatment.
The output of the ML model is 0 to 1, with 0 indicating no risk and 1 a certainty
of diabetes. From a business standpoint, you have to determine the threshold
above which you are going to medically intervene and reach out to patients. This
threshold is a function of your organization’s capacity to manage the number
of interventions in a given period, the intensity of the interventions, and the
history of health outcomes of patients with such probability modeling. In other
words, multiple business factors influence the threshold and how you interpret
the output of the ML model. For example, a large hospital group with sizable
resources will have a lower threshold for intervention than a small hospital with
limited resources with a higher threshold. Note that the ML model workings
remain unchanged as you alter your interpretation of the model output based
on external factors.
Consider another example where you build a model to determine the probability
of a bank customer asking for a loan to default. Based on this probability you
will determine whether to approve the loan or not (Figure 9.1). The output of
the ML model is 0 to 1 with 0 indicating no risk and 1 a certainty of loan default.
Again, you have to determine how you interpret the output of the ML model
for loan approval. Possibly in 2021, you would have a greater appetite to give
loans, and depending on your bank budget your threshold may be low. In 2022,
given the economic conditions and interest rates, your threshold presumably
will be higher. As in the previous example, the inner structure of the ML model
remains unchanged as you change your threshold based on economic conditions.

Figure 9.1: Bank loan default probability interpretation using a threshold


determined by external factors

ML Model Explainability
Now that you have determined how to interpret your ML model output, let
us take a look at why the ML model is giving a particular output. In other
words, we want to explain why in the first example the ML model is outputting
a particular diabetes probability for say John Doe and likewise in the second

85
example why the ML model is outputting a particular loan default probability
for say Mary Jane. This is ML model explainability and is key to understand-
ing the inner workings of an ML model. It is also important for prescriptive
analytics - to use the model to determine what action(s) to take to change a
possible outcome. For example, maybe John Doe has a high probability of dia-
betes because of obesity so a possible intervention might be walking 20 miles a
week. Mary Jane may have a low probability of default because she has taken
loans previously that she has successfully paid off, so the bank now has higher
confidence in her model output. A popular ML explainability library is SHAP
that we discuss next.

Shapley Explainability
A popular ML explainability library is Shapley Additive exPlanations (SHAP)
developed by Scott Lundberg and Su-In Lee in 2017 [1] [1] . SHAP [2] [2] is based
on Shapley’s values that are based on cooperative game theory. It is named after
Lloyd Shapley who introduced this concept in 1951 and was later awarded the
Nobel Prize for it. SHAP also explains individual data instance predictions
adopting techniques from a method called Local Interpretable Model-agnostic
Explanations [3] [3] (LIME).
The SHAP framework considers prediction for a data instance as a game. The
difference between the actual instance prediction and the average prediction of
all instances is the gain or loss of an instance from playing the game. SHAP
treats each feature value of a data instance as a “player”, who works with each
other feature values for loss or gain (= the difference between the instance
predicted value and the average value). As different players (feature values)
contribute to the game differently, the Shapley value is the average marginal
contribution by each player (feature value) across all possible coalitions (data
instances).
Note that SHAP explains the output of any ML model (using shap.Explainer( )
or shap.KernelExplainer( ) APIs) including deep learning ( shap.DeepExplainer(
) ) NLP and Computer Vision models ( shap.GradientExplainer( ) ).
The Shapley value is the only attribution method that satisfies properties Effi-
ciency , Symmetry , Dummy, and Additivity , which together may be considered
a definition of a fair payout -
1. Efficiency - for a data instance, the sum of its feature contributions should
be the same as the difference in its prediction from the average.
2. Symmetry - two feature contributions are the same if they have equal
contributions to all possible coalitions.
3. Dummy - a feature has a value of 0 if it does not have any impact on the
predicted value.
4. Additivity - the combined value of a feature contributes to the average.

86
For example, the Shapley value for a feature of a random forest algorithm
is the average of the Shapley value of the feature across all trees.
Figure 9.2 is an example of a global view of the feature importance using SHAP
that calculates the marginal contribution of each feature across the data in-
stances (or coalitions as is called in cooperative game theory). In the example,
we have a model to predict the risk of End Stage Renal Disease (ESRD) for an
individual given the features listed on the y-axis. As seen in the diagram, the
x-axis represents the influence of a feature on the model outcome (whatever that
may be) with the vertical line indicating no influence. Points to the left (right)
of the line denote a decrease (increase) in the output with feature (aka player
in game theory) value change. Feature values are codified as low (blue) to high
(red). Some feature contributions are explainable. For example, low (blue) val-
ues of the diagnosis of hypertensive CKD decrease the output (since they are to
the left of the line), in other words, reduce the risk of ESRD. Likewise, high (red)
values of the same feature increase the output, i.e. higher risk of ESRD (since
they are to the right of the line). Some feature values are not clearly explainable.
For example, time since hypertension indicates that while a high feature value
(red) lowers the risk of ESRD, a low value (blue) can increase or decrease the
risk. In conclusion, such a global viewpoint clearly demonstrates the impact of
individual features on the predicted output, thereby enabling users to develop
an actionable plan.

Figure 9.2: SHAP feature importance


In contrast to the aforementioned global view, Shapley values also generate a
local instance-specific view as illustrated in Figure 9.3. In Shapely terminology
this is known as a forced plot and demonstrates why a particular data instance
has a specific risk value (0.81), and what the contribution of each feature is
to change that risk value from the average. For example, the feature value
lab_creatinine_serum_avg is increasing the risk value (from the average) for
this specific instance.

87
In conclusion, Shapley values are important for ML model explainability that
is algorithm agnostic due to the cooperative game theory approach.

Figure 9.3: SHAP force plot

Causality, Counterfactuals, and Interventions


While ML model explainability demonstrate the impact of a feature on the
output, the implicit assumption is that a specific feature influences or causes
the change in the output (and not the other way around). For some use cases,
this may be simple to determine such as hypertensive CKD and risk of ESRD.
But there are use cases where this is not a given, such as a nation’s inflation
and unemployment rate (does high inflation cause high unemployment or the
other way around?). This is where causality and counterfactual are important.
A counterfactual is akin to a what-if analysis to determine what would happen
to an ML model prediction for a data instance if the value of a feature was
different than what it is. Counterfactuals are executed using causal inference,
i.e. determine the output after purposely changing the value(s) of a specific
feature(s). In terms of statistical notation, an ML model outputs for a spe-

cific data instance i.e. . For counterfactual analysis, we


want to to know the output when we change a feature for the data instance,

i.e. where denotes changing the in-

stance feature to . This causal inference determines if the output changes in


response to the feature change. If yes, then causality is determined, the output
is explained by the input(s) and we know how to develop an actionable plan.
This actionable plan is an intervention to change the data instance feature based
on the counterfactual analysis.
Let us walk through an example. Imagine that John Doe has a high risk of dia-
betes as determined using a measurable clinical variable called HbA1c. HbA1c
prediction is done using an ML model that weights one of the features. John is
overweight. Note that when you start ML model development, the direction of

88
influence may not be clear - is John HbA1c high because he is overweight, or
is he overweight because of high HbA1c? This is answered with counterfactual
analysis that checks that if John was not overweight would he still be at a high
risk of diabetes? With causal inference analysis, is done using observational
data and with a python package called DoWhy [4] . If the risk is lowered with
weight reduction, then causality is determined. Once causality is determined it
helps with explainability - a factor that contributes to John’s high risk of dia-
betes is weight. So to reduce the risk, reduce weight. To make this actionable,
an example intervention is to walk 20 miles a week (Figure 9.4).

Figure 9.4: Causality and intervention

Summary
In this chapter we learned about techniques to interpret and explain ML models,
including causal inference and counterfactuals. In the next chapter, we look at
using containers to deploy ML models.

[1] Scott Lundberg and Su-In Lee, A Unified Approach to Interpreting Model
Predictions , NIPS 2017, https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-
Paper.pdf .
[2] https://github.com/slundberg/shap
[3] https://github.com/marcotcr/lime
[4] https://github.com/py-why/dowhy

10 - Building Containers and Managing Orches-


tration
This chapter will introduce the use of containerized applications, then we will
talk about process orchestration. These tools will facilitate the DevOps process.
By the end of this chapter, you will be capable of: - creating an instance of
Kubernetes - Understanding the process to update ML models in production

89
Microservices and Docker containers
In this section, we will learn about the advantages of using microservices and
how to combine them with docker containers. We will first learn the pros of
using microservices.

Benefiting from microservices


For many years, microservices are a part of software design. With this method,
services stay linked yet operate separately from one another as a server-side
solution to development. Microservices are being used by more developers to
increase performance, scalability, and maintainability. With conventional archi-
tectural styles, it is impossible for different teams to operate on services without
influencing overall operations.
Microservices are an architectural design pattern used in software development
that structures applications as a group of loosely coupled services, making it
simpler for developers to create and grow systems. The traditional monolithic
architectural approach, which views software development as a single entity, is
different from the microservices architectural approach.
The microservices approach divides software development into smaller, au-
tonomous “chunks,” each of which carries out a distinct service or function.
Integration, API management, and cloud deployment technologies are all used
by microservices.
Microservices are now necessary due to a lack of other options. Developers
demand a fresh approach to development as applications get bigger and more
complex, one that enables them to easily extend programs as user wants and
requirements change.
The benefits of using microservices are: - Scalability
Microservices scale better than monoliths. Developers may expand certain ser-
vices instead of a whole program and perform unique activities and requests
more efficiently. Because developers focus on particular services, there’s less ef-
fort. Microservices accelerate development since developers focus on individual
services that need deployment or debugging. Developers can bring software to
market faster via faster development cycles.
• Data Security
Microservices interact through secure APIs, which may offer greater data se-
curity than monolithic methods. Developers manage data security since teams
operate in silos (but microservices are constantly linked). As data security be-
comes a bigger problem in software development, microservices might help.
• Data Governance
Microservices enable more responsibility when dealing with data governance
standards like GDPR and HIPAA. Some teams may struggle with the monolithic

90
method’s comprehensive approach to data governance. Compliance processes
benefit from microservices’ more precise approach.
• Multilingual technologies
Microservices enable developers to utilize diverse programming languages and
technologies without compromising software architecture. Java may be used
to code app functionalities. Another developer uses Python. This adaptability
creates “technology-agnostic” teams.
Developers package and deploy microservices using Docker containers in private
and hybrid clouds. Microservices and cloud environments facilitate scalability
and speed-to-market.
One of the main benefits is developers access microservices from one cloud loca-
tion and
cloud-based back-end modifications to microservices don’t affect other microser-
vices.
We will now talk about the docker containers.

Leveraging docker containers


In a microservices architecture, Docker containers isolate applications from the
host environment. It helps package apps into containers. Each container will
have standardized executable components, including source code and OS li-
braries to operate a microservice in any environment. Docker containers help al-
locate and share resources. They automate application deployment as portable,
self-sufficient containers in the cloud and on-premise Linux and Windows plat-
forms. Their main advantage is they are cheaper and more efficient than virtual
machines. They standardize development and manufacturing and allow Contin-
uous Integration.

Figure 10.1: Difference between Virtual Machines and Docker containers

91
Figure 10.2: Docker diagram
Figure 10.2 explains how docker works. It is divided into the following compo-
nents: - Images: serve as the foundation for containers. - Containers — Used to
execute applications created using Docker images10. We use docker run to con-
struct a container. The docker ps command may be used to see a list of active
containers. - Docker Daemon: host’s background service in charge of creating,
executing, and disseminating Docker containers. Clients communicate with this
daemon. - The Docker Client is a command-line program that connects the user
to the daemon. In a broader sense, there may be more types of clients as well,
like Kitematic, which gives users a graphical user interface. - a repository for
Docker images10 called Docker Hub. The registry may be seen as a collection of
all accessible Docker images10. One may run their Docker registries and utilize
them to retrieve images10 if necessary.

Figure 10.3: High-level workflow for the Docker containerized application life
cycle
Developers start the inner-loop process by developing code. Developers specify
everything before sending code to the repository in the inner-loop stage (for

92
example, a source control system such as Git). The repository commits Contin-
uous Integration (CI) and the procedure.
The inner loop includes “code,” “run,” “test,” and “debug,” plus actions before
executing the software locally. The developer uses Docker to run and test the
program. Next, we’ll outline the inner-loop process.
DevOps is more than a technology or toolset; it’s a philosophy that demands
cultural transformation. People, methods, and technologies speed up and fore-
cast application life cycles. Companies that embrace a containerized workflow
reorganize to fit their people and processes.
DevOps replaces error-prone manual procedures with automation, improving
traceability and repeatable workflows. With on-premises, cloud, and closely
integrated tools, organizations can manage environments more effectively and
save money.
Docker technologies are available at practically every step of your DevOps pro-
cess for Docker applications, from your development box during the inner loop
(code, run, debug) through the build-test-CI phase and the staging and produc-
tion environments.
Quality improvement helps uncover faults early in the development cycle, reduc-
ing repair costs. By putting the environment and dependencies in the image and
delivering the same image across many environments, you encourage removing
environment-specific settings, making deployments more dependable.
Rich data from effective instrumentation (monitoring and diagnostics) helps
guide future priorities and expenditures.
DevOps shouldn’t be a destination. It should be introduced slowly via ade-
quately scoped initiatives to show success, learn, and improve.

Kubernetes
Kubernetes was created in the Google lab to manage containerized applications
in a variety of settings, including physical, virtual, and cloud infrastructure. It
is an open-source technology that aids in the creation and administration of
application containerization. It can automate application deployment, applica-
tion scaling, and application container operations across clusters. It can build
infrastructure that is focused on containers. Let’s learn why Kubernetes and
DevOps are not only two mutually exclusive worlds but also a perfect pair.

Pairing Kubernetes and DevOps


As we saw in Chapter 1, DevOps drives a company’s software development and
deployment. It connects previously siloed development and operational teams.
It mixes development and operations processes and workflows and offers them
a common infrastructure and toolset. This enables one team to see the other’s

93
code so they may discover bugs early. As DevOps gained popularity, teams cob-
bled together pipelines from various distinct platforms, requiring customization.
Adding a tool required rebuilding the pipeline, which was inefficient. They found
the solution of using containerization. Containers bundle an application or ser-
vice’s code and dependencies to operate in any software environment. By using
containers to execute microservices, enterprises can design flexible, portable
pipelines. This enabled them to add or alter tools without affecting the overall
process, providing seamless CI/CD pipelines.
As DevOps switched to containerized workloads, orchestration and scalability
issues arose. Kubernetes came then. Kubernetes automates the deployment,
scaling, and administration of containers, allowing companies to handle thou-
sands of containers. Kubernetes delivers resilience, reliability, and scalability to
DevOps initiatives.
Kubernetes’ key DevOps characteristics are: - Everything in Kubernetes may
be built “as code,” meaning access restrictions, databases, ports, etc. for tools
and apps and environment parameters are declarative and saved in a source
repository. This code may be versioned so teams can send configuration and
infrastructure changes to Kubernetes. - As enterprises adopt cloud-native initia-
tives, their teams need tools that can be smoothly integrated across platforms
and frameworks. Kubernetes can flexibly orchestrate containers on-premises,
at the edge, in the public or private cloud, or between cloud service providers
(CSPs). - Kubernetes’ automatic deployment and update rollback capabilities
let you deliver upgrades with no downtime. Tools and apps in a container-based
CI/CD pipeline are split into microservices. Kubernetes updates individual con-
tainers without affecting other processes. Kubernetes makes it easy to test in
production to catch flaws and vulnerabilities before deploying updates. - Ku-
bernetes distributes resources effectively and just as required, lowering overhead
costs and optimizing server utilization. Kubernetes boosts development produc-
tivity by providing fast feedback and warnings about code, expediting issue
fixes, and lowering time to market. - Kubernetes may operate anywhere in a
company’s infrastructure. This enables teams to implement it without switching
infrastructures.
Let’s now learn more in-depth about the features of Kubernetes.

Deep diving into Kubernetes


Kubernetes has the following main features: - develops, integrates, and deploys
new software. - Infrastructure housed in containers - Management focused on
applications - infrastructure with automatic scaling - maintaining the same envi-
ronment for production, testing, and development - Infrastructure that is loosely
connected such that any part may function independently - greater resource us-
age density - Infrastructure that will be developed predictably
The ability of Kubernetes to execute applications across clusters of real and
virtual machine infrastructure is one of its primary features. Additionally, it

94
can execute apps on the cloud. It facilitates the transition from host-centric to
container-centric infrastructure.

Figure 10.4: Kubernetes Master Machine components

etcd It contains cluster-wide configuration information. It’s a distributed,


high-availability key-value store. As it may contain sensitive information, only
the Kubernetes API server may access it. It’s a public key-value store.

APIServer Kubernetes is an API server for cluster operations. Different


tools and libraries may connect with the API server thanks to its interface.
Kubeconfig includes server-side communication features. Kubernetes API is
exposed.

Supervisor This component controls most cluster collectors and performs a


job. It’s a daemon that collects and sends information to the API server in a
nonterminating loop. It gets the cluster’s shared state and changes the server
to the required state. Replication, endpoint, namespace, and service account
controllers are critical. The controller manager manages nodes, endpoints, etc.

Scheduler Kubernetes master requires this component. Master’s workload-


distribution service. It monitors cluster node load consumption and places a
burden on available resources. This allocates pods to available nodes. The
scheduler allocates pods and manages the workload.

Kubernetes Nodes The following Node server components are required for
Kubernetes master communication.

Docker Each node needs Docker to execute application containers in an iso-


lated, lightweight operating environment.

95
Kubelet This tiny node service relays information to and from the control
plane. It reads configuration data from etcd store. This receives directives
from the master component. kubelet maintains the work state and node server.
Network rules, port forwarding, etc. are managed.

Proxy Kubernetes This proxy service operates on each node and enables
remote hosts access services. It forwards requests to proper containers and does
load balancing. It makes networking predictable, accessible and separated. It
maintains node pods, volumes, secrets, and container health checks.

Master-node structure in Kubernetes Kubernetes Master and Node are


shown in Figure 10.5.

96
Figure 10.5: Kubernetes Master Machine components

Summary
In this chapter, we underlined the advantages of using micro-services. We de-
scribed how pairing containers and microservices to fasten time to market for
software. Then we finally scaled up by learning how to use Kubernetes. In the
next chapter, we will learn how to release software.

97
11 - Testing ML Models
In this chapter, we will talk about testing. Testing ensures software quality and
it is a critical part of the DevOps/MLOps process. You will learn about the
many forms of functional testing and their functions in software deployment.
By using several technologies that are linked with version control software, we
will investigate test automation and unit testing in further detail. By the end of
this chapter you will be capable of doing the following: - creating unit testing -
creating performance testing - creating integration testing - creating UI testing
We will start by introducing what testing is. # Introduction to testing
Software/Model testing checks, monitors, and ensures quality standards across
the whole development operations cycle. In DevOps/MLOps, we are actively
engaged in managing the testing procedures of new software/new model as it is
being built. Testing is essential to make sure that the software complies with
the business requirements established for the application and that it merges
successfully with the current code without altering it or interfering with its
dependencies. When checking if a model still performs as well or better than the
prior version, we set metrics that will allow us to sign off on new modifications.
This set of metrics is linked to the business: - confusion matrix: we separate the
dataset into Training and Test datasets. The training dataset will be used to
train the model while the test dataset is put aside. When the model is trained
to predict, we attempt to do so using the test dataset. And after the data
are divided into a matrix, we can observe how much of our model’s predictions
are accurate and how much of them are false: true-positive, true-negative, false-
positive and false-negative - accuracy: the closeness of measuring findings to the
actual value. It reveals the precision with which our classification model can
forecast the classes specified in the issue statement. - recall/sensitivity/True
Positive Rate: often utilized in situations when determining the truth is crucial
- precision: often utilized in situations when it is crucial to avoid having a large
number of False positives - specificity: genuine negative rate which gauges the
percentage of real negatives that are accurately classified as such - F1-score:
gauge the accuracy of a test of binary classification. To calculate the score, it
takes into account the test’s recall and precision. - and any other metric which
makes sense for the business. It could be the money made by a model, the cost
saved by a model, or the risk forecasted by a model,…
We demonstrated that testing was an important step of the code/model de-
ployment. We will now talk more in-depth about the various testing we will
encounter.

Functional testing
To assure quality and value, we run tests at various phases of the software
development process. Functional testing is carried out throughout the software
development process to ensure that the program complies with the requirements.

98
Each function is put to the test by having its arguments filled up with values,
then the result is compared to what was forecasted. Applications are built and
deployed with functional testing in place. As a result of DevOps’ cyclical nature,
software functions are continuously monitored and evaluated, which leads to a
source code revision for subsequent development cycles. In the whole iterative
process, functional testing is crucial.
The DevOps process is shown in the chart below. The construction and deploy-
ment phases of the delivery pipeline described in Figure 11.1 have been covered
in this book. We can observe that testing is an important part of this workflow.

Figure 11.1: DevOps workflow


In Figure 11.2, we show the order of the various tests we will present in this
chapter.

Figure 11.2: Sequence of the various tests


We can now further analyze each sort of functional testing with a comprehensive

99
understanding of the testing technique. We will start by talking about unit
testing.

Unit testing
A program’s whole code base is tested using unit testing to ensure that every
function and each component is operating as intended. The foundation of unit
testing is the examination of the smallest testable components, units, or blocks
that make up the program. As a result, after assigning values to the function’s
arguments, unit testing validates the function return (which is a single output).
It is an iterative best practice that makes the code base more independent and
modular the more often it is used during development. This indicates that unit
testing makes it simple to test and, if required, repair each new function in
the developers’ branch. As branches are included in new builds and developers’
work continues while including their faults, issues will become more difficult to
isolate and repair if unit tests are not performed throughout development.
To put it another way, unit testing creates modularity and divides the software
into smaller programs, each with a particular purpose. Coverage is the number
of functional units or blocks that are tested.
We may create a function script add (x,y), and insert arguments as an example
of unit testing to compare the output to an anticipated value. This would
essentially be a unit test run.
Define add (x, y):

provide x + y
Thus, using the function add (6, 3) should result in the number 9 being returned.
The unit test would print the error if the value was different.
To prevent any output variation that might invalidate the testing analysis, unit
testing needs control over the input data utilized for testing. Therefore, a testing
environment that is separate from the development process must be set up. It is
important not to link any external components. For instance, if a code is linked
to the parameters fetched from a database, we don’t want to use the databases
in the unit tests. We will create a mock database (which mimics the behavior
of the database) and we will use this mock object in the unit test.
We learned how to create a unit test, we will now learn how to get a higher view
of the functioning of our software/model by talking about integration testing.

Integration Testing
By integrating the modules that were examined in unit testing into larger group-
ings, integration testing increases the complexity of testing software. One user
request requires the cooperation of many software functional units. This proce-
dure must be followed in a certain order to get the intended result. Integration

100
testing makes sure these components work together as required. In other words,
the integration technique checks to see whether the new modules are compatible
with the code base’s current modules.
Since integration testing does not simulate a database as we did with the mock
database in the unit testing. The integration approach has direct access to the
database. As a consequence, integration tests may not pass if they cannot access
resources. These resources were not needed for unit testing but will be needed
for integration testing.
For unit testing, we used the function add as an example, for integration testing,
to keep on building this example, we will use a matrix multiply using the function
add and the function multiplies as an example of integration testing.
The integration testing regroups many more components and gives a full view
of the system works on a few use cases, we will not talk about the acceptance
testing.

Acceptance testing and system testing


Now that system testing and acceptability testing are both conducted from the
viewpoint of the user, we shall discuss them together because of their similarities
and interdependence. One verifies the consistency and usability of the program,
while the other verifies user engagement and its usefulness.
To assess how effectively all the components work together to meet the objectives
of the various requirements, system testing is done on a full and integrated
codebase that includes all of the functional needs of the application. System
testing, as opposed to integration tests, focuses on the results of the processes
themselves rather than how the modules interact. System testing assures that
results are accurate and dependable following business needs.
The goal of acceptance testing, often known as user acceptability testing, is
to ensure that the program is usable by users in real-world situations. The
user experience is the main emphasis of this testing, which makes sure that
several functional features of the application are evaluated to ensure smooth
and consistent user interaction.
We know how to test what was fixed or added in terms of features, and we will
now learn how to use regression testing to check if any prior features have not
been broken.

Regression Testing
Regression testing aids in determining if our program consistently delivers when
it is used frequently. When all of the tests reveal that a program is operating
properly and by its design, it has succeeded.

101
To do this, we develop a golden performance reference that serves as a bench-
mark for how our program should function. We’ll be looking for our software
to perform just as well as its ideal reference every time we execute it.

Load Performance Testing


Since it assesses an application’s stability, speed, scalability, and responsiveness
under a certain workload, this form of testing are not functional (e.g., data or
request volume). Functional needs are thus disregarded in favor of the software’s
capability.
The main goal of performance testing is to evaluate how well a system performs
under severe use scenarios including massive amounts of data, several concur-
rent requests, etc. As a result, application and computing system performance
limitations are investigated during load and performance testing. Load tests are
laborious and time-consuming; they are often planned for midnight execution
when the crew is off duty.

Canary testing and A/B testing


Canary testing and A/B testing are in some ways similar. We will first talk
about Canary testing.

Canary testing
A new software version or a new feature is tested with actual users in a live
setting as part of software testing known. It is accomplished by pushing a
limited number of end users some code updates live.
The new code or “canary” only affects a tiny number of people, therefore its
overall effect is minimal. If the new code turns out to be problematic or causes
issues for users, the modifications may be rolled back.
A small number of end users act as a test group for new code during program
canary testing. These users, like the canary in the coal mine, are not aware
that they are aiding in the early detection of application issues. Monitoring
software notifies the development team if a code update is problematic so they
may correct it before it is made available to more people, reducing the chance
that the experience for all users will be negatively impacted.
A canary release is an excellent approach to introducing small code modifications
connected to the introduction of new features or a new software version. Because
the code is made available to real users in production, the development team
can assess if the changes have the intended or anticipated effects rapidly.
Developers may move a small portion of users to new features in a new version
via canary deployment. The impacts of any problems with the new software
are limited by merely exposing a portion of the total user population to it, and

102
developers may more easily roll back a problematic release without having it
harm the user base as a whole.

How does the canary test work?


Canary testing adheres to a methodical, step-by-step methodology like other
kinds of software testing. The following are the steps:
First, the testing users are chosen by the development team. This group is a
minor portion of the user base, but it is large enough to provide data that will
support useful statistical analysis. Users are not aware that they are a part of
a test population.
Step 2 involves setting up a testing environment that runs concurrently with
the live environment already in place. Additionally, they set up the system load
balancer to direct user requests from chosen canary testers to the new setting.
Step 3: By sending test user requests to the new environment, developers start
the canary test. Additionally, they keep an eye on testers for whatever length
is necessary to make sure the new version is performing as planned.
Step 4: The new software feature or version may be made available to all users
if it satisfies the deployment requirements. The testers are then sent to the
software’s original version if it turns out that the new version has several flaws,
performs badly, or causes some other problem for consumers.
Step 5: The team publishes the program to a larger audience after fixing any
found flaws.
Validating new software or a new feature in an existing application is simple
by using canary testing. Before being made available to a broader user popu-
lation, the performance of the code may be carefully watched. The danger of
widespread bad performance or negative user experiences is drastically reduced
since the canary is only deployed to a limited number of users. Additionally,
alterations or additions may be swiftly removed if it turns out that they cause
the program to run poorly, have problems, or cause unfavorable user feedback.
We will now talk about the A/B testing.

A/B testing
A/B testing, often known as split testing, is a method for determining whether
the version of something may assist a person or organization achieve a goal more
successfully. To make sure that modifications to a website or page component
are driven by facts and not just one person’s opinion, A/B testing is often used
in web development.
A/B tests are anonymous research in which the subjects are not made aware
that a test is being run. Version A serves as the control in a typical A/B test on
a Web page, whereas Version B serves as the variation. During the test period,

103
half of the website visitors get version A of the page, which has no modifications,
and the other half receive version B, which has a change intended to increase
a certain statistic like clickthrough rate, conversion, engagement, or time spent
on the page. It is determined if the control or the variation performed better
for the targeted aim by analyzing end-user behavior that is collected during the
test period [a] [a] .
The difference between A/B testing and canary testing is that the first one, the
two versions are known to be functional. A/B testing focuses is to check the
preference of the user while canary testing is to test if a new feature will work
better. If this new feature doesn’t work, it will be easy to roll back to the
previous version.

Multi-Armed Bandit Testing (MAB) added by Arnab


The challenge with A/B testing from the previous section is that there is no way
to prioritize a specific model (say A or B) based on its performance while in
production. This is what can be done with multi-armed bandit testing (MAB).
MAB is a simplified version of reinforcement learning that balances exploration
and exploitation. Assume you have chosen a model for production deployment.
Exploration is when you are exploring other models and comparing their perfor-
mance vis-a-vis your production model. Exploitation is when you are focused
only on the production model to squeeze out the best performance.
In MAB testing, the routing of production data is governed by a router that
analyzes the performance of the models under test. So for a scenario akin to
A/B testing, the router would evaluate the performances of A and B models.
Based on the evaluation result, the majority of the inference data is routed
to the best-performing model. Minority subsets are randomly routed to the
remaining model(s), as illustrated in Figure 11.3. This dynamic strategy is a
balance between exploitation (the majority of the data to the best-performing
model) and exploration (the minority of the data to the other models). The
phrase multi-armed comes from the scenario where you are always evaluating
and exploring multiple models.
In the next section, we discuss the case when you do not have multiple models
to test in production but a single update to an existing production model.

104
Figure 11.3: Multi-Armed Bandit testing

Shadow Mode Testing - Champion-Challenger Paradigm


added by Arnab
Sometimes you may not have the luxury to have multiple production-ready
models for A/B or MAB testing. You may have only a single update to the
current production model. But you still want to be sure that your model will
perform as expected in production before a complete switchover. Even better,
you want to make sure that this new update has better model performance. In
the industry, a champion-challenger paradigm is used for this approach. The
incumbent production model is the champion and the new updated model is
the challenger. Until the challenger proves itself (i.e. has a better performance),
the champion is not dislodged.
A method to determine if the challenger is worthy of the crown is shadow mode
testing. In this mode, the challenger is deployed in the “shadow” of the champion
and a minority of production data is routed to the challenger (see Figure 11.4).
If the data can be copied (say in models such as risk analysis of bank loan
applications) then both models receive and evaluate the same data. If the data
cannot be copied (say user web interface) then users with similar behaviors
such as clicking on the same link are divided between the champion and the
challenger. In either scenario, the results are logged and offline analysis is done
to determine the winner. If the challenger performs better than the champion
then a new champion is crowned and a complete switchover is done to the new
model.

105
Figure 11.4: Difference between Virtual Machines and Docker containers

User Interface testing


In the past, testing a user interface always required manual intervention. It
was very difficult to automate this part. The selenium library allowed some
automation by scripting the actions which are done on a given User Interface.
UI testing is a large part of the application using a User Interface. For ML ops
it will matter less since the target for testing is to ensure that the model will
work in production.

Summary
In this chapter, we enforced the importance of testing in the DevOps/MLOps
workflow. We learned how to create a unit test, and integration test and talked
about all the other specific tests that ensure software and model quality. In the
next chapter, we will talk about the monitoring step of the workflow.

12 - Monitoring Machine Learning Models


In this twelfth chapter we highlight post-production responsibilities after a
model has been tested and deployed. Specifically, in this chapter you will be
able to:
• Appreciate the need for ML model monitoring
• Understand what are the different ways that a ML model can underper-
form in production
• Decide on different methodologies to detect and do a root cause analysis
of model performance
We start this chapter highlighting the need for monitoring ML models.

Why is ML Model Monitoring Needed?


You need to monitor an ML model in production because the performance of
the model may deviate from the performance observed during training. Let us

106
understand how this may happen.
As illustrated in Figure 12.1, you can think of an ML model as having data
inputs X and output y . In terms of probabilities, there is the input probability
P(X) , the input-conditional output probability P(y|X), and the output marginal
probability P(y) .

Figure 12.1: Input-output relation in an ML model


The model performance deterioration may be due to a change in any of the 3
probabilities outlined in Figure 12.1. Starting with P(X) , a data shift in dis-
tribution between training and production is known as a covariate shift. Note
that the other probabilities do not change. This shift is a violation of a key
assumption in ML model development that states that production data is from
the same distribution as the training data. In other words, the patterns identi-
fied by the developed ML model during training are valid in the future in the
not-yet-seen production data. In covariate shift, this is no longer valid and we
look at it in the next section.
If P(y) is different in production from training (and the other probabilities
remain unchanged), then that shift is known as the prior probability shift and
we look at it later in this chapter. Finally, if P(y|X) (with the other probabilities
unchanged) is different in production from training, that is known as concept
shift and is discussed later in the chapter.
As stated, any of the above may cause unexpected model performance. To detect
such a change in model performance, we monitor model metrics in production
and compare them to expected values measured during training. We discuss
this approach later in this chapter.
Another reason for the model performance deterioration may be the system
health (the machine on which the ML model is running) and we also look at
how to monitor that as well.

Why Production Data may be different from Training Data


In this section, we look at the reasons why the production data may be different
from the training data. Before that, note that while training data is in batches
(and mini-batches), production data may be streaming. To compare them, you
need to collect a significant amount of streaming production data over time

107
so that a statistical comparison can be made to verify if the distribution is
significantly different from the training data distribution.
The production and training data may be different because of the following
reasons -
1. Sampling bias - it could be that the training data that was sampled from
a population is biased and is not a true representation of the underly-
ing population distribution, as demonstrated in Figure 12.2. Therefore,
during production, the ML model is exposed to a population distribution
that is different from the training data distribution. For example, using
cartoon pictures of dogs to train a model to identify the type of dog while
production data includes real-life pictures of dogs.

Figure 12.2: Biased training sample distribution different from the test sample
1. Non-stationary environment - an ML production model that is receiving
outside data may be exposed to a non-stationary environment where the
data characteristics (such as mean and variance) are changing with time
(as illustrated in Figure 12.3) and the data processing does not correct
for it. For example, trending data such as movie ticket sales over the
years is data from a non-stationary environment. Note that there are
data processing techniques available to change a non-stationary data to a
stationary data but are beyond the scope of this chapter.

108
Figure 12.3: Non-stationary feature
Given that now we understand why production data may be different, in the
next sections we cover characteristics of those differences, how to detect them
and importantly, how to correct for them such that the ML model regains
performance in production.

ML Model Feature Different in Production Data from


Training Data - Covariate Shift
As we explained earlier, covariate shift happens when p(X) in Figure 12.1
changes between training and production. An example is illustrated in Figure
12.4 [1].

109
Figure 12.4: Covariate shift with different distribution in training (before) and
production (after)
An example of covariate shift is when an image ML model is developed to detect
cars using black-and-white pictures and the production data contains colored
images12 of the same cars as in the training data. Another example of covariate
shift is when a spoken English speech recognition algorithm to detect what is
being said is trained using an Australian accent and used an American accent.
A third example is when a disease detection algorithm using patient data is
trained with data of 20 and 30-year-olds and used on Medicare (ages 65 or
older) population data.

Detect Covariate Shift


There are different statistical tests and machine learning models that can be
used to detect covariate shifts in the ML features. In this chapter, we outline
different statistical tests for the distribution comparison of different types of
features as follows -
1. Categorical feature - you can use the chi-squared [2] [2] test that uses the
contingency table.
2. Numerical feature - you can use the Kolmogorov-Smirnov [3] [3] test to
compare distributions.

110
Correct Covariate Shift
Once you detect the feature(s) with covariate shift, you have 2 options to com-
pensate for the shift -
1. Drop the feature(s) - this is more of a hard-line but simple approach where
you rebuild the model without the feature. The downside to this approach
is that if this is an important feature, then removing it from the model
may reduce the accuracy.
2. Retrain the model - you retrain the model with the shifted production data
as the updated training data. If there are not many shifted data points,
then you may have to assign higher weights to them during training. Note
that the downside to this approach is that if the feature returns to the
earlier distribution, you will detect another covariate shift and need to
redo the training,

ML Model Output Different in Production Data from


Training Data - Prior Probability Shift
When the distribution of the output/target variable P(y) (Figure 12.1) changes
between training and production, this is called prior probability shift (aka target
shift). In this case, the prior assumption of say a particular percentage of the
output variable belongs to a specific category is violated as shown in Figure 12.5
[4] .

Figure 12.5: Prior Probability Shift - change in output variable distribution


An example of a prior probability shift is when fraudulent credit transactions
go down from 5% in training data to 2% in production data. Another example
is an ML model trained to detect email spam with 25% of the training data but
50% of the production data containing spam.

111
Detect Prior Probability Shift
Prior probability shift can be detected using a methodology called Population
Stability Index (PSI). PSI is a comparative measure of how much a variable has
changed between two samples. It divides the training data output into bins and
uses the bins to compare with production data output [5] [5] . Therefore, if the
production data output has a different distribution, then the PSI detects it.

Correct Prior Probability Shift


Once you detect the feature(s) with covariate shift, you will need to retrain
the model with the shifted production data as the updated training data. Note
that the downside to this approach is that if the feature returns to the ear-
lier distribution, you will detect another covariate shift and need to redo the
training.

ML Model Conditional Output Different in Production


Data from Training Data - Concept Shift
When the ML model input and output data distributions do not change, but the
conditional output probability P(y|X) (Figure 12.1) changes, that is a concept
drift. For example, loan approvals during Q4 2022 with high-interest rates and
fear of pending recession may be different than Q4 2019 before the pandemic.
Note that the input (data of loan applicants) and output (approval/denials of
loans) data distribution do not change.

Detect Concept Shift


Concept shift is tricky to detect given the input and output data distributions
remain unchanged. One way to detect it is to maintain a “golden dataset” where
you have the expected result values (regression test). Run the golden dataset
through the current ML model and compare the new results with the expected
results. If there is a significant change detected, that is an indicator of a concept
shift.

Correct Prior Probability Shift


As with the other drifts, a simple solution to a concept drift is to retrain the
ML model with the new production data.

Monitor Transient ML Model Performance Changes


Data drift and prior probability shifts can happen suddenly or gradually over
time. As long as there are significant data points in production that can be
compared with historical data, the drifts are detectable using the techniques
discussed above. However, there may be transients due to sudden distribution
changes in P(X) or P(y), and the data returns to where it was before the drift

112
after some time. During the transient phase, there may be unexpected changes
to the model performance. In such circumstances, the statistical techniques
discussed above may not be able to detect them. Instead, they are detectable
by monitoring model performance for each data instance as described below.

Detect Model Performance Change


During ML model development with training data, the evaluation metrics of
the model are tracked to determine acceptable limits. Examples of such metrics
are given in Table 12.1. The tracking of the metrics enables us to determine
the mean and standard deviation. Thereafter in production, monitoring metrics
can be set such that if the model performance is beyond n standard deviation
from the mean, then a flag is raised. In the example in Figure 12.6, the accuracy
metric performance bound is set to be +/- 2 standard deviations from the mean.

Regression Models Classification Model


Root Mean Squared Error F1 score
R-squared Precision-Recall
Mean Absolute Percentage Error Receiver Operating Characteristic

Table 12.1: Regression and Classification Model Evaluation Metrics.

113
Figure 12.6: Model Performance monitoring using bounds determined at train-
ing

Correct Model Performance Change


If the ML model performance monitoring detects deterioration and nothing
on the data drift detectors, then possibly the drifts were transient and are no
longer present. Not much you can do in such situations except for monitor how
frequently this happens and take a business decision on what to do when the
data drifts.

System Health Operational Monitoring


The production machine on which the ML model is running may have multiple
issues with processor and memory. For example, if there is a processor intensive
process running on the same machine and starving the ML model, then the
ML model may not respond to input data in time. Likewise, the same ML
model phenomenon may occur if there is a memory hungry process running on
the same production machine. Additionally, if there is an issue with the rate
of input data in the machine due to outside interference or a denial-of-service

114
attack on the production machine, this may result in increased latency where
the ML model is not responding in time.

Detect System Health Change


The machine on which the ML model is running needs to be monitored for usage
and latency as follows -
1. Amount of data over time - this is to ensure that the data ingestion (espe-
cially for streaming data) is not overwhelming the memory of the machine.
2. Memory usage - to ensure that the ML model is not being starved of
memory by checking that the memory usage is within operational limits.
3. Output latency - to track the latency of the ML model response by mon-
itoring the time to output and checking that it is within expected limits
set during training.

Correct System Health Change


Once a machine issue is detected, it is best to shut down the ML model and
attempt to remove the cause of the machine issue. For example, shutdown any
runaway processes that may be hogging the processor and/or memory.

Summary
In this chapter we discuss why ML model monitoring is required, and the con-
cepts behind detection and correction. We also outline different reasons that a
ML model may be performing different-than-expectation in production. In the
next chapter, we look at ML model fairness.

[1] https://towardsdatascience.com/to-monitor-or-not-to-monitor-a-model-is-
there-a-question-c0e312e19d03
[2] https://en.wikipedia.org/wiki/Chi-squared_test
[3] https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
[4] https://towardsdatascience.com/mlops-model-monitoring-prior-probability-
shift-f64abfa03d9a
[5] https://towardsdatascience.com/mlops-model-monitoring-prior-probability-
shift-f64abfa03d9a

13 - Evaluating Fairness
In this thirteenth chapter, we discuss ML model fairness. This Is an important
topic for ML ethics, specifically bias. Now that you know how to build, deploy,

115
test, and monitor ML models, you may need to ensure that your ML model is
fair. Specifically, in this chapter you will be able to:
• Understand what is bias and what is fairness in ML models
• Learn how to detect bias in ML models
• Take action to mitigate bias so that your ML model is fair
• Analyze the trade-off between fairness and accuracy
In the first section, we discuss bias and fairness.

What are Bias and Fairness?


Bias is discriminating against a particular idea, group, or phenomenon. For
example, you may be biased against a specific team in your favorite sport - you
want to win against them and do not want them to do better than your favorite
team. Another example is that you may be biased towards your favorite aunt
or uncle versus the other relatives.
Fairness is to ensure there is no bias, that is no unequal harm or any preferential
treatment to a particular idea, group, or phenomenon. Looking at the previous
examples, for you to be fair you need to remove the bias against that specific
team you want your team to win against and feel the same towards all relatives.
As you can imagine, in our daily interactions and society in general there are a
lot of biases.
The aforementioned regular biases that exist get captured in data. The same
data is used to train ML models and their predictions are based on data pat-
terns that have an underlying bias(es). Though the ML models pick up these
underlying bias patterns, note that such pattern recognition is key. Without
such patterns, ML models may not be effective. For example, a dataset with no
patterns and no bias is white noise, and therefore it has no predictive power. So
it is difficult to remove all bias (as per definition) and deploy completely fair ML
models. The objective is to ensure that your ML models have no harmful bias
that discriminates against a particular idea, group, or phenomenon in society.

ML Model Bias
There are different sources of bias in ML models. We start with the most popular
data bias -
1. Data bias - when the data is biased towards or against a particular idea,
or group of phenomena, the trained ML model inherits that bias. For
example, assume you are building an ML model to target financial as-
set management advisors to buy your company’s mutual funds. The ML
model determines which advisors will buy the fund based on their profile,
and geographical location, among others. If most of the advisors in Cal-
ifornia tend to buy your funds relative to other states, then your data is
biased from a geographical perspective. Consequently, during inference,

116
the ML model is likely to indicate that California-based advisors are going
to buy your mutual fund. In reality, that may not happen and your ML
model may overestimate your mutual fund selling success for California
leading to high false positive (i.e. ML model estimates yes to buying funds
but the actual is a no).
2. Algorithmic bias - when you are building an ML model, you can choose
to overfit or underfit on training data. If the algorithm overfits a dataset,
then the inference for that dataset will likely have a high variance with
a lot of false positives and false negatives (i.e. ML model estimates do
not match actuals). For example, an ML model cross-selling to retail
customers may focus on a specific customer segment and overfit on that
segment. Therefore the prediction if cross-sell to a customer from the
overfit segment may have a high precision (i.e. the ML model estimate
matches the actual). But the ML model may also have a lot of false
positives and false negatives for customers from a different segment.
3. Business bias - we have seen how different business and global circum-
stances can change the interpretation of an ML model output in Chapter
8. For example, during high inflation loan applications can be subjected
to higher approval thresholds than during low inflation.
In the next section, we outline how to detect an ML model bias.

ML Model Bias Detection


To detect if an ML model has any bias, first use the related business case to
determine what group, idea, or phenomena the model may be biased against.
Then test the ML model with inference data such that you can verify if there is
any bias. For example, assume you are building an ML model to ascertain the
risk of chronic disease for an individual. You determine that there can be two
sources of bias: gender and age. Therefore you need to set up a test such that
you can analyze the ML model output by gender and by age.
As you see in Table 13.1, the inference data is binned by age group and also by
gender. If there is a bias, it should be noticeable in the ML model output metrics
such as precision and/or recall. From the table, you can conclude that there is
no gender bias as the metrics for male (M) and female (F) have no significant
difference. Unlike the metrics for the age bins. You can see that there is an
indication of age bias as the model seems to predict disease risk for a higher
number of individuals in older age groups. The consequence of this bias is that
the age groups 65 years and older have high false positives and corresponding
low precision for chronic disease detection (note: precision = true_positive /
(true_positive + false_positive) ).

117
Table 13.1: Chronic disease prediction ML model output binned by age and
gender for bias detection

ML Model Bias Correction


Given that you have detected ML model bias, as discussed earlier there can be
three sources of this bias. Business bias is easy to detect if you compare the
model output and check if the interpretation of that output has changed over
time.
To detect if there is data bias you should perform exploratory data analysis to
check if the data is imbalanced for a specific category as outlined in Chapter
5. This can be corrected by the first method below. In contrast, detecting
algorithmic bias can get tricky - if there is a bias against a specific category and
the data is not imbalanced, then algorithmic bias may be a cause. This can be
corrected by the second and third methods below -
1. Data sampling - correct for any training data imbalance (fewer data points)
in a specific category by either over- or under-sampling. Likewise, you
can simulate the same effect by changing the weights for each data point,
giving higher weight to the data instances belonging to categories with
fewer data.
2. ML Model Optimization - change the cost function constraints used dur-
ing training to influence the model parameters calculations. For example,
include false positive cost constraints such that the false positive is com-
parable across all the categories.
3. Postprocessing - use the ROC curve for different categories to select dif-
ferent thresholds such that the metrics (false positive, false negative, F1
score) are comparable across all the categories.

118
How does ML Model Fairness affect Accuracy?
Bias correction often mitigates (but not eliminates) ML model bias. This comes
at a cost (remember the adage: there is no free lunch). And the cost is ML
model accuracy. Think of it this way - if you had all the details for a specific
group of retail customers, then you would know them like family. Any ML model
built using that data would have a near-perfect prediction of their likes/dislikes.
But the model is very unfair since you are biased towards those customers. To
make it fair, you need to give up some information (forget something about the
customers in that group) that would reduce the ML model accuracy. That is
the trade-off.
Once you have your ML model ready, identify the area(s) where you can make
the ML model fair. Quantity fairness before you make any changes. Quantify
accuracy using your defined metric. Start making the ML model fairer in steps.
Calculate the accuracy for each step. You will notice that as you make your ML
model fairer, you are likely giving up on accuracy. In other words, fairness and
accuracy form a Pareto pair as illustrated in Figure 13.2.

Figure
13.2: Pareto curve of fairness vs accuracy

Summary
In this chapter, we understood ML model fairness and bias, the reason for ML
model bias, and how to detect and correct the bias. We also discussed the trade-
off when correcting for bias and making an ML model fairer. In the next chapter,
we look at how to make an ML model robust to failures using anti-fragility.

119
14 - Exploring Antifragility and ML Model Envi-
ronmental Impact
Congratulations on reaching this last chapter - by now you know how to build,
deploy, test, detect bias, and monitor ML models. In this fourteenth chapter,
we discuss techniques on how to make your ML model robust and to understand
the environmental impact of ML models. In this chapter you will be able to:
• Understand antifragility and how it can be used to make your ML model
robust
• Determine the environmental impact of training your ML model and how
you can help
We start with the concept of antifragility.

What is Antifragility?
Antifragility as the name suggests is the opposite of fragility. So what does that
mean? Let’s start with fragility - it means things or systems that break down
when they are subject to randomness, in the form of vibrations or something
else. So what is the opposite of this phenomenon - well, it is not that things or
systems can handle randomness. That is called robustness. The opposite is that
instead of breaking down, systems become stronger when subject to randomness.
This is called antifragility.
Machines are fragile - they do not do well when encountering randomness. Hu-
mans, on the other hand, are antifragile. When we manage unforeseen situa-
tions (i.e. randomness) we gain additional experience and become stronger. In
the next section, we discuss what is the association between antifragility and
ML models.

Chaos Engineering - Antifragility with ML Models


The concept is to use a human (antifragile) and machine (fragile) tandem to
make ML models antifragile. This technique is called chaos engineering [1] and
it has an open-source toolkit [2] . It started with Netflix [3] where they improved
the resiliency of their systems by developing a tool called chaos monkeys. They
imagined a scenario where monkeys would be let loose in their data center and
the monkeys would randomly disable production servers. The test was to check
if Netflix could survive this scenario without any impact on the customer.
Chaos monkeys expanded to random failures and abnormal conditions across
different parts of a production system (for example, production database) and
the ability of a system to perform under such duress. Once such conditions were
successfully tested, the systems were not only robust but antifragile given that
the failures and/or abnormal scenarios exposed fault lines that were plugged to
make the system stronger. Chaos monkeys is a tool available on github [4] . In

120
the next section, we go through the principles of chaos engineering to make your
ML model production system stronger.

Experimentation with Chaos Engineering


The idea is to use controlled experiments to uncover fault lines in the ML model
production system. This typically uncovers issues in hidden places such as -
1. Infrastructure - platform or databases used in production systems.
2. Applications - bugs in ML model code and outside dependencies such as
remote server connections.
3. Practice or process - bug tracking to ensure that found issues are plugged
and tested.
You need to run the experiments with a controlled “blast radius” such that if
issues are uncovered their impact is limited to within the radius and they do
not bring down the whole system.
Before you run a controlled chaos experiment, you need to establish a hypothesis
on the possible failures and prioritize them. One way to prioritize is to label
each failure in terms of their likelihood (how often this is likely to happen) and
impact (how badly can this failure damage the system operations). Use a grid
as in Figure 14.1 to understand where each failure lies.
After you prioritize the failures, run controlled chaos experiments such as -
1. Infrastructure - randomly bring down servers within a set group such that
you understand and control the failure.
2. Applications - random disable outside servers communicating with the
ML model - for example, disable a database that is storing ML model
parameters.
3. Process - repeat failures that were discovered earlier and supposed to have
been fixed such that the ML model system is robust to them.

121
Figure 14.1: Plot to prioritize system
failures in terms of how often they happen (Likelihood) and their influence
(Impact)
Failures or fault lines exposed by these tests once corrected are going to make
your ML model production system stronger. That is antifragility leading to
robust ML models and systems.
In the next section, we discuss the environmental impact of training and running
your ML models.

Environmental Impact of ML Models


The last topic in this book is the environmental impact of ML models. While
we are all aware of the amazing achievements of ML models, we are not privy to
their environmental impact. Training ML models requires storage and process-
ing power to analyze vast amounts of data and run multiple experiments across
days and weeks. This is quantified in terms of carbon emissions.
It is recommended that when you are building pipelines and training ML models,
you also calculate the carbon emissions. CodeCarbon [5] is a software package
that seamlessly integrates into the Python codebase. The solution was jointly
developed by Mila, a world leader in AI research based in Montreal, GAMMA,
BCG’s global data science and AI team, Haverford College in Pennsylvania,
and Comet.ml, an MLOps solution provider. It estimates the amount of carbon
dioxide (𝐶𝑂2 ) produced by the computing resources used to execute the code.
The objective is to incentivize developers to optimize their code efficiency in
terms of carbon emissions.
Additionally, if you are using a cloud provider for ML model training you can
use a 𝐶𝑂2 calculator [6] to choose the least carbon-emitting data center. One
of the factors in the calculation is the power grid used by the cloud provider
based on whether the energy is generated using renewable (solar, hydro, wind)
or nonrenewable (coal) sources.

122
Summary
In this last chapter we look at a toolkit based on antifragility that helps improve
ML model system resiliency and makes them highly available. We also discussed
the environmental impact of training ML models and how to calculate them.

[1] Miles R., Learning Chaos Engineering . O’Reilly, 2019.


[2] https://chaostoolkit.org/
[3] https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116
[4] https://github.com/netflix/chaosmonkey
[5] https://pypi.org/project/codecarbon/
[6] Machine Learning CO2 Impact Calculator (mlco2.github.io) (mlco2.github.io)

123

You might also like