Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Tools and Technology Used to implement the Practical
Practical 1: Setting up a Version Control System (VCS) for ML Projects:
Using Git for version control:
Git is a version control system that you download onto your computer. It is essential that you use Git
if you want to collaborate with other developers on a coding project or work on your own project.
In order to check if you already have Git installed on your computer you can type the command git --
version in the terminal.
If you already have Git installed then you will see what version you have. If you don’t have Git installed
you can visit the Git website and easily follow the download instructions to install the correct version
for your operating system.
What is a repository?
You can think of a repository (aka a repo) as a “main folder”, everything associated with a specific
project should be kept in a repo for that project. Repos can have folders within them, or just be separate
files. You will have a local copy (on your computer) and an online copy (on GitHub) of all the files in
the repository.
The workflow
The GitHub workflow can be summarised by the “commit-pull-push” mantra.
1. Commit
▪ Once you’ve saved your files, you need to commit them - this means the changes you have made to
files in your repo will be saved as a version of the repo, and your changes are now ready to go up
on GitHub (the online copy of the repository).
2. Pull
▪ Now, before you send your changes to Github, you need to pull, i.e. make sure you are completely
up to date with the latest version of the online version of the files - other people could have been
working on them even if you haven’t. You should always pull before you start editing and before
you push.
3. Push
▪ Once you are up to date, you can push your changes - at this point in time your local copy and the
online copy of the files will be the same.
Each file on GitHub has a history, so instead of having many files
like Dissertation_1st_May.R, Dissertation_2nd_May.R, you can have only one and by exploring its
history, you can see what it looked at different points in time.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 2: Creating a Continuous Integration (CI) Pipeline
Jenkins Pipeline is a suite of plugins that supports implementing and integrating continuous delivery
pipelines into Jenkins. Pipeline provides an extensible set of tools for modeling simple-to-complex
delivery pipelines "as code" via the Pipeline DSL.
This section describes how to get started with creating your Pipeline project in Jenkins and introduces
you to the various ways that a Jenkinsfile can be created and stored.
Prerequisites
To use Jenkins Pipeline, you will need:
• Jenkins 2.x or later (older versions back to 1.642.3 may work but are not recommended)
• Pipeline plugin, [2] which is installed as part of the "suggested plugins" (specified when running
through the Post-installation setup wizard after installing Jenkins).
Read more about how to install and manage plugins in Managing Plugins.
Defining a Pipeline
Both Declarative and Scripted Pipeline are DSLs to describe portions of your software delivery
pipeline. Scripted Pipeline is written in a limited form of Groovy syntax.
Relevant components of Groovy syntax will be introduced as required throughout this documentation,
so while an understanding of Groovy is helpful, it is not required to work with Pipeline.
A Pipeline can be created in one of the following ways:
• Through Blue Ocean - after setting up a Pipeline project in Blue Ocean, the Blue Ocean UI
helps you write your Pipeline’s Jenkinsfile and commit it to source control.
• Through the classic UI - you can enter a basic Pipeline directly in Jenkins through the classic
UI.
• In SCM - you can write a Jenkinsfile manually, which you can commit to your project’s source
control repository. [3]
The syntax for defining a Pipeline with either approach is the same, but while Jenkins supports entering
Pipeline directly into the classic UI, it is generally considered best practice to define the Pipeline in
a Jenkinsfile which Jenkins will then load directly from source control.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 3: Containerization with Docker
Docker is a platform for running applications in an isolated environment called a "container" (or Docker
container). Applications like Jenkins can be downloaded as read-only "images" (or Docker images),
each of which is run in Docker as a container. A Docker container is a "running instance" of a Docker
image. A Docker image is stored permanently, based on when image updates are published, whereas
containers are stored temporarily. Learn more about these concepts in Getting Started, Part 1:
Orientation and setup in the Docker documentation.
Due to Docker’s fundamental platform and container design, a Docker image for a given application,
such as Jenkins, can be run on any supported operating system or cloud service also running Docker.
Supported operating systems include macOS, Linux and Windows, and supported cloud services
include AWS and Azure.
Installing Docker
To install Docker on your operating system, follow the instructions in the Guided Tour prerequisites.
Alternatively, visit Docker Hub, and select the Docker Community Edition suitable for your operating
system or cloud service. Follow the installation instructions on their website.
Minimum hardware requirements:
Software requirements:
Java: see the Java Requirements page
Web browser: see the Web Browser Compatibility page
For Windows operating system: Windows Support Policy
There are several Docker images of Jenkins available.
Use the recommended official jenkins/jenkins image from the Docker Hub repository. This image
contains the current Long-Term Support (LTS) release of Jenkins, which is production-ready. However,
this image doesn’t contain Docker CLI, and is not bundled with the frequently used Blue Ocean plugins
and its features. To use the full power of Jenkins and Docker, you may want to go through the
installation process described below.
Windows
The Jenkins project provides a Linux container image, not a Windows container image. Be sure that
your Docker for Windows installation is configured to run Linux Containers rather than Windows
Containers. Refer to the Docker documentation for instructions to switch to Linux containers.
1. Open up a command prompt window Create a bridge network in Docker
2. Create a bridge network in Docker
3. Run Docker image
4. Customize the official Jenkins Docker image, by executing the following two steps:
5. Run your own myjenkins-blueocean:2.426.1-1 image as a container in Docker.
6. Proceed to the Setup wizard.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 4: Orchestrating ML Workflows with Kubernetes.
Kubernetes
Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of
containerized applications.
A Kubernetes cluster adds a new automation layer to Jenkins. Kubernetes makes sure that resources
are used effectively and that your servers and underlying infrastructure are not overloaded.
Kubernetes' ability to orchestrate container deployment ensures that Jenkins always has the right
amount of resources available.
Hosting Jenkins on a Kubernetes Cluster is beneficial for Kubernetes-based deployments and dynamic
container-based scalable Jenkins agents. Here, we see a step-by-step process for setting up Jenkins on
a Kubernetes Cluster.
Setup Jenkins On Kubernetes
For setting up a Jenkins Cluster on Kubernetes, we will do the following:
1. Create a Namespace
2. Create a service account with Kubernetes admin permissions.
3. Create local persistent volume for persistent Jenkins data on Pod restarts.
4. Create a deployment YAML and deploy it.
5. Create a service YAML and deploy it.
Kubernetes Jenkins Deployment
Let’s get started with deploying Jenkins on Kubernetes.
Step 1: Create a Namespace for Jenkins. It is good to categorize all the DevOps tools as a separate
namespace from other applications.
Step 2: Create a 'serviceAccount.yaml' file and copy the following admin service account manifest.
Step 3: Create 'volume.yaml' and copy the following persistent volume manifest.
Step 4: Create a Deployment file named 'deployment.yaml' and copy the following deployment
manifest.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 5: Model Packaging and Deployment with TensorFlow Serving
TensorFlow is basically a software library for numerical computation using data flow graphs where:
nodes in the graph represent mathematical operations.
edges in the graph represent the multidimensional data arrays (called tensors) communicated between
them. (Please note that tensor is the central unit of data in TensorFlow).
Consider the diagram given below:
TensorFlow APIs
TensorFlow provides multiple APIs (Application Programming Interfaces). These can be classified into
2 major categories:
Low level API:
complete programming control
recommended for machine learning researchers
provides fine levels of control over the models
TensorFlow Core is the low level API of TensorFlow.
High level API:
built on top of TensorFlow Core
easier to learn and use than TensorFlow Core
make repetitive tasks easier and more consistent between different users
1. Installing TensorFlow
An easy to follow guide for TensorFlow installation is available here: Installing TensorFlow. Once
installed, you can ensure a successful installation by running this command in python interpreter:
import tensorflow as tf
2. The Computational Graph
Any TensorFlow Core program can be divided into two discrete sections:
Building the computational graph.A computational graph is nothing but a series of TensorFlow
operations arranged into a graph of nodes.
Running the computational graph.To actually evaluate the nodes, we must run the computational graph
within a session. A session encapsulates the control and state of the TensorFlow runtime.
Now, let us write our very first TensorFlow program to understand above concept:
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Python
# importing tensorflow
import tensorflow as tf
# creating nodes in computation graph
node1 = tf.constant(3, dtype=tf.int32)
node2 = tf.constant(5, dtype=tf.int32)
node3 = tf.add(node1, node2)
# create tensorflow session object
sess = tf.compat.v1.Session()
# evaluating node3 and printing the result
print("sum of node1 and node2 is :",sess.run(node3))
# closing the session
sess.close()
C++
#include <iostream>
using namespace std;
int main() {
cout << "GFG!";
return 0;
}
Output: Sum of node1 and node2 is: 8
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 6: Experiment Tracking and Management
MLFlow:
What is MLflow?: Stepping into the world of Machine Learning (ML) is an exciting journey, but it
often comes with complexities that can hinder innovation and experimentation. MLflow is a solution to
many of these issues in this dynamic landscape, offering tools and simplifying processes to streamline
the ML lifecycle and foster collaboration among ML practitioners. Whether you’re an individual
researcher, a member of a large team, or somewhere in between, MLflow provides a unified platform to
navigate the intricate maze of model development, deployment, and management. MLflow aims to
enable innovation in ML solution development by streamlining otherwise cumbersome logging,
organization, and lineage concerns that are unique to model development. This focus allows you to
ensure that your ML projects are robust, transparent, and ready for real-world challenges. Read on to
discover the core components of MLflow and understand the unique advantages it brings to the complex
workflows associated with model development and management.
Core Components of MLflow
MLflow, at its core, provides a suite of tools aimed at simplifying the ML workflow. It is tailored to
assist ML practitioners throughout the various stages of ML development and deployment. Despite its
expansive offerings, MLflow’s functionalities are rooted in several foundational components:
• Tracking: MLflow Tracking provides both an API and UI dedicated to the logging of parameters,
code versions, metrics, and artifacts during the ML process. This centralized repository captures
details such as parameters, metrics, artifacts, data, and environment configurations, giving teams
insight into their models’ evolution over time. Whether working in standalone scripts, notebooks,
or other environments, Tracking facilitates the logging of results either to local files or a server,
making it easier to compare multiple runs across different users.
• Model Registry: A systematic approach to model management, the Model Registry assists in
handling different versions of models, discerning their current state, and ensuring smooth
productionization. It offers a centralized model store, APIs, and UI to collaboratively manage an
MLflow Model’s full lifecycle, including model lineage, versioning, aliasing, tagging, and
annotations.
• MLflow Deployments for LLMs: This server, equipped with a set of standardized APIs, streamlines
access to both SaaS and OSS LLM models. It serves as a unified interface, bolstering security
through authenticated access, and offers a common set of APIs for prominent LLMs.
• Evaluate: Designed for in-depth model analysis, this set of tools facilitates objective model
comparison, be it traditional ML algorithms or cutting-edge LLMs.
• Prompt Engineering UI: A dedicated environment for prompt engineering, this UI-centric
component provides a space for prompt experimentation, refinement, evaluation, testing, and
deployment.
• Recipes: Serving as a guide for structuring ML projects, Recipes, while offering recommendations,
are focused on ensuring functional end results optimized for real-world deployment scenarios.
• Projects: MLflow Projects standardize the packaging of ML code, workflows, and artifacts, akin to
an executable. Each project, be it a directory with code or a Git repository, employs a descriptor or
convention to define its dependencies and execution method.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
By integrating these core components, MLflow offers an end-to-end platform, ensuring efficiency,
consistency, and traceability throughout the ML lifecycle.
Why Use MLflow?: The machine learning (ML) process is intricate, comprising various stages, from
data preprocessing to model deployment and monitoring. Ensuring productivity and efficiency
throughout this lifecycle poses several challenges:
• Experiment Management: It’s tough to keep track of the myriad experiments, especially when
working with files or interactive notebooks. Determining which combination of data, code, and
parameters led to a particular result can become a daunting task.
• Reproducibility: Ensuring consistent results across runs is not trivial. Beyond just tracking code
versions and parameters, capturing the entire environment, including library dependencies, is
critical. This becomes even more challenging when collaborating with other data scientists or when
scaling the code to different platforms.
• Deployment Consistency: With the plethora of ML libraries available, there’s often no standardized
way to package and deploy models. Custom solutions can lead to inconsistencies, and the crucial
link between a model and the code and parameters that produced it might be lost.
• Model Management: As data science teams produce numerous models, managing, testing, and
continuously deploying these models becomes a significant hurdle. Without a centralized platform,
managing model lifecycles becomes unwieldy.
• Library Agnosticism: While individual ML libraries might offer solutions to some of the challenges,
achieving the best results often involves experimenting across multiple libraries. A platform that
offers compatibility with various libraries while ensuring models are usable as reproducible “black
boxes” is essential.
MLflow addresses these challenges by offering a unified platform tailored for the entire ML lifecycle.
Its benefits include:
• Traceability: With tools like the Tracking Server, every experiment is logged, ensuring that teams
can trace back and understand the evolution of models.
• Consistency: Be it accessing models through the MLflow Deployments for LLMs or structuring
projects with MLflow Recipes, MLflow promotes a consistent approach, reducing both the learning
curve and potential errors.
• Flexibility: MLflow’s library-agnostic design ensures compatibility with a wide range of machine
learning libraries. It offers comprehensive support across different programming languages, backed
by a robust REST API, CLI, and APIs for Python API, R API, and Java API.
By simplifying the complex landscape of ML workflows, MLflow empowers data scientists and
developers to focus on building and refining models, ensuring a streamlined path from experimentation
to production.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 7: Continuous Deployment (CD) for ML Models
Continuous integration has changed the way we develop software. But a CI environment is different
from production, and synthetic tests are not always enough to reveal problems. Some issues only appear
when they hit production, and by that time, the damage is already done. Canary deployments allow us
to test the waters before jumping in.
What Is Canary Deployment
In software engineering, canary deployment is the practice of making staged releases. We roll out a
software update to a small part of the users first, so they may test it and provide feedback. Once the
change is accepted, the update is rolled out to the rest of the users.
Canary deployments show us how users interact with application changes in the real world. As in blue-
green deployments, the canary strategy offers no-downtime upgrades and easy rollbacks. Unlike blue-
green, canary deployments are smoother, and failures have limited impact.
Releases vs. Deployments
A canary release is an early build of an application. Splitting stable and development branches is a
widespread strategy in the open-source world. Many projects use an odd/even numbering scheme to
separate stable from the non-stable version.Often companies publish canary versions of their products,
hoping that tech-savvy or power users want to download and try them out. Examples of companies
canarying their applications are Mozilla and their nightly and beta versions of Firefox, and Google, with
its canary release channel for Chrome.
In a canary deployment, on the other hand, we install the update in our systems and split the users into
two groups. A small percentage of them will go to the canary while the rest stay on the old version, as
a control.
Benefits of Canary Deployments
• A/B testing: we can use the canary to do A/B testing. In other words, we present two alternatives to the
users and see which gets better reception.
• Capacity test: it’s impossible to test the capacity of a large production environment. With canary
deployments, capacity tests are built-in. Any performance issues we have in our system will begin to
crop up as we slowly migrate the users to the canary.
• Feedback: we get invaluable input from real users.
• No cold-starts: new systems can take a while to start up. Canary deployments slowly build up
momentum to prevent cold-start slowness.
• No downtime: like blue-green deployments, a canary deployment doesn’t generate downtime.
• Easy rollback: if something goes wrong, we can easily roll back to the previous version.
• Canary releases: as long as we have some way of remotely updating software, we can do canary
releases. App stores are a great example of this. Both Google Play and Apple’s App Store
support staged rollouts. This feature lets us push updates in waves, to a set percent of users at a
time.
• Rolling canaries: we have numerous tools like AWS CodeDeploy, Chef, Puppet, or Docker to
help us perform rolling updates.
• Side-by-side canaries: cloud allows us to create and tear down hardware and services on
demand. We have tools like Terraform, Ansible, or AWS CloudFormation to define
infrastructure using code.
• CI/CD: when we add continuous delivery and deployment into the mix, we get one of the most
effective patterns for shipping out code.
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Practical 8: Monitoring and Alerting
The Fiddler web debugger tool:
The Fiddler tool helps you debug web applications by capturing network traffic between the Internet
and test computers. The tool enables you to inspect incoming and outgoing data to monitor and
modify requests and responses before the browser receives them. Fiddler also includes a powerful
event-based scripting subsystem, which you can extend by using any .NET Framework language.
Fiddler and the HTTP replay options can help you troubleshoot client-side issues with web applications
by making an offline copy of the test site. With these tools, you can create offline images of the
browsing experience and then package and analyse the results to obtain more detailed debug
information. To download the Fiddler add-on, go to the Internet Explorer add-ons page. For more
information about how to troubleshoot by using Fiddler and related tools. First Steps with Fiddler
Everywhere on Windows This describes how to install and start using Fiddler Everywhere on Windows.
• First, you will go through the installation and configuration steps.
• Next, you'll create a Fiddler account so that you can move on to using the web-debugging tool.
• Finally, you will see how to capture, inspect, and modify traffic.
• Create an account with administrative rights, which you'll need for capturing and decoding
HTTPS traffic.
• Provide an active Internet connection with access to the following URLs:
• https://*.telerik.com/
• https://*.getfiddler.com/
• https://fiddler-backend-production.s3-accelerate.amazonaws.com
https://recaptcha.net
The Telerik site (where the Fiddler authentication form resides) uses different CDNs to load various
components, styles, and cookies related to its user interface. Having limited internet access can cause
the site not to load correctly.
Ensure that Fiddler Every where’s proxy port is open and unrestricted by a firewall/security tool. The
default port is 8866, but you can change it from Settings > Connections > Fiddler listens on port.Step
1: Install Fiddler Everywhere on Your Machine
You'll first have to install your machine's latest version of Fiddler Everywhere.
1. Download and install the latest version of Fiddler Everywhere.
Fiddler Everywhere for Windows offers you the flexibility to select the installation scope based on
your preferences. You can opt for a per user installation, limiting access to the currently logged-in
user, or choose a per machine installation, allowing all users on the host machine to use Fiddler
Name: Drishti Dhingani
Batch: HN1
SAP Id: 60003200075
Everywhere. Note that simultaneous execution of multiple instances of Fiddler Everywhere on the
same host machine is not supported.
When Fiddler Everywhere is installed per machine, each individual user must log into Fiddler
Everywhere with their credentials, and the generated data won't be accessible from the other users.
Step 2: Create Your Fiddler Account
In this step, you'll register by creating your unified Telerik account.
1. Launch the Fiddler Everywhere application. Follow the Sign in or create an account link.
2. Create an account using email and password or using the Sign in with Google option.
The Fiddler Everywhere Enterprise subscription plan supports SSO login. Get in touch with
our support for detailed instructions on configuring your company-specific SSO.
3. Enter the requested profile information in the Telerik form.
4. Check your inbox, open the confirmation email, and complete your account activation.
5. Return the Fiddler Everywhere application and choose whether to become a trial user or
purchase a subscription plan by selecting either the Start Free Trial or the BUY NOW link.
After successful login into Fiddler Everywhere, you will see your personal and license information
within the Home pane.
Step 3: Interact with the Captured Traffic
You can now take advantage of the Fiddler Everywhere capabilities - capture, inspect, save & share,
import *& export, mock and modify HTTPS traffic.
1. Capture HTTPS traffic through the preferred capturing mode.
2. Inspect the captured data.
3. Save, share. or export the captured HTTPS traffic.
4. Modify a session through the Composer
5. Mock client and server behaviour through the Rules tab