0% found this document useful (0 votes)

121 views9 pages

Vgpu On Volcano

vgpu-on-volcano

Uploaded by

monicali950909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views9 pages

Vgpu On Volcano

vgpu-on-volcano

Uploaded by

monicali950909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Volcano vgpu device plugin for Kubernetes Example

Prerequisites

1. GPU driver has been successfully installed.

2. Nvidia-container-toolkit has been installed. Make sure default-runtime is set to nvidia in /etc/docker/daemon.json(Rememeber to restart docker service after the change)

3. Kubernetes has been properly installed and is functioning normally.

Volcano Installation
1. Make sure volcano version is higher than v1.9.0
2. You can follow the volcano installation documentation: https://volcano.sh/en/docs/v1-9-0/installation/
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm repo update
helm install volcano volcano-sh/volcano --version 1.9.0 -n volcano-system --create-namespace

3. Check if all pods are in running states

Volcano-vgpu-device-plugin Installation
1. You can follow the volcano-vgpu-device-plugin installation documentation: https://github.com/Project-HAMi/volcano-vgpu-device-plugin?tab=readme-ov-file#enabling-gpu-
support-in-kubernetes

kubectl edit cm -n volcano-system volcano-scheduler-configmap

Save volcano-vgpu-device-plugin.yml to local ```yaml

Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or

implied.

See the License for the specific language governing permissions and

limitations under the License.

apiVersion: v1 kind: ServiceAccount metadata: name: volcano-device-plugin

namespace: kube-system
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: volcano-device-plugin rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch",
"update", "patch"] - apiGroups: [""] resources: ["nodes/status"] verbs: ["patch"] - apiGroups: [""] resources: ["pods"]

verbs: ["get", "list", "update", "patch", "watch"]

kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: volcano-device-plugin subjects: - kind: ServiceAccount name: volcano-device-plugin
namespace: kube-system roleRef: kind: ClusterRole name: volcano-device-plugin

apiGroup: rbac.authorization.k8s.io
apiVersion: apps/v1 kind: DaemonSet metadata: name: volcano-device-plugin namespace: kube-system spec: selector: matchLabels: name: volcano-device-plugin
updateStrategy: type: RollingUpdate template: metadata: # This annotation is deprecated. Kept here for backward compatibility # See
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: name:
volcano-device-plugin spec: tolerations: # This toleration is deprecated. Kept here for backward compatibility # See https://kubernetes.io/docs/tasks/administer-
cluster/guaranteed-scheduling-critical-addon-pods/ - key: CriticalAddonsOnly operator: Exists - key: volcano.sh/gpu-memory operator: Exists effect: NoSchedule # Mark
this pod as a critical add-on; when enabled, the critical add-on # scheduler reserves resources for critical add-on pods so that they can # be rescheduled after a failure. #
See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" serviceAccount: volcano-
device-plugin containers: - image: docker.io/projecthami/volcano-vgpu-device-plugin:v1.9.4 args: ["--device-split-count=10"] lifecycle: postStart: exec: command: ["/bin/sh",
"-c", "cp -f /k8s-vgpu/lib/nvidia/* /usr/local/vgpu/"] name: volcano-device-plugin env: - name: NODENAME valueFrom: fieldRef: fieldPath: spec.nodeName - name:
HOOKPATH value: "/usr/local/vgpu" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] add: ["SYSADMIN"] volumeMounts: - name: device-plugin
mountPath: /var/lib/kubelet/device-plugins - name: lib mountPath: /usr/local/vgpu - name: hosttmp mountPath: /tmp - image: docker.io/projecthami/volcano-vgpu-device-
plugin:v1.9.4 name: monitor command: - /bin/bash - -c - volcano-vgpu-monitor env: - name: NVIDIAVISIBLEDEVICES value: "all" - name:
NVIDIAMIGMONITORDEVICES value: "all" - name: HOOKPATH value: "/tmp/vgpu" - name: NODENAME valueFrom: fieldRef: fieldPath: spec.nodeName
securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] add: ["SYS_ADMIN"] volumeMounts: - name: dockers mountPath: /run/docker - name:
containerds mountPath: /run/containerd - name: sysinfo mountPath: /sysinfo - name: hostvar mountPath: /hostvar - name: hosttmp mountPath: /tmp volumes: - hostPath:
path: /var/lib/kubelet/device-plugins type: Directory name: device-plugin - hostPath: path: /usr/local/vgpu type: DirectoryOrCreate name: lib - name: hosttmp hostPath:
path: /tmp type: DirectoryOrCreate - name: dockers hostPath: path: /run/docker type: DirectoryOrCreate - name: containerds hostPath: path: /run/containerd type:
DirectoryOrCreate - name: usrbin hostPath: path: /usr/bin type: Directory - name: sysinfo hostPath: path: /sys type: Directory - name: hostvar hostPath: path: /var type:
Directory ```

kubectl create -f volcano-vgpu-device-plugin.yml

2. Check if volcano-device-plugin pod in running states

3. Check node status

Running VGPU Jobs

1. Running a demo vgpu job
yaml cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: gpu-pod1 spec: schedulerName: volcano containers: - name

2. Check pod status

3. Running a single command to check if its working

Monitor
1. You can access the metrics interface of the volcano scheduler in cluster. For example: curl -vvv volcano-scheduler-service.volcano-system:8080/metrics
2. You can also change the Volcano service from ClusterIP mode to NodePort mode, which will allow external access to the metrics interface.

Vgpu On Volcano New
No ratings yet
Vgpu On Volcano New
13 pages
Multinode Cluster Creation
No ratings yet
Multinode Cluster Creation
9 pages
How To Install Kubernetes Cluster On Ubuntu 24.04 LTS (Step-by-Step Guide)
No ratings yet
How To Install Kubernetes Cluster On Ubuntu 24.04 LTS (Step-by-Step Guide)
4 pages
Scheduler
No ratings yet
Scheduler
13 pages
CKA Docs
No ratings yet
CKA Docs
11 pages
Only Componet Interact With ETCD
No ratings yet
Only Componet Interact With ETCD
35 pages
Kubernetes Quick Notes
No ratings yet
Kubernetes Quick Notes
96 pages
Spk-Troubleshooting HTML
No ratings yet
Spk-Troubleshooting HTML
17 pages
50 Kubernetes Tips & Useful Tricks With Usecases Part-1,2,3
No ratings yet
50 Kubernetes Tips & Useful Tricks With Usecases Part-1,2,3
10 pages
100 Kubernetes Tips & Useful Tricks With Usecases Part 1,2,3,4,5
100% (1)
100 Kubernetes Tips & Useful Tricks With Usecases Part 1,2,3,4,5
21 pages
Kubernetes Imp
No ratings yet
Kubernetes Imp
4 pages
Kubernatis (k8s) Basics: Components
No ratings yet
Kubernatis (k8s) Basics: Components
20 pages
Kuberneets Task
No ratings yet
Kuberneets Task
7 pages
Class 1
No ratings yet
Class 1
11 pages
Kubernetes Scheduling Guide
No ratings yet
Kubernetes Scheduling Guide
34 pages
Kubernetes Kubectl Commands Guide
100% (1)
Kubernetes Kubectl Commands Guide
5 pages
Kubernetes Ultimate Notes
No ratings yet
Kubernetes Ultimate Notes
45 pages
Kubernetes Command Reference
No ratings yet
Kubernetes Command Reference
1 page
Kubectl Commands Cheat Sheet PDF
No ratings yet
Kubectl Commands Cheat Sheet PDF
1 page
Kubernetes Command Reference Guide
No ratings yet
Kubernetes Command Reference Guide
1 page
Kubectl Commands Cheat Sheet
No ratings yet
Kubectl Commands Cheat Sheet
1 page
Kubernetes Command Cheat Sheet
No ratings yet
Kubernetes Command Cheat Sheet
1 page
Deploy Kubectl On Vagrant Machinies
No ratings yet
Deploy Kubectl On Vagrant Machinies
5 pages
K8s Cluster Kubeadm
No ratings yet
K8s Cluster Kubeadm
10 pages
Kuber Troubleshooting
No ratings yet
Kuber Troubleshooting
7 pages
SANS Kubernetes Cloud Native Security DevSecOps Automation
No ratings yet
SANS Kubernetes Cloud Native Security DevSecOps Automation
2 pages
In Class Lab2
No ratings yet
In Class Lab2
7 pages
Assignment 3
No ratings yet
Assignment 3
21 pages
Install Kubernetes in OCI
No ratings yet
Install Kubernetes in OCI
20 pages
Cheat Sheet: Kubernetes: Master(s)
No ratings yet
Cheat Sheet: Kubernetes: Master(s)
20 pages
Advanced Kubernetes Scenarios
No ratings yet
Advanced Kubernetes Scenarios
45 pages
Kubernetes Interview Questions
No ratings yet
Kubernetes Interview Questions
16 pages
Kubernets CommandCheastsheet
No ratings yet
Kubernets CommandCheastsheet
3 pages
Application Deployment-Kuber
No ratings yet
Application Deployment-Kuber
10 pages
Installation Guide v1.1
No ratings yet
Installation Guide v1.1
12 pages
K8s Scheduling Scenarios
No ratings yet
K8s Scheduling Scenarios
6 pages
1) Node Selector:: Let Us Start With A Simple Example
No ratings yet
1) Node Selector:: Let Us Start With A Simple Example
71 pages
Kubernetes Cheat Sheet r1v1
100% (1)
Kubernetes Cheat Sheet r1v1
10 pages
CKA Exam Questoins
No ratings yet
CKA Exam Questoins
12 pages
Kubernetes CheatSheet-1
No ratings yet
Kubernetes CheatSheet-1
9 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
Cluster Administration
No ratings yet
Cluster Administration
6 pages
Cka Kubernetes Application Developer Crash Course
No ratings yet
Cka Kubernetes Application Developer Crash Course
172 pages
CKA
No ratings yet
CKA
8 pages
Installing Minikube On Centos8 GCP Compute
No ratings yet
Installing Minikube On Centos8 GCP Compute
17 pages
Linux-Foundation - Vceup .Pre .CKS .48q-DeMO
No ratings yet
Linux-Foundation - Vceup .Pre .CKS .48q-DeMO
117 pages
KNT Iqs 11
No ratings yet
KNT Iqs 11
13 pages
Kubernetes Tasks Documentation: Release 0.1
No ratings yet
Kubernetes Tasks Documentation: Release 0.1
51 pages
CKAD
0% (1)
CKAD
74 pages
Kubernetes
No ratings yet
Kubernetes
4 pages
Multinode k8s Cluster
No ratings yet
Multinode k8s Cluster
51 pages
Kubernetes Setup Guide on Windows
No ratings yet
Kubernetes Setup Guide on Windows
10 pages
Part 25 - Troubleshooting Kubernetes Scenarios
No ratings yet
Part 25 - Troubleshooting Kubernetes Scenarios
17 pages
Day 1 Training Kubernetes
No ratings yet
Day 1 Training Kubernetes
7 pages
Roles and Responsibilities of The MD and RSC Mission
No ratings yet
Roles and Responsibilities of The MD and RSC Mission
10 pages
Finding MR Wright Leaning N Ranch Series Book 2 Ba Tortuga Download
No ratings yet
Finding MR Wright Leaning N Ranch Series Book 2 Ba Tortuga Download
34 pages
Exam Preparation Guide: Week 10
No ratings yet
Exam Preparation Guide: Week 10
98 pages
Present Perfect
No ratings yet
Present Perfect
20 pages
Chola Dynasty
No ratings yet
Chola Dynasty
20 pages
TWN IT Fundamentals Course Brochure
No ratings yet
TWN IT Fundamentals Course Brochure
21 pages
What Is Hyperledger Fabric
No ratings yet
What Is Hyperledger Fabric
14 pages
Making Inferences
No ratings yet
Making Inferences
1 page
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
No ratings yet
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
7 pages
Class 7 Science Chapter 15 LIGHT
No ratings yet
Class 7 Science Chapter 15 LIGHT
6 pages
Adventist Sabbath School History & Importance
No ratings yet
Adventist Sabbath School History & Importance
10 pages
Ebin - Pub - Essays On Indic History
No ratings yet
Ebin - Pub - Essays On Indic History
198 pages
Final Report
No ratings yet
Final Report
37 pages
Gejala Batu Hempedu
No ratings yet
Gejala Batu Hempedu
4 pages
Experiment Number 3 Basic Router Configuration
No ratings yet
Experiment Number 3 Basic Router Configuration
4 pages
Lesson 5 Advanced Grammar Inversion
No ratings yet
Lesson 5 Advanced Grammar Inversion
4 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
M350 Network Configuration Instructions
No ratings yet
M350 Network Configuration Instructions
13 pages
Business English Teaching Insights
No ratings yet
Business English Teaching Insights
53 pages
Naura Khairunnisa A 201770018 Skill I: Be Sure The Sentence Has A Subject and A Verb
No ratings yet
Naura Khairunnisa A 201770018 Skill I: Be Sure The Sentence Has A Subject and A Verb
3 pages
Building and Maintaining Friendships
No ratings yet
Building and Maintaining Friendships
3 pages
Vmware Aria Operations For Networks Release Notes
No ratings yet
Vmware Aria Operations For Networks Release Notes
28 pages
Women and Literacy Local and Global Inquiries For A New Century 1 Edition Beth Daniell PDF Download
100% (3)
Women and Literacy Local and Global Inquiries For A New Century 1 Edition Beth Daniell PDF Download
61 pages
Midterm Test: 2. Grammar
No ratings yet
Midterm Test: 2. Grammar
3 pages
2016-30 Jan-Three Holy Hierarchs
No ratings yet
2016-30 Jan-Three Holy Hierarchs
8 pages
SAP S - 4HANA Utilities For CE - SAP Community
No ratings yet
SAP S - 4HANA Utilities For CE - SAP Community
20 pages
English
No ratings yet
English
21 pages
India's Contribution To Linguistics
No ratings yet
India's Contribution To Linguistics
24 pages
Hello Beyond Words - T2 - All in 1
No ratings yet
Hello Beyond Words - T2 - All in 1
92 pages
Shell Scripting Interview Questions and Answers
No ratings yet
Shell Scripting Interview Questions and Answers
7 pages

Vgpu On Volcano

Uploaded by

Vgpu On Volcano

Uploaded by

Volcano vgpu device plugin for Kubernetes Example

1. GPU driver has been successfully installed.

3. Kubernetes has been properly installed and is functioning normally.

3. Check if all pods are in running states

kubectl edit cm -n volcano-system volcano-scheduler-configmap

Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");

You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or

limitations under the License.

verbs: ["get", "list", "update", "patch", "watch"]

kubectl create -f volcano-vgpu-device-plugin.yml

2. Check if volcano-device-plugin pod in running states

Running VGPU Jobs

2. Check pod status

3. Running a single command to check if its working

You might also like