0% found this document useful (0 votes)
121 views9 pages

Vgpu On Volcano

vgpu-on-volcano

Uploaded by

monicali950909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views9 pages

Vgpu On Volcano

vgpu-on-volcano

Uploaded by

monicali950909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Volcano vgpu device plugin for Kubernetes Example

Prerequisites

1. GPU driver has been successfully installed.


2. Nvidia-container-toolkit has been installed. Make sure default-runtime is set to nvidia in /etc/docker/daemon.json(Rememeber to restart docker service after the change)

3. Kubernetes has been properly installed and is functioning normally.

Volcano Installation
1. Make sure volcano version is higher than v1.9.0
2. You can follow the volcano installation documentation: https://volcano.sh/en/docs/v1-9-0/installation/
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm repo update
helm install volcano volcano-sh/volcano --version 1.9.0 -n volcano-system --create-namespace

3. Check if all pods are in running states

Volcano-vgpu-device-plugin Installation
1. You can follow the volcano-vgpu-device-plugin installation documentation: https://github.com/Project-HAMi/volcano-vgpu-device-plugin?tab=readme-ov-file#enabling-gpu-
support-in-kubernetes

kubectl edit cm -n volcano-system volcano-scheduler-configmap


Save volcano-vgpu-device-plugin.yml to local ```yaml

Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or


implied.

See the License for the specific language governing permissions and

limitations under the License.


apiVersion: v1 kind: ServiceAccount metadata: name: volcano-device-plugin

namespace: kube-system
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: volcano-device-plugin rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch",
"update", "patch"] - apiGroups: [""] resources: ["nodes/status"] verbs: ["patch"] - apiGroups: [""] resources: ["pods"]

verbs: ["get", "list", "update", "patch", "watch"]


kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: volcano-device-plugin subjects: - kind: ServiceAccount name: volcano-device-plugin
namespace: kube-system roleRef: kind: ClusterRole name: volcano-device-plugin

apiGroup: rbac.authorization.k8s.io
apiVersion: apps/v1 kind: DaemonSet metadata: name: volcano-device-plugin namespace: kube-system spec: selector: matchLabels: name: volcano-device-plugin
updateStrategy: type: RollingUpdate template: metadata: # This annotation is deprecated. Kept here for backward compatibility # See
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: name:
volcano-device-plugin spec: tolerations: # This toleration is deprecated. Kept here for backward compatibility # See https://kubernetes.io/docs/tasks/administer-
cluster/guaranteed-scheduling-critical-addon-pods/ - key: CriticalAddonsOnly operator: Exists - key: volcano.sh/gpu-memory operator: Exists effect: NoSchedule # Mark
this pod as a critical add-on; when enabled, the critical add-on # scheduler reserves resources for critical add-on pods so that they can # be rescheduled after a failure. #
See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" serviceAccount: volcano-
device-plugin containers: - image: docker.io/projecthami/volcano-vgpu-device-plugin:v1.9.4 args: ["--device-split-count=10"] lifecycle: postStart: exec: command: ["/bin/sh",
"-c", "cp -f /k8s-vgpu/lib/nvidia/* /usr/local/vgpu/"] name: volcano-device-plugin env: - name: NODENAME valueFrom: fieldRef: fieldPath: spec.nodeName - name:
HOOKPATH value: "/usr/local/vgpu" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] add: ["SYSADMIN"] volumeMounts: - name: device-plugin
mountPath: /var/lib/kubelet/device-plugins - name: lib mountPath: /usr/local/vgpu - name: hosttmp mountPath: /tmp - image: docker.io/projecthami/volcano-vgpu-device-
plugin:v1.9.4 name: monitor command: - /bin/bash - -c - volcano-vgpu-monitor env: - name: NVIDIAVISIBLEDEVICES value: "all" - name:
NVIDIAMIGMONITORDEVICES value: "all" - name: HOOKPATH value: "/tmp/vgpu" - name: NODENAME valueFrom: fieldRef: fieldPath: spec.nodeName
securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] add: ["SYS_ADMIN"] volumeMounts: - name: dockers mountPath: /run/docker - name:
containerds mountPath: /run/containerd - name: sysinfo mountPath: /sysinfo - name: hostvar mountPath: /hostvar - name: hosttmp mountPath: /tmp volumes: - hostPath:
path: /var/lib/kubelet/device-plugins type: Directory name: device-plugin - hostPath: path: /usr/local/vgpu type: DirectoryOrCreate name: lib - name: hosttmp hostPath:
path: /tmp type: DirectoryOrCreate - name: dockers hostPath: path: /run/docker type: DirectoryOrCreate - name: containerds hostPath: path: /run/containerd type:
DirectoryOrCreate - name: usrbin hostPath: path: /usr/bin type: Directory - name: sysinfo hostPath: path: /sys type: Directory - name: hostvar hostPath: path: /var type:
Directory ```

kubectl create -f volcano-vgpu-device-plugin.yml

2. Check if volcano-device-plugin pod in running states


3. Check node status

Running VGPU Jobs


1. Running a demo vgpu job
yaml cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: gpu-pod1 spec: schedulerName: volcano containers: - name

2. Check pod status

3. Running a single command to check if its working


Monitor
1. You can access the metrics interface of the volcano scheduler in cluster. For example: curl -vvv volcano-scheduler-service.volcano-system:8080/metrics
2. You can also change the Volcano service from ClusterIP mode to NodePort mode, which will allow external access to the metrics interface.

You might also like