PARAM Siddhi-AI System Manual Ver1.0
PARAM Siddhi-AI System Manual Ver1.0
Version 1.0
February 2022
©Copyright Notice
Copyright©2021 Centre for Development of Advanced Computing
No part of this publication may be copied without the express written permission of C-
DAC.
®Trademark
CDAC, CDAC logo, NSM logo are trademarks or registered trademarks.
Other brands and product names mentioned in this manual may be trademarks or
registered trademarks of their respective companies and are hereby acknowledged.
Getting Help
For technical assistance, please send an email to [email protected]
DISCLAIMER
The information contained in this document is subject to change without notice.
C-DAC shall not be liable for errors contained herein or for incidental or consequential
damages in connection with the performance or use of this manual.
PREFACE
The PARAM Siddhi-AI supercomputer manual is intended for the users who want to pursue
research on cutting-edge, high-performance computing (HPC) technologies and tools that combine
the power of Artificial Intelligence(AI). This document aims to onboard the researchers on the
PARAM Siddhi-AI system.
The content of the user manual has been divided into the following chapters :
1. Prerequisites
Chapter 1 includes the prerequisites for the users of the PARAM Siddhi-AI system.
2. Target Audience
Chapter 2 illustrates for whom this user guide is supposed to be useful or get the
maximum benefit on PARAM Siddhi-AI System.
3. Introduction to PARAM Siddhi-AI System
Chapter 3 introduces the PARAM Siddhi-AI system to show how it provides a
perfect platform for HPC and AI researchers to leverage the power of GPU-based
clusters to scale up their applications with a massive amount of data.
4. PARAM Siddhi-AI System Architecture and Configuration
Chapter 4 lists the specification of the PARAM Siddhi-AI system with its detailed
architecture and configuration.
5. PARAM Siddhi-AI Accessing Mode
Chapter 5 contains the information about the different accessing modes of the
PARAM Siddhi-AI system when users try to connect it from inside or outside the
C-DAC premise.
Appendix
This section includes the sample job submission script, NPSF Support details,
references, VPN Client manual, FAQs, online resources for further reading, and
acknowledging the PARAM Siddhi-AI system under National Supercomputing
Mission(NSM) in the publications.
Contents
1. Prerequisites ........................................................................................................................................ 6
2. Target Audience .................................................................................................................................. 8
3. Introduction to PARAM Siddhi-AI System...................................................................................... 9
4. PARAM Siddhi-AI System Architecture and Configuration ....................................................... 10
4.1 Login/Compile Node ....................................................................................................................... 11
4.2 Compute Nodes ............................................................................................................................... 11
4.3 Hardware Specification .................................................................................................................. 11
4.4 Software Specifications ................................................................................................................... 11
4.5 Packages Installed ........................................................................................................................... 12
5. PARAM Siddhi-AI System Accessing Mode .................................................................................. 13
5.1 Access Mode .................................................................................................................................... 13
5.1.1 Connect to C-DAC VPN .......................................................................................................... 13
5.1.2 Login to PARAM Siddhi-AI System ...................................................................................... 15
5.2 Password Change ............................................................................................................................ 18
5.3 Data Transfer between local machine and PARAM Siddhi-AI System..................................... 19
6. PARAM Siddhi-AI Computing Environment .................................................................................... 25
6.1 Screen Utility ................................................................................................................................... 25
6.2 Nvidia-smi ........................................................................................................................................ 27
7. Applications on PARAM Siddhi-AI System ................................................................................... 28
7.1 Setting up Conda Environment ..................................................................................................... 28
7.2 Working with Jupyter Notebook ................................................................................................... 29
7.2.1 Jupyter Notebook access from a local system inside the C-DAC network: ........................ 30
7.2.2 Jupyter Notebook Access from a local system outside the C-DAC network ...................... 32
7.3 Running an MPI application.......................................................................................................... 35
7.4 CUDA Aware MPI .......................................................................................................................... 36
7.5 Working with Compilers ................................................................................................................ 38
7.6 Working with CUDA ...................................................................................................................... 38
7.6 Working with Enroot Containers .................................................................................................. 39
8. Job Queueing System on PARAM Siddhi-AI System........................................................................ 43
8.1 Components of Batch Processing Systems .................................................................................... 43
8.2 Job .................................................................................................................................................... 43
1
8.3 Job Submission and Resource Utilization ..................................................................................... 44
8.3.1 Non-interactive (batch) mode ................................................................................................. 44
8.3.2 Interactive mode....................................................................................................................... 45
8.4 Information on jobs ........................................................................................................................ 46
8.5 Controlling the jobs ........................................................................................................................ 46
8.6 Monitoring the jobs......................................................................................................................... 47
Appendix .................................................................................................................................................... 50
‘A’ Sample Job Submission Script ...................................................................................................... 50
‘B’ Support Details ............................................................................................................................... 55
‘C’ References ....................................................................................................................................... 56
‘D’ VPN Client Manual ........................................................................................................................ 57
‘E’ Frequently Asked Questions .......................................................................................................... 81
‘F’ Acknowledging the National Supercomputing Mission in publications .................................... 84
2
LIST OF FIGURES
3
Figure 7.11: A glimpse of GPUDirect RDMA for CUDA Aware MPI - I
Figure 7.12: A glimpse of GPUDirect RDMA for CUDA Aware MPI- II
Figure 8.1: Status of resources
4
LIST OF TABLES
5
1. Prerequisites
The primary prerequisites of the PARAM Siddhi-AI system include basic knowledge of
programming in the Linux environment. The expected secondary prerequisites are listed below:
➢ Program editor
Jupyter notebook
6
➢ List of some useful resources :
Conda Environment
https://docs.conda.io/en/latest/
Jupyter Notebook
https://jupyter.org/
CUDA programming
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
http://www.pgroup.com/doc/pgicudafortug.pdf
Programming tutorials
https://www.python.org/
https://www.r-project.org/
https://www.mathworks.com/help/matlab/programming-and-data-types.html
Introduction to Supercomputing
http://homepages.math.uic.edu/~jan/mcs572/
7
2. Target Audience
8
3. Introduction to PARAM Siddhi-AI System
This chapter introduces the user to the PARAM Siddhi-AI system. Param Siddhi-AI
supercomputer was established under the National Supercomputing Mission (NSM) at C-DAC and
developed jointly with the support of the Department of Science and Technology (DST), Ministry
of Electronics and Information Technology (MeitY). This supercomputer emphasizes the
convergence of High-Performance Computing and Artificial Intelligence (HPC-AI). It has
achieved the global ranking of 62 in the TOP 500 most powerful non-distributed computer systems
in the world, with 6.5 PF DP performance and 210 PF AI performance. PARAM Siddhi-AI
Supercomputer is built on the NVIDIA DGX A100 SuperPOD reference architecture. With the
enhanced computing capabilities, it helps in the application of HPC and AI in various areas like
image processing, video analytics, brain computing, speech technologies, robotics, virtual reality,
accelerated computing, graphics virtualization, etc. along with traditional scientific computing.
We believe that the HPC-AI user community will be benefited from the PARAM Siddhi-AI
system, which offers state-of-art HPC-AI facilities for running their application.
9
4. PARAM Siddhi-AI System Architecture and Configuration
The PARAM Siddhi-AI System is a 42-node cluster of NVIDIA DGX A100 systems with HDR
InfiniBand interconnect and with the peak computing power of 210 PF(AI) and 6.5 PF(DP), and
sustained computing power of 4.62 PF (DP). NVIDIA DGX-A100 systems are the compute nodes,
allocated through the SLURM scheduler, and 10.5 PiB lustre PFS-based storage has been made
available as shared storage on all the nodes. Each node is having 8 NVIDIA A100 Tensor Core
GPUs, Dual socket AMD EPYC 7742 64C 2.25GHz CPU, 320 GB GPU memory, and 1 TB system
memory.
• Login/compile Node
• Compute Nodes
10
4.1 Login/Compile Node
This is an entry point to the system for users to write, compile and build their applications.
Additionally, the user can submit and track their jobs from this node. For example, here, login-
siddhi is the login and compile node.
Component Specification
CPU AMD EPYC 7742 64C 2.25GHz
CPU Core 128 cores (Dual Socket each with 64 cores)
[256 cores with Hyper-threading]
L3 Cache 256 MB
System Memory (RAM) 1 TB
GPU NVIDIA A100 –SXM4
GPUMemory 40GB
Total No. of GPUs per node 8
Storage 10.5 PiB PFS based storage
Networking Mellonex ConnectX-6 VPI (infiniband HDR)
Table 4.1 Hardware Specifications of one compute node
Software Specifications
OS Ubuntu 20.04.2 LTS (DGXOS 5.0.5)
Kernel 5.4.0-80-generic
CUDA 10.1
NVIDIA Driver Version 450.142.00
NVIDIA NGC Support https://ngc.nvidia.com/signin
Table 4.2 Software Specification
11
4.5 Packages Installed
Software Version
Python 3.8, 3.9
HPC-X 2.9
HPC-SDK 20.7
CUDA 11.0.2
11.2.0
11.3.0
11.4.0
Enroot 3.3.1
Chakshu
(Multi-Cluster Monitoring Plateform)
Table 4.3: Packages installed
Note: Please refer to chapter 7 for detail on using these packages in your program.
12
5. PARAM Siddhi-AI System Accessing Mode
ssh based access has been provided for accessing the PARAM Siddhi-AI system, which allows
users to log in to the system through a two-stage process :
Users are required to install a VPN Client (FortiClient) on their local laptop/computer to connect
to the PARAM Siddhi-AI system. Therefore, it is advisable to refer to the FortiClient installation
cum user manuals in Annexure’D’ for appropriate installation instructions. Upon successful
installation of the VPN client (FortiClient),users may connect to the VPN as per the below steps :
A) Start the FortiClient application and then enter “User ID” and “password” provided at the time
of account creation through an e-mail sent by [email protected]
13
Figure 5.1 : FortiClient Login Screen
B) Click on Connect button. After successfully establishing the VPN connection, you will see
the below screen.
14
Note: Step 2 must be followed only after successfully establishing the VPN connection.
The login node is the primary gateway to the rest of the cluster, including a job scheduler (Slurm).
You may submit jobs to the queue, which will run when the required resources are available. It is
advisable to do not run programs directly on the login node. Instead, the login node is used to
submit jobs, transfer data, and compile source code. (If your compilation takes more than a few
minutes, you should submit the compilation job into the queue to be run on the cluster.) Secure
Shell-based access to PARAM Siddhi- AI login/compile node can be accessed from Windows
and Linux machines as per the steps given below :
A) Accessing from Windows machine
It is recommended to use MobaXterm or Putty software tool as an SSH client on a Windows machine to
log in to PARAM Siddhi-AI System. Free Home Edition of the same can be downloaded from the following
URLs:
https://mobaxterm.mobatek.net/download.html
or
https://the.earth.li/~sgtatham/putty/latest/w64/putty.exe
15
Figure 5.3 : Access through MobaXterm
16
• Access through Putty
17
After entering the user’s login password, you will get the shell of the login node
Example :
Once the command is executed, you will be prompted for password. Enter the password. After entering
the user password, you will get the shell of the login node as shown below.
Note: If you are unable to connect with the PARAM Siddhi-AI system after following the
above steps, please refer to the appendix FAQ for troubleshooting.
18
Password setting criteria:
Password should be of a minimum of eight (8) characters in length. In addition, the password should contain
at least two upper case alphabets, two lower case alphabets, two numerical, and two special characters.
Password will be valid for six months, and one cannot re-use/repeat the last three(previous)passwords.
5.3 Data Transfer between local machine and PARAM Siddhi-AI System
Users need to have the data and application related to their project/research work on the PARAM
Siddhi-AI system. To store the data, directories have been made available to the users with the
name “ home” the path to this directory is “/home”. Whereas these directories are common to all
the users, a user will get his directory with their username in /home/ directories where they can
store the data.
However, there is a limit to the storage provided to the users, which is defined according to quota
over these directories. All users will be allotted the same quota by default. When users wish to
transfer data from their local system (laptop/desktop) to the PARAM Siddhi-AI system, they can
use various methods and tools from Windows and Linux machines.
A) Windows Machine
• Through MobaXterm
Users can copy small files from/to their local machine and PARAM Siddhi-AI system using the
MobaXterm tool by following the below steps :
i) Upload files from your local system (laptop/desktop) to the PARAM Siddhi-AI Cluster, click on the
upload button as shown in the below image
19
Figure 5.8 : Data upload through MobaXterm
Example :
See the below image to upload DIRAC-19.0-Source.tar file from the local system to PARAM Siddhi-AI
Cluster.
ii) To download files from the PARAM Siddhi-AI system to your local system, follow the steps shown in
below image:
20
Figure 5.10: data download through MobaXTerm
Example :
Select the file “DIRAC-19.0-Source.tar” file from the left panel and click on the download selected files.
It will get downloaded on your local system.
• Through WinSCP
Users can copy small files from/to their local machine and PARAM Siddhi-AI system using the
WinSCP tool by following the below steps :
21
Figure 5.11 : Data transfer through WinSCP
ii) After establishing the connection, To upload files from your local system to the PARAM
Siddhi-AI system and To download files from the remote system to your local system, drag and
drop the files in the respective location.
22
Figure 5.12: Upload/download from the remote system to your local system using WinSCP.
B) Linux Machine
Use the below commands to copy data/file(s) to/from your local Linux system and PARAM
Siddhi-AI system.
i) Upload
Example :
23
Figure 5.13: Upload from local Linux system to PARAM Siddhi-AI system using command
line
ii) Download
Example :
$ scp [email protected]:~suresht/test.txt .
Figure 5.14: Download from PARAM Siddhi-AI to local Linux system using command line.
However, the above methods should only be used if someone wants to transfer small files. If you
wish to transfer a large file or data set, we recommend you upload the same on the web (maybe on
GitHub) and download from there directly on the PARAM Siddhi-AI system by some method like
wget, git clone, etc.
24
6. PARAM Siddhi-AI Computing Environment
This chapter introduces the PARAM Siddhi-AI system's computing environment. It introduces
the user to work with the PARAM Siddhi-AI HPC cluster.
$ screen
This will open a screen session, create a new window, and start a shell in that window.
Named screen sessions are useful when you run multiple screen sessions. To create a named
session, run the screen command with the following arguments:
You can detach from the screen session at any time by typing: (press ctrl and a key together; then
release both keys and press d )
$ ctrl+a d
25
The program running in the screen session will continue to run after you detach from the session
$ screen -x
Below are some most common commands for managing Linux Screen Windows:
$ screen -r
In case you have multiple screen sessions running on your machine, you will need to append the
screen session ID after the r switch.
When you start a new screen session, it creates a single window with a shell in it. You can have
multiple windows inside a screen session.
To create a new window with shell type Ctrl+a c, the first available number from the
range 0...9 will be assigned to it.
$ screen -ls
H) To resume your screen session with session use the following command
26
$ screen -r <session id>
6.2 Nvidia-smi
The NVIDIA System Management Interface (nvidia-smi) is a command-line utility based on top
of the NVIDIA Management Library (NVML). It is intended to aid in the management and
monitoring of NVIDIA GPU devices.
$ nvidia-smi
The output of this command is shown in figure 6.1 .
27
7. Applications on PARAM Siddhi-AI System
This chapter presents the detailed HPC and AI libraries/packages installed or ways to install by
users on the PARAM Siddhi-AI system.
7.1 Setting up Conda Environment
A Conda environment is a directory containing users' specific collection of installed Conda
packages. Conda allows users to manage dependencies and isolate the projects from different
versions of projects. One can create, export, list, remove, and update environments with different
package versions installed in them. Switching or moving between environments is called activating
and deactivating the environment. For example, you may have one environment with Python 3.7
and its dependencies for one project and another environment with Python 3.8 and its dependencies
for the second project. If you change one environment, your other environments are not affected.
In addition, you can easily activate or deactivate environments. PARAM Siddhi-AI system allows
users to work in Conda environment efficiently. Users can install Conda environment and other
Machine Learning/Deep Learning frameworks/packages in their home directory by following the
below steps -
I. Log into the login-siddhi node
II. Download Conda installation script as below:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
III. Create a temporary build directory
$ mkdir `pwd`/tmp
$ export TMPDIR=’pwd’/tmp
IV. Run Conda Script. This command installs the Conda environment in your home directory
under the Conda folder.
$ sh Miniconda3-latest-Linux-x86_64.sh -b -p Conda -u
V. Activate Conda environment after successful installation
$ source Conda/bin/activate
VI. Now your environment is ready to install any packages as per your requirement
For example :
VII. To test the environment, you can run your python script as below:
28
Note: You may find a quick Conda cheat sheet at below URL:
https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/con
da-cheatsheet.pdf
$ source Conda/bin/activate
For example,
(base)$ jupyter notebook --ip=0.0.0.0 --port=8892 --allow-root --no-browser &
29
Figure 7.1: Screen after launching Jupyter notebook showing the token number
Note: Token number displayed on the screen (as shown in figure 7.1) would later be
used for logging into Jupyter notebook through your local web browser.
7.2.1 Jupyter Notebook access from a local system inside the C-DAC network:
I. Type the below address in your local system browser to access Jupyter notebook.
http://login-siddhi.pune.cdac.in:<port number>
For example,
http://login-siddhi.pune.cdac.in:8892
II. Enter the token number (as shown in Figure 7.1) when asked by the jupyter home
page as shown in figure 7.2
30
Figure 7.2 : Jupyter home page to enter the token
III. Upon successful login, you may start developing code using Jupyter notebook. Figure
7.3 shows the GUI of the Jupyter notebook
31
7.2.2 Jupyter Notebook Access from a local system outside the C-DAC network
Follow the below command for tunneling between login-siddhi node and local machine :
$ ssh -f -N -L <Tunnel Port> :< PARAM Siddhi- AI system> :< jupyter notebook port number>
<PARAM Siddhi-AI system user login name >@< PARAM Siddhi-AI system >
Example:
Note: Replace with appropriate <PARAM Siddhi-AI system user login name> and enter the user
password that is provided to you.
Example:
$ http://localhost:5400
Note: One can specify any local port number as Tunnel Port (>1024) instead of 5400.
32
Figure 7.4: Tunneling Process using Putty
Start Session (at top) from left column and specify Host name: login-siddhi.pune.cdac.in
33
Figure 7.5: Starting session using Putty
Figure 7.6: Terminal in your local machine to connect with the PARAM Siddhi-AI System
After successful login to gateway system (on the terminal in your local machine), start the
browser client at your local machine and type URL as below :
localhost: 5400
Enter the token number displayed on the screen (as shown in figure 7.1) in the Jupyter
notebook login page on the local machine’s web browser.
34
Figure 7.7: Token authentication screen of Jupyter
The message passing interface (MPI) is a standardized means of exchanging messages between
multiple nodes running a parallel program across the cluster in distributed memory manner. In
PARAM Siddhi-AI, MPI is already installed, packed in the NVIDIA HPC-X package,
improving applications' performance and scalability. However, users have to follow the below
steps to run their MPI application from their working directory:
$ source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
35
III. Run program
Message Passing Interface (MPI) is a standard API for communicating data via messages between
distributed processes commonly used in HPC to build applications that can scale to multi-node
computer clusters. The newer version of MPI is fully compatible with CUDA, designed for parallel
computing on a single node or multi-node. To accelerate the MPI application with GPUs or enable
an existing single-node multi-GPU application to scale across multiple nodes, CUDA-aware MPI
provides a feature to easily and efficiently scale up a multi-GPU application. In PARAM Siddhi-
AI supercomputer CUDA-aware MPI is enabled and implemented through OpenMPI.
With a regular MPI implementation, only pointers to host memory can be passed to MPI. But if
we combine the MPI program and CUDA, we need to send GPU buffers instead of host buffers.
Without CUDA-aware MPI, we need to stage GPU buffers through host memory,
using cudaMemcpy as shown in the following code in figure 7.3 :
//MPI rank 0
cudaMemcpy(s_buf_h,s_buf_d,size,cudaMemcpyDeviceToHost);
MPI_Send(s_buf_h,size,MPI_CHAR,1,100,MPI_COMM_WORLD);
//MPI rank 1
MPI_Recv(r_buf_h,size,MPI_CHAR,0,100,MPI_COMM_WORLD, &status);
cudaMemcpy(r_buf_d,r_buf_h,size,cudaMemcpyHostToDevice);
Figure 7.9 : Sending buffers from Host to device and device to host
This is not necessary with a CUDA-aware MPI library; the GPU buffers can be directly passed to
MPI as in the following code in figure 7.4.
//MPI rank 0
MPI_Send(s_buf_d,size,MPI_CHAR,1,100,MPI_COMM_WORLD);
36
CUDA Aware MPI works based on GPUDirect RDMA. The newest GPUDirect RDMA feature,
introduced with CUDA 5.0, supports Remote Direct Memory Access (RDMA), with which buffers
can be directly sent from the GPU memory to a network adapter without staging through host
memory, as shown in figure 7.5 and figure 7.6
Figure 7.11: A glimpse of GPUDirect RDMA for CUDA Aware MPI - I (Source:
https://developer.nvidia.com/blog/parallelforall/wpcontent/uploads/2013/03/GPUDirectRDMA.p
ng)
Fig 7.12 : A glimpse of GPUDirect RDMA for CUDA Aware MPI - II (Source :
https://developer.nvidia.com/blog/parallelforall/wpcontent/uploads/2013/03/GPUDirectP2P.png)
With CUDA Aware MPI, users can write the program without copying the buffer between Host
and device, high-bandwidth, low-latency communications with NVIDIA GPUs.
37
7.5 Working with Compilers
In PARAM Siddhi-AI, the NVIDIA HPC SDK version 21.7 is installed. HPC-SDK is a
comprehensive suite of compilers and libraries. Users are required to set up the HPC-SDK
environment as below before using any available compilers:
$ source /opt/nvidia/env.sh
$ ./hello
$ ./hello
$ ./hello
$ ./hello
PARAM Siddhi-AI system has the following available versions of CUDA installed. Users
can choose any version of CUDA as per their application by adding the below line in their
job script.
• CUDA 11.0.2
$ source /opt/cuda-11.0.2/env.sh
38
• CUDA 11.2.0
$ source /opt/cuda-11.2.0/env.sh
• CUDA 11.3.0
$ source /opt/cuda-11.3.0/env.sh
• CUDA 11.4.0
$ source /opt/nvidia/env.sh
Features of Enroot
Useful Links
• https://github.com/NVIDIA/enroot/blob/master/doc/usage.md
• https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf
• https://github.com/NVIDIA/enroot/blob/master/doc/usage.md [github.com]
39
Steps to use Enroot
I. On a login node, start screen. For example, to import an ubuntu image, use the below
command
II. Use the below command to grant access to 1 node, 2 GPUs for 1 hour of
resource time on the PARAM Siddhi-AI system
After granting the compute node shell, create the enroot container by
following the next step.
Once you have an Enroot image, you generally want to create an Enroot container
for running your application within it. For this, you need to use the enroot create
command to expand the Enroot image into a proper file system stored locally.
This command creates an Enroot container out of an Enroot image. It uses the same
name as the Enroot image without any extension by default.
Note: If you need two containers out of a single image, you need to run this
command twice; please read the official Enroot documentation
https://github.com/NVIDIA/enroot/blob/master/doc/usage.md to check how to
assign different names to different Enroot containers. You will have one such file
system for every Enroot container you create.
40
C) Running Software Inside an existing Enroot Container with start
I. Once you have an Enroot container, you can run an application within the
boundaries of that container (i.e., with the software stack defined by that
container.) Use the enroot start command.
II. If you need to run something as a root inside the container, you can use the --
root option. Remember: you are root-only inside the container, not the machine
where the container is running.
All the users of PARAM Siddhi-AI system are subscribed to NVIDIA NGC portal :
https://ngc.nvidia.com/signin
Create an account on this portal. The catalog of available Nvidia NGC containers can
be consulted here:
https://ngc.nvidia.com/catalog/containers.
To import (pull if using docker terminology) these containers, you need an API key
associated with your Nvidia NGC account.
You can generate your API key here: https://ngc.nvidia.com/setup/api-key. For the rest
of this section, we will refer your generated API key as <API_KEY>.
To configure Enroot for using your API key, create the file enroot/.credentials within your
$HOME and append the following line to it:
41
After doing this, you can import containers from Nvidia NGC. For example, the latest
TensorFlow container can be imported as indicated below.
42
8. Job Queueing System on PARAM Siddhi-AI System
The Batch Processing System facilitates the execution of a series of programs or jobs without
human intervention. The batch jobs or programs are set up so that they can be run to completion
without manual intervention. This is in contrast to interactive programs requiring manual user
input. These batch jobs or programs take a set of data files as input, processes the data, and
produce a set of output data files. This operating environment is termed a Batch Processing
System.
Batch Processing System of a Cluster comprises two primary components as mentioned below:
Compile/Login nodes
These nodes provide an entry point to the system for users to write, compile and build their
applications. Additionally, the user can submit and track their jobs from these nodes. Depending
on the size of the Cluster, the number of such nodes may vary from one to many. For example,
here, login-siddhi is the login and compile node.
Compute nodes
These nodes are the ones on which the application runs after the user submits the job. These are
the nodes on which resource manager clients run. There is no direct access to these nodes other
than through the resource manager.
8.2 Job
A Job can be defined as an entity through which a user specifies different parameters like the ones given
below to execute the application in the batch mode. These are usually specified in the job file which
contains the following details:
43
8.3 Job Submission and Resource Utilization
If the application has been built successfully on the PARAM Siddhi-AI System, the next step in
the process to run the application is getting ready with a resource manager-based job submission
script. The job submission script is a shell script based on the resource manager present on the
system. The user submits the batch job command file to Slurm to run the application. The job
description file contains the details like the computational resources requested, the wall time of the
job, the job queue, etc., along with the application, its path, and the environment for its execution.
Jobs handled by the Slurm queuing system are of two types, namely: Non-interactive (batch)
mode and Interactive mode.
The job is submitted to the queuing system using a job script containing the resource request and the
commands necessary to execute the job. Please refer to the sample SLURM based job submission script
as below (for example, the name of the below script is myjob1.sh).
##############################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
cd $SLURM_SUBMIT_DIR
44
Note :
i) Use the 1 GPU for testing script
ii) For reserving one GPU, please reserve 32 CPU cores of a node and same will extend to
multiple GPUs For example : To reserving 2 GPUs , you should reserve 64 cores, and so on
Example:
$ sbatch myjob1.sh
Note: Replace myjob1.sh with your job script
The user gets access to the terminal on one of the compute. Nodes allowing for the commands to
be executed on the terminal (available for the specified wall time). The job needs to be started
using a screen
##########################################################################
#########################################################################
When a job is submitted in interactive mode user lands on the assigned compute node upon the
allocation of resources. Once you get the shell,
45
• Note: For reserving two GPUs, please reserve 64 CPU cores of a node and the same will
extend to multiple GPUs (for reserving 3 GPUs please reserve 96 cores, and so on).
• Note: Wall time : default wall time is 1 Hours and max wall time is 7days (168 Hours)
Note: Please use the above command to get the scheduler's estimate of when your pending/idle
job will start running. It is, of course, just the scheduler's best estimate, given current conditions,
and the actual time a job starts might be earlier or later than that depending on factors such as
the behavior of currently running jobs, the submission of new jobs, and hardware issues, etc.
• List all running jobs for a user:
squeue -u <username> -t RUNNING
46
• To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
The following section gives an illustration of the output of three commands, namely:
(1) sinfo, (2) squeue, and (3) scontrol used to analyze the status of the job.
We can see that the one node (scn35) in the testp queue is idle. The other queues share nodes 6
(scn7, scn9, scn10, scn15, scn30, scn36)–and currently is in use for a running job. By combining
this with the Linux watch command, we can make a simple display that refreshes periodically
47
Figure 8.2: List of jobs running/queued
Notice that the first job is in state PD (pending), and is waiting for the node to become available.
The second job is in state R (running) and is executing on the scn7 node
48
Figure 8.3: Display of status of the job in the specified format
3) squeue --user=username : Displays running and pending jobs per individual user
The projected start time of the job can be queried by executing the below command:
4) scontrol : This is the administrative tool used to view and/or modify jobs state.
For example
49
Appendix
cd $SLURM_SUBMIT_DIR
############################################################
Note :
• --ntasks-per-node shows the number of cores
• Use the 1 GPU for testing your script
• For reserving one GPU, please reserve 32 CPU cores of a node and same will extend to multiple
GPUs (For example, to reserve 16 GPUs, please reserve 512 cores).
50
2) Batch script for single node multiGPU (2 GPUs , hence 64 cores ):
############################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=64
#SBATCH --gres=gpu:A100-SXM4:2
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
cd $SLURM_SUBMIT_DIR
############################################################
51
3) Batch script for multinode and multiGPU (2 nodes, hence 16 GPUs and
512 cores) :
Since 1 node is having 8 GPUs , therefore in below script --gres=gpu:A100-SXM4:8 shows the 8
GPUs on 1 node . Also in script , it shows the two nodes, therefore 8*2 = 16 GPUs will be
allocated .
Similarly, --ntasks-per-node= 256 , which means 256 * 2 = 512 cores will be allocated.
######################################################
#! /bin/bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=256
#SBATCH --gres=gpu:A100-SXM4:8
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
cd $SLURM_SUBMIT_DIR
52
4) Batch script for Python program on single node Single GPU
######################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
cd $SLURM_SUBMIT_DIR
python3 hello.py
##########################################################
53
5) Batch script to run Python program in conda environment on single node
Single GPU
######################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
cd $SLURM_SUBMIT_DIR
python3 hello.py
##########################################################
54
‘B’ Support Details
It has been a long guide. We appreciate your patience if you have read it in one go till this point. We do
understand that many things may have been skipped and the guide may not be complete. If you need
any further information/assistance, please feel free to contact us at:
HAPPY SUPERCOMPUTING!!
55
‘C’ References
1. http://unixhelp.ed.ac.uk
2. http://www.ee.surrey.ac.uk/Teaching/Unix/
3. http://unixhelp.ed.ac.uk/vi/
4. https://documentation.help/PuTTY/documentation.pdf
5. https://winscp.net/eng/docs/start
6. https://mobaxterm.mobatek.net/documentation.html
7. https://docs.nvidia.com/dgx/pdf/dgxa100-user-guide.pdf
8. https://www.nvidia.com/en-in/data-center/nvidia-ampere-gpu-architecture/
9. https://docs.conda.io/projects/conda/en/4.6.0/downloads/52a95608c49671267e40c689e0bc00
ca/conda-cheatsheet.pdf
10. https://www.dataquest.io/blog/jupyter-notebook-tutorial/
11. https://ngc.nvidia.com/catalog/all
12. https://github.com/NVIDIA/enroot
13. https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf
14. https://developer.nvidia.com/networking/hpc-x
15. https://developer.nvidia.com/hpc-sdk
16. https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/
17. https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmFAQ.html
56
‘D’ VPN Client Manual
1. The following instructions a user has to follow to manually install the Fortinet SSL VPN client
in windows.
URL: https://forticlient.com/downloads
2. Once the file is downloaded, right-click on the file, select “run” (as an administrator or
equivalent) and proceed as follows:
3. Click on “Next”
4. Select “Secure Remote Access” for SSL-VPN and IPSec-VPN and press “Next”
57
5. Click on “Next”
Note: Click “Next” to install to the default folder or click “Change” to choose another
folder.
58
6. Click on “Install”
59
7. Installation is done and click on Finish button.
60
Configuring Fortinet SSL-VPN client On Windows OS
The client is now installed and needs to be configured. To configure the client, click on
Windows Start/All Programs/Forticlient/FortiClient SSL VPN and the following box
opens. You are now ready to configure connection
61
9. “Connection Name (could be anything)” and “Remote Gateway (hackathon.cdac.in)”
as below mentioned:
62
10. Enter the CDAC provided username, password and press connect:
63
After entering username and password. You will receive Token on your email id.
64
The connection is in process.
65
12. SSL-VPN Connection will get established after getting the appropriate IP address.
66
12. Now, you can access CDAC Pune Unified network.
• Open terminal
• Check Linux distribution architecture #uname -a for example : 32 bit LSB
67
• Go to the folder where the file has been downloaded and extract it with
68
3) Use CDAC provided username and password in the login window
After entering username and password, you will receive Token on your email id.
69
• At this Stage the connection established.
70
Procedure for installing Fortinet SSL VPN Client
on MacOS
71
• Click on Install to begin the installation.
72
3) Configure FortiClient
73
• Open the installed FortiClient VPN Application.
74
• Fill-up the following connection as depicted in below image.
75
• Following setting to be filled as:
I. VPN: Select SSL-VPN
II. Connection Name: Name of the connection
III. Description: Describe the connection purpose
IV. Remote Gateway: hackathon.cdac.in
V. Leave remaining as default
• Save the connection.
76
4) Connect to C-DAC Network via SSL VPN Client
• Open the Fortinet VPN Client for establishing connection.
77
• Use CDAC provided username and password to connect to
VPN
78
After entering username and password, you will receive token
on your email id.
79
• Wait for 1-2 minutes for the SSL-VPN connection to establish.
80
‘E’ Frequently Asked Questions
This chapter will list the frequently asked questions about the PARAM PARAM Siddhi-AI system.
Users can search their query in the FAQ database, and if the query is not listed or the explanation
seems to be insufficient, then the user can raise the request with NPSF support.
Q1. I have successfully configured the FortiClient but am still unable to log in. Where should
I get help?
If you are unable to connect with the PARAM Siddhi-AI system after following the procedures
explained in chapter 5, please share snapshots of the following information and send an e-mail
to [email protected] :
Q3. What are the available versions of CUDA in PARAM Siddhi-AI system?
In PARAM Siddhi-AI system, the scheduler allocates the resources to the jobs and estimates when
the job will start running. The pending job also shows the start time assigned by the scheduler’s
best estimate based upon the given current conditions. The actual time a job starts might be earlier
or later than that depending on factors such as the behavior of currently running jobs, the
submission of new jobs, hardware issues, etc. (refer Ch. 8, Section 8.4).
81
Q5. How can I get my job to start early?
First of all, make sure you only request the resources you need. The more you ask, the longer
you may have to wait. Then, try and make your job flexible in terms of resources.
Use the scancel <jobid> command with the jobid of the job you want to be canceled.
$ source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
Q7. What are the special features PARAM Siddhi-AI system has for AI/ML/DL model
training?
PARAM Siddhi-AI is leveraged with tensor core GPU nodes. A tensor is a mathematical object
that describes the relationship between other mathematical objects linked together. Matrix is the
most popular tensor used for most deep learning operations such as image processing. Siddhi-AI
system contains NVIDIA Ampere architecture provides a huge performance boost and delivers
new precisions to cover the full spectrum required from research to production — FP32, Tensor
Float 32 (TF32), FP16, INT8, INT4, and bfloat16. In addition, with Tensor Cores enabled, it
dramatically accelerates the throughput and reduces AI training times.
Q8. After logging into the login-siddhi node, how do I start running my first AI/ML/DL
program?
You can start working with Conda environment and enroot containers, as explained in chapter 7.
You can install any AI/ML/DL packages of your choice in Conda environment to match your
application. Similarly, you can pull the ready containers of pre-trained models from Nvidia NGC.
Q9. How do I perform multi GPU training for large neural networks?
PARAM Siddhi-AI system supports ‘Single node Multi GPU’ and ‘Multi node Multi GPU.’ You
can install Horovod (https://horovod.ai/) for distributed deep learning training in Conda
environment.
Q10. How do I run my application with AI frameworks like TensorFlow, Caffe, PyTorch,
MXNet, and cuDNN?
Since AI frameworks and cuDNN are highly dependent on each other, they are version-specific
too. Therefore, we recommend installing any AI framework - TensorFlow, Caffe, PyTorch,
82
MXNet in Conda environment and using the same cuDNN installed with these frameworks. Also,
you can choose any Cuda version installed in PARAM Siddhi-AI for your application.
83
‘F’ Acknowledging the National Supercomputing Mission in publications
If you use the supercomputers and services provided under the National Supercomputing Mission,
Government of India, please let us know of any published results, including Student Thesis,
Conference Papers, Journal Papers, and patents obtained.
Also, please submit the copies of dissertations, reports, reprints, and URLs in which “PARAM
Siddhi-AI under the National Supercomputing Mission, Government of India” is acknowledged
to:
84