0% found this document useful (0 votes)
263 views88 pages

PARAM Siddhi-AI System Manual Ver1.0

Uploaded by

velivela.rohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views88 pages

PARAM Siddhi-AI System Manual Ver1.0

Uploaded by

velivela.rohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

PARAM Siddhi-AI User Manual

Version 1.0
February 2022

National PARAM Supercomputing Facility


Centre for Development of Advanced Computing, Pune
`

©Copyright Notice
Copyright©2021 Centre for Development of Advanced Computing

All Rights Reserved.

Any technical documentation that is made available by C-DAC (Centre for


Development of Advanced Computing) is the copyrighted work of C-DAC and is
owned by C-DAC. This technical documentation is being delivered to you as is, and C-
DAC makes no warranty as to its accuracy or use. Any use of the technical
documentation or the information contained therein is at the risk of the user. C-DAC
reserves the right to make changes without prior notice.

No part of this publication may be copied without the express written permission of C-
DAC.

®Trademark
CDAC, CDAC logo, NSM logo are trademarks or registered trademarks.

Other brands and product names mentioned in this manual may be trademarks or
registered trademarks of their respective companies and are hereby acknowledged.

Getting Help
For technical assistance, please send an email to [email protected]

DISCLAIMER
The information contained in this document is subject to change without notice.

C-DAC shall not be liable for errors contained herein or for incidental or consequential
damages in connection with the performance or use of this manual.
PREFACE

The PARAM Siddhi-AI supercomputer manual is intended for the users who want to pursue
research on cutting-edge, high-performance computing (HPC) technologies and tools that combine
the power of Artificial Intelligence(AI). This document aims to onboard the researchers on the
PARAM Siddhi-AI system.
The content of the user manual has been divided into the following chapters :
1. Prerequisites
Chapter 1 includes the prerequisites for the users of the PARAM Siddhi-AI system.

2. Target Audience
Chapter 2 illustrates for whom this user guide is supposed to be useful or get the
maximum benefit on PARAM Siddhi-AI System.
3. Introduction to PARAM Siddhi-AI System
Chapter 3 introduces the PARAM Siddhi-AI system to show how it provides a
perfect platform for HPC and AI researchers to leverage the power of GPU-based
clusters to scale up their applications with a massive amount of data.
4. PARAM Siddhi-AI System Architecture and Configuration
Chapter 4 lists the specification of the PARAM Siddhi-AI system with its detailed
architecture and configuration.
5. PARAM Siddhi-AI Accessing Mode

Chapter 5 contains the information about the different accessing modes of the
PARAM Siddhi-AI system when users try to connect it from inside or outside the
C-DAC premise.

6. PARAM Siddhi-AI Computing Environment

Chapter 6 contains the overall computing environment of the system to facilitate


the user submitting their job.

7. Application Environment on PARAM Siddhi-AI System

Chapter 7 presents the detailed HPC and AI libraries/packages installed in the


system and further guides on installing the user-specific package for their research.
8. Job Queueing on PARAM Siddhi-AI System
Chapter 8 details the submission of the job on the PARAM Siddhi-AI system using
the SLURM Scheduler and monitoring the jobs.

Appendix
This section includes the sample job submission script, NPSF Support details,
references, VPN Client manual, FAQs, online resources for further reading, and
acknowledging the PARAM Siddhi-AI system under National Supercomputing
Mission(NSM) in the publications.
Contents
1. Prerequisites ........................................................................................................................................ 6
2. Target Audience .................................................................................................................................. 8
3. Introduction to PARAM Siddhi-AI System...................................................................................... 9
4. PARAM Siddhi-AI System Architecture and Configuration ....................................................... 10
4.1 Login/Compile Node ....................................................................................................................... 11
4.2 Compute Nodes ............................................................................................................................... 11
4.3 Hardware Specification .................................................................................................................. 11
4.4 Software Specifications ................................................................................................................... 11
4.5 Packages Installed ........................................................................................................................... 12
5. PARAM Siddhi-AI System Accessing Mode .................................................................................. 13
5.1 Access Mode .................................................................................................................................... 13
5.1.1 Connect to C-DAC VPN .......................................................................................................... 13
5.1.2 Login to PARAM Siddhi-AI System ...................................................................................... 15
5.2 Password Change ............................................................................................................................ 18
5.3 Data Transfer between local machine and PARAM Siddhi-AI System..................................... 19
6. PARAM Siddhi-AI Computing Environment .................................................................................... 25
6.1 Screen Utility ................................................................................................................................... 25
6.2 Nvidia-smi ........................................................................................................................................ 27
7. Applications on PARAM Siddhi-AI System ................................................................................... 28
7.1 Setting up Conda Environment ..................................................................................................... 28
7.2 Working with Jupyter Notebook ................................................................................................... 29
7.2.1 Jupyter Notebook access from a local system inside the C-DAC network: ........................ 30
7.2.2 Jupyter Notebook Access from a local system outside the C-DAC network ...................... 32
7.3 Running an MPI application.......................................................................................................... 35
7.4 CUDA Aware MPI .......................................................................................................................... 36
7.5 Working with Compilers ................................................................................................................ 38
7.6 Working with CUDA ...................................................................................................................... 38
7.6 Working with Enroot Containers .................................................................................................. 39
8. Job Queueing System on PARAM Siddhi-AI System........................................................................ 43
8.1 Components of Batch Processing Systems .................................................................................... 43
8.2 Job .................................................................................................................................................... 43

1
8.3 Job Submission and Resource Utilization ..................................................................................... 44
8.3.1 Non-interactive (batch) mode ................................................................................................. 44
8.3.2 Interactive mode....................................................................................................................... 45
8.4 Information on jobs ........................................................................................................................ 46
8.5 Controlling the jobs ........................................................................................................................ 46
8.6 Monitoring the jobs......................................................................................................................... 47
Appendix .................................................................................................................................................... 50
‘A’ Sample Job Submission Script ...................................................................................................... 50
‘B’ Support Details ............................................................................................................................... 55
‘C’ References ....................................................................................................................................... 56
‘D’ VPN Client Manual ........................................................................................................................ 57
‘E’ Frequently Asked Questions .......................................................................................................... 81
‘F’ Acknowledging the National Supercomputing Mission in publications .................................... 84

2
LIST OF FIGURES

Figure 4.1: Architecture diagram of the PARAM Siddhi-AI System


Figure 5.1: FortiClient Login Screen
Figure 5.2: FortiClient successful connection screen
Figure 5.3: Access through MobaXterm
Figure 5.4: Login screen in MobaXterm
Figure 5.5: Terminal of login node in MobaXterm
Figure 5.6: Access through Putty
Figure 5.7: Login screen in Putty
Figure 5.8: Data upload through MobaXterm
Figure 5.9: Data browse through MobaXterm
Figure 5.10: Data download through MobaXterm
Figure 5.11: Data transfer through WinSCP
Figure 5.12: Upload or download from the remote system to your local system using WinSCP.
Figure 5.13: Upload from local Linux system to PARAM Siddhi-AI system using command line.
Figure 5.14: Download from PARAM Siddhi-AI to local Linux system using command line.
Figure 6.1: Output of nvidia-smi command
Figure 7.1: Screen after launching Jupyter notebook showing the token number
Figure 7.2: Jupyter home page to enter the token
Figure 7.3: GUI of Jupyter notebook at local system's browser
Figure 7.4: Tunneling Process using Putty
Figure 7.5: Starting session using Putty
Figure 7.6: Terminal in your local machine to connect with the PARAM Siddhi-AI System
Figure 7.7: Token authentication screen of Jupyter
Figure 7.8: Dashboard of Jupyter
Figure 7.9: Sending buffers from Host to device and device to host
Figure 7.10: CUDA Aware MPI communication without copying buffer between host and device

3
Figure 7.11: A glimpse of GPUDirect RDMA for CUDA Aware MPI - I
Figure 7.12: A glimpse of GPUDirect RDMA for CUDA Aware MPI- II
Figure 8.1: Status of resources

Figure 8.2: List of jobs running/queued

Figure 8.3: Display of status of the job in the specified format

Figure 8.4: User wise display of job

4
LIST OF TABLES

Table 4.1 Hardware Specifications


Table 4.2 Software Specification
Table 4.3: Packages installed
Table 8.1: Description of job state

5
1. Prerequisites

The primary prerequisites of the PARAM Siddhi-AI system include basic knowledge of
programming in the Linux environment. The expected secondary prerequisites are listed below:

➢ Working knowledge of terminal/shell/command-line interface

➢ Basic knowledge of file and directory handling Linux commands


Ex. ls, cd, mkdir, cp, mv, rm, chmod, tar, gzip etc.

➢ Use of the text editors


Ex. GNU nano, vi, vim, emacs

➢ Program editor
Jupyter notebook

➢ Knowledge of file transfer tools


Ex. scp, sftp

➢ Knowledge of Artificial Intelligence, Machine Learning, Deep Learning

➢ Working knowledge of Artificial Intelligence and scientific computing


Framework
Ex. TensorFlow, Keras, PyTorch, MXNet

➢ Practical knowledge of AI packages


Ex. CuDNN , scipy,Pandas, numpy,sklearn,pip

➢ Working knowledge of the virtual environment


Ex. Virtualenv, Conda, venv

➢ Basic understanding of GPU Programming using CUDA

➢ knowledge of docker container

➢ Use of data visualization tools


Ex., Matplotlib, seaborn, Gnuplot, Plotly

➢ Coding in high-level languages


Ex. Python, R, Fortran, C, C++

➢ Familiar with parallel programming (Threads, Shared memory, Multi-GPU,


Multi-node)
Ex. MPI and OpenMP, Horovod, NCCL

6
➢ List of some useful resources :

Conda Environment
https://docs.conda.io/en/latest/

Jupyter Notebook
https://jupyter.org/

CUDA programming
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
http://www.pgroup.com/doc/pgicudafortug.pdf

GPU Computing with Python for Deep Learning


https://developer.nvidia.com/how-to-cuda-python

Deep Learning Frameworks


https://www.tensorflow.org/
https://keras.io/
https://pytorch.org/
https://mxnet.apache.org/versions/1.7.0/

Deep Learning GPU-Optimised pre-trained models


https://www.nvidia.com/en-in/gpu-cloud/

Distributed Deep Learning


https://horovod.ai/
https://github.com/horovod/horovod
https://eng.uber.com/horovod/

Programming tutorials
https://www.python.org/
https://www.r-project.org/
https://www.mathworks.com/help/matlab/programming-and-data-types.html

Parallel programming tutorials


https://computing.llnl.gov/tutorials/mpi/
https://computing.llnl.gov/tutorials/openMP/
http://www.citutor.org

Introduction to Supercomputing
http://homepages.math.uic.edu/~jan/mcs572/

Supercomputing in India- National Supercomputing Mission


https://dst.gov.in/india-emerging-leader-supercomputing
https://nsmindia.in/

7
2. Target Audience

This user manual focus on the following researchers:


➢ HPC researchers want to harness the power of artificial Intelligence.
➢ Scientists and researchers who wish to PARAM Siddhi-AI system which is a dense GPU-
enabled HPC/AI system, for carrying out their scientific and engineering work.
➢ AI researchers who wish to train the complex network on HPC systems and would like to
know about how to use PARAM Siddhi-AI system.

8
3. Introduction to PARAM Siddhi-AI System

This chapter introduces the user to the PARAM Siddhi-AI system. Param Siddhi-AI
supercomputer was established under the National Supercomputing Mission (NSM) at C-DAC and
developed jointly with the support of the Department of Science and Technology (DST), Ministry
of Electronics and Information Technology (MeitY). This supercomputer emphasizes the
convergence of High-Performance Computing and Artificial Intelligence (HPC-AI). It has
achieved the global ranking of 62 in the TOP 500 most powerful non-distributed computer systems
in the world, with 6.5 PF DP performance and 210 PF AI performance. PARAM Siddhi-AI
Supercomputer is built on the NVIDIA DGX A100 SuperPOD reference architecture. With the
enhanced computing capabilities, it helps in the application of HPC and AI in various areas like
image processing, video analytics, brain computing, speech technologies, robotics, virtual reality,
accelerated computing, graphics virtualization, etc. along with traditional scientific computing.

PARAM Siddhi-AI: Convergence of HPC and AI


PARAM Siddhi-AI is incorporated with Graphics Processing Units (GPUs) clusters to boost the
processing power. These are purpose-built processors that can run deep learning models, central
to most AI applications, an order of magnitude faster than traditional CPUs.

Optimizations and Scalability with AI Software Suites


PARAM Siddhi-AI system's hardware is complemented by a rich software library suite in addition
to the popular AI frameworks—TensorFlow, MXNet, Caffe, Theano, and Torch. This manual will
help researchers implement HPC and AI applications to work together on India's indigenously
built supercomputer. In addition, it would motivate researchers to explore the AI and HPC research
together in new directions.

We believe that the HPC-AI user community will be benefited from the PARAM Siddhi-AI
system, which offers state-of-art HPC-AI facilities for running their application.

9
4. PARAM Siddhi-AI System Architecture and Configuration

The PARAM Siddhi-AI System is a 42-node cluster of NVIDIA DGX A100 systems with HDR
InfiniBand interconnect and with the peak computing power of 210 PF(AI) and 6.5 PF(DP), and
sustained computing power of 4.62 PF (DP). NVIDIA DGX-A100 systems are the compute nodes,
allocated through the SLURM scheduler, and 10.5 PiB lustre PFS-based storage has been made
available as shared storage on all the nodes. Each node is having 8 NVIDIA A100 Tensor Core
GPUs, Dual socket AMD EPYC 7742 64C 2.25GHz CPU, 320 GB GPU memory, and 1 TB system
memory.

Figure 4.1 Architecture diagram of the PARAM Siddhi-AI System


PARAM Siddhi-AI System is comprised of two types of nodes:

• Login/compile Node
• Compute Nodes

10
4.1 Login/Compile Node
This is an entry point to the system for users to write, compile and build their applications.
Additionally, the user can submit and track their jobs from this node. For example, here, login-
siddhi is the login and compile node.

4.2 Compute Nodes


These nodes are the ones on which the application runs after the user submits the job. There is no
direct access to compute nodes other than through the SLURM scheduler.

4.3 Hardware Specification

Component Specification
CPU AMD EPYC 7742 64C 2.25GHz
CPU Core 128 cores (Dual Socket each with 64 cores)
[256 cores with Hyper-threading]
L3 Cache 256 MB
System Memory (RAM) 1 TB
GPU NVIDIA A100 –SXM4
GPUMemory 40GB
Total No. of GPUs per node 8
Storage 10.5 PiB PFS based storage
Networking Mellonex ConnectX-6 VPI (infiniband HDR)
Table 4.1 Hardware Specifications of one compute node

4.4 Software Specifications

Software Specifications
OS Ubuntu 20.04.2 LTS (DGXOS 5.0.5)
Kernel 5.4.0-80-generic
CUDA 10.1
NVIDIA Driver Version 450.142.00
NVIDIA NGC Support https://ngc.nvidia.com/signin
Table 4.2 Software Specification

11
4.5 Packages Installed

Software Version
Python 3.8, 3.9
HPC-X 2.9
HPC-SDK 20.7
CUDA 11.0.2
11.2.0
11.3.0
11.4.0
Enroot 3.3.1
Chakshu
(Multi-Cluster Monitoring Plateform)
Table 4.3: Packages installed
Note: Please refer to chapter 7 for detail on using these packages in your program.

12
5. PARAM Siddhi-AI System Accessing Mode

5.1 Access Mode

ssh based access has been provided for accessing the PARAM Siddhi-AI system, which allows
users to log in to the system through a two-stage process :

Step 1: Connect to C-DAC VPN


Step 2: Log in to PARAM Siddhi-AI System
Note: Users need not connect through VPN if accessing within C-DAC premise.

5.1.1 Connect to C-DAC VPN

Users are required to install a VPN Client (FortiClient) on their local laptop/computer to connect
to the PARAM Siddhi-AI system. Therefore, it is advisable to refer to the FortiClient installation
cum user manuals in Annexure’D’ for appropriate installation instructions. Upon successful
installation of the VPN client (FortiClient),users may connect to the VPN as per the below steps :
A) Start the FortiClient application and then enter “User ID” and “password” provided at the time
of account creation through an e-mail sent by [email protected]

13
Figure 5.1 : FortiClient Login Screen

B) Click on Connect button. After successfully establishing the VPN connection, you will see
the below screen.

Figure 5.2: FortiClient Successful connection screen

14
Note: Step 2 must be followed only after successfully establishing the VPN connection.

5.1.2 Login to PARAM Siddhi-AI System

The login node is the primary gateway to the rest of the cluster, including a job scheduler (Slurm).
You may submit jobs to the queue, which will run when the required resources are available. It is
advisable to do not run programs directly on the login node. Instead, the login node is used to
submit jobs, transfer data, and compile source code. (If your compilation takes more than a few
minutes, you should submit the compilation job into the queue to be run on the cluster.) Secure
Shell-based access to PARAM Siddhi- AI login/compile node can be accessed from Windows
and Linux machines as per the steps given below :
A) Accessing from Windows machine
It is recommended to use MobaXterm or Putty software tool as an SSH client on a Windows machine to
log in to PARAM Siddhi-AI System. Free Home Edition of the same can be downloaded from the following
URLs:

https://mobaxterm.mobatek.net/download.html

or

https://the.earth.li/~sgtatham/putty/latest/w64/putty.exe

• Access through MobaXterm

15
Figure 5.3 : Access through MobaXterm

Figure 5.4: Login screen in MobaXterm


After entering the user’s login password, you will get the shell of the login node

Figure 5.5: Terminal of login node in MobaXterm

16
• Access through Putty

Figure 5.6 : Access through putty

Figure 5.7: Login screen in Putty

17
After entering the user’s login password, you will get the shell of the login node

B) Accessing from Linux machine:


Users can login to PARAM Siddhi-AI system using ssh command from their linux base local
laptop/desktop by following the below steps :

$ ssh login-siddhi.pune.cdac.in -X -l <user name>

Example :

$ ssh login-siddhi.pune.cdac.in -X -l suresht

Once the command is executed, you will be prompted for password. Enter the password. After entering
the user password, you will get the shell of the login node as shown below.

Note: If you are unable to connect with the PARAM Siddhi-AI system after following the
above steps, please refer to the appendix FAQ for troubleshooting.

5.2 Password Change


Use the command “passwd” to change your password and follow password criteria while changing the
password
Syntax: passwd <user name>

18
Password setting criteria:

Password should be of a minimum of eight (8) characters in length. In addition, the password should contain
at least two upper case alphabets, two lower case alphabets, two numerical, and two special characters.
Password will be valid for six months, and one cannot re-use/repeat the last three(previous)passwords.

5.3 Data Transfer between local machine and PARAM Siddhi-AI System

Users need to have the data and application related to their project/research work on the PARAM
Siddhi-AI system. To store the data, directories have been made available to the users with the
name “ home” the path to this directory is “/home”. Whereas these directories are common to all
the users, a user will get his directory with their username in /home/ directories where they can
store the data.

However, there is a limit to the storage provided to the users, which is defined according to quota
over these directories. All users will be allotted the same quota by default. When users wish to
transfer data from their local system (laptop/desktop) to the PARAM Siddhi-AI system, they can
use various methods and tools from Windows and Linux machines.

A) Windows Machine
• Through MobaXterm
Users can copy small files from/to their local machine and PARAM Siddhi-AI system using the
MobaXterm tool by following the below steps :

i) Upload files from your local system (laptop/desktop) to the PARAM Siddhi-AI Cluster, click on the
upload button as shown in the below image

19
Figure 5.8 : Data upload through MobaXterm
Example :
See the below image to upload DIRAC-19.0-Source.tar file from the local system to PARAM Siddhi-AI
Cluster.

Figure 5.9 : Data browse through MobaXTerm

ii) To download files from the PARAM Siddhi-AI system to your local system, follow the steps shown in
below image:

20
Figure 5.10: data download through MobaXTerm
Example :
Select the file “DIRAC-19.0-Source.tar” file from the left panel and click on the download selected files.
It will get downloaded on your local system.

• Through WinSCP
Users can copy small files from/to their local machine and PARAM Siddhi-AI system using the
WinSCP tool by following the below steps :

WinSCP can be downloaded from the URL: https://winscp.net/eng/download.php

21
Figure 5.11 : Data transfer through WinSCP

Note: replace with appropriate <user name> <password>that is provided to you.

ii) After establishing the connection, To upload files from your local system to the PARAM
Siddhi-AI system and To download files from the remote system to your local system, drag and
drop the files in the respective location.

22
Figure 5.12: Upload/download from the remote system to your local system using WinSCP.
B) Linux Machine
Use the below commands to copy data/file(s) to/from your local Linux system and PARAM
Siddhi-AI system.

i) Upload

$ scp filename.txt <user name>@<remote-system-ip>:<path-to-destination-directory>

Example :

$ scp myfile.txt [email protected]:~/

23
Figure 5.13: Upload from local Linux system to PARAM Siddhi-AI system using command
line

ii) Download

$ scp <user name>@<remote-system-ip>:<path-of-the-files-to-be-downloaded> .

Example :

$ scp [email protected]:~suresht/test.txt .

Figure 5.14: Download from PARAM Siddhi-AI to local Linux system using command line.

Note: replace with appropriate <user name> <password>that is provided to you.

However, the above methods should only be used if someone wants to transfer small files. If you
wish to transfer a large file or data set, we recommend you upload the same on the web (maybe on
GitHub) and download from there directly on the PARAM Siddhi-AI system by some method like
wget, git clone, etc.

24
6. PARAM Siddhi-AI Computing Environment

This chapter introduces the PARAM Siddhi-AI system's computing environment. It introduces
the user to work with the PARAM Siddhi-AI HPC cluster.

6.1 Screen Utility


Screen or GNU Screen works as a terminal multiplexer. It allows users to start a screen session
and open any virtual terminals inside that session. The main advantage of the Screen utility is that
processes running in Screen will continue to run when their terminal is not visible, even if the
user’s connection drops and the SSH session is terminated while performing a long-running task
on a remote machine. Thus, a process started with Screen, can be detached from the current session
and then reattached later. In other words, we can say the session is detached, but the process that
was originally started from the Screen is still running and managed by the Screen utility itself.
Follow the below steps to work with Screen utility :
A) Starting Linux Screen

To start a screen session, type screen in your console:

$ screen

This will open a screen session, create a new window, and start a shell in that window.

B) Starting Named Screen Session

Named screen sessions are useful when you run multiple screen sessions. To create a named
session, run the screen command with the following arguments:

$ screen -S <session name>

Example: screen -S namd

C) Detach from Linux Screen Session

You can detach from the screen session at any time by typing: (press ctrl and a key together; then
release both keys and press d )

$ ctrl+a d

25
The program running in the screen session will continue to run after you detach from the session

D) Re-join the screen

You can re-join the screen at any time by typing:

$ screen -x

Below are some most common commands for managing Linux Screen Windows:

Ctrl+a c create a new window (with shell)

Ctrl+a ”List all window

Ctrl+a 0 Switch to window 0 (by number)

Ctrl+a X Close the current region

E) Reattach to a Linux Screen

To resume your screen session, use the following command:

$ screen -r

In case you have multiple screen sessions running on your machine, you will need to append the
screen session ID after the r switch.

F) Working with Linux Screen Windows

When you start a new screen session, it creates a single window with a shell in it. You can have
multiple windows inside a screen session.

To create a new window with shell type Ctrl+a c, the first available number from the
range 0...9 will be assigned to it.

G) Find the session ID

$ screen -ls

H) To resume your screen session with session use the following command

26
$ screen -r <session id>

6.2 Nvidia-smi
The NVIDIA System Management Interface (nvidia-smi) is a command-line utility based on top
of the NVIDIA Management Library (NVML). It is intended to aid in the management and
monitoring of NVIDIA GPU devices.
$ nvidia-smi
The output of this command is shown in figure 6.1 .

Figure 6.1: Output of nvidia-smi command

27
7. Applications on PARAM Siddhi-AI System

This chapter presents the detailed HPC and AI libraries/packages installed or ways to install by
users on the PARAM Siddhi-AI system.
7.1 Setting up Conda Environment
A Conda environment is a directory containing users' specific collection of installed Conda
packages. Conda allows users to manage dependencies and isolate the projects from different
versions of projects. One can create, export, list, remove, and update environments with different
package versions installed in them. Switching or moving between environments is called activating
and deactivating the environment. For example, you may have one environment with Python 3.7
and its dependencies for one project and another environment with Python 3.8 and its dependencies
for the second project. If you change one environment, your other environments are not affected.
In addition, you can easily activate or deactivate environments. PARAM Siddhi-AI system allows
users to work in Conda environment efficiently. Users can install Conda environment and other
Machine Learning/Deep Learning frameworks/packages in their home directory by following the
below steps -
I. Log into the login-siddhi node
II. Download Conda installation script as below:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
III. Create a temporary build directory
$ mkdir `pwd`/tmp
$ export TMPDIR=’pwd’/tmp
IV. Run Conda Script. This command installs the Conda environment in your home directory
under the Conda folder.
$ sh Miniconda3-latest-Linux-x86_64.sh -b -p Conda -u
V. Activate Conda environment after successful installation
$ source Conda/bin/activate
VI. Now your environment is ready to install any packages as per your requirement

(base) $ conda install <package name>

For example :

(base) $ conda install tensorflow-gpu

VII. To test the environment, you can run your python script as below:

(base) $ python test.py

28
Note: You may find a quick Conda cheat sheet at below URL:
https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/con
da-cheatsheet.pdf

7.2 Working with Jupyter Notebook


The Jupyter Notebook is an open-source web application used to create and share documents
containing live code, equations, visualizations, and text. The Jupyter Notebook does not install
with the Python package. Therefore, if users want to work on it, they need to launch it through
Conda. The Jupyter Notebook App is a server-client application that allows editing and
running notebook documents via a web browser. PARAM Siddhi-AI system allows users to access
the Jupyter notebook on their local system while setting up and running on the PARAM Siddhi-
AI cluster.
To access through the browser of your local system, follow the below steps:
I. Log into PARAM Siddhi-AI system and activate the Conda environment (Follow step
7.1 to install the Conda in your working directory )

$ source Conda/bin/activate

II. Start the jupyter notebook by executing the below command

(base)$ jupyter notebook --ip=0.0.0.0 --port=<port no> --allow-root --no-browser &

Note: You can specify any local port address (>1024)

For example,
(base)$ jupyter notebook --ip=0.0.0.0 --port=8892 --allow-root --no-browser &

29
Figure 7.1: Screen after launching Jupyter notebook showing the token number

Note: Token number displayed on the screen (as shown in figure 7.1) would later be
used for logging into Jupyter notebook through your local web browser.

7.2.1 Jupyter Notebook access from a local system inside the C-DAC network:

I. Type the below address in your local system browser to access Jupyter notebook.

http://login-siddhi.pune.cdac.in:<port number>

For example,
http://login-siddhi.pune.cdac.in:8892
II. Enter the token number (as shown in Figure 7.1) when asked by the jupyter home
page as shown in figure 7.2

30
Figure 7.2 : Jupyter home page to enter the token
III. Upon successful login, you may start developing code using Jupyter notebook. Figure
7.3 shows the GUI of the Jupyter notebook

Figure 7.3: GUI of Jupyter notebook at local system's browser

31
7.2.2 Jupyter Notebook Access from a local system outside the C-DAC network

A. From Linux Machine

Follow the below command for tunneling between login-siddhi node and local machine :

$ ssh -f -N -L <Tunnel Port> :< PARAM Siddhi- AI system> :< jupyter notebook port number>
<PARAM Siddhi-AI system user login name >@< PARAM Siddhi-AI system >

Example:

$ ssh -f -N -L 5400:login-siddhi.pune.cdac.in:8892 [email protected]

Note: Replace with appropriate <PARAM Siddhi-AI system user login name> and enter the user
password that is provided to you.

$ http://localhost: <Tunnel Port>

Example:

$ http://localhost:5400

Note: One can specify any local port number as Tunnel Port (>1024) instead of 5400.

B. From Windows Machine

I. Start Putty, in the left column expand `SSH'.


II. Click on Tunnel, specify Source port 5400 and also specify ‘Destination’ as
login-siddhi.pune.cdac.in:8892 and add it. After that select the line L5400 login-
siddhi.pune.cdac.in:8892

32
Figure 7.4: Tunneling Process using Putty

Start Session (at top) from left column and specify Host name: login-siddhi.pune.cdac.in

33
Figure 7.5: Starting session using Putty

Figure 7.6: Terminal in your local machine to connect with the PARAM Siddhi-AI System

After successful login to gateway system (on the terminal in your local machine), start the
browser client at your local machine and type URL as below :

localhost: 5400
Enter the token number displayed on the screen (as shown in figure 7.1) in the Jupyter
notebook login page on the local machine’s web browser.

34
Figure 7.7: Token authentication screen of Jupyter

Then, run your application here :

Figure 7.8: Dashboard of Jupyter

7.3 Running an MPI application

The message passing interface (MPI) is a standardized means of exchanging messages between
multiple nodes running a parallel program across the cluster in distributed memory manner. In
PARAM Siddhi-AI, MPI is already installed, packed in the NVIDIA HPC-X package,
improving applications' performance and scalability. However, users have to follow the below
steps to run their MPI application from their working directory:

I. Set the MPI environment in your working directory

$ source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh

II. Compile program

####Replace hello with your program name

$ mpicc hello.c -o hello

35
III. Run program

####Replace hello with your program name

$mpirun -mca pml ucx -x UCX_NET_DEVICES -np 4 ./hello

7.4 CUDA Aware MPI

Message Passing Interface (MPI) is a standard API for communicating data via messages between
distributed processes commonly used in HPC to build applications that can scale to multi-node
computer clusters. The newer version of MPI is fully compatible with CUDA, designed for parallel
computing on a single node or multi-node. To accelerate the MPI application with GPUs or enable
an existing single-node multi-GPU application to scale across multiple nodes, CUDA-aware MPI
provides a feature to easily and efficiently scale up a multi-GPU application. In PARAM Siddhi-
AI supercomputer CUDA-aware MPI is enabled and implemented through OpenMPI.
With a regular MPI implementation, only pointers to host memory can be passed to MPI. But if
we combine the MPI program and CUDA, we need to send GPU buffers instead of host buffers.
Without CUDA-aware MPI, we need to stage GPU buffers through host memory,
using cudaMemcpy as shown in the following code in figure 7.3 :

//MPI rank 0
cudaMemcpy(s_buf_h,s_buf_d,size,cudaMemcpyDeviceToHost);
MPI_Send(s_buf_h,size,MPI_CHAR,1,100,MPI_COMM_WORLD);

//MPI rank 1
MPI_Recv(r_buf_h,size,MPI_CHAR,0,100,MPI_COMM_WORLD, &status);
cudaMemcpy(r_buf_d,r_buf_h,size,cudaMemcpyHostToDevice);
Figure 7.9 : Sending buffers from Host to device and device to host
This is not necessary with a CUDA-aware MPI library; the GPU buffers can be directly passed to
MPI as in the following code in figure 7.4.

//MPI rank 0
MPI_Send(s_buf_d,size,MPI_CHAR,1,100,MPI_COMM_WORLD);

//MPI rank n-1


MPI_Recv(r_buf_d,size,MPI_CHAR,0,100,MPI_COMM_WORLD, &status);
Figure 7.10: With CUDA Aware MPI communication without copying buffer between host and
device

36
CUDA Aware MPI works based on GPUDirect RDMA. The newest GPUDirect RDMA feature,
introduced with CUDA 5.0, supports Remote Direct Memory Access (RDMA), with which buffers
can be directly sent from the GPU memory to a network adapter without staging through host
memory, as shown in figure 7.5 and figure 7.6

Figure 7.11: A glimpse of GPUDirect RDMA for CUDA Aware MPI - I (Source:
https://developer.nvidia.com/blog/parallelforall/wpcontent/uploads/2013/03/GPUDirectRDMA.p
ng)

Fig 7.12 : A glimpse of GPUDirect RDMA for CUDA Aware MPI - II (Source :
https://developer.nvidia.com/blog/parallelforall/wpcontent/uploads/2013/03/GPUDirectP2P.png)

With CUDA Aware MPI, users can write the program without copying the buffer between Host
and device, high-bandwidth, low-latency communications with NVIDIA GPUs.

37
7.5 Working with Compilers

In PARAM Siddhi-AI, the NVIDIA HPC SDK version 21.7 is installed. HPC-SDK is a
comprehensive suite of compilers and libraries. Users are required to set up the HPC-SDK
environment as below before using any available compilers:

$ source /opt/nvidia/env.sh

A) Compilation and run with Pgfortran compiler

$ pgfortran -o hello hello.f90

$ ./hello

B) Compilation and run with pg77 compiler

$ pgf77 hello.f90 -o hello

$ ./hello

C) Compilation and run with pgcc compiler

$ pgcc hello.c -o hello

$ ./hello

D) Compilation and run with pgc++ compiler

$ pgc++ hello.cpp -o hello

$ ./hello

7.6 Working with CUDA

PARAM Siddhi-AI system has the following available versions of CUDA installed. Users
can choose any version of CUDA as per their application by adding the below line in their
job script.

• CUDA 11.0.2

$ source /opt/cuda-11.0.2/env.sh

38
• CUDA 11.2.0

$ source /opt/cuda-11.2.0/env.sh

• CUDA 11.3.0

$ source /opt/cuda-11.3.0/env.sh

• CUDA 11.4.0

$ source /opt/nvidia/env.sh

7.6 Working with Enroot Containers

Enroot is a tool to turn traditional container images into unprivileged sandboxes. It is a


lightweight, performant container runtime with built-in NVIDIA GPU support. Aside
from adding no performance impact, one benefit Enroot brings is that, without any
additional overhead to admins, users can enter their containers as unprivileged users.
In addition, it means users are not required to worry about file permissions or user rights
and focus only on core work, with no risk of accidentally running as root and deleting
a file system.

Features of Enroot

• Enroot supports docker images and is highly configurable


• It is fast to download containers, create images, and quick to start
• It has built-in GPU support with a libnvidia-container
• It works well with SLURM and application servers (like JupyterHub)
• Enroot is proposed as a great way to use NVIDIA's container and resources on
NGC

Useful Links

• https://github.com/NVIDIA/enroot/blob/master/doc/usage.md
• https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf
• https://github.com/NVIDIA/enroot/blob/master/doc/usage.md [github.com]

39
Steps to use Enroot

A) Creating an Enroot Container Image with import

If there is a container image in a container registry (i.e., Docker or NVIDIA NGC)


that you want to use for your running job, then the first step is to import that image
into an Enroot image within your machine. The command enroot import can be
used.

I. On a login node, start screen. For example, to import an ubuntu image, use the below
command

$ enroot import docker://ubuntu

This command imports an existing container image in an existing container repository


(e.g., Docker or Nvidia NGC) and creates an image that Enroot can be interpreted/read
(i.e., an Enroot image.) The newly created Enroot image has the same name as the
imported image but with the .sqsh extension. This image can then be used to create
Enroot containers. This step only needs to be performed once as many containers can
be created out of that image.

II. Use the below command to grant access to 1 node, 2 GPUs for 1 hour of
resource time on the PARAM Siddhi-AI system

srun -N 1 -c 64 --gres=gpu:A100-SXM4:2 -t 01:00:00 --pty /bin/bash

After granting the compute node shell, create the enroot container by
following the next step.

B) Creating and Enroot Container with create

Once you have an Enroot image, you generally want to create an Enroot container
for running your application within it. For this, you need to use the enroot create
command to expand the Enroot image into a proper file system stored locally.

$ enroot create ubuntu.sqsh

This command creates an Enroot container out of an Enroot image. It uses the same
name as the Enroot image without any extension by default.

Note: If you need two containers out of a single image, you need to run this
command twice; please read the official Enroot documentation
https://github.com/NVIDIA/enroot/blob/master/doc/usage.md to check how to
assign different names to different Enroot containers. You will have one such file
system for every Enroot container you create.

40
C) Running Software Inside an existing Enroot Container with start

I. Once you have an Enroot container, you can run an application within the
boundaries of that container (i.e., with the software stack defined by that
container.) Use the enroot start command.

$ enroot start Ubuntu

II. If you need to run something as a root inside the container, you can use the --
root option. Remember: you are root-only inside the container, not the machine
where the container is running.

$ enroot start --root Ubuntu

Note : For more information, you can also refer to NVIDIA_Containers.pdf


[slurm.schedmd.com].

D) Using NVIDIA NGC for pre-trained models containers

All the users of PARAM Siddhi-AI system are subscribed to NVIDIA NGC portal :

https://ngc.nvidia.com/signin

Create an account on this portal. The catalog of available Nvidia NGC containers can
be consulted here:

https://ngc.nvidia.com/catalog/containers.

To import (pull if using docker terminology) these containers, you need an API key
associated with your Nvidia NGC account.

You can generate your API key here: https://ngc.nvidia.com/setup/api-key. For the rest
of this section, we will refer your generated API key as <API_KEY>.

To configure Enroot for using your API key, create the file enroot/.credentials within your
$HOME and append the following line to it:

machine nvcr.io login $oauthtoken password <API_KEY>

where <API_KEY> is the key generated as described above.

41
After doing this, you can import containers from Nvidia NGC. For example, the latest
TensorFlow container can be imported as indicated below.

$ enroot import docker://nvcr.io#nvidia/tensorflow:20.12-tf1-py3

42
8. Job Queueing System on PARAM Siddhi-AI System

The Batch Processing System facilitates the execution of a series of programs or jobs without
human intervention. The batch jobs or programs are set up so that they can be run to completion
without manual intervention. This is in contrast to interactive programs requiring manual user
input. These batch jobs or programs take a set of data files as input, processes the data, and
produce a set of output data files. This operating environment is termed a Batch Processing
System.

8.1 Components of Batch Processing Systems

Batch Processing System of a Cluster comprises two primary components as mentioned below:

Compile/Login nodes

These nodes provide an entry point to the system for users to write, compile and build their
applications. Additionally, the user can submit and track their jobs from these nodes. Depending
on the size of the Cluster, the number of such nodes may vary from one to many. For example,
here, login-siddhi is the login and compile node.

Compute nodes

These nodes are the ones on which the application runs after the user submits the job. These are
the nodes on which resource manager clients run. There is no direct access to these nodes other
than through the resource manager.

8.2 Job
A Job can be defined as an entity through which a user specifies different parameters like the ones given
below to execute the application in the batch mode. These are usually specified in the job file which
contains the following details:

• Type and number of compute nodes


• Wall time
• Job notifications requirement
• Path for error, output files etc.
• Application

43
8.3 Job Submission and Resource Utilization

If the application has been built successfully on the PARAM Siddhi-AI System, the next step in
the process to run the application is getting ready with a resource manager-based job submission
script. The job submission script is a shell script based on the resource manager present on the
system. The user submits the batch job command file to Slurm to run the application. The job
description file contains the details like the computational resources requested, the wall time of the
job, the job queue, etc., along with the application, its path, and the environment for its execution.

Jobs handled by the Slurm queuing system are of two types, namely: Non-interactive (batch)
mode and Interactive mode.

8.3.1 Non-interactive (batch) mode

The job is submitted to the queuing system using a job script containing the resource request and the
commands necessary to execute the job. Please refer to the sample SLURM based job submission script
as below (for example, the name of the below script is myjob1.sh).

##############################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

###################### Command to run your MPI program######################


source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
mpirun -mca pml ucx -x UCX_NET_DEVICES -np 32 ./hello
##########################################################################

Replace the ./hello with your executable.

44
Note :
i) Use the 1 GPU for testing script
ii) For reserving one GPU, please reserve 32 CPU cores of a node and same will extend to
multiple GPUs For example : To reserving 2 GPUs , you should reserve 64 cores, and so on

sbatch: To submit the job to batch queuing system.

$ sbatch <job script>

Example:
$ sbatch myjob1.sh
Note: Replace myjob1.sh with your job script

1) To check the status of the job

squeue --job <job id>

2) Delete the job

scancel <job id>

8.3.2 Interactive mode

The user gets access to the terminal on one of the compute. Nodes allowing for the commands to
be executed on the terminal (available for the specified wall time). The job needs to be started
using a screen

A) The command for interactive job submission requesting for 1 GPUs

##########################################################################

srun --nodes=1 --ntasks-per-node=32 --gres=gpu:A100-SXM4:1 --time=00:10:00 --pty /bin/bash

#########################################################################

When a job is submitted in interactive mode user lands on the assigned compute node upon the
allocation of resources. Once you get the shell,

• Start screen in the shell using the screen command

• Join the screen and run the computation

45
• Note: For reserving two GPUs, please reserve 64 CPU cores of a node and the same will
extend to multiple GPUs (for reserving 3 GPUs please reserve 96 cores, and so on).

• Please give “gres” as “gpu:A100-SXM4:<no of GPUs per node> i.e. gres=gpu:A100-


SXM4:1

• Note: Wall time : default wall time is 1 Hours and max wall time is 7days (168 Hours)

• Max No of jobs is 8 per user in running state

8.4 Information on jobs

Below commands are useful to display the information of the job :

• List all current jobs for a user:


$ squeue -u <username>

• Projected start time of the job


squeue -u <user name> --start

Note: Please use the above command to get the scheduler's estimate of when your pending/idle
job will start running. It is, of course, just the scheduler's best estimate, given current conditions,
and the actual time a job starts might be earlier or later than that depending on factors such as
the behavior of currently running jobs, the submission of new jobs, and hardware issues, etc.
• List all running jobs for a user:
squeue -u <username> -t RUNNING

• List all pending jobs for a user:


squeue -u <username> -t PENDING

• List detailed information for a job (useful for troubleshooting):


scontrol show jobid -dd <jobid>
8.5 Controlling the jobs

Below commands are useful to control the job :

• To cancel one job:


scancel <jobid>

• To cancel all the jobs for a user:


scancel -u <username>

46
• To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>

• To cancel one or more jobs by name:


scancel --name myJobName

• To hold a particular job from being scheduled:


scontrol hold <jobid>

• To release a particular job to be scheduled:


scontrol release <jobid>

8.6 Monitoring the jobs

The following section gives an illustration of the output of three commands, namely:
(1) sinfo, (2) squeue, and (3) scontrol used to analyze the status of the job.

1) sinfo : Lists out the status of resources in the system

Figure 8.1: Status of resources

We can see that the one node (scn35) in the testp queue is idle. The other queues share nodes 6
(scn7, scn9, scn10, scn15, scn30, scn36)–and currently is in use for a running job. By combining
this with the Linux watch command, we can make a simple display that refreshes periodically

2) squeue: Lists out the Jobs running/queued in the system

47
Figure 8.2: List of jobs running/queued

Notice that the first job is in state PD (pending), and is waiting for the node to become available.
The second job is in state R (running) and is executing on the scn7 node

Job state Description


PD Pending The job is waiting in a queue for allocation of resources
R Running The job currently is allocated to a node and is running
CG Completing The job is finishing but some processes are still active
CD Completed The job has completed successfully
F Failed Failed with non-zero exit value
TO Terminated Job terminated by SLURM after reaching its runtime
limit
S Suspended A running job has been stopped with its resources
released to other jobs
ST Stopped A running job has been stopped with its resources
retained
Table 8.1: Description of job state

48
Figure 8.3: Display of status of the job in the specified format

3) squeue --user=username : Displays running and pending jobs per individual user

Figure 8.4: User wise display of job

The projected start time of the job can be queried by executing the below command:

$ squeue --start -j < job id number >

4) scontrol : This is the administrative tool used to view and/or modify jobs state.

$ scontrol show job < job id number >

For example

$ scontrol show job 1367

49
Appendix

‘A’ Sample Job Submission Script

1) Batch script for single node Single GPU (1 GPUs, 32 cores):


############################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

###################### Command to run your MPI program######################


source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
mpirun -mca pml ucx -x UCX_NET_DEVICES -np 32 ./hello

############################################################

Note :
• --ntasks-per-node shows the number of cores
• Use the 1 GPU for testing your script
• For reserving one GPU, please reserve 32 CPU cores of a node and same will extend to multiple
GPUs (For example, to reserve 16 GPUs, please reserve 512 cores).

• Please give “gres” as “gpu:A100-SXM4:<no of GPUs per node> i.e. gres=gpu:A100-SXM4:1

50
2) Batch script for single node multiGPU (2 GPUs , hence 64 cores ):
############################################################
#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=64
#SBATCH --gres=gpu:A100-SXM4:2
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

###################### Command to run your MPI program######################


source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
mpirun -mca pml ucx -x UCX_NET_DEVICES -np 64 ./hello

############################################################

51
3) Batch script for multinode and multiGPU (2 nodes, hence 16 GPUs and
512 cores) :

Since 1 node is having 8 GPUs , therefore in below script --gres=gpu:A100-SXM4:8 shows the 8
GPUs on 1 node . Also in script , it shows the two nodes, therefore 8*2 = 16 GPUs will be
allocated .
Similarly, --ntasks-per-node= 256 , which means 256 * 2 = 512 cores will be allocated.

######################################################

#! /bin/bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=256
#SBATCH --gres=gpu:A100-SXM4:8
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

###################### Command to run your MPI program######################


source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh
mpirun -mca pml ucx -x UCX_NET_DEVICES -np 512 ./hello
##########################################################

52
4) Batch script for Python program on single node Single GPU

######################################################

#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

python3 hello.py
##########################################################

53
5) Batch script to run Python program in conda environment on single node
Single GPU

######################################################

#! /bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=32
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --time=00:10:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

echo "Starting at `date`"


echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"

cd $SLURM_SUBMIT_DIR

#################conda environment path ################################

source <Path of conda>/Conda/bin/activate

conda activate <conda environment name>

###conda activate tf24

python3 hello.py

##########################################################

54
‘B’ Support Details

It has been a long guide. We appreciate your patience if you have read it in one go till this point. We do
understand that many things may have been skipped and the guide may not be complete. If you need
any further information/assistance, please feel free to contact us at:

Assistance to users: mail:


System Support : [email protected]

Application Support : [email protected]

HAPPY SUPERCOMPUTING!!

With best wishes,

National PARAM Supercomputing Facility (NPSF)

Systems Administration Team,

Centre for Development of Advanced Computing(C-DAC),

Panchavati, Pashan, Pune - 411008

55
‘C’ References

1. http://unixhelp.ed.ac.uk
2. http://www.ee.surrey.ac.uk/Teaching/Unix/
3. http://unixhelp.ed.ac.uk/vi/
4. https://documentation.help/PuTTY/documentation.pdf
5. https://winscp.net/eng/docs/start
6. https://mobaxterm.mobatek.net/documentation.html
7. https://docs.nvidia.com/dgx/pdf/dgxa100-user-guide.pdf
8. https://www.nvidia.com/en-in/data-center/nvidia-ampere-gpu-architecture/
9. https://docs.conda.io/projects/conda/en/4.6.0/downloads/52a95608c49671267e40c689e0bc00
ca/conda-cheatsheet.pdf
10. https://www.dataquest.io/blog/jupyter-notebook-tutorial/
11. https://ngc.nvidia.com/catalog/all
12. https://github.com/NVIDIA/enroot
13. https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf
14. https://developer.nvidia.com/networking/hpc-x
15. https://developer.nvidia.com/hpc-sdk
16. https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/
17. https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmFAQ.html

56
‘D’ VPN Client Manual

Installation and Configuration of Fortinet SSL-VPN Client for Windows

1. The following instructions a user has to follow to manually install the Fortinet SSL VPN client
in windows.
URL: https://forticlient.com/downloads
2. Once the file is downloaded, right-click on the file, select “run” (as an administrator or
equivalent) and proceed as follows:
3. Click on “Next”

4. Select “Secure Remote Access” for SSL-VPN and IPSec-VPN and press “Next”

57
5. Click on “Next”
Note: Click “Next” to install to the default folder or click “Change” to choose another
folder.

58
6. Click on “Install”

59
7. Installation is done and click on Finish button.

60
Configuring Fortinet SSL-VPN client On Windows OS

The client is now installed and needs to be configured. To configure the client, click on
Windows Start/All Programs/Forticlient/FortiClient SSL VPN and the following box
opens. You are now ready to configure connection

8. Click on “Configure VPN”

61
9. “Connection Name (could be anything)” and “Remote Gateway (hackathon.cdac.in)”
as below mentioned:

62
10. Enter the CDAC provided username, password and press connect:

Example: username: avishkar1

63
After entering username and password. You will receive Token on your email id.

64
The connection is in process.

65
12. SSL-VPN Connection will get established after getting the appropriate IP address.

66
12. Now, you can access CDAC Pune Unified network.

Procedure for Installing Fortinet SSL VPN Client on Linux Based


System

1) Download FortiClient for Linux provided by NetOps through E-Mail.

2) Steps to run FortiClient SSLVPN for Linux

• Open terminal
• Check Linux distribution architecture #uname -a for example : 32 bit LSB

67
• Go to the folder where the file has been downloaded and extract it with

#tar –xzvf forticlientsslvpn_linux<version>.tar.gz

• Open the FortiClient folder


• Run ./forticlientsslvpn

• When run first time user must have admin privilege


• Enter password of admin user

68
3) Use CDAC provided username and password in the login window

Example. User: avishkar1

• A warning window like the following will open. Click on “Continue”

After entering username and password, you will receive Token on your email id.

69
• At this Stage the connection established.

70
Procedure for installing Fortinet SSL VPN Client
on MacOS

1) Download the FortiClient installer file for MacOS


• Visit the url: https://www.forticlient.com/downloads to download the

FortiClient for MacOS.

• Click on “Download for MacOS”

2) Install the FortiClient


• Double click on the downloaded file or select and hit enter on the
downloaded file to start the setup.

71
• Click on Install to begin the installation.

• Follow the on-screen instruction for successful installation of FortiClient.

72
3) Configure FortiClient

73
• Open the installed FortiClient VPN Application.

• Click on Configure VPN to setup VPN Client.

74
• Fill-up the following connection as depicted in below image.

75
• Following setting to be filled as:
I. VPN: Select SSL-VPN
II. Connection Name: Name of the connection
III. Description: Describe the connection purpose
IV. Remote Gateway: hackathon.cdac.in
V. Leave remaining as default
• Save the connection.

76
4) Connect to C-DAC Network via SSL VPN Client
• Open the Fortinet VPN Client for establishing connection.

77
• Use CDAC provided username and password to connect to
VPN

78
After entering username and password, you will receive token
on your email id.

79
• Wait for 1-2 minutes for the SSL-VPN connection to establish.

80
‘E’ Frequently Asked Questions

This chapter will list the frequently asked questions about the PARAM PARAM Siddhi-AI system.
Users can search their query in the FAQ database, and if the query is not listed or the explanation
seems to be insufficient, then the user can raise the request with NPSF support.

Q1. I have successfully configured the FortiClient but am still unable to log in. Where should
I get help?

If you are unable to connect with the PARAM Siddhi-AI system after following the procedures
explained in chapter 5, please share snapshots of the following information and send an e-mail
to [email protected] :

1. VPN Connection status page


2. VPN Connection configuration page
3. VPN Client version
4. Local system’s OS Version
5. https://www.whatismyip.com (in browser)
6. ping google.com
7. netstat -rn
8. ping 10.208.38.94
9. telnet login-siddhi.pune.cdac.in 22
10. nslookup login-siddhi.pune.cdac.in
11. ssh <username>@login-siddhi.pune.cdac.in

Q2. Which operating system is installed in PARAM Siddhi-AI system?

DGX OS (based on Ubuntu 20.04)

Q3. What are the available versions of CUDA in PARAM Siddhi-AI system?

CUDA 11.0.2, 11.2.0, 11.3.0, 11.4.0

Q4. When will my job start which is pending?

In PARAM Siddhi-AI system, the scheduler allocates the resources to the jobs and estimates when
the job will start running. The pending job also shows the start time assigned by the scheduler’s
best estimate based upon the given current conditions. The actual time a job starts might be earlier
or later than that depending on factors such as the behavior of currently running jobs, the
submission of new jobs, hardware issues, etc. (refer Ch. 8, Section 8.4).

81
Q5. How can I get my job to start early?

First of all, make sure you only request the resources you need. The more you ask, the longer
you may have to wait. Then, try and make your job flexible in terms of resources.

Q6. How do I cancel a job?

Use the scancel <jobid> command with the jobid of the job you want to be canceled.

Q7. Which mpirun should I use ?

$ source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh

Q7. What are the special features PARAM Siddhi-AI system has for AI/ML/DL model
training?

PARAM Siddhi-AI is leveraged with tensor core GPU nodes. A tensor is a mathematical object
that describes the relationship between other mathematical objects linked together. Matrix is the
most popular tensor used for most deep learning operations such as image processing. Siddhi-AI
system contains NVIDIA Ampere architecture provides a huge performance boost and delivers
new precisions to cover the full spectrum required from research to production — FP32, Tensor
Float 32 (TF32), FP16, INT8, INT4, and bfloat16. In addition, with Tensor Cores enabled, it
dramatically accelerates the throughput and reduces AI training times.

Q8. After logging into the login-siddhi node, how do I start running my first AI/ML/DL
program?

You can start working with Conda environment and enroot containers, as explained in chapter 7.
You can install any AI/ML/DL packages of your choice in Conda environment to match your
application. Similarly, you can pull the ready containers of pre-trained models from Nvidia NGC.

Q9. How do I perform multi GPU training for large neural networks?

PARAM Siddhi-AI system supports ‘Single node Multi GPU’ and ‘Multi node Multi GPU.’ You
can install Horovod (https://horovod.ai/) for distributed deep learning training in Conda
environment.

Q10. How do I run my application with AI frameworks like TensorFlow, Caffe, PyTorch,
MXNet, and cuDNN?

Since AI frameworks and cuDNN are highly dependent on each other, they are version-specific
too. Therefore, we recommend installing any AI framework - TensorFlow, Caffe, PyTorch,

82
MXNet in Conda environment and using the same cuDNN installed with these frameworks. Also,
you can choose any Cuda version installed in PARAM Siddhi-AI for your application.

Q11. Which GPU resides in the PARAM Siddhi-AI system?

Nvidia A100 GPUs.

83
‘F’ Acknowledging the National Supercomputing Mission in publications

If you use the supercomputers and services provided under the National Supercomputing Mission,
Government of India, please let us know of any published results, including Student Thesis,
Conference Papers, Journal Papers, and patents obtained.

Please acknowledge the National Supercomputing Mission and National PARAM


Supercomputing Facility (NPSF) as given below:

We acknowledge National Supercomputing Mission (NSM) for providing computing resources


of ‘PARAM Siddhi-AI’, under National PARAM Supercomputing Facility (NPSF), C-DAC,
Pune and supported by the Ministry of Electronics and Information Technology (MeitY) and
Department of Science and Technology (DST), Government of India.

Also, please submit the copies of dissertations, reports, reprints, and URLs in which “PARAM
Siddhi-AI under the National Supercomputing Mission, Government of India” is acknowledged
to:

HoD HPC I&E,


National PARAM Supercomputing Facility(NPSF),
Centre for Development of Advanced Computing,
CDAC Innovation Park,
S.N. 34/B/1,
Panchavati, Pashan,
Pune – 411008
Maharashtra

Communication of your achievements using resources provided by the National Supercomputing


Mission will help the mission measure outcomes and gauge future requirements. This will also
help in further augmentation of resources at a given site of the National Supercomputing Mission.

84

You might also like