0% found this document useful (0 votes)
42 views37 pages

Revised GEM Bid 16th May 2025

The document is a revised Request for Proposal (RFP) for the procurement of a GPU Compute solution for the Gujarat State Data Center, including responses to bid queries. It outlines various clarifications and suggestions regarding technical specifications, timelines, and requirements for the bid, emphasizing the need for wider OEM participation and adjustments to certain clauses. Bidders are instructed to submit queries within 7 days of the corrugendum publication, with no further clarifications accepted after this period.

Uploaded by

75j7scrysz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views37 pages

Revised GEM Bid 16th May 2025

The document is a revised Request for Proposal (RFP) for the procurement of a GPU Compute solution for the Gujarat State Data Center, including responses to bid queries. It outlines various clarifications and suggestions regarding technical specifications, timelines, and requirements for the bid, emphasizing the need for wider OEM participation and adjustments to certain clauses. Bidders are instructed to submit queries within 7 days of the corrugendum publication, with no further clarifications accepted after this period.

Uploaded by

75j7scrysz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

GUJARAT INFORMATICS LIMITED

Block No. 2, 2ND Floor, C & D Wing, Karmayogi Bhavan, Sector -10A, GANDHINAGAR - 382010

Revised RFP (Corrigendum-03 dated


16.05.2025)
&
Responses to bid Queries dated 16.05.2025

Procurement of GPU Compute solution for Gujarat State Data Center,


Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Please find the below Corrigendum-3 dated 16.05.2025.

For more details visit www.gil.gujarat.gov.in

The bidder shall submit their queries, if any, within 07 days from the date of publication of Corrigendum-03. No further clarifications
regarding Corrigendum-03 or any earlier published corrigendum shall be entertained by the Tenderer after this period. In addition, no new
queries shall be accepted within this 7 day also.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
Timeline for resolution is within 4 hours from the time of call logged / reported to
We request GIL to amend it as NBD ( Next Business Day) resolution for Hardware related issues, during working days. As sparing locally for
1 Corrigendum -2 , SLA & Penalties ( Operational Penalties), Page-23 Bidder/OEM. - please refer corrigendum-03.
such High end GPU Hardware is not feasible for OEMS.

Certification : Undertaking from Sever OEM for compatibility of the proposed


As per RFP and time to time published
sever with GPU under the quoted Inference node must be submitted (duly signed -
corrigendum
by authorized signatory , mentioning Bid reference)
We would like to draw GILs attention to these clauses that have been relaxed in Corrigendum 2 , will allow unproven systems being
quoted, which are neither tested nor certified by GPU OEMS or benchmarked and submitted, audited results to reputed reference sites like
Corrigendum 2 : 10. Minimum Technical Specification Master
2 MLcommons.org, that is referred, maintain by leading OEMs for GPU, Servers.
Node, Training Nodes, Inferecing Nodes Page-24, 27, 28
This relaxation , will risk the unproven infrastructure being proposed, resulting into undesired, uncertain results. There are GIDs, who
would just extrapolate and submit undertakings!! risking the complete bid.
Benchmarks : If not listed on Mlcommons, bidder shall be required to submit
benchmark report for the Make/Model (same configuration) of the quoted server As per RFP and time to time published
-
as part of bid submission. The submission should be on OEM letterhead duly corrigendum
signed and referring the bidder and bid details

Benchmarks : If not listed on spec.org, bidder shall be required to submit


As per RFP and time to time published
benchmark report / logs for the Make/Model (same configuration) of the quoted -
corrigendum
server as part of bid submission

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
Corrigendum 2 : 10. Minimum Technical Specification Master 32 Gbps Host Bus Adaptor with required SFP (at both end) for SFPs (if required) for the supplied servers
5 We request GIL to clarify, what does (at both end ) means here !!! -
Node, Page-24 & Inferencing nodes, Page-27 connecting with existing SAN switch and storage. maintaining a card level redundency

Please refer to corrigendum for make & model


of existing SAN and storage

Network : a) Infiniband / Ethernet (200Gbps or higher) as required for


quoted storage delivery to nodes
Corrigendum 2 : 10. Minimum Technical Specification Nodes for Network : a) Infiniband / Ethernet (200GBps or higher) as required for quoted
6 Justification : The Network card be it infiniband or Ethernet the ratings are in Gbps and NOT GBps, the same can be noted in Training - please refer Corrigendum-03.
AI training, Page-25 storage delivery to nodes
nodes, where Gbps is specified., we request GIL to amend appropraitly, refer to link : https://www.nvidia.com/content/dam/en-
zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf

Modification Internal Storage : For Operating System: minimum 1.92 TB NVMe drives OR M.2 960GB NVMe drives .
Justification : Current changes in Corrigendum 2 are restricting OEMs to participate, earlier in Corrigendum 1, this was allowed.
Corrigendum 2 : 10. Minimum Technical Specification AI Training Internal Storage : For Operating System: minimum 1.92 TB NVMe drives
7 Hence request GIL to relax the clasue for wider OEMs participation. without impacting any perfomrance of the nodes. as Operating system - please refer Corrigendum-03.
Node, Page-25 & Inferecing Nodes, Page-27
are small in size and does not requried to be mised with performacne capacity drives and large dries .

Modification : Following N/W are required:


System Network : Following N/W are required: 1. NDR Infiniband / Ethernet (400Gbps or higher) for compute
Corrigendum 2 : 10. Minimum Technical Specification AI Training 1. NDR Infiniband / Ethernet (400GBps or higher) for compute communication
8 - please refer Corrigendum-03.
Nodes, Page-25 communication 2. Infiniband /Ethernet (200Gbps or higher) for storage delivery
2. Infiniband /Ethernet (200GBps or higher) for storage delivery justification : The Network card are NOT rated in GBps, as suggested and sahred the link in earlier point, please refer again :
https://www.nvidia.com/content/dam/en-zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf
Power Requriement : Appropriate rated and energy efficient, redundant (N+N)
Corrigendum 2 : 10. Minimum Technical Specification Inferencing
9 hot swappable We reqeust GIL to defined the N+N, as N=1, would means N+1 redudnant hotswappable power supply to be offered - please refer corrigendum-03.
Nodes, Page-27
power supply and FAN

We request GIL to open up this clause for wider OEMs particiation ,as current clause is restrictive in nature.
As per RFP and time to time published
2 x Accelerators per node, each with minimum 140 GB or higher GPU accelarators
corrigendum.
OR
However, Bidder may propose 2 or higher
4 x Accelarators per node , each with minimum 94GB or higher GPU
accelerator card per node (In inference Node)
OR
Number of GPUs and and numbers of inference node quantity may be
Corrigendum 2 : 10. Minimum Technical Specification, Inference 4x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 6 GPU Nodes for inferencing).
10 GPU Communication : 2 x Accelerators per node, each with minimum 140 GB or - change accordingly. However, total requirement
Node, Page-27 OR
higher GPU per Accelerator. of core and memory may be provided a s
8x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 3 GPU Nodes for inferencing).
mentioned in RFP and time to time published
Justification : we request GIL to consider the suggested changes, as All leading OEM and GPU OEM does not have ratings,
corrigendum.
configurations available and benchmakred.
Also this will help GIL to have better options with better Rack Space, Power Cooling for Data Center. without any compromise on
performacne.

Modification : We request GIL to revise the following benchmarks , as available on https://mlcommons.org/ Please refer to last Corrigendum-2 dated
DLRM-dcnv2 – 3.6 minutes or less on single node DLRM-dcnv2 – 3.75 minutes or less on single node 15.03.2025. Tolerance of 25% is allowed on the
GNN – 7.8 minutes or less on single node GNN – Please clarify are we referring to RGAT benchmark for GNN? benchmark in AI Training model.
11 Nodes for AI training: Total Qty-4 sets. Benchmarks, Page-27 -
ResNet – 12.1 minutes or less on single node ResNet – 13.25 minutes or less on single node
U-Net3D – 11.6 minutes or less on single node U-Net3D – 12.42 minutes or less on single node if GNN is not available, RGAT is accepted instead
Justification : We requst to amend as request for Wider OEM participation in the bid. of GNN.

OS Support : The system should support latest version of Red Hat Enterprise Linux
Corrigendum 2 : 10. Minimum Technical Specification, Inference
12 /Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server, Quoted OS should be We requesg GIL to clarify, which of these to be quoted as part of solution for better clarity!! please refer Corrigendum-03.
Node, Page-28
under Enterprise support from OEM with premium or highest level of support.

Corrigendum 2 : 10. Minimum Technical Specification, Inference We request GIL to define the user base, as this ask of Unlimited Guest OS, will force bidder to quote highest level of subscritpion license,
13 OS Support : Supply should include DC edition unlimited Guest OS licenses - please refer Corrigendum-03.
Node, Page-28 escalating the cost, which may not be utilized by GIL for years to come!! Hence defineing the user license will make it optimized.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
Modification : The solution should be PFS (Parallel File System) OR NFS ( Network File System) based and delivered with 1PB (All
NVMe) usable post RAID 6/equivalent or better protection, expandable up to 2PB
"The solution should be PFS (Parallel File System) based and delivered with 1PB
in the same file system.
(All NVMe) usable post RAID 6/equivalent or better protection, expandable up to
14 Storage Nodes, External Storage, Page -30 Justification :We request GIL to refer to corrigendum 1, NFS solution was allowed with required performcne, however in Corrigneudm 2 - please refer Corrigendum-03.
2PB in the same file system.
the NFS has be removed and PFS has been introduced with unresonably HIGH performacne number, which has no direct relationship with
"
overall Training or Inferencing nodes throughput and network calcualtions!! Hence request GIL to relax this restrictive clasue for wider
OEMs participation.

Modification : (As per Corrigendum 1). Performance: 28GBps Read and 14GBps Write from day one and scalable up to >100GBps read
write combinations
with a scale-out architecture and additional controllers/nodes in the future.
OR
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day
one and scalable up to 200% with a scale-out architecture and additional controllers/
nodes in the future.

Justification :: We request GIL to allow NFS solution, as all current data is on NFS solution, converting, migrating NFS to PFS will take lot
of time, that will further delay the productivity of the AI/ML setup. !! Relaxing this restrictive clause will help wider OEM participation
with established results, deployments of AI/ML Solution , will Even NVIDIA does not have any bias towards PFS, as both perform well for
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day
AI/ML environment. Current Specifications are OEM Specific and limiting participation by Leading OEMs.
one and scalable up to 200% with a scale-out architecture and additional
controllers/nodes in the future.
Please refer the link shared again here with for detailed view, where it clearly shows NVIDIA SuperPod recommendation as well
IOPS: minimum 8,00,000
15 Storage Nodes, External Storage, Page -30 As per Nvidia documentation, for DGX superpod, available at Storage Architecture — NVIDIA DGX SuperPOD: Next Generation Scalable - please refer Corrigendum-03.
1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs.
Infrastructure for AI Leadership Reference Architecture Featuring NVDIA DGX H200 , the storage performance is as under
2. NVMe Storage offered must be certified with the proposed GPU OEM.
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
all nodes as per proposed solution.
Single SU* (256 GPU) aggregate system read = 125GBps
Single SU* (256GPU) aggregate system write = 62GBPs

*A single Scalable Unit (SU) in an NVIDIA DGX SuperPOD consists of 32 DGX systems, each system with 8 GPUs of H200

Hence for a 56GPU (4 training nodes with 8 GPU per node, and 12 inference server with 2 GPU per node) system, the maximum
throughput we should need is as under:

Aggregate system read = 28GBps

Aggregate system write = 14GBps

Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
InfiniBand / Ethernet (200GBps or higher) as required for
16 Master Node, Page-24, Network Section - please refer Corrigendum-03.
quoted storage delivery to nodes Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity" the mentioned GBps is not the
correct rating ,as the HBA/NIC are available from leading OEMS.

Modification: Required 200G or higher Ethernet 2 x twin-port HCA as required for quoted storage delivery to node
d) Required InfiniBand / 200G or higher Ethernet 2 x twin-port As per RFP and time to time published
17 Nodes for AI Training, Page-25, Network Section -
HCA as required for quoted storage delivery to node corrigendum
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"

Modification: Required two switches, each with 64 non-blocking ports and 1RU or 2RU form factor with aggregate data throughput up to
51.2 Tb/s and required compatible cables of appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet
f) Required switch with 64 non-blocking ports with of all nodes in non-blocking mode
Clarification:
aggregate data throughput up to 51.2 Tb/s and required
The bidder shall have to deploy required
18 Nodes for AI Training, Page-25, Network Section compatible cables of appropriate length to connect all 8 Justification: Chassis based switches can be very power hungry. 1 or 2RU form factor switches consume power around 2000W depending -
quantity of switch with said functionality to
(compute communication) nos. of IB NDR ports / Ethernet on the number of transceivers used.
complete the solution.
of all nodes in non-blocking mode
Also, with 2 switches, a redundancy can be formed in the back-end GPU connectivity so that failure of one switch can still allow GPU-to-
GPU traffic to flow through at 400G

Modification: Ethernet (200Gbps or higher) for storage delivery


19 Nodes for AI Training, Page-26, System Network Section 2. InfiniBand /Ethernet (200GBps or higher) for storage delivery - please refer Corrigendum-03.
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"

Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
4. InfiniBand / Ethernet (200GBps or higher) as required for quoted storage
20 Inference Node, Page 28, System Network Section - please refer Corrigendum-03.
delivery to nodes
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"

please refer corrigendum-03.

Clarification:
Modification: Min. Four or higher Nos. of Switch with 48 *10/25 G SFP+ and 6 x 100G QSFP ports or higher to connect to Core for all of
The bidder must deploy the required quantity of
“Master Node”, “Node for AI Training” and “Inference Node” to form Cluster Communication and Perimeter N/W. Switch must support of
switches with the same or higher functionality to
MLAG/MCLAG and EVPN-Multihoming feature.
Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G meet the solution requirements.
QSFP ports to connect all of “Master Node”, “Node for AI Training” and The switch count shall be adjusted
21 Networking Switch, Page-28 Justification: Ideally, it is better to physically separate the perimeter traffic and cluster orchestration traffic. Across all servers, the total
“Inference Node” to form Cluster Communication and Perimeter N/W. (increased/decreased) based on the actual port
perimeter ports are 46 and orchestration node ports are 46. Additional layer of segregation can be done using VLAN, VxLAN and VRFs.
Switch must support of MLAG /MCLAG feature availability per device while maintaining the
Secondly, 25G standard is becoming highly common in DC use case. Having 25G as switch port option can allow adding servers with 25G
specified speed and functionality.
ports, instead of buying new switches. Additionally, 6x 100G is adequate to connect to uplink towards Core / Spine switch. Multihoming
EVPN -multihoming feature can be provided, in
allows the switch to form a setup similar to "MC-LAG" without needing to directly interconnect switches.
case it is required to complete the solution.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
please refer corrigendum-03.
Modification: Min. Two or higher Nos. of Switch with 48 *1G RJ45, 4* 10G SFP28 and 2 x 100G QSFP ports to connect all of “Master
Node”, “Node for AI Training” and “Inference Node” to form In-band and BMC/Out-of-band Management N/W.
Clarification:
Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28 and 2 x The bidder shall deploy the required quantity of
Justification: There are a total of 23 servers (Master, Training and Inferencing). In addition, there will be other devices like switches,
22 Networking Switch, Page-28 100G QSFP ports to connect all of “Master Node”, “Node for AI Training” and switches with equal or higher capability to meet
firewall, storage, load-balancers, etc. These devices will be spread across multiple racks considering the power constraint of the rack and
“Inference Node” to form In-band and BMC/Out-of-band Management N/W the solution’s needs.
the limit on the number of devices it can have. Having 1 OOB switch can become an issue with respect to cable laying across the racks
The switch count may be adjusted
including the distance. That's why to ensure that all bidders are able to follow appropriate DC design standard, minimum 2 OOB switches
(increased/decreased) based on actual device
will be helpful / necessary.
port availability, provided the specified speed
and functionality are maintained.

please refer corrigendum-03.

Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of Switch with 64* 100GbE QSFP ports to quantity of switches with said or higher
connect all of “Master Node”, “Node for AI Training” and “Inference Node” to form User N/W. Switch must support of MLAG /MCLAG and functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of EVPN-Multihoming feature. multihoming is not mandatory with the solution
Switch with 64* 100GbE QSFP ports to connect all of “Master Node”, “Node for AI
23 Networking Switch, Page-28 -
Training” and “Inference Node” to form User N/W. Switch must support of MLAG Justification: A Server NIC usually have 2x 100G NIC, instead of 1. Moreover, single connectivity to switch (instead in HA) can be a single Increase/decrease the count of switches as per
/MCLAG feature point of failure and losing connectivity to user traffic is a switch or transceiver or fiber cable were to go bad. Therefore, specifying two the ports available with the mentioned speed
switches which can be in HA will be good option to ensure all bidders comply to minimal baseline. Multihoming allows the switch to form and functionality to complete the solution.
a setup similar to "MC-LAG" without needing to directly interconnect switches.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.

Timeline for resolution is within 4 hours from the time of call logged / reported to
We request GIL to amend it as NBD ( Next Business Day) resolution for Hardware related issues, during working days. As sparing locally for
24 Corrigendum -2 , SLA & Penalties ( Operational Penalties), Page-23 Bidder/OEM. - please refer corrigendum-03.
such Hardware is not feasible for OEMS.

We would like to draw GILs attention to these clasues that have been relaxed in Corrigendum 2 , will allow unproven systems being
Certification : Undertaking from Sever OEM for compatibility of the proposed quoted, which are neither tested nor certified by GPU OEMS or benchmarked and submiitted, auditted results to reputed refrence sites
As per RFP and time to time published
25 sever with GPU under the quoted Inference node must be submitted (duly signed like MLcommons.org, that is reffrred, maintain by leading OEMs. -
corrigendum
by authorized signatory , mentioning Bid reference) This relaxation , will risk the unproven infrastrucure being proposed, resulting into undesired, uncertain results. There are OEMs, who
would jsut extrapolate and and submitt understakings!! risking the compelet bid.

Corrigendum 2 : 10. Minimum Technical Specification Master


Benchmarks : If not listed on Mlcommons, bidder shall be required to submit
Node, Training Nodes, Inferecing Nodes Page-24, 27, 28
benchmark report for the Make/Model (same configuration) of the quoted server As per RFP and time to time published
26 -
as part of bid submission. The submission should be on OEM letterhead duly corrigendum
signed and referring the bidder and bid details

Benchmarks : If not listed on spec.org, bidder shall be required to submit


As per RFP and time to time published
27 benchmark report / logs for the Make/Model (same configuration) of the quoted -
corrigendum
server as part of bid submission

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
Corrigendum 2 : 10. Minimum Technical Specification Master 32 Gbps Host Bus Adaptor with required SFP (at both end) for We request GIL to clarify, what does (at both end ) means here !!! SFPs (if required) for the supplied servers
28 -
Node, Page-24 & Inferencing nodes, Page-27 connecting with existing SAN switch and storage. Which Existing SAN Switch we need to integrate. Kindly share Make/Model no of the SAN Switch. maintaining a card level redundency.

Please refer to corrigendum for make & model


of existing SAN switches and storage.

Network : a) Infiniband / Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
Corrigendum 2 : 10. Minimum Technical Specification Nodes for Network : a) Infiniband / Ethernet (200GBps or higher) as required for quoted Justification : The Network card be it infiniband or Ethernet the ratings are in Gbps and NOT GBps, the same can be noted in Training
29 - please refer Corrigendum-03.
AI training, Page-25 storage delivery to nodes nodes, where Gbps is specified., we request GIL to amend appropraitly, refer to link : https://www.nvidia.com/content/dam/en-
zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf

Modification Internal Storage : For Operating System: minimum 1.92 TB NVMe drives OR M.2 960GB NVMe drives .
Justification : Current changes in Corrigendum 2 are restricting OEMs to participate, earlier in Corrigendum 1, this was allowed.
Corrigendum 2 : 10. Minimum Technical Specification AI Training Internal Storage : For Operating System: minimum 1.92 TB NVMe drives
30 Hence request GIL to relax the clasue for wider OEMs participation. without impacting any perfomrance of the nodes. as Operating system - please refer Corrigendum-03.
Node, Page-25 & Inferecing Nodes, Page-27
are small in size and does not requried to be mised with performacne capacity drives and large dries .

Modification : Following N/W are required:


System Network : Following N/W are required:
1. NDR Infiniband / Ethernet (400Gbps or higher) for compute communication
Corrigendum 2 : 10. Minimum Technical Specification AI Training 1. NDR Infiniband / Ethernet (400GBps or higher) for compute
31 2. Infiniband /Ethernet (200Gbps or higher) for storage delivery - please refer Corrigendum-03.
Nodes, Page-25 communication
Justification : The Network card are NOT rated in GBps, as suggested and sahred the link in earlier point, please refer again :
2. Infiniband /Ethernet (200GBps or higher) for storage delivery
https://www.nvidia.com/content/dam/en-zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf

Corrigendum 2 : 10. Minimum Technical Specification Inferencing Power Requriement : Appropriate rated and energy efficient, redundant (N+N)
32 We reqeust GIL to defined the N+N, as N=1, would means N+1 redudnant hotswappable power supply to be offered - please refer corrigendum-03.
Nodes, Page-27 hot swappable power supply and FAN
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries

We request GIL to open up this clause for wider OEMs particiation ,as current clause is restrictive in nature.
As per RFP and time to time published
2 x Accelerators per node, each with minimum 140 GB or higher GPU accelarators
corrigendum.
OR
However, Bidder may propose 2 or higher
4 x Accelarators per node , each with minimum 94GB or higher GPU
accelerator card per node (In inference Node)
OR
Number of GPUs and and numbers of inference node quantity may be
Corrigendum 2 : 10. Minimum Technical Specification, Inference 4x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 6 GPU Nodes for inferencing).
33 GPU Communication : 2 x Accelerators per node, each with minimum 140 GB or - change accordingly. However, total requirement
Node, Page-27 OR
higher GPU per Accelerator. of core and memory may be provided a s
8x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 3 GPU Nodes for inferencing).
mentioned in RFP and time to time published
Justification : we request GIL to consider the suggested changes, as All leading OEM and GPU OEM does not have ratings, configurations
corrigendum.
available and benchmakred.
Also this will help GIL to have better options with better Rack Space, Power Cooling for Data Center. without any compromise on
performacne.

Modification : We request GIL to revise the following benchmarks , as available on https://mlcommons.org/


DLRM-dcnv2 – 3.6 minutes or less on single node DLRM-dcnv2 – 3.75 minutes or less on single node Please refer to last Corrigendum-2 dated
GNN – 7.8 minutes or less on single node GNN – Please clarify are we referring to RGAT benchmark for GNN? 15.03.2025.
34 Nodes for AI training: Total Qty-4 sets. Benchmarks, Page-27 -
ResNet – 12.1 minutes or less on single node ResNet – 13.25 minutes or less on single node if GNN is not available, RGAT is accepted instead
U-Net3D – 11.6 minutes or less on single node U-Net3D – 12.42 minutes or less on single node of GNN.
Justification : We requst to amend as request for Wider OEM participation in the bid.

OS Support : The system should support latest version of Red Hat Enterprise Linux
Corrigendum 2 : 10. Minimum Technical Specification, Inference /Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server, Quoted OS should be
35 We requesg GIL to clarify, which of these to be quoted as part of solution for better clarity!! - please refer Corrigendum-03.
Node, Page-28 under Enterprise support from OEM with premium or highest level of
support.

Corrigendum 2 : 10. Minimum Technical Specification, Inference We request GIL to define the user base, as this ask of Unlimited Guest OS, will force bidder to quote highest level of subscritpion license,
36 OS Support : Supply should include DC edition unlimited Guest OS licenses - please refer Corrigendum-03.
Node, Page-28 escalating the cost, which may not be utilized by GIL for years to come!! Hence defineing the user license will make it optimized.

Modification : (As per Corrigendum 1). Performance: 28GBps Read and 14GBps Write from day one and scalable up to >100GBps read
write combinations with a scale-out architecture and additional controllers/nodes in the future.
OR
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day one and scalable up to 200% with a scale-out architecture
and additional controllers/nodes in the future.

Justification :: We request GIL to allow NFS solution, as all current data is on NFS solution, converting, migrating NFS to PFS will take lot
of time, that will further delay the productivity of the AI/ML setup. !! Relaxing this restrictive clause will help wider OEM participation
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day with established results, deployments of AI/ML Solution , will Even NVIDIA does not have any bias towards PFS, as both perform well for
one and scalable up to 200% with a scale-out architecture and additional AI/ML environment. Current Specifications are OEM Specific and limiting participation by Leading OEMs.
controllers/nodes in the future.
IOPS: minimum 8,00,000 Please refer the link shared again here with for detailed view, where it clearly shows NVIDIA SuperPod recommendation as well. As per
37 Storage Nodes, External Storage, Page -30 - please refer Corrigendum-03.
1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs. Nvidia documentation, for DGX superpod, available at Storage Architecture — NVIDIA DGX SuperPOD: Next Generation Scalable
2. NVMe Storage offered must be certified with the proposed GPU OEM. Infrastructure for AI Leadership Reference Architecture Featuring NVDIA DGX H200 , the storage performance is as under
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
all nodes as per proposed solution. Single SU* (256 GPU) aggregate system read = 125GBps
Single SU* (256GPU) aggregate system write = 62GBPs

*A single Scalable Unit (SU) in an NVIDIA DGX SuperPOD consists of 32 DGX systems, each system with 8 GPUs of H200

Hence for a 56GPU (4 training nodes with 8 GPU per node, and 12 inference server with 2 GPU per node) system, the maximum
throughput we should need is as under:
Aggregate system read = 28GBps
Aggregate system write = 14GBps

Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
InfiniBand / Ethernet (200GBps or higher) as required for
38 Master Node, Page-24, Network Section - please refer Corrigendum-03.
quoted storage delivery to nodes Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity" the mentioned GBps is not the
correct rating ,as the HBA/NIC are available from leading OEMS.

Modification: Required 200G or higher Ethernet 2 x twin-port HCA as required for quoted storage delivery to node
d) Required InfiniBand / 200G or higher Ethernet 2 x twin-port As per RFP and time to time published
39 Nodes for AI Training, Page-25, Network Section -
HCA as required for quoted storage delivery to node corrigendum
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"

Modification: Required two switches, each with 64 non-blocking ports and 1RU or 2RU form factor with aggregate data throughput up to
Clarification:
51.2 Tb/s and required compatible cables of appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet
The bidder must deploy the required quantity of
f) Required switch with 64 non-blocking ports with of all nodes in non-blocking mode
switches with the same or higher functionality to
aggregate data throughput up to 51.2 Tb/s and required
meet the solution requirements.
40 Nodes for AI Training, Page-25, Network Section compatible cables of appropriate length to connect all 8 Justification: Chassis based switches can be very power hungry. 1 or 2RU form factor switches consume power around 2000W depending -
The switch count shall be adjusted
(compute communication) nos. of IB NDR ports / Ethernet on the number of transceivers used.
(increased/decreased) based on the actual port
of all nodes in non-blocking mode
availability per device while maintaining the
Also, with 2 switches, a redundancy can be formed in the back-end GPU connectivity so that failure of one switch can still allow GPU-to-
specified speed and functionality.
GPU traffic to flow through at 400G

Modification: Ethernet (200GBps or higher) for storage delivery


41 Nodes for AI Training, Page-26, System Network Section 2. InfiniBand /Ethernet (200GBps or higher) for storage delivery - please refer Corrigendum-03.
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries

Modification: Ethernet (200GBps or higher) as required for quoted storage delivery to nodes
4. InfiniBand / Ethernet (200GBps or higher) as required for quoted storage
42 Inference Node, Page 28, System Network Section - please refer Corrigendum-03.
delivery to nodes
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"

please refer corrigendum-03.

Clarification:
Modification: Min. Four or higher Nos. of Switch with 48 *10/25 G SFP+ and 6 x 100G QSFP ports or higher to connect to Core for all of The bidder shall have to deploy the required
“Master Node”, “Node for AI Training” and “Inference Node” to form Cluster Communication and Perimeter N/W. Switch must support of quantity of switches with said or higher
MLAG/MCLAG and EVPN-Multihoming feature. functionality to complete the solution. EVPN-
Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G
multihoming is not mandatory with the solution.
QSFP ports to connect all of “Master Node”, “Node for AI Training” and
43 Networking Switch, Page-28 Justification: Ideally, it is better to physically separate the perimeter traffic and cluster orchestration traffic. Across all servers, the total - The switch count may be adjusted
“Inference Node” to form Cluster Communication and Perimeter N/W.
perimeter ports are 46 and orchestration node ports are 46. Additional layer of segregation can be done using VLAN, VxLAN and VRFs. (increased/decreased) based on actual device
Switch must support of MLAG /MCLAG feature
Secondly, 25G standard is becoming highly common in DC use case. Having 25G as switch port option can allow adding servers with 25G port availability, provided the specified speed
ports, instead of buying new switches. Additionally, 6x 100G is adequate to connect to uplink towards Core / Spine switch. Multihoming and functionality are maintained.
allows the switch to form a setup similar to "MC-LAG" without needing to directly interconnect switches. EVPN -multihoming feature can be provided, in
case it is required to complete the solution.

please refer corrigendum-03.

Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 48 *1G RJ45, 4* 10G SFP28 and 2 x 100G QSFP ports to connect all of “Master
quantity of switches with said or higher
Node”, “Node for AI Training” and “Inference Node” to form In-band and BMC/Out-of-band Management N/W.
functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28 and 2 x multihoming is not mandatory with the solution.
Justification: There are a total of 23 servers (Master, Training and Inferencing). In addition, there will be other devices like switches,
44 Networking Switch, Page-28 100G QSFP ports to connect all of “Master Node”, “Node for AI Training” and - The switch count may be adjusted
firewall, storage, load-balancers, etc. These devices will be spread across multiple racks considering the power constraint of the rack and
“Inference Node” to form In-band and BMC/Out-of-band Management N/W (increased/decreased) based on actual device
the limit on the number of devices it can have. Having 1 OOB switch can become an issue with respect to cable laying across the racks
port availability, provided the specified speed
including the distance. That's why to ensure that all bidders are able to follow appropriate DC design standard, minimum 2 OOB switches
and functionality are maintained.
will be helpful / necessary.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.

please refer corrigendum-03.

Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of Switch with 64* 100GbE QSFP ports to
quantity of switches with said or higher
connect all of “Master Node”, “Node for AI Training” and “Inference Node” to form User N/W. Switch must support of MLAG /MCLAG and
functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of EVPN-Multihoming feature.
multihoming is not mandatory with the solution.
Switch with 64* 100GbE QSFP ports to connect all of “Master Node”, “Node for AI
45 Networking Switch, Page-28 - The switch count may be adjusted
Training” and “Inference Node” to form User N/W. Switch must support of MLAG Justification: A Server NIC usually have 2x 100G NIC, instead of 1. Moreover, single connectivity to switch (instead in HA) can be a single
(increased/decreased) based on actual device
/MCLAG feature point of failure and losing connectivity to user traffic is a switch or transceiver or fiber cable were to go bad. Therefore, specifying two
port availability, provided the specified speed
switches which can be in HA will be good option to ensure all bidders comply to minimal baseline. Multihoming allows the switch to form
and functionality are maintained.
a setup similar to "MC-LAG" without needing to directly interconnect switches.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.

Modification T1=T+90 days from the date of issuance of contract over GEM
Supply of the Hardware including Licenses and OEM Warranty Certificate. As per RFP and time to time published
46 Justification: OEM Delivery timeline for Hardware taked minimum 3 months. Hence request authority to kindly look into this and amend -
T1=T+60 days from the date of issuance of contract over GEM corrigendum.
the clause.
Please refer to RFP and time to time published
47 POC to meet the benchmark as mentioned in this RFP document. We request Authority to kindly provide POC Evaluation criteria -
Corrigendum
Clarification:
After Successful POC which is successful demonstration of the benchmark as We request Authorities to kindly elobarote the detail Benchmark criteria for successful DEMO and also provide scripts / tests to perform
48 - Will be share at the time of POC.
mentioned in RFP, benchmark testing

49 Submission Date Extension Request authority to kindly give extension of BID subimission date of 20 working days from the date of publication of query responses. - please refer corrigendum-03.

We request Authority to kindly provide approval for site survey for study of
- possibilities of integration
50 Require Site Visit - Please refer to RFP and time to time published Corrigendum. Bidder can si
- Understanding on integration / Networking with existing infrastructure
- Overall Power requirement against needs

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
51 24 Clarification- No of 32G FC ports required per server and card level redundancy required or not -
existing SAN switch and storage. maintaining a card level redundency.

Please refer to corrigendum for make & model


of existing SAN switches and storage.

This in line with point 9 of page 18, where is is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
52 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
53 27 Clarification- No of 32G FC ports required per server and card level redundacy required or not
existing SAN switch and storage. maintaining a card level redundency.

Please refer to corrigendum for make & model


of existing SAN switches and storage.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
Clarification:
The bidder can quote the product on higher side
54 25 c) Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or higher (Fiber/Copper) port. Change to-c) Minimum 1 no. of 1 GbE port and 2 4 nos of 10 GbE or higher (Fiber/Copper) port. Wider Participant and get better perfomance and redundancy
meeting the requirements to complete the
solution.
AI Training and Inference Nodes - If not listed on Mlcommons, bidder shall be
Kindly change it to ' If not listed on Mlcommons, bidder shall be required to submit benchmark report OR give an undertaking to meet the
required to submit benchmark report for the Make/Model (same configuration) Kindly allow us to give an undertaking to meet the benchmark target timings during POC (with upto 25 % Please refer to RFP and time to time published
55 27 and 29 Benchmark target timings during POC (with upto 25 % tolerance as mentioned in the RFP) for the Make/Model (same configuration) of the
of the quoted server as part of bid submission. The submission should be on OEM tolerance as mentioned in the RFP). Corrigendum.
quoted server as part of bid submission. The submission should be on OEM letterhead duly signed and referring the bidder and bid details.
letterhead duly signed and referring the bidder and bid details.
The proposed storage array should be configured with no single point of failure,
The proposed storage array should be configured with no single point of failure, including required controllers, cache (if applicable), power As per RFP and time to time published
56 30 including required controllers, cache, power supply, cooling fans, etc. It should be Some Storage solutions do not come with cache in the controllers, hence the request for change.
supply, cooling fans, etc. It should be scalable up to 12 additional controllers/nodes. corrigendum
scalable up to 12 additional controllers/nodes.
57 30 IOPS: minimum 8,00,000 IOPS: minimum 8,00,000 Read AI Storage Systems require a high Read Performance, hence the suggestion for change. please refer Corrigendum-03.
As per RFP and time to time published
58 30 NVMe Storage offered must be certified with the proposed GPU OEM. NVMe Storage offered must be certified/compatible with the proposed GPU Server OEM. To ensure wider choice of Storage solutions. Kindly approve.
corrigendum
Page 26 - Scalability, Cluster and
Page 26 already mentions Scalability, Cluster and Management Hardware and software. Kindly remove the clause of 'Cluster Management Having only the requirement of 'Scalability, Cluster and Management Hardware and software', will help in As per RFP and time to time published
59 27 Management Hardware and software and Page 27 and 29 - Cluster Management
& Scheduler and hardware ', for the Training and Inference nodes since the features asked are proprietary. providing a uniform cluster tool. corrigendum
& Scheduler and hardware

please refer corrigendum-03.


Networking Switch : (2). Min. One or required Nos. of Switch with 48 *1G RJ45, 4*
Clarification:
25G SFP28
The bidder shall have to deploy the required
and 2 x 100G QSFP ports to connect all of “Master Node”, “Node for AI Training” Change to - (2). Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28 or 50G SFP56 ports to connect all of “Master Node”,
quantity of switches with said or higher
60 28 and “Inference Node” to form In-band and BMC/Out-of-band Management N/W. “Node for AI Training” and “Inference Node” to form In-band and BMC/Out-of-band Management N/W. Required cables of appropriate
functionality to complete the solution.
Required cables of appropriate length, transceivers should be supplied. Switch length, transceivers should be supplied. Switch should have redundant Power Supply. 5 Years Comprehensive Onsite Warranty.
The switch count may be adjusted
should have redundant Power Supply. 5 Years
(increased/decreased) based on actual device
Comprehensive Onsite Warranty.
port availability, provided the specified speed
and functionality are maintained.

NFS over RDMA is a superior choice to PFS for AI/ML workloads using GPU Direct due to its lower latency,
higher throughput, and improved scalability. By leveraging RDMA's direct memory-to-memory transfer,
NFS over RDMA reduces latency and overhead, making it ideal for massive data transfers required in AI/ML
The solution should be PFS (Parallel File System) based and delivered with 1PB (All
The solution should be PFS (Parallel File System) or NFS over RDMA based and delivered with 1PB (All NVMe) usable post RAID workloads. Additionally, NFS is a standardized protocol, simplifying management and integration with
61 NVMe) usable post RAID 6/equivalent or better protection, expandable up to 2PB please refer Corrigendum-03.
6/equivalent or better protection, expandable up to 2PB in the same file system. existing infrastructure. In contrast, PFS, while designed for high-performance computing, can be complex to
in the same file system.
set up and manage, and may not scale as well as NFS over RDMA. When combined with GPU Direct, NFS
over RDMA enables faster data transfer, reduced latency, and improved overall performance by offloading
data transfer tasks from the CPU and allowing it to focus on compute-intensive tasks.

TLC NVMe drives are well-suited for AI/ML workloads due to their high capacity, lower cost, improved
performance, and increased endurance, offering a better balance of performance, capacity,while As per RFP and time to time published
63 1PB (NVMe) usable post RAID 6 or better configuration 1PB (NVMe TLC drives) usable post RAID 6 or better configuration
significantly outperforming traditional HDDs and SATA SSDs, making them a practical choice for AI/ML corrigendum
applications that require rapid data access, large datasets, and frequent data writes.

According to the validated architecture jointly developed by Nvidia and NetApp, a read performance of 45
GBps is more than sufficient to support 128 GPUs. Given that our current requirement is for 32 GPUs, the
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day proposed solution will provide ample headroom for future expansions, easily supporting up to 64 GPUs
Performance: Min 120 40 GBps Read and Min 60 15 GBps Write aggregated from day one and scalable up to 200% with a scale-out
65 one and scalable up to 200% with a scale-out architecture and additional control- without compromising performance. This ensures a scalable and future-proof infrastructure that can grow please refer Corrigendum-03.
architecture and additional controllers/nodes in the future.
lers/nodes in the future. with our evolving needs. Please refer to the below reference

https://docs.netapp.com/us-en/netapp-solutions/ai/aipod_nv_validation_sizing.html#solution-
validation

Throughput is a more critical metric than IOPS for AI/ML workloads, as these workloads typically involve
66 IOPS: minimum 8,00,000 IOPS: minimum 8,00,000 processing large datasets and sequential data access patterns, requiring high-throughput storage solutions please refer Corrigendum-03.
to transfer data quickly, whereas IOPS measures small I/O operations, which is less relevant for AI/ML
workloads that prioritize high-speed data transfer.

67 1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs. Query not clear
68 2. NVMe Storage offered must be certified with the proposed GPU OEM. Query not clear
69 Query not clear
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
70 Query not clear
all nodes as per proposed solution.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries

Including tamperproof snapshots and ransomware protection in the storage specifications for AI/ML
Security : workloads is crucial to ensure data integrity, version control, compliance, and rapid recovery. AI/ML
1. Offered Storage solution must offer tamperproof snapshots for the data with capability to automatically create snapshot and expire workloads are data-intensive and require high-performance storage solutions, making it essential to protect
them by defining retention period. minimum 1000 snapshots must be supported. data from unauthorized modifications, deletions, and ransomware attacks. Tamperproof snapshots provide As per RFP and time to time published
71 Additional points
a reliable way to track changes and maintain version control, while ransomware protection ensures real- corrigendum
2. Offered Storage solution must support native or addon solution to Identify Ransomware Attacks, Take autonomous actions to protect time detection and prevention of attacks, enabling rapid recovery and minimizing downtime. Furthermore,
the data from ransomware attacks, report the attack to administrators and offer recovery capabilities to the administrators. these features are critical for regulated industries, such as healthcare and finance, where strict data
protection and retention policies are mandatory.

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
72 24 Clarification- No of 32G FC ports required per server and card level redundancy required or not
existing SAN switch and storage. maintaining a card level redundency.

Please refer to corrigendum for make & model


of existing SAN switches and storage.

This in line with point 9 of page 18, where is is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
73 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.

Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
74 27 Clarification- No of 32G FC ports requored per server and card level redundacy required or not
existing SAN switch and storage. maintaining a card level redundency.

Please refer to corrigendum for make & model


of existing SAN switches and storage.

Clarification:
The bidder can quote the product on higher side
75 25 c) Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or higher (Fiber/Copper) port. Change to-c) Minimum 1 no. of 1 GbE port and 2 4 nos of 10 GbE or higher (Fiber/Copper) port. Wider Participant and get better perfomance and redundancy
meeting the requirements to complete the
solution.

AI Training and Inference Nodes - If not listed on Mlcommons, bidder shall be


Kindly change it to ' If not listed on Mlcommons, bidder shall be required to submit benchmark report OR give an undertaking to meet the
required to submit benchmark report for the Make/Model (same configuration) Kindly allow us to give an undertaking to meet the benchmark target timings during POC (with upto 25 % Please refer to RFP and time to time published
76 27 and 29 Benchmark target timings during POC (with upto 25 % tolerance as mentioned in the RFP) for the Make/Model (same configuration) of the
of the quoted server as part of bid submission. The submission should be on OEM tolerance as mentioned in the RFP). Corrigendum.
quoted server as part of bid submission. The submission should be on OEM letterhead duly signed and referring the bidder and bid details.
letterhead duly signed and referring the bidder and bid details.

The proposed storage array should be


configured with no single point of failure,
including required controllers, cache, power
The proposed storage array should be configured with no single point of failure, supply, cooling fans, etc. It should be scalable up
The proposed storage array should be configured with no single point of failure, including required controllers, cache (if applicable), power
77 30 including required controllers, cache, power supply, cooling fans, etc. It should be Some Storage solutions do not come with cache in the controllers, hence the request for change. to 12 additional controllers/nodes.
supply, cooling fans, etc. It should be scalable up to 12 additional controllers/nodes.
scalable up to 12 additional controllers/nodes. If controller-based cache is unavailable,
alternative acceleration mechanisms (e.g.,
NVMe, distributed caching, tiered memory)
should be supported

78 30 IOPS: minimum 8,00,000 IOPS: minimum 8,00,000 Read AI Storage Systems require a high Read Performance, hence the suggestion for change. please refer Corrigendum-03.
As per RFP and time to time published
79 30 NVMe Storage offered must be certified with the proposed GPU OEM. NVMe Storage offered must be certified/compatible with the proposed GPU Server OEM. To ensure wider choice of Storage solutions. Kindly approve.
corrigendum
Page 26 - Scalability, Cluster and
Page 26 already mentions Scalability, Cluster and Management Hardware and software. Kindly remove the clause of 'Cluster Management Having only the requirement of 'Scalability, Cluster and Management Hardware and software', will help in As per RFP and time to time published
80 27 Management Hardware and software and Page 27 and 29 - Cluster Management
& Scheduler and hardware ', for the Training and Inference nodes since the features asked are proprietary. providing a uniform cluster tool. corrigendum
& Scheduler and hardware
Kindly be noted that the above-men oned clause is a clear viola on of Office Memorandum P-
45014/33/2021-BE-II (E-64737) dated 20th December 2022 & P-45021/121/2018-(B.E.-II) dated 20th June
2019 issued by DPIIT, which clearly cites “common examples of
restric ve and discriminatory condi ons against the local suppliers” and Sub-clause ‘e’ of ‘Clause 1’ in the
ANNEXURE-A
expressly states that “Excessive past experience requirement, not commensurate with the proven
experience expected from Bidder for successful execu on of contract”
In additon to the above, we would like to highlight that the above highlighted pre-qualification conditon is
deviating
from GFR rules. General Financial Rules (GFR), 2017, Clause b i.e. Particular Construction Experience and
Key Production Rates of subclause 2
(iii) i.e., Pre-qualification Criteria on Page No.33 and 34 under Chapter 3 of the Manual for Procurement of
The OEM should have executed similar GPU setup for min 3 clients in last 5 Years Works 2022 issued by DOE. The clause b of sub-clause (iii) states that:
81 Clause No. 5 under "Eligibility Conditions," in India as on date of bid submission. Out of which One client deployment should “The applicant should have:1. successfully completed or substan ally completed similar works during last please refer Corrigendum-03.
be One project having similar works total value of INR 125 Cr. seven years ending last day of month previous to the one in which applica ons are invited should be either
of the following:
-1.1 Three similar completed works cos ng not less than the amount equal to 40(forty) percent of the
es mated cost;
or1.2 Two similar completed works cos ng not less than the amount equal to 50 (fi y) percent of the
es mated cost;
or1.3 One similar completed work cos ng not less than the amount equal to 80 (eighty) percent of the
es mated cost; In view of the above , it is per nent to men on here that clause no. 5 men oned in the
Eligibility Condi ons is taking away the opportunity to par cipate from poten al OEMs who have strong
experience in deploying GPU Clusters ,
which will limit the compe on . In view of above we request you to please modify the clause in order to
avoid
restric ve par cipa on and providing fair opportunity to all.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries

It may please be noted that Clause No. 5 under "Eligibility Conditions," related to the pre-qualification
criteria
for participating OEMs, appears to be contrary to the General Financial Rules (GFR), 2017.
The clause, under Particular Construction Experience and Key Production Rates, states that the applicant
In light of the above, we respectfully request that Clause No. 5 be reviewed and suitably amended in alignment
The OEM should have executed similar GPU setup for min 3 clients in last 5 Years must
with the GFR, 2017, to provide a fair and inclusive opportunity for all eligible participants.
in India as on date of bid submission. Out of which One client deployment should have:
82 Clause No. 5 under "Eligibility Conditions," Furthermore, since the proposed solution involves multiple vendors, we kindly request an extension of the bid please refer Corrigendum-03.
be One project having similar works total value of INR 125 Cr. 1. Successfully completed or substantially completed similar works during the last seven years ending on
submission deadline by at least 7 additional days to accommodate the coordination and compliance
Note: Similar works means SITC OF GPU ACCELERATED with multiple GPU Node. the last day of the month previous to the one in which applications are invited, in either of the following
requirements effectively.
ways:
o 1.1 Three similar completed works costing not less than 40% of the estimated cost; or
o 1.2 Two similar completed works costing not less than 50% of the estimated cost; or
o 1.3 One similar completed work costing not less than 80% of the estimated cost.

The bidder should have experience of set up of GPU


1. Eligibility Conditions: (Point no 4.1) This eligibility point 4.1 contradicts to eligibility point 4.0 in revised RFP (corrigendum 2). Therefore, we request to consider bidders
83 server base solution with Cumulative 15 nos. of GPUs - please refer Corrigendum-03.
Page 16 experience of supply & implementation of higher Core Servers as covered in point 4.0 OR remove this clause.
in last 5 Years in India.
Supply of the Hard ware including Li censes and OEM Warranty Cer ficate.
Supply of the Hard ware including Li censes and OEM Warranty Cer ficate. As per RFP and time to time published
84 T1=T+60 days from the date of issuance of contract -
T1=T+90 days from the date of issuance of contract over GEM corrigendum
over GEM
8. IMPLEMENTATION TIMELINES & PENALTIES: Installa on, commissioning & integra on of GPU serv ers at GSDC along
Installa on, com missioning & integra on of GPU serv ers at GSDC along with HLD , LLD doc uments As per RFP and time to time published
85 with HLD , LLD doc uments -
T2=T1+60 days corrigendum
T2=T1+30
Deployment of required Skilled Resource at GSD Deployment of required Skilled Resource at GSD As per RFP and time to time published
86 -
T2+7 Days T2+30 Days corrigendum
Overall ( Sr. no- 2 to 6) Penalty CAP not be more than 10 % of the total GEM As per RFP and time to time published
87 We request for Maximum Penalty Capping @5% of the total GEM order value for Implementation Timelines & Penalties. -
order value for IMPLEMENTATION TIMELINES & PENALTIES: corrigendum
Uptime of solution <=99.741%
We request you to revise this penalty clause as
In case of failure of proposed solution and non-maintaining
Uptime of solution <=99.741%
targeted value, 0.5% of Billable Quarterly O&M and Man- As per RFP and time to time published
88 b. SLA for Uptime (99.741%) In case of failure of proposed solution and non-maintaining targeted value, 0.05% of Billable Quarterly O&M and Man-power -
power payment for every hourly delay or part thereof pro- corrigendum
payment for every hourly delay or part thereof pro-portionately in resolution; with max cap of 10 % of O & M and Manpower cost /
portionately in resolution; with max cap of 10 % of GEM
Value.
order value

Clarification:
Both end in the clause refer to 32 Gbps SFP
Clarification- No of 32G FC ports required per server and card level redundancy required or not. required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
89 24
existing SAN switch and storage. Also, Regarding the existing FC port in DC, it is mentioned that 16G SFP are available, however as per RFP you have asked 32GB SFP at both maintaining a card level redundency.
end, kindly confirm if we can use 16GB or it is mandatory to use 32 GB SFP at both sides?
Please refer to corrigendum for make & model
of existing SAN switches and storage.

Clarification:
We are assuming that we need to set-up the required infra at GIL site, but all the AI/ML workloads and use cases will be responsibilities of Please refer manpower clause of the RFP and
90 17 SOW, Bidder has to deploy propose solution for inference and AI training model
GIL, bidder has no role to play in it once the underlying infra is ready. time to time published corrigendum.

This in line with point 9 of page 18, where it is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
91 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.

Clarification:
Both end in the clause refer to 32 Gbps SFP
Clarification- No of 32G FC ports required per server and card level redundacy required or not. required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
92 27
existing SAN switch and storage. Also, Regarding the existing FC port in DC, it is mentioned that 16G SFP are available, however as per RFP you have asked 32GB SFP at both maintaining a card level redundency.
end, kindly confirm if we can use 16GB or it is mandatory to use 32 GB SFP at both sides?
Please refer to corrigendum for make & model
of existing SAN switches and storage.

Clarifications:
The proposed solution should support GPU
virtualization, enabling the efficient sharing of
GPU resources across multiple virtual machines
and containerized environments. It should be
compatible with industry-standard hypervisors
and container orchestration platforms,
supporting vGPU or GPU passthrough
The proposed solution should support for sharing of GPU across multiple virtual mechanisms.The solution should support GPU
Kindly clarify the on below points 1. Which hypervisor will be used for VM-based workloads. 2. do we need to provison hypervisor licesnes
93 17 environments and virtualization, enabling shared GPU resources
or GIL will provide the same. 3. will container and VM co-exist on the same hardware.
containers. across multiple VMs and containerized
applications instead of dedicating them to a
single instance. It should allow simultaneous
access for multiple workloads, support vGPU or
GPU passthrough for fractional GPU allocation,
and be compatible with industry-standard
hypervisors and container platforms like
VMware vSphere, Microsoft Hyper-V, KVM,
Docker, and Kubernetes.

Onsite replacement of faulty hardware and skills support directly from hardware OEM is utmost important
Server, Storage & Switch OEM must have local service support depot in Gujarat preferably in Gandhinagar/Ahmedabad since last 5 years As per RFP and time to time published
94 New Clause to be incorporated in solution led bids wherein uptime and SLA are paramount and OEM skills and support on-site is mandatory
as on date of RFP release date. corrigendum
and important.
Both Bidder and OEM should be mandatorily registered under Indian Companies Act 1956, Act 2013 for As per RFP and time to time published
95 New Clause to be incorporated All OEMs (Hardware & Software) must be be a company registered in India under the Companies Act 1956, Act 2013
Indian laws to be applicable on these entities and to make them accountable under Indian Judicial Laws. corrigendum
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
Security Features:
1. Immutable Silicon Root of Trust with component integration like network card, GPU etc.
2. Secure Recovery - Ability to rollback firmware
3. FIPS 140-2 validation
Some basic Security Features are defined in the RFP, however, these addl. Parameters are more relevant As per RFP and time to time published
96 New Clause to be incorporated 4. One-button Secure Erase
and critical for system security. corrigendum
5. Common Criteria certification
6. Advanced Encryption Standard (AES) and Triple Data Encryption Standard (3DES) on browser
7. Support for Commercial National Security Algorithms (CNSA)
8. Secure Configuration Lock
Security Features:
1. Trusted Platform Module 2.0
2. Secure Firmware
3. Detect and recover for BIOS tamper-free updates
No Security Features are defined for AI Nodes in the RFP, hence these addl. parameters are must required As per RFP and time to time published
97 New Clause to be incorporated 4. Secure Recovery - Ability to rollback firmware
and critical for system security. corrigendum
5. ACPI 6.3 Compliant
6. UEFI 2.8
7. SMBIOS 3.4 or later
8. Malicious Code Free design" (to be certified by OEM)
Security Features:
1. Immutable Silicon Root of Trust with component integration like network card, GPU etc.
2. Secure Recovery - Ability to rollback firmware
3. FIPS 140-2 validation
Some basic Security Features are defined in the RFP, however, these addl. Parameters are more relevant As per RFP and time to time published
98 New Clause to be incorporated 4. One-button Secure Erase
and critical for system security. corrigendum
5. Common Criteria certification
6. Advanced Encryption Standard (AES) and Triple Data Encryption Standard (3DES) on browser
7. Support for Commercial National Security Algorithms (CNSA)
8. Secure Configuration Lock
As per RFP and time to time published
corrigendum.
However, Bidder may propose 2 or higher
Requirement mentions min. 2 x Accelerators per Node. accelerator card per node (In inference Node)
Please Clarify, can we Offer more Accelerators per Node. Presently the RFP clause is contradictory as it and numbers of inference node quantity may be
Latest Generation Intel® Xeon® platinum or AMD Epyc scalable processors with Latest Generation Intel® Xeon® platinum or AMD Epyc scalable processors with Minimum Dual 32-Core, with Min 2 X GPU Accelerator.
99 mentions min. 2 x accelerators but the Inference Node Qty is Fixed. This will be disadvantage for vendor change accordingly. However, total requirement
Minimum Dual 32-Core, with Min 2 X GPU Accelerator. Total Qty of Inference server must contain atleast 24 x Accelerators.
offering more than 2 x Accelerators per Inference Node. of core and memory may be provided a s
Request you to mention total accelerators required for the Inference Solution. mentioned in RFP and time to time published
corrigendum.

External Storage
Minimum Technical Specifications In case PFS (Parallel File System) is asked IOPS: minimum 8,00,000 Self-certification of PFS by Storage OEM for proposed GPU will suffice the interoperability and performance
100 please refer Corrigendum-03.
Storage Nodes IOPS: minimum 8,00,000 NVMe Storage offered must be either certified with the proposed GPU OEM or must be self-certified by Storage OEM for proposed GPU. requirements of GIL.
NVMe Storage offered must be certified with the proposed GPU OEM
External Storage
Minimum Technical Specifications There are no references to IOPS in NVIDIA SUPERPOD documents and AI NFS Storage will have very high
101 In case NFS based Storage is asked IOPS: minimum 8,00,000 READ please refer Corrigendum-03.
Storage Nodes number of READ IOPS, hence IOPS requirement needs to be READ IOPS.
IOPS: minimum 8,00,000
External Storage
In case NFS based Storage is asked
Minimum Technical Specifications The proposed storage array should be configured with no single point of failure, including controllers (at least 2 controllers per disk tier), As per RFP and time to time published
102 The proposed storage array should be configured with no single point of failure, 2 Controllers are industry standard, please keep it to 2.
Storage Nodes cache, power supply, cooling fans, etc. It should be scalable up to 12 additional controllers / nodes. corrigendum
including controllers (at least 3 controllers per disk tier), cache, power supply,
cooling fans, etc. It should be scalable up to 12 additional controllers / nodes.

External Storage
Performance: 40GBps 100% Read from day one and scalable up to 80GBps 100% Read with a scale-out architecture and additional Performance in NFS Storage is better measured on Read throughput and also these numbers are available
Minimum Technical Specifications In case NFS based Storage is asked
103 controllers/nodes in the future. with all Storage OEMs. please refer Corrigendum-03.
Storage Nodes Performance: 20GBps Read/Write from day one and scalable up to 60GBps with a
Data Availability: 99.9999% data availability guarantee on proposed Storage model duly certified by Storage OEM. Data Availability Guarantee of Six-Nines (99.9999%) is practically must for this critical infrastructure.
scale-out architecture and additional controllers/nodes in the future.
External Storage
Minimum Technical Specifications Vendor shall ensure that concurrent failure of at-least 4 disks can be handled without any kind of downtime and vendor shall configure the As per RFP and time to time published
104 In case NFS based Storage is asked Required feature for giving different better resilance and performance
Storage Nodes erasure code accordingly corrigendum
New Clause to be incorporated

1. Offered storage system shall be able to create native immutable snapshots for offered solution.

External Storage 2. Offered storage system shall have capability for creating the immutable snapshot copies at both primary and DR location through
Minimum Technical Specifications As per RFP and time to time published
105 In case NFS based Storage is asked replication engine shall provide the flexibility for having different retention period for each location. Required feature for giving better prtoction to storage
Storage Nodes corrigendum
New Clause to be incorporated .
3. After defining the expiration period, it shall not be possible to reduce the expiration time. However, if the business need arise, then
expiration period shall be shortened only through Dual authorization and through different set of authorized users only.

1. Each offered file services front-end controller shall have minimum of 256GB memory and minimum 32 number of CPU cores.
External Storage
Minimum Technical Specifications 2. Each front-end controller shall also be offered with 2 x 100Gbps Ethernet Front-end ports and shall also have 2 x 100Gbps backend As per RFP and time to time published
106 In case NFS based Storage is asked Minimum hardware to be proposed so that everyone proposes enough resources for performance.
Storage Nodes ports for disk connectivity. corrigendum
New Clause to be incorporated
3. Every front-end controller shall have dual physical CPUs.

1. Offered Storage platform shall support NFS nconnect feature for increasing the NFS performance. It shall allow at-least 16 x TCP
connections between each client and storage platform.

2. Offered storage platform shall support NFS over RDMA and multi-pathing feature for increasing the NFS performance while connecting
External Storage
Minimum Technical Specifications the client to storage system. As per RFP and time to time published
107 In case NFS based Storage is asked Some advanced NFS features.
Storage Nodes corrigendum
New Clause to be incorporated
3. Multi-pathing shall be able to work in conjunction with nconnect and NFS over RDMA.

4. Offered Storage platform shall also support byte range file locking for both NFS V 3.x and NFS 4.1.

1. Offered storage system shall provide the functionality of disaster recovery by replicating the required path or directory to DR or peer
location.
External Storage
Minimum Technical Specifications 2. Offered storage system shall ensure that data path between Primary and DR location is encrypted. Vendor shall offer required Software As per RFP and time to time published
108 In case NFS based Storage is asked Disaster Recovery capabilities
Storage Nodes / License or hardware for achieving the required functionality. corrigendum
New Clause to be incorporated

3. Offered storage system shall support one to many and many to one replication so that one site can replicate to more than 1 DR site or
replication peer and multiple Primary sites can replicate to single DR location.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
1. Offered storage system replication analytics engine shall have capability to showcase the overall RPO between sites and shall also
External Storage showcase the RPO miss report so that required bandwidth, if required, planning can be done.
Minimum Technical Specifications As per RFP and time to time published
109 In case NFS based Storage is asked Disaster Recovery capabilities
Storage Nodes corrigendum
New Clause to be incorporated 2. Offered storage system replication shall also show the overall data transfer over the replication connection so that, if required,
bandwidth planning can be done for achieving the required RPO.
1. Offered storage system shall support Quality of service natively and shall be able to assign IOPS and required bandwidth for a given
protected path / share.
External Storage
Minimum Technical Specifications As per RFP and time to time published
110 In case NFS based Storage is asked 2. It shall be possible to enable the quality of services within each provisioned tenant. QOS features
Storage Nodes corrigendum
New Clause to be incorporated
3. Offered quality of service functionality shall work in conjunction with quota management so that it can also be assigned for used
capacity and provisioned capacity.
1. Offered storage platform shall support and enabled with 256-bit-AES-XTS encryption and shall support both internal and external key
management
External Storage
Minimum Technical Specifications As per RFP and time to time published
111 In case NFS based Storage is asked Encryption Features
Storage Nodes 2. Encryption object module shall be FIPS 140-2 validated. corrigendum
New Clause to be incorporated
3. Offered storage system shall also be supplied with FIPS enabled drives.

Hypervisor Integration with AI Solutions


• The Hypervisor should support integration with AI solutions to enable seamless building, deployment, and management of AI workloads.
• It should leverage both virtualization and containerization technologies and be officially supported and certified by the AI solution
provider.

Hypervisor Management Tools


• The proposed solution should include robust hypervisor management tools that simplify the deployment, scaling, and operational
management of AI workloads, thereby reducing operational complexity.

GPU Acceleration Support


• The hypervisor should support GPU acceleration to allow efficient utilization of GPU resources during AI training and inference.
112 Hypervisor As per RFP and time to time published corrigendum
• This will ensure high performance and scalability of AI applications.

Comprehensive GPU Reporting Capabilities


• The solution should have the capability to generate comprehensive reports for GPU usage, performance metrics, compliance status,
health monitoring, forecasting, and capacity planning across AI workloads.

High Availability (HA) Support


• The system should support High Availability for VM migration in case of a physical server failure. All virtual machines on the failed server
should be capable of migrating automatically to another physical server running the same virtualization software.
• Additionally, the solution should support HA for VMs utilizing passthrough PCIe devices or NVIDIA/other vendor vGPUs.

1. The Hypervisor should support integrating with AI solutions to help build, deploy and manage AI workloads, leveraging the benefits
of Virtualization and containerization. The Hypervisor should be supported and certified by AI Solution.

2. Solution should include hypervisor management tools, simplify the deployment, management and scaling of AI workloads, reducing
operation complexity.

3. Hypervisor should support GPU acceleration, enabling efficient utilization of GPU resources for AI training and inference, ensuring
high performance and scalability. As per RFP and time to time published
113 Hypervisor
corrigendum
4. The solution should provide capability of generating reports for GPU usage, performance, compliance, health, forecasting, capacity,
across AI workload.

5. Should support HA for migration of VMs in case one server fails all the Virtual machines running on that server shall be able to
migrate to another physical server running same virtualization software. Should support HA for VMs with a passthrough PCIe device or a
NVIDIA / other vGPUs.

The OEM should have executed similar GPU setup for min 3 clients in last 5 Years
in India as on date of bid submission. Out of which One client deployment should The OEM should have executed similar GPU setup for min 3 clients in last 5 Years in India as on date of bid submission. Out of which One
be One project having similar works total client deployment should be One project having similar works totalvalue of INR 20 Cr. or Total of 50 Cr from 2 Customers. As per RFP and time to time published
114 Eligibility Conditions This is required for wider participation and considering the budgeting of this project.
value of INR 125 Cr. Note: Similar works means SITC of Project which includes GPU corrigendum
Note: Similar works means SITC OF GPU ACCELERATED with multiple GPU Node.
ACCELERATED with multiple GPU Node.
Master Node
Hardware RAID 0,1, 5, 6, 10, 50, 60 with 4GB cache Flash based cache protection Hardware RAID 0,1, 5, 6, 10, 50, 60 with 4GB cache Flash based cache protection module should be included, should support Gen 4/5.0 Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
115 Storage Controller
module should be included, should support Gen 5.0 PCIe NVMe PCIe. restricting us. corrigendum.
Following N/W are required:
Following N/W are required:
a) Infiniband / Ethernet (200GBps or higher) as required for quoted storage Ethernet connectivity provides better throughput and performance , also ethernet is widely used and
a) Ethernet (200GBps or higher) as required for quoted storage delivery to nodes
delivery to nodes available with all leading OEMs while Infiniband is less used and available with some specific OEMs only so
Master Node b) Ethernet (100Gbps or higher) for User delivery Clarification:
116 b) Ethernet (100Gbps or higher) for User delivery please revise accordingly.
Network c) Ethernet (10GbE or higher) for cluster orchestration Ethernet connectivity is already allowed.
c) Ethernet (10GbE or higher) for cluster orchestration
d) Ethernet (10GbE or higher) for perimeter connectivity
d) Ethernet (10GbE or higher) for perimeter connectivity 1G connectivity is mostly required for oob connectivity and inband is used case for telemetry data so please
e)Ethernet (1GbE or higher) for in-band/ oob management
e) Ethernet (1GbE or higher) for in-band management revise for wider OEM participation
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries

Dedicated IPMI 2.0 compliant management LAN port having support for system health monitoring, event log access, Virtual media over
net- work, and Virtual KVM (KVM over IP). All required licenses to use IPMI features should be included. Licenses shall be
perpetual/subscription base for entire contract period to use.

Server management software should also provide below capabilities:

•The management tool should be able to provide global resource pooling and policy management to enable policy-based automation and
capacity planning with Zero-touch repository manager and self-updating firmware system, Automated hardware configuration and
Operating System deployment to multiple servers
Dedicated IPMI 2.0 compliant management LAN port having support for system
Servers ( specially like master nodes) play critical role in the Data centers as multiple applications dependo
Master Node health monitoring, event log access, Virtual media over net- work, and Virtual * Virtual IO management / stateless computing and Server management software should provide capaibility to view health, inventory for As per RFP and time to time published
117 on it and external storage also connects to it due to this end to end server management sofwtare with
Server Management KVM (KVM over IP). All required licenses to use IPMI features should be included. third-party compute, network, storage, integrated systems, virtualization, and containers. corrigendum.
mentioned features would be needed so please revise the clause acordingly
Licenses shall be perpetual/subscription base for entire contract period to use.
* The management software should participate in server provisioning, device discovery, inventory, diagnostics, monitoring, fault
detection, auditing, and statistics collection and should provide an alert in case the system is not part of OEM Hardware Compatibility list
& should provide anti counterfeit.
*The proposed management solution should provide proactive security & software advisory alerts and should outline the fixes required to
address the issues and analyze current configurations & identify potential issues due to driver & firmware incompatibility

* The proposed solution should have customizable dashboard to show overall faults / health / inventory for all managed infrastructure.
With option to create unique dashboards for individual users. The user should have flexibility to select names for dashboards and widgets
(ex:- health, utilization etc.)

ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled
min ACPI 6.2 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled within the BIOS for secure cryptographic key generation, Every server architecture on different compliances get certified jointly by OEM and third parties due to that As per RFP and time to time published
118 Security Features within the BIOS for secure cryptographic key generation, SMBIOS 3.5 or later,
SMBIOS 3.5 or later, Malicious Code Free design" (to be certified by OEM). please revise with mentioned so that every OEM can participate corrigendum
Malicious Code Free design" (to be certified by OEM).

Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc scalable As per RFP and time to time published
Min Dual 60 core latest Gen Intel® Xeon® platinum (5th Gen or higher) or AMD Epyc (Turin or higher) scalable processors , with Min 8 X
AI Training processors, with Min 8 X GPU Accelerators. Providing 500TF or Higher Double corrigendum.
119 GPU Accelerators. providing Defining processors's generation will standardize the type of processors offered by every OEM for equal
Processors & performance (per node, minimum) Precision Tensor FP64 / TF64 Performance, 31 PetaFlops or Higher FP8 However, Bidder can quote higher side compute.
500TF or Higher Double Precision Tensor FP64 / TF64 Performance, 31 PetaFlops or Higher FP8 performance with sparsity. participation also considering Training nodes require high compute performance so cores needs to be
performance with sparsity.
udpated as per processor's generation.So, please revise accordingly.
a)Minimum 8 nos of InfiniBand NDR ports or Ethernet (400Gb/s or higher) for
compute communication for internode communication,
b)1 nos. of port for BMC (dedicated LAN port), a)Minimum 8 nos of SuperNIC’s with minimum 8 arm cores capable of Suppor ng Infiband / Ethernet (400Gb/s or higher) for compute
c)Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or higher (Fiber/Copper) communication for internode communication GPU training nodes in AI architecture requires SuperNICs for east-west communication across the nodes
port. b)1 nos. of port for BMC (dedicated LAN port), and DPUs are reqired for North-South communication.
AI Training d)Required InfiniBand / 200G or higher Ethernet 2 x twin-port HCA as required c)Minimum 2 nos of 10 GbE or higher (Fiber/Copper) port. Also RoCEv2 is very important & must needed critical protocol used for GPU to GPU communication across As per RFP and time to time published
120
Network for quoted storage delivery to node. d)Required 200G or higher DPU’s with 2 x twin-port 200G as required for quoted storage delivery to node network switches which along with PFC, ECN make sures to provide lossless , low latency , high bandwidth corrigendum
e)Addi onally, 1 nos of 100GbE or higher Ethernet (Fibre). e)Required switch with 64 non-blocking ports with aggregate data throughput up to 51.2 Tb/s and required compatible cables of communication, So please update this point accordingly and allow wider OEM participation
f)Required switch with 64 non-blocking ports with aggregate data throughput up appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet of all nodes in non-blocking mode. The
to 51.2 Tb/s and required compatible cables of appropriate length to connect all 8 switch should provide RoCE v2 / equivalent, PFC, ECN, Telemetry capabilities to run the setup
(compute communication) nos. of IB NDR ports / Ethernet of all nodes in
non-blocking mode.
AI Training • For Operating System: minimum 1.92 TB NVMe drives • For Operating System: minimum 2x 960 GB M.2 SSD Boot drives Every OEM's architecture as per their testing and availability of drives is different so please update this
121 please refer Corrigendum-03.
Internal Storage • For Data: Minimum 8 * 3.84 TB U.2 or EDSFF NVMe drives • For Data: Minimum 8 * 3.84 TB U.2 Gen5 NVMe drives clause for us to participate.

AI Training Appropriate rated and energy efficient, redundant (N+N) hot swappable power Appropriate rated and energy efficient,redundant (N+1) hot swappable power supply and FANs. In case of power failure, the system should High availabilty and tolerance to powersupply is very important for crtical server having large datasets
122 please refer corrigendum-03.
Power requirements supply and FAN be able to sustain 3 power supplies failure with GPU throttling no less than 60% dependent on them for training so please update the clause for allowing us to participate

Following N/W are required:


Following N/W are required:
1. SuperNIC’s with(400GBps or higher) for compute communication GPU training nodes in AI architecture requires SuperNICs for east-west communication across the nodes
1.NDR Infiniband / Ethernet (400GBps or higher) for compute communica on
AI Training 2. DPU’s (2x 200Gbps or higher) for storage delivery and DPUs are reqired for North-South communication and also for storage communication. So , please As per RFP and time to time published
123 2.Infiniband /Ethernet (200GBps or higher) for storage delivery
System Network 3.DPU’s ( 2x200 Gbps) for cluster orchestration revise the point to standardize the AI computing node connectivity and communication architecture and corrigendum
3.Ethernet (Min 10 GbE or higher) for cluster orchestra on
4.Ethernet (10 GbE or higher) for perimeter connectivity allowing every OEM to participate equally
4.Ethernet (10 GbE or higher) for perimeter connec vity
5.Ethernet (1GbE) for in/oob-band management
5.Ethernet (1GbE) for in-band management

Rack Servers should be certified by GPU Controller / Accelerator OEM, the


AI Training As per RFP and time to time publish
124 Certificate or listing of offered Server model in GPU Controller / Accelerator OEM please generalise this clause for wider OEM participation as it is restrcting us to participate
Certification Corrigendum.
website must be submitted along with bid.
Rack Servers should be certified by GPU Controller / Accelerator OEM, the Certificate or listing of offered Server model in GPU Controller /
Accelerator OEM website or undertaking to be published by time of award must be submitted along with bid.

Bidder needs to submit proof of the quoted GPU meeting these Mlcommons
training benchmarks at the time of bidding
Bidder needs to submit proof of the quoted GPU meeting these Mlcommons training benchmarks at the time of bidding
or
or Since AI training models are quite new to the market and their benchmarks are getting updated during Please refer to RFP and time to time published
125 If not listed on Mlcommons, bidder shall be required to submit
If not listed on Mlcommons, bidder shall be required to submit GPU Accelerator’s test results for the Make/Model (same GPU’s ) of the course of time so please generalise this clause for wider OEM participation as it is restrcting us to participate Corrigendum.
benchmark report for the Make/Model (same configura on) of the
quoted server as part of bid submission.
quoted server as part of bid submission. The submission should be on
OEM letterhead duly signed and referring the bidder and bid details.

Offered Nodes should be listed under ML Commons Training (4.0 or


higher) for the mentioned Benchmarks, supporting published link to be shared
during bid submission. Offered Nodes should be listed under ML Commons Training (4.0 or
Or higher) for the mentioned Benchmarks, supporting published link to be shared during bid submission.
AI Training If not listed on Mlcommons, bidder shall be required to submit benchmark report Or
Benchmarks for the Make/Model (same configuration) of the quoted server as part of bid If not listed on Mlcommons, bidder shall be required to submit
submission. The submission should be on OEM letterhead duly signed and undertaking to be provided that quoted systems Make/Model is certified by the GPU accelerator OEM The submission should be on OEM
referring the bidder and bid details. letterhead duly signed and referring the bidder and bid details.
Specifications: Specifications:
a)BERT - 5.3 minutes or less on single node a)BERT - 5.3 minutes or less on single node Since AI training models are quite new to the market and their benchmarks are getting updated during Please refer to RFP and time to time published
126
b)DLRM-dcnv2 – 3.6 minutes or less on single node b)DLRM-dcnv2 – 3.6 minutes or less on single node course of time so please generalise this clause for wider OEM participation as it is restrcting us to participate Corrigendum.
c)GNN – 7.8 minutes or less on single node c)GNN – 7.8 minutes or less on single node
d)Llama 2 70B –24.7 minutes or less on single node d)Llama 2 70B –24.7 minutes or less on single node
e)ResNet – 12.1 minutes or less on single node e)ResNet – 12.1 minutes or less on single node
f)Re naNet – 34.3 minutes or less on single node f)Re naNet – 34.3 minutes or less on single node
g)Stable Diffusion – 41.4 minutes or less on single node U-Net3D – 11.6 minutes g)Stable Diffusion – 41.4 minutes or less on single node U-Net3D – 11.6 minutes or less on single node
or less on single node
Up to 25% tolerance shall be accepted on aforementioned benchmarks during POC
Up to 25% tolerance shall be accepted on aforementioned benchmarks during
POC.
#

Responses to pre-bid queries and Revised RFP (Corrigendum-3 dated 16.05.2025)


Procurement of GPU Compute solution for Gujarat State Data Center, Gandhinagar (GeM No. GEM/2025/B/5824041 dated 16.01.2025)
Sr# Reference (Clause /page) Points of Clarification Requested Suggestion/Clarification Justification Responses to pre-bid queries
As per RFP and time to time published
corrigendum.
However, Bidder may propose 2 or higher
accelerator card per node (In inference Node)
2 x Accelerators per node, each with minimum 140GB or higher GPU per Type of GPUs are defined as per the different models/architecture of different Server OEMs so equivalent and numbers of inference node quantity may be
Inference Node 2 x Accelerators per node, each with minimum 90 GB or higher GPU per Accelerator.
127 Accelerator.Should support Tensor core/Matrix core, CUDA / Stream Processors/ type of server having respective GPU needs to be updated to allow all OEMs to participate so request you to change accordingly. However, total requirement
Number of GPUs and GPU Communication Should support Tensor core/Matrix core, CUDA / Stream Processors/ openCL /ROCm with Accelerators.
openCL /ROCm with Accelerators. please as it is restrcting to participate of core and memory may be provided a s
mentioned in RFP and time to time published
corrigendum.

Inference Node For Operating System: minimum 2*1.92 TB M.2 NVMe drives Minimum 4 * 3.84 Request you to please update the clause for wider OEM participation as different server OEMs has different
128 For Operating System: minimum 2*1.92 TB NVMe drives Minimum 4 * 3.84 TB U.2 /U.3 or EDSFF NVMe drives please refer Corrigendum-03.
Internal Storage TB U.2 or EDSFF NVMe drives types of drives certified and tested.

ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled
Inference Node Min ACPI 6.2 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled within the BIOS for secure cryptographic key generation, Every server architecture on different compliances get certified jointly by OEM and third parties due to that As per RFP and time to time published
129 within the BIOS for secure cryptographic key generation, SMBIOS 3.5 or later,
Security Features SMBIOS 3.5 or later, Malicious Code Free design" (to be certified by OEM). please revise with mentioned so that every OEM can participate corrigendum
Malicious Code Free design" (to be certified by OEM)

Inference Node 4 x PCIe Gen 5.0 x 16 FH FL Slots. All slots must operate at PCI Gen 5.0 speed when Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
130 4 x PCIe Gen 5.0 x 16 FH FL Slots. All slots must operate at PCI Gen 4.0/5.0 speed when fully populated
PCI Express interface fully populated restricting us. corrigendum

Inference Node Appropriate Motherboard and chipset. Must support PCIe Gen 5.0 and Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
131 Appropriate Motherboard and chipset. Must support PCIe Gen 4.0/5.0 and compatible with selected processors and GPUs.
Mother Board compatible with selected processors and GPUs. restricting us. corrigendum

1.Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G
QSFP ports to connect all of “Master Node”, “Node for AI Training” and
“Inference Node” to form Cluster Communication and Perimeter N/W.
Switch must support of MLAG /MCLAG feature.
Switch must support EVPN – VxLAN based network. Required cables of
appropriate length, transceivers should be supplied. Switches(s) should
have redundant Power Supply. 5 Years Comprehensive Onsite Warranty
2.Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28
Inference Node and 2 x 100G QSFP ports to connect all of “Master Node”, “Node for AI = Please clarify this switch’s usage as if this switch is required for master node connectivity also then it should have 400G switch ports also As per RFP and time to time published
132 need clarification
Networking Switch Training” and “Inference Node” to form In-band and BMC/Out-of-band in it corrigendum
Management N/W. Required cables of appropriate length, transceivers
should be supplied. Switch should have redundant Power Supply. 5
Years Comprehensive Onsite Warranty.
3.Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or
One Nos. of Switch with 64* 100GbE QSFP ports to connect all of
“Master Node”, “Node for AI Training” and “Inference Node” to form
User N/W. Switch must support of MLAG /MCLAG feature.

Image segmentation (medical) 3D-Unet-99


Throughput for single NODE inference (99% offline) = 09 Samples/s or higher
NLP
Bert-99
Throughput for single NODE inference (99% offline) = 10000 Samples/s or higher
Bidder needs to submit proof of the quoted GPUs being listed in
Recommendation dlrm-v2-99 Bidder needs to submit proof of the quoted GPUs being listed in MLperf inferencing benchmarks at the time of bid submission.
MLperf inferencing benchmarks at the time of bid submission.
Throughput for single NODE inference (99% offline) = 85000 Samples/s or higher
Or
LLM Summarization gptj-99
Since AI models are quite new to the market and their benchmarks are getting updated during course of Please refer to RFP and time to time published
133 Throughput for single NODE inference (99% offline) = 2500 Tokens/s or higher Or
If not listed on MLperf, bidder shall be required to submit time so please generalise this clause for wider OEM participation as it is restrcting us to participate Corrigendum.
Image Classification ResNet
benchmark report for the Make/Model(same configura on) of
Throughput for single NODE inference (99% offline) = 105000 Samples /s or higher If not listed on MLperf, bidder shall be required to submit GPU Accelerator’s test results for the Make/ Model (same GPU’s ) of the quoted
the quoted should be on OEM letterhead duly signed and
Object Detection RetinaNet server as part of bid submission.
referring the bidder and bid details.
Throughput for single NODE inference (99% offline) = 1500 Samples /s or higher
Image Generation Stable Diffusion-XL
Throughput for single NODE inference (99% offline) = 1 Samples/s or higher
Up to 25% tolerance shall be accepted on aforementioned benchmarks during
POC.

Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc
Kindly help to amend the clause as "Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc
scalable processors, with Min 8 X GPU Accelerators. providing
scalable processors, with Min 8 X GPU Accelerators. providing 500TF or Higher Double Precision Tensor FP32 / TF32 Performance, 31 Suggested changes helps to participate in the bid process and this would give the department to make the As per RFP and time to time published
134 Nodes for AI training: Total Qty-4 sets. Page no.25 500TF or Higher Double Precision Tensor FP64 / TF64 Performance, 31 PetaFlops
PetaFlops or Higher FP8 performance with sparsity" bid more competitive and also help to participate respective other OEM’s in the bid process. corrigendum.
or Higher FP8 performance with sparsity
Revised RFP (Corrigendum-03)
Bid for GPU Compute for uses AI / ML at GSDC

1. Eligibility Conditions:

Sr. Specific Requirement Documents required


No.
1. The bidder should be a company registered in India ● Certificate of Incorporation
under the Companies Act 1956, Act 2013 or a ● Memorandum and
partnership registered under the India Partnership Article of
Act 1932, or a Partnership firm registered under the association
Limited Liability Partnership Act 2008 with their ● Registered Partnership Deed
registered office in India in operation for the last ● Copy of PAN card
three years ● Copies of relevant GST
registration certificates.
2. The bidder should have average Minimum Annual ● Audited profit and loss
Turnover of Rs. 25 crores in 3 years out of last 5 statement and balance sheet
financial years from the last date of bid submission ● Auditor certificate clearly
with positive net worth. specifying the turnover and
positive net worth.
3 The OEM should have average Annual Turnover ● Audited profit and loss
of minimum Rs. 250 crores for the last five financial statement and balance sheet
years from the last date of bid submission with ● Auditor certificate clearly
positive net worth. specifying the turnover and
In case a Make in India OEM participates directly positive net worth.
in this bid as the sole bidder, the OEM-specific
eligibility criteria shall not be applicable. In such
cases, only the bidder ’ s turnover, relevant
experience, and past performance shall be
evaluated for qualification for Make in India
OEM. The Make in India OEM must, however,
comply with all technical and commercial
requirements as specified in the RFP.
3.1 The Bidder Should have technical support center
in Ahmedabad / Gandhinagar, Gujarat. If the The Bidder should submit valid
bidder is not having any technical support center in Proof (or)
Ahmedabad / Gandhinagar, Gujarat, then bidder Bidder should submit Self-
should submit a letter of undertaking to open the declaration duly Signed and
office in Gujarat within 30 days from the date of stamped by the authorized
issue of work order if (s) he is awarded the work Signatory in format described in
RFP.
4. The bidder should have experience in setting up
GPU or CPU core that meet the following criteria Copy of Work Order along with
from any central or state Government / PSU/ Completion / Go-Live certificate.
Listed Company/BFSI sector in India or Global
experience in Tier-04 Datacentre within the last For Global experience bidder has
five years as of the bid submission deadline. to submit client’s tier-04 datacenter
� One project having total value of INR 40 Cr certificate along with client official
(Should have min 45 GPU / 800 core) or mail id, which should be reflect on
client official website for due
� Two project having total value of INR 25 Cr verification. Copy of Work Order
(Should have min 28 GPU / 500 core) or along with Completion / Go-Live
� Three project having total value of INR 20 Cr certificate.
(Should have min 22 GPU / 400 core)
Note:
GPU Experience means GPU Installation in
multiple server Nodes.
CPU core experience means cores installed in
Server Nodes.
4.1 The bidder should have experience of set up of Copy of Work Order along with
GPU server base solution with Cumulative 15 Completion / Go-Live certificate.
nos. of GPUs in last 5 Years in India.
5. The OEM should have executed similar GPU Copy of Work Order along with
setup for min 3 clients from any central or state Completion / Go-Live certificate.
Government / PSU/ Listed Company/BFSI sector
in India or Global experience in Tier-04
Datacentre in last 5 Years as on date of bid For Global experience bidder has
submission. Out of which One client deployment to submit client’s tier-04 datacenter
should be One project having similar works total certificate along with client official
value of INR 125 Cr. mail id, which should be reflect on
Note: Similar works means SITC OF GPU client official website for due
ACCELERATED with multiple GPU Node. verification. Copy of Work Order
In case a Make in India OEM participates directly along with Completion / Go-Live
in this bid as the sole bidder, the OEM-specific certificate.
eligibility criteria shall not be applicable. In such
cases, only the bidder ’ s turnover, relevant
experience, and past performance shall be
evaluated for qualification for Make in India OEM.
The Make in India OEM must, however, comply
with all technical and commercial requirements
as specified in the RFP.
6. The bidder should provide the authorization In Case of SI, should submit
certificate from the OEM for Manufacturer Authorization Form.
a. Quoting the requirement and subsequent
support for Hardware and Software (and) In Case of OEM, Letter of
b. Proposed GPUs solution will not be End of Declaration on their letter head
Life (EOL) for 5 years from the date of
installation
7. Neither OEM nor bidder should be blacklisted from Certificate of Undertaking for Non-
supplying equipment to any blacklisting from supplying
Government/PSU/BFSI within India in the past. equipment to any
Government/PSU/BFSI within India
in the past.
8. A Power of Attorney / Board Resolution in the Original Power of Attorney / Board
name of the person signing the bid document. Resolution Copy on a non-judicial
stamp paper.
1. All details and the supportive documents for the above should be uploaded in the GeM
bid. Bidder has to submit OEM MAF for Proposed hardware, Software and license part.
2. Bidder’s experience, bidder’s turn over criteria will not be considered of GeM bid.
However, bidder must match eligibility criteria, experience, bidder’s turn over criteria, etc.
as mentioned above (& in this document) and will be considered for evaluation. EMD
and PBG should submitted by bidder as per GEM.
3. Bid Evaluation Method – Lowest Price L1 based on Technical evaluation and / or PoC
(Proof of Concept) Testing.
4. Bidder has to submit end-to-end OEM make and model details of proposed solution at
the time of bid submission for further evaluation.

1.1 Criteria for Bid Evaluation

A three-stage procedure will be adopted for evaluation of proposals as follows:


1) Pre- Qualification or Eligibility Condition
2) PoC (Proof of Concept) Testing of benchmark mentioned in this RFP for who comply in
Pre- Qualification or Eligibility Criteria
3) Financial bid opening for who comply in PoC (of Bench mark test mentioned in this RFP)

1.2 Technical Presentation cum Proof-of-Concept (PoC)


Technical Bids of only those Bidder / OEMs who meet the “Pre-Qualification or Eligibility
Condition” criteria shall be considered for further Technical Evaluation.
The bidders who qualify for the pre-qualification shall be invited to demonstrate their
proposed solution through POC to the Authorities of Tender Evaluation Panel (and
potentially other representatives of the Authorities) and GSDC. The Demo will be conducted
alphabetically of Bidder name. Timeline for POC as following;

Sr. POC Timeline Work Type Remarks


NO.
1 T0 Date of intimation to bidder for -
Delivery of minimum proposed
solution for POC to pre-
qualified/eligible bidders
2 T1=T0+20 Days Delivery of minimum proposed If the OEM / bidder fails to
solution to achieve mentioned deliver proposed
Benchmark (in RFP) by each minimum solution within
Bidder (In case of Complete end 15 days, they will be
to end solution from the same considered disqualified.
OEM of multiple bidders, OEM
wise POC will be conducted).
3 T2 Intimation for PoC demonstration -
schedule to bidders
3 T3=T2+20 Days Installation, commissioning and If the PoC is not
completion of POC testing of completed within 20 days
minimum proposed solution to from the POC request
achieve mentioned benchmark date, the Bidders/OEM
(in RFP) will be disqualified.

� If multiple bidders have proposed the same complete End to End OEM solution, the
result of that PoC shall apply to all such bidders.
� In the event proposed solution of bidder fails in the PoC, bidders quoting this solution will
be considered disqualified.
� Bidders must arrange and deploy all necessary infrastructure and accessories, including
hardware, software, cables, racks, operating systems, and any other items required to
conduct the PoC.
� GSDC shall provide only space, power, and cooling. All other logistical and PoC execution-
related costs shall be borne entirely by the bidder/OEM.
� Upon completion of the PoC, the bidder must lift all deployed equipment from GSDC
premises at their own cost.
� Bidders must demonstrate the required benchmarks as specified in the RFP during the PoC.
� Only those bidders who successfully qualify the PoC will be eligible for financial bid opening.
� The Tenderer reserves the right to accept or reject any or all bids in case it is not satisfied
with the outcome of the PoC testing & benchmark required.
� The Tenderer reserves the right to accept or reject any or all bids or to re-tender at the
Tenderer’s sole discretion without assigning any reasons to anybody whatsoever.

2. Clarification on Bidding Documents

� A prospective Bidder requiring any clarification of the bidding documents may seek
clarifications by submitting queries on email Id: [email protected],
[email protected] prior to the date of Pre-Bid Meeting.

� Tenderer will discuss the queries received from the interested bidders in the Pre-Bid
Meeting and respond the clarifications by uploading on the website
https://gil.gujarat.gov.in.

� No further or new clarification what so ever shall be entertained after the Pre-Bid
Meeting.

� The interested bidder should send the queries as per the following format:

Bidder’s Request For Clarification


Name of Organization Name & position of Address of organization
submitting person including phone, fax,
Request submitting request: email points of contact
Sr. Bidding Document Content of RFP requiring Points of Clarification
No. Reference (Clause Clarification required
1 /page)
2
3

3. Scope of Work:

1. GPU Servers shall be supplied, installed, configured, tested and commissioned along with
necessary software’s, OS’s and license’s at GSDC located at Gandhinagar, Gujarat.
2. Bidder has to deploy propose solution for inference and AI training model.
3. All software and library licenses to be provided in the name of DST/ DIT, Government of
Gujarat.
4. Gujarat GPU Compute for AI / ML solution must have rack mounted computing platform-
based computer servers, either as rack or blade server design housed in its suitable chassis.
5. The proposed solution should support for sharing of GPU across multiple virtual
environments and containers. Required license should be available from day one. Bidder to
ensure premium level or highest level of OEM support to meet SLA for all OEM provided
software and libraries.
6. MLops practices and principles should be followed under training model. If required, Bidder
can use appropriate tool for the same without any additional cost to tenderer.
7. The bidder shall submit the detailed documentation on the implementation and deployment.
8. The solution should support remote console access as per GSDC policy to all the servers
for cluster server's health monitoring at Fast Ethernet or better access speed.
9. The servers/chassis/enclosures should be populated fully with N+1 redundant power
supplies of the suitable capacity rating available for the proposed model with the supplier.
Failure of one of the Power supplies should not throttle the Compute nodes. In case the
offered Power Supplies cannot take the HPL load of all the Compute Nodes in the chassis,
lower number of Compute Nodes per chassis may be proposed.
10.The bidder will have to supply Server Rack along with provision of iPDU, TOR Switch, patch
panel, cables, SFP modules, any other active/passive components etc. to host the GPU
Cluster with GPUs at GSDC. Any other component required for the solution proposed by the
supplier has to be incorporated for completion of the Solution.
11.Onsite comprehensive annual maintenance with warranty and OEM support for 5 Years from
the date of completion of Functional Acceptance Test (Onsite warranty will include those
sites to which the item supplied under the contract is moved, in case of migration of the
equipment). Warranty should include but not limited to - On-going Firmware updates,
Proactive bug fixes, Preventive Maintenance, Parts replacement, etc.
12.After completing the installation and integration, the bidder will demonstrate the compliance
of the RFP and provide required training to the GSDC /TPA for executing FAT and further
Operation.
13.All the items as required under this RFP should be delivered in a single lot.
14.The bidder shall be fully responsible for the manufacturer’s warranty for all equipment,
accessories, spare parts etc. against any defects arising from design, material,
manufacturing, workmanship, or any act or omission of the manufacturer / bidder or any
defect that may develop under normal use of supplied equipment during the warranty period.
15.The bidder shall replace the faulty hard disk at no cost, the department will not returned the
faulty disk after replacement of new disk.
16.The bidder should provide entire support of the required solution asked in the RFP and back-
to-back support from the OEM.
17.The bidder should provide Support/ Escalation Matrix & Portal details for logging tickets for
any failure/performance incidents. Also there has to be mechanism wherein all licenses to
be showcased on the portal.

Manpower for Hand Holding Support

� Successful bidder will have to depute 2 (two) technical manpower as below to provide hand
holding support for the contract period.
1. System Administrator

o Total 5+ years of experience


o Proficiency in Linux (e.g., Ubuntu, CentOS, RHEL)
o Familiarity with cluster management tools like SLURM, Kubernetes
o Understanding of high-speed interconnects - InfiniBand, Ethernet
o Experienced in configuring network topologies for low-latency, high-
throughput AI workloads.
o Knowledge of GPUs (e.g., NVIDIA A100, H100), accelerators, and their
deployment
o Awareness of storage technologies and their AI workload implications (e.g.,
NVMe, SSDs, and parallel storage).
o Experience with configuration management tools like Ansible

2. AI / ML Deployment engineer

o Proficiency in Python and corresponding AI libraries – NumPy, SciKit, Pandas,


CUDA Python, CuGraph, CuML etc
o Experience with containerization (Docker) and orchestration (Kubernetes)
tools
o Hands-on experience with AI training frameworks like TensorFlow, PyTorch
and deployment frameworks like Triton
o Familiarity with deploying and scaling large language models (e.g., pre-
training, fine-tuning, serving & inference pipelines).
o Proficiency in data preprocessing, feature engineering, and handling large-
scale datasets
o Experience implementing MLOps pipelines for automating model lifecycle
management
o Experience with cloud services and APIs

� The deputed manpower will have to remain present during normal office hours of GSDC (9 AM
to 7 PM) during working days and support GSDC for day-to-day maintenance and handling
effective GPU infrastructure utilization.
� If require, the manpower will have to remain present on holyday(s) or after office hours based
on the requirements of GSDC.
� The bidder shall have to provide backup resources in case of the deputed manpower is absent
or on leave. The backup resource deputed shall be aware of the tasks and responsibility being
carried out during that period at GSDC and should be able to execute the tasks with minimum
on-call support.
� The manpower will have to report to GSDC authority. The bidder shall submit proof of
attendance certified by the GSDC authority along with the Invoice for payment process.

4. Warranty Support: As part of the warranty services bidder shall provide:

I. Bidder shall provide a comprehensive on‐site free warranty for 5 years from the date of
acceptance of FAT (Final Acceptance Test) for proposed solution.
II. Bidder shall also obtain the 5 years OEM support (ATS/AMC) on all hardware and other
equipment for providing OEM support during the warranty period.
III. Bidder shall provide the comprehensive manufacturer's warranty and support in respect of
proper design, quality and workmanship of all hardware, equipment, Software, Licenses,
accessories etc. covered by the bid. Bidder must warrant all hardware, equipment,
accessories, spare parts, software etc. procured and implemented as per this bid against
any manufacturing defects during the warranty period.
IV. Bidder shall provide the performance warranty in respect of performance of the installed
hardware and software to meet the performance requirements and service levels in the bid.
V. Bidder is responsible for sizing and procuring the necessary hardware and software licenses
as per the performance requirements provided in the bid. During the warranty period bidder,
shall replace or augment or procure higher‐level new equipment or additional licenses at no
additional cost in case the procured hardware or software is not adequate to meet the service
levels.
VI. Mean Time between Failures (MTBF): If during contract period, any equipment has a
hardware failure on four or more occasions in a period of less than three months, it shall be
replaced by equivalent or higher‐level new equipment by the bidder at no cost. For any delay
in making available the replacement and repaired equipment’s for inspection, delivery of
equipment’s or for commissioning of the systems or for acceptance tests / checks on per
site basis, DST/GIL/DIT reserves the right to charge a penalty.
VII. During the warranty period bidder, shall maintain the systems and repair / replace at the
installed site, at no charge, all defective components that are brought to the bidder notice.
VIII. The bidder shall as far as possible repair/ replace the equipment at site.
IX. Warranty should not become void, if DST/GIL/DIT buys, any other supplemental hardware
from a third party and installs it within these machines under intimation to the bidder.
However, the warranty will not apply to such supplemental hardware items installed.
X. The bidder shall carry out quarterly Preventive Maintenance (PM), including cleaning of
interior and exterior, of all hardware, if any, and should maintain proper records at each site
for such PM. Failure to carry out such PM will be a breach of warranty and the warranty
period will be extended by the period of delay in PM.
XI. Bidder shall monitor warranties to check adherence to preventive and repair maintenance
terms and conditions.
XII. Bidder shall ensure that the warranty complies with the agreed Technical Standards,
Security Requirements, Operating Procedures, and Recovery Procedures.
XIII. Bidder shall have to stock and provide adequate onsite and offsite spare parts and spare
component to ensure that the uptime commitment as per SLA is met.
XIV. Any component that is reported to be down on a given date should be either fully repaired
or replaced by temporary substitute (of equivalent configuration) within the time frame
indicated in the Service Level Agreement (SLA).
XV. Bidder shall develop and maintain an inventory database to include the registered hardware
warranties.
XVI. To provide warranty support effectively, OEM should have spare depo in India and will be
ask to deliver spare as per SLA requirement.
1. All supplied items must conform to the detailed technical specifications as mentioned in
this document.
2. Install the equipment, obtain user acceptance and submit a copy of user acceptance to
designated authority.
3. The agreement stipulates that the vendor shall maintain the system with uptime. It is
required to maintain uptime of 99.741%. Further, bidder is responsible for providing
comprehensive warranty and support (24x7) for the period of 5 years from the date of
successful completion FAT.
4. The Bidder shall be responsible for providing all material, equipment and services
specified or otherwise, which are required to fulfill the intent of ensuring operability,
maintainability and the reliability of the complete work covered under this specification.
5. Manufacturer shall provide and support for installation, commissioning, spares,
technical support in Gujarat.
6. All supporting equipment, tools shall be arranged by vendor himself.
7. Unpacking of goods shall be done in front of GIL/GSDC officer, Gandhinagar official
and for any damage it is sole responsibility of vendor.
8. Delivery of goods: packing unpacking transportation loading unloading Octroi insurance
and any other taxes and duties shall be included in the bid price.
9. All the liabilities like human injury, incident, etc. pertain in the bidder scope. The bidder
will be solely responsible to execute insurance for the said work as mentioned in this
RFP.
10. All safety precaution should be taken as per Industrial practice by the bidder to take
upmost care. In any case, the tenderer will not be liable to any obligation for any issue
arise under this project.

5. Lack of Information to Bidder:

� The Bidder shall be deemed to have carefully examined all RFP documents to its entire
satisfaction. Any lack of information shall not in any way relieve the Bidder of its
responsibility to fulfil its obligation under the Contract.

6. Payment Terms:

1. No advance payment will be made to the bidder.


2. 70% of the Capex cost shall be paid within 30 days after Supply of the proposed product,
software including Licenses mentioned in RFP of complete solution.
3. 20% of the Capex cost shall be paid within 30 days after Successful FAT (Final
Acceptance Test) of complete solution duly certified by the GSDC and counter-
signed/approved by the authority.
4. 10% of the Capex cost shall be paid after due acceptance by the GSDC and Go-live.
5. Cost of O&M support and Manpower Cost (D mentioned in financial breakup) for 5 years
will be equally distributed in 20 quarters and paid on Quarterly basis after FAT.
Note: Bidder has to submit invoices along with necessary legitimate supporting documents
failing which invoices submitted are liable to be rejected/not accepted.

7. Final ACCEPTANCE TEST:

To be carried out based on followings but not limited to:

� GIL and GSDC reserves the right to inspect goods and services supplied as per the scope
of this RFP document. The cost of all such tests shall be borne by the Vendor. Any inspected
goods fail for confirm to the specification will be rejected, and Vendor shall have to replace
the rejected goods as per the contract specification without any financial implication to the
GIL/DIT.
� After successful installation of the System in accordance with the requirements as mentioned
in Schedule of Requirement, POC shall be executed.
� Successful bidder has to complete the SITC of proposed complete solution and execute POC
to meet the benchmark as mentioned in this RFP document. All cost with respect to execute
the POC shall be borne by successful bidder.
� If POC does not meet the benchmark, bidder shall lift deployed complete solution from GSDC
without any cost to Tender. No payment will be made on failure of POC.
� After Successful POC which is successful demonstration of the benchmark as mentioned in
RFP, only then the bidder shall go for Final Acceptance Test.
� After successful installation of the System in accordance with the requirements as mentioned
in Schedule of Requirement, Final Acceptance Test will be conducted. The GSDC or
designated agency shall through review all aspects of the solution as per the ask of the RFP.
After successful testing, Acceptance Test Certificate will be issued by GIL/DIT and member
of GSDC or its designated agency to the Bidder. The Bidder shall submit the certificate to
GIL/DIT for further payment process.
� The date on which Final Acceptance certificate is issued shall be deemed to be the date of
successful commissioning and Go-Live of the System.
� Any delay by the successful bidder in the POC or Acceptance Testing shall render the
successful bidder liable to the imposition of appropriate Penalties.
� Bidder is required to update the details of Hardware installed in the Assets Master or as
decided by GIL and member of GSDC Officer before completion of FAT.
� GIL/GSDC and/or an outside agency nominated by DST will conduct an acceptance test on
the hardware after completion of installation and commissioning of hardware by the vendor.
Acceptance test shall comprise of tests to verify conformity of technical
requirements/specifications and performance. In case GIL/GSDC is not satisfied with the
above then, the vendor will upgrade /replace them with equal or higher model after due
approval of GSDC team without any extra cost. The exact details of acceptance test will be
mutually decided after the installation of hardware.

8. IMPLEMENTATION TIMELINES & PENALTIES:

Successful bidder has to complete the Installation, Configure, Commissioning, Integration with
Acceptance of the ordered work within the time period (s) specified in the below table. However, in
case of any delay solely on the part of successful bidder TENDERER reserve the right to levy the
appropriate penalties as per the below table:

IMPLEMENTATION TIMELINES & PENALTIES FOR PROPOSED GPU Cluster with GPUs AT
GSDC
Time Limit Maximu Overall
S/n Penalty for
Work type for m Penalty Cap
Delay
Execution Penalty
EMD may be
Within 15
forfeited and
Days from
contract may be
date of
1 Submission of PBG terminated or part -
issuance of
thereof
GEM
contract
T1=T+60 0.5% of Capex
Supply of the
days from value of 10% of
Hardware including
the date delayed/pending GEM Overall ( Sr. no- 2 to 6)
2 Licenses and OEM
of issuance work per week or order Penalty CAP not be
Warranty
of contract part thereof value more than 10 % of the
Certificate.
over GEM total GEM order value
Installation, 0.5% of Capex for IMPLEMENTATION
10% of
commissioning & value of TIMELINES &
GEM
3 integration of GPU T2=T1+30 delayed/pending PENALTIES:
order
servers at GSDC work per week or
value
along with HLD , part thereof
LLD documents
0.1 % of Capex
value of
delayed/pending
work per week or
part thereof.
In case of delay
for more than
POC to meet the 10% of
2(two) weeks
benchmark as T3=T2+30 GEM
4 after the defined
mentioned in this days order
milestone, the
RFP document. value.
POC shall be
treated as failed
and the contract
shall be
terminated and
PBG may be
forfeited.
0.5% of Capex
10% of
value of
Final Acceptance T3=T2+15 GEM
5 delayed/pending
Testing (FAT) days order
work per week or
value.
part thereof.
Deployment of
Rs.
6 required Skilled T3+7 Days Rs. 10000/- day.
250000/-
Resource at GSDC
10 Days Rs.
7 Training Rs. 10000/- day.
from T3 250000/-

Note:
� Material supplied, installed and commission as per this Bid/contract should be covered under the
warranty for a period of five years from the date of FAT acceptance.
� T= Date of issuance of contract over GEM.
� In case of any fault arises in the installed items during the warranty period of 5 years, bidder is
requiring to either repair the faulty items or have to install the replacement (complying to the RFP
specification) for faulty material without any additional cost to the Tenderer.
� Aforesaid penalty cap will not be applicable for any severe impact/incident/outage at GSDC,
resulting in loss to Government of Gujarat.

9. SLA & Penalties


a. Operational Penalty:

� The successful bidder shall repair/ replace all faulty material covered under the warranty
within the shortest possible time thus ensuring minimum downtime, failing which applicable
penalty will be imposed. In case of failure of appliance / solution for more than 3 consecutive
time for the same issue within any of the single quarter during contract period, bidder would
be bound to replace the product with no cost to DST / GIL/DIT.
� The successful bidder shall be responsible for maintaining the desired performance and
availability of the system/services.
� Successful bidder should ensure the prompt service support during warranty period.
� Timeline for resolution is within NBD ( Next Business Day) from the time of call logged /
reported to Bidder/OEM. If the successful bidder fails to resolve the call as specified above,
penalty will be imposed on each delayed hour for Rs. 5000 / hour or part thereof
proportionately, which will be recovered against Performance bank guarantee or billable
quarterly invoice amount submitted by the successful bidder.
� Down time will be calculated from the time complain is logged to service in charge of
Successful Bidder (via email/call/written letter) till the GSDC’s authorized / Nominated
employee acknowledge the repair / service completion.
b. SLA for Uptime (99.741%)

SLA Target Penalties in case of breach in SLA

Uptime of solution >=99.741% No penalty


In case of failure of proposed solution and non-
maintaining targeted value, 0.5% of Billable Quarterly
Uptime of solution <=99.741% O&M and Manpower payment for every hourly delay
or part thereof proportionately in resolution; with max
cap of 10 % of GEM order value.

� SLA will be calculated on quarterly basis, However, Final penalty deduction on the quarterly
payment i.e., (4* 3 quarter SLA report penalty will be applied during O&M and Manpower
quarterly payment.)
� Bidder has to ensure support 365*24*7 for SLA calculation.

c. Manpower related SLA and Penalties:


1. Availability of the min required manpower should be 100%. The agency has to implement
the attendance system and share the attendance report of each person deployed as part
of team on monthly basis with the GSDC.

2. Replacement of a profile by the agency (only one replacement per technical profile –
with equal or higher qualification and experience – would be permitted per year)

3. Prior Intimated Leave of absence will be allowed: If a resource proceeding on leave or


becoming absent is replaced with a resource approved by authority, then such
substitution will not be treated as absence.

For every SLA non-compliance reported and proved, there shall be a penalty as given below:

# SLA Timelines/ Event Applicable Penalty

2 Replacement of resources There should be minimum 15 No penalty- On timely


by the agency on formal days overlap between the replacement.
submission of resignation new deployed resource and
by the resource in the the replaced resource. Rs. 5000/- per resource per
company. day for each day delay from
stated timelines.

3 The deployed resources - Penalty of Rs. 50,000 per


shall not be engaged in any resource may be imposed on
activity other than that breach of SLA.
assigned by the
TENDERER On consecutive breach of 03
times may lead to
termination of the contract.

4 Absence without prior - Penalty of Rs. 5000/- per


approval from the resource per day shall be
TENDERER and No Backup imposed.
resource arranged
10. Minimum Technical Specification:

Master Node: (07 Nodes)

Components Minimum Specifications


Latest Generation Intel® Xeon® platinum or AMD Epyc scalable
Processors
processors with Minimum Dual 56-Core
Mother Board OEM Supported Motherboard and chipset.
System Memory 1 TB DDR4 or higher SDRAM with ECC Advance.
For Operating System: Minimum 2*3.84 TB capacity hot swap
Enterprise NVMe SSDs
Internal Storage
For Data: Minimum 4*3.84 TB capacity interface hot swap
Enterprise NVMe SSDs.
32 Gbps Host Bus Adaptor with required SFP (at both end) for
HBA Card
connecting with existing SAN switch and storage.
Hardware RAID 0,1, 5, 6, 10, 50, 60 with 4GB cache Flash based
Storage Controller cache protection module should be included, should support Gen
5.0 PCIe NVMe
Following N/W are required:
a) Infiniband / Ethernet (200Gbps or higher) as required for
quoted storage delivery to nodes
Network b) Ethernet (100Gbps or higher) for User delivery
c) Ethernet (10GbE or higher) for cluster orchestration
d) Ethernet (10GbE or higher) for perimeter connectivity
e) Ethernet (1GbE or higher) for in-band management
One VGA port, 2 or more USB ports. Dedicated LAN port for
External Port
Management Interface
Dedicated IPMI 2.0 compliant management LAN port having
support for system health monitoring, event log access, Virtual
Server Management media over network, and Virtual KVM (KVM over IP). All required
licenses to use IPMI features should be included. Licenses shall
be perpetual/subscription base for entire contract period to use.
Appropriate rated and energy efficient, redundant (N+1) hot
Power Supply
swappable power supply (Mandatory) and Fan (Optional) .
Should be able to alert upcoming failures on maximum number
Failure Alerting
of components such as Processor, memory, HDDs and
Mechanism
expansion cards, etc.
The system should support latest version of Red Hat Enterprise
Linux / Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server.
However, the bidder shall deliver Ubuntu Linux version 22 or
OS Support higher which should be delivered with Enterprise support from
OEM with premium or highest level of support available. This must
be provided with enterprise-level support from the OEM,
specifically with premium or the highest level of support available.
Hypervisor with Enterprise level highest license and support
Hypervisor
should be provided from day one.
All necessary and required software, SDK, libraries, tools to cater
Software Support
and run the AI/ML workload should be provide from day 1.
Scalability, Cluster and System should be scalable with multi node cluster. Software
Management Hardware and support & cluster tools and management hardware and software
software and licenses to be supplied along with product.
Warranty & Support 5 Years comprehensive onsite warranty.
System should support Secure Firmware Updates, Support for
Trusted Platform Module enabled within the BIOS for secure
cryptographic key generation, Secure storage space and Self
encrypting drive.
Security Features
ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform
Module enabled within the BIOS for secure cryptographic key
generation, SMBIOS 3.5 or later, Malicious Code Free design" (to
be certified by OEM).
Performance Benchmarks 1.Specrate2017_fp_base >690
2.Specrate2017_Int_base >530
The System OEM must have listed the SPEC benchmark score
on www.spec.org for the same node model with the same CPU
configuration and a memory configuration of at least 1TB.
Or
If not listed on spec.org, bidder shall be required to submit
benchmark report / logs for the Make/Model (same configuration)
of the quoted server as part of bid submission. The submission
should be on OEM letterhead duly signed and referring the bidder
and bid details.

Nodes for AI training: Total Qty-4 sets.

Components Minimum Specifications


Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc
scalable processors, with Min 8 X GPU Accelerators. providing
Processors & performance (per
500TF or Higher Double Precision Tensor FP64 / TF64
node, minimum)
Performance, 31 PetaFlops or Higher FP8 performance with
sparsity.
8 x Accelerators per node, each with minimum 140 GB or higher
memory per Accelerator.
Number of GPUs and Minimum 900GB/s bidirectional communication bandwidth per
GPU Communication GPU.
Should support Tensor core/Matrix core, CUDA / Stream
Processors/ openCL /ROCm with Accelerators,
Multi Instance GPU Capability to support partitioning of single GPU into multiple GPU
instances where both memory and compute of the GPU is divided
into multiple instances
System Memory The system should be configured with Minimum 2TB DDR5 RAM
with all slots populated in balanced configuration for maximum
bandwidth.
a) Minimum 8 nos of InfiniBand NDR ports or Ethernet
(400Gb/s or higher) for compute communication for
internode communication,
b) 1 nos. of port for BMC (dedicated LAN port),
c) Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or
higher (Fiber/Copper) port.
d) Required InfiniBand / 200G or higher Ethernet 2 x twin-
port HCA as required for quoted storage delivery to node.
Network e) Additionally, 1 nos of 100GbE or higher Ethernet (Fibre).
f) Required switch with 64 non-blocking ports with
aggregate data throughput up to 51.2 Tb/s and
required compatible cables of appropriate length to
connect all 8 (compute communication) nos. of IB
NDR ports / Ethernet of all nodes in non-blocking
mode.

• For Operating System: Minimum capacity of 2*960 GB NVMe


Internal Storage /M.2 NVMe drives
• For Data: Minimum 8 * 3.84 TB U.2 / U.3 or EDSFF NVMe drives
System should support Secure Firmware Updates, Support for
Trusted Platform Module enabled within the BIOS for secure
Security Features
cryptographic key generation, Secure storage space and Self
encrypting drive.
Power requirements Appropriate rated and energy efficient, redundant (N+1) hot
swappable power supply (Mandatory) and Fan (Optional) .
System Network Following N/W are required:
1. NDR Infiniband / Ethernet (400Gbps or higher) for compute
communication
2. Infiniband /Ethernet (200Gbps or higher) for storage delivery
3. Ethernet (Min 10 GbE or higher) for cluster orchestration
4. Ethernet (10 GbE or higher) for perimeter connectivity
5. Ethernet (1GbE) for in-band management
The system should support latest version of Red Hat Enterprise
Linux / Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server.
OS Support However, the bidder shall deliver Ubuntu Linux version 22 or
higher which should be delivered with Enterprise support from
OEM with premium or highest level of support available.
AI Enterprise software & subscription or equivalent for each and
every GPUs to be included from day one of Installation. Software
stack to be supported by GPU OEM for 5 years for each system.

All necessary and required software, SDK, libraries, tools to cater


and run the AI/ML workload should be provided from day 1. Bidder
to ensure that enterprise level OEM support & SLA is available for
all OEM provided software and libraries. Licenses required must
be included and shall be perpetual/subscription base for entire
contract period with no scaling restrictions.
AI Enterprise Software Some of the basic, SDK/library/containers to be used in the
system are:
a. CUDA toolkit,
b. CUDA tuned Neural Network (cuDNN) Primitives
c. TensorRT Inference Engine
d. CUDA tuned BLAS (cuBLAS)
e. CUDA tuned Sparse Matrix Operations (cuSPARSE)
f. Multi-GPU Communications (NCCL)
g. Industry SDKs – NVIDIA Merlin, DeepStream, ISAAC,
Nemo, Morpheus
h. Rapids, Tao, Tensor RT, Triton Inference
All necessary and required software, SDK, libraries, tools to cater
and run the AI/ML workload should be provided from day 1.
Comprehensive software frameworks for the following should be
provided:
a) Accelerated ML and data processing
b) LLM pre-training, fine-tuning & guard railing
c) Micro services enabled framework for API based LLM model
Software Support
deployment & serving
d) End to End flows for conversational AI - ASR, NMT, TTS
e) Video, Audio and Image processing pipelines

In addition customizable pre-built reference workflows for


generative AI use cases shall also be covered as part of the
software offerings
Scalability, Cluster and System should be scalable with multi node cluster. Software
Management Hardware and support & cluster tools and management hardware and software
software and licenses to be supplied along with product.
Certification Rack Servers should be certified by GPU Controller / Accelerator
OEM, the Certificate or listing of offered Server model in GPU
Controller / Accelerator OEM website must be submitted along
with bid.
Warranty & Support 5 Years comprehensive warranty with Enterprise level
Highest/Premium Support. OEM Enterprise level
Highest/Premium Support should reflect on OEM portal. Quoted
all products including GPUs should not be End of support till 5
years from the date of issue of the bid.
The product quoted should be manufactured in current year.
Cluster management for system provisioning and monitoring
Cluster Management &
needs to be included, The Cluster Manager must support multiple
Scheduler and hardware
of hardware vendors.
The Cluster Manager must allow for the easy deployment and
management of servers across multiple data centers, the public
cloud, and edge locations as a single shared infrastructure
through a single interface.
All necessary hardware, software and necessary licenses should
be provided from day 1

Bidder needs to submit proof of the quoted GPU meeting these


Mlcommons training benchmarks at the time of bidding
or
If not listed on Mlcommons, bidder shall be required to submit
benchmark report for the Make/Model (same configuration) of the
quoted server as part of bid submission. The submission should be
on OEM letterhead duly signed and referring the bidder and bid
details.

Offered Nodes should be listed under ML Commons Training (4.0


or higher) for the mentioned Benchmarks, supporting published
link to be shared during bid submission.
Or
If not listed on Mlcommons, bidder shall be required to submit
Benchmarks benchmark report for the Make/Model (same configuration) of the
quoted server as part of bid submission. The submission should be
on OEM letterhead duly signed and referring the bidder and bid
details.
Specifications:
a) BERT - 5.3 minutes or less on single node
b) DLRM-dcnv2 – 3.6 minutes or less on single node
c) GNN – 7.8 minutes or less on single node
d) Llama 2 70B –24.7 minutes or less on single node
e) ResNet – 12.1 minutes or less on single node
f) RetinaNet – 34.3 minutes or less on single node
g) Stable Diffusion – 41.4 minutes or less on single node
U-Net3D – 11.6 minutes or less on single node

Up to 25% tolerance shall be accepted on aforementioned


benchmarks during POC.

Inference Node: Total Qty-12 sets.

Components Minimum Specifications


Latest Generation Intel® Xeon® platinum or AMD Epyc scalable
processors with Minimum Dual 32-Core, with Min 2 X GPU
Processors & performance
Accelerator.
(per node, minimum)
The bidder should configured Head/ Master/ Management Node in
(1+1) HA mode deliver the solution.
2 x Accelerators per node, each with minimum 140 GB or higher
Number of GPUs and GPU per Accelerator.
GPU Communication Should support Tensor core/Matrix core, CUDA / Stream
Processors/ openCL /ROCm with Accelerators.
Multi Instance GPU Capability to support partitioning of single GPU into multiple GPU
instances where both memory and compute of the GPU is divided
into multiple instances.
System Memory The system should be configured with Minimum 2TB DDR5 RAM
with all slots populated in balanced configuration for maximum
bandwidth.
For Operating System: Minimum capacity of 2*960 GB NVMe /M.2
Internal Storage NVMe drives
Minimum 4 * 3.84 TB U.2 / U.3 or EDSFF NVMe drives
32 Gbps Host Bus Adaptor with required SFP (at both end) for
HBA Card
connecting with existing SAN switch and storage.
System should support Secure Firmware Updates, Support for
Trusted Platform Module enabled within the BIOS for secure
cryptographic key generation, Secure storage space and Self
encrypting drive.
Security Features
ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform
Module enabled within the BIOS for secure cryptographic key
generation, SMBIOS 3.5 or later, Malicious Code Free design" (to
be certified by OEM).
Power requirements Appropriate rated and energy efficient, redundant (N+1) hot
swappable power supply (Mandatory) and Fan (Optional) .
PCI Express interface 4 x PCIe Gen 5.0 x 16 FH FL Slots. All slots must operate at PCI
Gen 5.0 speed when fully populated
Mother Board Appropriate Motherboard and chipset. Must support PCIe Gen 5.0
and compatible with selected processors and GPUs.
System Network Following N/W are required:
1. Ethernet (10 GbE or higher) for cluster orchestration
2. Ethernet (10 GbE or higher) for perimeter connectivity
3. Ethernet (1GbE or higher) for in-band management
4. Infiniband / Ethernet (200Gbps or higher) as required for quoted
storage delivery to nodes
5. Minimum 1 x 100 GbE Ethernet ports for User Network
6. 1 nos. of port for BMC (dedicated LAN Port)
Networking Switch 1. Min. Two or required Nos. of Switch with 48 *10/25G or higher
SFP+ and 6 or higher x 100G QSFP ports to connect all of “Master
Node”, “Node for AI Training” and “Inference Node” to form Cluster
Communication and Perimeter N/W. Switch must support of MLAG
/MCLAG feature.
Switch must support EVPN – VxLAN based network. Required
cables of appropriate length, transceivers should be supplied.
Switches(s) should have redundant Power Supply. 5 Years
Comprehensive Onsite Warranty.
2. Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 10/25G
or higher SFP28 or higher and 2 x 100G QSFP ports to connect all
of “Master Node”, “Node for AI Training” and “Inference Node” to
form In-band and BMC/Out-of-band Management N/W. Required
cables of appropriate length, transceivers should be supplied.
Switch should have redundant Power Supply. 5 Years
Comprehensive Onsite Warranty.
3. Min. two or required Nos. of Switch with 32 * 100GbE QSFP ports
or
One or required Nos. of Switch with 64* 100GbE QSFP ports to
connect all of “Master Node”, “Node for AI Training” and “Inference
Node” to form User N/W. Switch must support of MLAG /MCLAG
feature.
Switch must support EVPN – VxLAN based network. Required
cables of appropriate length, transceivers should be supplied.
Switch should have redundant Power Supply. 5 Years
Comprehensive onsite Warranty.

Note:
The bidder must deploy the required quantity of switches with the
same or higher functionality to meet the solution requirements.
The switch count shall be adjusted (increased/decreased) based
on the actual port availability per device while maintaining the
specified speed and functionality.
The system should support latest version of Red Hat Enterprise
Linux / Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server.
OS Support However, the bidder shall deliver Ubuntu Linux version 22 or higher
which should be delivered with Enterprise support from OEM with
premium or highest level of support available.
Quoted model should be certified for RHEL, Ubuntu OS. The same
shall be verifiable from OS OEMs website.
Supply should include DC edition unlimited Guest OS licenses
Hypervisor with Enterprise level highest license and support
Hypervisor
available should be provided from day one.
Virtual GPU Support for virtual GPU to share a physical GPU across multiple
VMs. required license should be from day one. Bidder to ensure
that enterprise level OEM support & SLA is available for all OEM
provided software and libraries.
AI Enterprise software & subscription or equivalent for each and
every GPUs to be included from day one. Bidder to ensure that
enterprise level OEM support & SLA is available for all OEM
provided software and libraries.
All necessary and required software, SDK, libraries, tools to cater
and run the AI/ML workload should be provide from day 1.
Comprehensive software frameworks for the following should be
provided:
AI Enterprise Software
a) Accelerated ML and data processing
b) Microservices enabled framework for API based LLM model
deployment & serving
d) End to End flows for conversational AI - ASR, NMT, TTS
e) Video, Audio and Image processing pipelines

In addition customizable pre-built reference workflows for


generative AI use cases shall also be covered as part of the
software offerings
Scalability, Cluster and System should be scalable with multi node cluster. Software
Management Hardware and support & cluster tools and management hardware and software
software and licenses to be supplied along with product.
Undertaking from Sever OEM for compatibility of the proposed
sever with GPU under the quoted Inference node must be
Certification submitted (duly signed by authorized signatory , mentioning Bid
reference)
Warranty & Support 5 Years comprehensive warranty with Enterprise level
Highest/Premium Support. OEM Enterprise level Highest/Premium
Support should reflect on OEM portal. Quoted all products including
GPUs should not be End of support till 5 years from the date of
issue of the bid.
The product quoted should be manufactured in current year.
Cluster management for system provisioning and monitoring needs
to be included, The Cluster Manager must support multiple of
hardware vendors.
The Cluster Manager must allow for the easy deployment and
Cluster Management &
management as a single shared infrastructure through a single
Scheduler and hardware
interface.
All necessary hardware, software and necessary licenses should
be provided from day 1

Image segmentation (medical)


3D-Unet-99
Throughput for single NODE inference (99% offline) = 09
Bidder needs to submit proof Samples/s or higher
of the quoted GPUs being NLP
listed in MLperf inferencing Bert-99
benchmarks at the time of Throughput for single NODE inference (99% offline) = 10000
bid submission. Samples/s or higher
Recommendation
Or dlrm-v2-99
Throughput for single NODE inference (99% offline) = 85000
If not listed on MLperf, Samples/s or higher
bidder shall be required to LLM Summarization
submit benchmark report for gptj-99
the Make/Model (same Throughput for single NODE inference (99% offline) = 2500
configuration) of the quoted Tokens/s or higher
server as part of bid Image Classification
submission. The submission ResNet
should be on OEM Throughput for single NODE inference (99% offline) = 105000
letterhead duly signed and Samples /s or higher
referring the bidder and bid Object Detection
details. RetinaNet
Throughput for single NODE inference (99% offline) = 1500
Samples /s or higher
Image Generation
Stable Diffusion-XL
Throughput for single NODE inference (99% offline) = 1 Samples/s
or higher
Up to 25% tolerance shall be accepted on aforementioned
benchmarks during POC.

1.Specrate2017_fp_base >690
2.Specrate2017_Int_base >530
The System OEM must have listed the SPEC benchmark score on
www.spec.org for the same node model with the same CPU
configuration and a memory configuration of at least 1TB.
Performance Benchmarks
If not listed on spec.org, bidder shall be required to submit
benchmark report / logs for the Make/Model (same configuration)
of the quoted server as part of bid submission. The submission
should be on OEM letterhead duly signed and referring the bidder
and bid details.

Storage Nodes
The solution should be PFS (Parallel File System) OR NFS (Network File
System) based and delivered with 1PB (All NVMe) usable post RAID
6/equivalent or better protection, expandable up to 2PB in the same file
system.
The proposed storage array should be configured with no single point of
failure, including required controllers, cache, power supply, cooling fans,
etc. It should be scalable up to 12 additional controllers/nodes.
1PB (NVMe) usable post RAID 6 or better configuration
The storage should be distributed with namespace consistent across nodes.
External Storage Performance: Min 32 GBps Read and Min 16 GBps Write aggregated from
day one and scalable up to 200% with a scale-out architecture and
additional controllers/nodes in the future.
IOPS: minimum 8,00,000
1. Storage must offer NVIDIA GPUDirect Storage connectivity to
GPUs.
2. NVMe Storage offered must be certified with the proposed GPU
OEM.
Front-End Connectivity: 200GBE or higher Ethernet connectivity
compatible with all nodes as per proposed solution.

Specifications: 42U Server Rack


Sr.
No Parameter Minimum Specifications
.
Form Factor
1 with Width & 42U Server Rack should be 800mm (Width) x — l400mm (Depth)
Depth
Rack Frame should be robust and made of welded steel frame that offers
Cabinet Type
strong and sturdy support for installation of 19" equipment and accessories.
2 &
Rack Frame made of Steel Profile and connected with Horizontal Profiles
Construction
for Width and Depth. Depth support channel with adjustable mounting slots.

3 Cable Entry Top and Bottom Panel with cable entry facility with Brush.

The 19” mounting angles should be provided 2 Nos. on front and rear side
Mounting of the Rack. It should be adjustable full depth. 19” Mounting Angles made
4
Angle up of Steel 2mm Thickness with better mounting flexibility and maximizes
usable mounting space.
"U" “U” numbering should be provided on the 19" mounting rails such that these
5
Identification unique numbers are visible after mounting of the equipment also.

PDU Each rack should have provision for installation of two PDU with toolless
6
Provision mounting provision to be connected to the two different sources individually.

Cable
7 Manager Each rack should have 4 horizontal 1U Closed type cable manager.
Provision

Side Panel shall be covered with horizontally split steel panels The side
8 Side Panels
panels should be easily detachable with locking provision.

Front and Rear doors should be perforated and both front and rear doors
should be at least 80
9 Door % hexagonal perforated (Holes). Front & Rear Door should be with Minimum
of
138 degrees to allow easy access to the interior.
Door Hexagaonal Perforated Single Front Door will be Lockable and - handle
10
Perforation Lock & Key should be provided.
Hexagonal Perforated Dual Rear Door will be Lockable and 3 Point Lock
11 Door Lock
should be provided.
Rack should be with Plinth of 800 MMW, 100MM H and 1400 MM D. The
12 Castor
rack shall be not having External height >2060 mm including Plinth.
Minimum load bearing capacity supported by Base Frame should be static
13 Load Bearing
load of at least -1200 Kg.
Rack shall be pre-treated and powder coated. The Powder coating process
Powder
14 shall be ROHS compliant. Powder coating thickness shall be 80 to 100
Coating
microns. The color of the powder coat shall be Black.
Each rack shall be provided with 3 Nos. of 3 PHASE 63A PDU IEC C19 X
12 SKT (PER SOCKET IEC C19 X 4 SOCKET + 63A D Curve DP MCB )
15 PDU
X 3 + 16 SQ MM 5 CORE 3.5 MTR FRLS CABLE WITH 5
PIN 63A IND PLUG (2No Vertical and 1 No Horizontal)
16 Shelf 1No Heavy Duty Shelf for keeping the Display & Keyboard

All Racks & Doors are inherently grounded to Rack Frame. Both the front
Door and rear doors should be designed with quick-release hinges allowing for
17
Construction quick and easy detachment without the use of tools. The front door of unit
should be field reversible so that it may open from either side.

18 Statutory 100% assured compatibility with all equipment conforming to DIN 41494 /
Standard EIA 310-D standard(General industrial standard for equipment).

The rack shall be from OEM having ISO9001:2008,


19 Certification ISO14001:2004, ISO 45001:2018 & ISO 50001:2018
(Certificate to be submitted along with compliance)

21 Warranty 5 years onsite comprehensive warranty

Indicative Diagram

Note:
� Bidders should refer to the indicative diagram for reference and propose their own solution
to meet the requirement and ensuring minimum failure / no failure accordingly.
� Bidder has to conduct site visits in advance (before the bid submission date) during working
days and hours to assess the rack positioning. Based on this assessment, they should quote
their solution in the bid submission.
� In addition, Bidder has to connect Management and inference node with existing storage at
GSDC as following;

Existing Storage
� NetApp FAS8300 with Total ports 4 and Used ports 2
� Hitachi VSP 5600 with Total ports-64 ports and Used 64 ports
CISCO MDS 9710 SAN Switch with 16 Gbps SFP (having port capacity of 32 Gbps) is used
to connect with Storage. Port details are as below.

Total Used Available


SAN Fabric Name Ports Ports Ports
GSDC-Fabric-1 289 249 40
GSDC-Fabric-2 289 256 33

� For any additional requirement of ports over and above as aforementioned available ports,
the bidder shall provide SAN switch with same or higher configuration compatible to connect
inference node and management node with existing storage to complete the solution without
any additional cost to the tenderer.
� The bidder has to ensure propose management and inference node solution should be
compatible with aforementioned storage and switch. All necessary accessories, cabling,
hardware, software, and licenses should be considered accordingly.

� Please find below tentative Layout diagram for Installation of RACKs.


3. PRICE BID SCHEDULE (On GEM):

Cost
Sr. includin
Description
No. g GST
(Rs.)
GPU Compute for uses AI / ML at GSDC:
A. Inclusive of all the required hardware, Software and necessary Licenses
required to make the solution fully functional.
B. As per the Scope of work, functional and technical requirement, including
racks, cable & all other accessories (including active & passive
1
components), Installation, testing, commissioning and training etc.
C. Cost of Comprehensive Annual Maintenance with warranty and OEM
support for 5 year
D. Cost for O&M cost (including two skilled resources) for period of 5 Years

Total cost (Rs.)

Note:

� CAPEX cost includes- A, B, and C. OPEX cost include D.


� L1 will be the lowest sum total of rates of all line items including GST as per GeM GTC.
� TENDERER/GIL may negotiate the prices with L1 Bidder, under each item/head offered by
Bidder.
� The L1 Bidder shall share the Item Wise cost breakup with the tenderer for future reference
for scalability and additional components within the solution.
� Enterprise level highest license and support for complete solution should be provided from
day one.
� RA has been enabled in the GEM Bid.
Please submit the undertaking letter as per Ministry of Finance Memorandum No.: F.
No.6/18/2019-PPD dated 23.07.2020 & Office Memorandum No.: F.18/37/2020-PPD dated
08.02.2021 as per Performa given below on OEM letterhead as well as on bidder’s
letterhead.

On Letterhead of Bidder

Sub: Undertaking as per Office Memorandum No.: F. No.6/18/2019-PPD dated 23.07.2020 &
Office Memorandum No.: F.18/37/2020-PPD dated 08.02.2021 published by Ministry of
Finance, Dept. of Expenditure, Public Procurement division

Ref: Bid Number: ________________________________

I have read the clause regarding restriction on procurement from a bidder of a country that shares a
land border with India. I certify that we as a bidder and quoted product from the following OEMs are
not from such a country or if from such a country, these quoted products OEM has been registered
with the competent authority. I hereby certify that these quoted product & its OEM fulfills all
requirements in this regard and is eligible to be considered for procurement for Bid
number_______________________.

No. Item Category Quoted Make & Model

In case I’m supplying material from a country which shares a land border with India, I will provide
evidence for valid registration by the competent authority, otherwise GIL/End user Dept. reserves
the right to take legal action on us.
(Signature)
Authorized Signatory of M/s <<Name of Company>>
On Letterhead of OEM

Sub: Undertaking as per Office Memorandum No.: F. No.6/18/2019-PPD dated 23.07.2020 &
Office Memorandum No.: F.18/37/2020-PPD dated 08.02.2021 published by Ministry of
Finance, Dept. of Expenditure, Public Procurement division

Ref: Bid Number: ____________________________________

Dear Sir,

I have read the clause regarding restriction on procurement from a bidder of a country that shares a
land border with India. I certify that our quoted product and our company are not from such a country,
or if from such a country, our quoted product and our company have been registered with the
competent authority. I hereby certify that these quoted products and our company fulfills all
requirements in this regard and is eligible to be considered for procurement for Bid
number_______________________.

No. Item Category Quoted Make & Model

In case I’m supplying material from a country which shares a land border with India, I will provide
evidence for valid registration by the competent authority; otherwise GIL/End user Dept. reserves
the right to take legal action on us.

(Signature)
Authorized Signatory of M/s <<Name of Company>>

You might also like