Revised GEM Bid 16th May 2025
Revised GEM Bid 16th May 2025
Block No. 2, 2ND Floor, C & D Wing, Karmayogi Bhavan, Sector -10A, GANDHINAGAR - 382010
The bidder shall submit their queries, if any, within 07 days from the date of publication of Corrigendum-03. No further clarifications
regarding Corrigendum-03 or any earlier published corrigendum shall be entertained by the Tenderer after this period. In addition, no new
queries shall be accepted within this 7 day also.
#
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
Corrigendum 2 : 10. Minimum Technical Specification Master 32 Gbps Host Bus Adaptor with required SFP (at both end) for SFPs (if required) for the supplied servers
5 We request GIL to clarify, what does (at both end ) means here !!! -
Node, Page-24 & Inferencing nodes, Page-27 connecting with existing SAN switch and storage. maintaining a card level redundency
Modification Internal Storage : For Operating System: minimum 1.92 TB NVMe drives OR M.2 960GB NVMe drives .
Justification : Current changes in Corrigendum 2 are restricting OEMs to participate, earlier in Corrigendum 1, this was allowed.
Corrigendum 2 : 10. Minimum Technical Specification AI Training Internal Storage : For Operating System: minimum 1.92 TB NVMe drives
7 Hence request GIL to relax the clasue for wider OEMs participation. without impacting any perfomrance of the nodes. as Operating system - please refer Corrigendum-03.
Node, Page-25 & Inferecing Nodes, Page-27
are small in size and does not requried to be mised with performacne capacity drives and large dries .
We request GIL to open up this clause for wider OEMs particiation ,as current clause is restrictive in nature.
As per RFP and time to time published
2 x Accelerators per node, each with minimum 140 GB or higher GPU accelarators
corrigendum.
OR
However, Bidder may propose 2 or higher
4 x Accelarators per node , each with minimum 94GB or higher GPU
accelerator card per node (In inference Node)
OR
Number of GPUs and and numbers of inference node quantity may be
Corrigendum 2 : 10. Minimum Technical Specification, Inference 4x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 6 GPU Nodes for inferencing).
10 GPU Communication : 2 x Accelerators per node, each with minimum 140 GB or - change accordingly. However, total requirement
Node, Page-27 OR
higher GPU per Accelerator. of core and memory may be provided a s
8x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 3 GPU Nodes for inferencing).
mentioned in RFP and time to time published
Justification : we request GIL to consider the suggested changes, as All leading OEM and GPU OEM does not have ratings,
corrigendum.
configurations available and benchmakred.
Also this will help GIL to have better options with better Rack Space, Power Cooling for Data Center. without any compromise on
performacne.
Modification : We request GIL to revise the following benchmarks , as available on https://mlcommons.org/ Please refer to last Corrigendum-2 dated
DLRM-dcnv2 – 3.6 minutes or less on single node DLRM-dcnv2 – 3.75 minutes or less on single node 15.03.2025. Tolerance of 25% is allowed on the
GNN – 7.8 minutes or less on single node GNN – Please clarify are we referring to RGAT benchmark for GNN? benchmark in AI Training model.
11 Nodes for AI training: Total Qty-4 sets. Benchmarks, Page-27 -
ResNet – 12.1 minutes or less on single node ResNet – 13.25 minutes or less on single node
U-Net3D – 11.6 minutes or less on single node U-Net3D – 12.42 minutes or less on single node if GNN is not available, RGAT is accepted instead
Justification : We requst to amend as request for Wider OEM participation in the bid. of GNN.
OS Support : The system should support latest version of Red Hat Enterprise Linux
Corrigendum 2 : 10. Minimum Technical Specification, Inference
12 /Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server, Quoted OS should be We requesg GIL to clarify, which of these to be quoted as part of solution for better clarity!! please refer Corrigendum-03.
Node, Page-28
under Enterprise support from OEM with premium or highest level of support.
Corrigendum 2 : 10. Minimum Technical Specification, Inference We request GIL to define the user base, as this ask of Unlimited Guest OS, will force bidder to quote highest level of subscritpion license,
13 OS Support : Supply should include DC edition unlimited Guest OS licenses - please refer Corrigendum-03.
Node, Page-28 escalating the cost, which may not be utilized by GIL for years to come!! Hence defineing the user license will make it optimized.
#
Modification : (As per Corrigendum 1). Performance: 28GBps Read and 14GBps Write from day one and scalable up to >100GBps read
write combinations
with a scale-out architecture and additional controllers/nodes in the future.
OR
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day
one and scalable up to 200% with a scale-out architecture and additional controllers/
nodes in the future.
Justification :: We request GIL to allow NFS solution, as all current data is on NFS solution, converting, migrating NFS to PFS will take lot
of time, that will further delay the productivity of the AI/ML setup. !! Relaxing this restrictive clause will help wider OEM participation
with established results, deployments of AI/ML Solution , will Even NVIDIA does not have any bias towards PFS, as both perform well for
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day
AI/ML environment. Current Specifications are OEM Specific and limiting participation by Leading OEMs.
one and scalable up to 200% with a scale-out architecture and additional
controllers/nodes in the future.
Please refer the link shared again here with for detailed view, where it clearly shows NVIDIA SuperPod recommendation as well
IOPS: minimum 8,00,000
15 Storage Nodes, External Storage, Page -30 As per Nvidia documentation, for DGX superpod, available at Storage Architecture — NVIDIA DGX SuperPOD: Next Generation Scalable - please refer Corrigendum-03.
1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs.
Infrastructure for AI Leadership Reference Architecture Featuring NVDIA DGX H200 , the storage performance is as under
2. NVMe Storage offered must be certified with the proposed GPU OEM.
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
all nodes as per proposed solution.
Single SU* (256 GPU) aggregate system read = 125GBps
Single SU* (256GPU) aggregate system write = 62GBPs
*A single Scalable Unit (SU) in an NVIDIA DGX SuperPOD consists of 32 DGX systems, each system with 8 GPUs of H200
Hence for a 56GPU (4 training nodes with 8 GPU per node, and 12 inference server with 2 GPU per node) system, the maximum
throughput we should need is as under:
Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
InfiniBand / Ethernet (200GBps or higher) as required for
16 Master Node, Page-24, Network Section - please refer Corrigendum-03.
quoted storage delivery to nodes Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity" the mentioned GBps is not the
correct rating ,as the HBA/NIC are available from leading OEMS.
Modification: Required 200G or higher Ethernet 2 x twin-port HCA as required for quoted storage delivery to node
d) Required InfiniBand / 200G or higher Ethernet 2 x twin-port As per RFP and time to time published
17 Nodes for AI Training, Page-25, Network Section -
HCA as required for quoted storage delivery to node corrigendum
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"
Modification: Required two switches, each with 64 non-blocking ports and 1RU or 2RU form factor with aggregate data throughput up to
51.2 Tb/s and required compatible cables of appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet
f) Required switch with 64 non-blocking ports with of all nodes in non-blocking mode
Clarification:
aggregate data throughput up to 51.2 Tb/s and required
The bidder shall have to deploy required
18 Nodes for AI Training, Page-25, Network Section compatible cables of appropriate length to connect all 8 Justification: Chassis based switches can be very power hungry. 1 or 2RU form factor switches consume power around 2000W depending -
quantity of switch with said functionality to
(compute communication) nos. of IB NDR ports / Ethernet on the number of transceivers used.
complete the solution.
of all nodes in non-blocking mode
Also, with 2 switches, a redundancy can be formed in the back-end GPU connectivity so that failure of one switch can still allow GPU-to-
GPU traffic to flow through at 400G
Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
4. InfiniBand / Ethernet (200GBps or higher) as required for quoted storage
20 Inference Node, Page 28, System Network Section - please refer Corrigendum-03.
delivery to nodes
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"
Clarification:
Modification: Min. Four or higher Nos. of Switch with 48 *10/25 G SFP+ and 6 x 100G QSFP ports or higher to connect to Core for all of
The bidder must deploy the required quantity of
“Master Node”, “Node for AI Training” and “Inference Node” to form Cluster Communication and Perimeter N/W. Switch must support of
switches with the same or higher functionality to
MLAG/MCLAG and EVPN-Multihoming feature.
Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G meet the solution requirements.
QSFP ports to connect all of “Master Node”, “Node for AI Training” and The switch count shall be adjusted
21 Networking Switch, Page-28 Justification: Ideally, it is better to physically separate the perimeter traffic and cluster orchestration traffic. Across all servers, the total
“Inference Node” to form Cluster Communication and Perimeter N/W. (increased/decreased) based on the actual port
perimeter ports are 46 and orchestration node ports are 46. Additional layer of segregation can be done using VLAN, VxLAN and VRFs.
Switch must support of MLAG /MCLAG feature availability per device while maintaining the
Secondly, 25G standard is becoming highly common in DC use case. Having 25G as switch port option can allow adding servers with 25G
specified speed and functionality.
ports, instead of buying new switches. Additionally, 6x 100G is adequate to connect to uplink towards Core / Spine switch. Multihoming
EVPN -multihoming feature can be provided, in
allows the switch to form a setup similar to "MC-LAG" without needing to directly interconnect switches.
case it is required to complete the solution.
#
Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of Switch with 64* 100GbE QSFP ports to quantity of switches with said or higher
connect all of “Master Node”, “Node for AI Training” and “Inference Node” to form User N/W. Switch must support of MLAG /MCLAG and functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of EVPN-Multihoming feature. multihoming is not mandatory with the solution
Switch with 64* 100GbE QSFP ports to connect all of “Master Node”, “Node for AI
23 Networking Switch, Page-28 -
Training” and “Inference Node” to form User N/W. Switch must support of MLAG Justification: A Server NIC usually have 2x 100G NIC, instead of 1. Moreover, single connectivity to switch (instead in HA) can be a single Increase/decrease the count of switches as per
/MCLAG feature point of failure and losing connectivity to user traffic is a switch or transceiver or fiber cable were to go bad. Therefore, specifying two the ports available with the mentioned speed
switches which can be in HA will be good option to ensure all bidders comply to minimal baseline. Multihoming allows the switch to form and functionality to complete the solution.
a setup similar to "MC-LAG" without needing to directly interconnect switches.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.
Timeline for resolution is within 4 hours from the time of call logged / reported to
We request GIL to amend it as NBD ( Next Business Day) resolution for Hardware related issues, during working days. As sparing locally for
24 Corrigendum -2 , SLA & Penalties ( Operational Penalties), Page-23 Bidder/OEM. - please refer corrigendum-03.
such Hardware is not feasible for OEMS.
We would like to draw GILs attention to these clasues that have been relaxed in Corrigendum 2 , will allow unproven systems being
Certification : Undertaking from Sever OEM for compatibility of the proposed quoted, which are neither tested nor certified by GPU OEMS or benchmarked and submiitted, auditted results to reputed refrence sites
As per RFP and time to time published
25 sever with GPU under the quoted Inference node must be submitted (duly signed like MLcommons.org, that is reffrred, maintain by leading OEMs. -
corrigendum
by authorized signatory , mentioning Bid reference) This relaxation , will risk the unproven infrastrucure being proposed, resulting into undesired, uncertain results. There are OEMs, who
would jsut extrapolate and and submitt understakings!! risking the compelet bid.
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
Corrigendum 2 : 10. Minimum Technical Specification Master 32 Gbps Host Bus Adaptor with required SFP (at both end) for We request GIL to clarify, what does (at both end ) means here !!! SFPs (if required) for the supplied servers
28 -
Node, Page-24 & Inferencing nodes, Page-27 connecting with existing SAN switch and storage. Which Existing SAN Switch we need to integrate. Kindly share Make/Model no of the SAN Switch. maintaining a card level redundency.
Network : a) Infiniband / Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
Corrigendum 2 : 10. Minimum Technical Specification Nodes for Network : a) Infiniband / Ethernet (200GBps or higher) as required for quoted Justification : The Network card be it infiniband or Ethernet the ratings are in Gbps and NOT GBps, the same can be noted in Training
29 - please refer Corrigendum-03.
AI training, Page-25 storage delivery to nodes nodes, where Gbps is specified., we request GIL to amend appropraitly, refer to link : https://www.nvidia.com/content/dam/en-
zz/Solutions/networking/ethernet-adapters/connectx-7-datasheet-Final.pdf
Modification Internal Storage : For Operating System: minimum 1.92 TB NVMe drives OR M.2 960GB NVMe drives .
Justification : Current changes in Corrigendum 2 are restricting OEMs to participate, earlier in Corrigendum 1, this was allowed.
Corrigendum 2 : 10. Minimum Technical Specification AI Training Internal Storage : For Operating System: minimum 1.92 TB NVMe drives
30 Hence request GIL to relax the clasue for wider OEMs participation. without impacting any perfomrance of the nodes. as Operating system - please refer Corrigendum-03.
Node, Page-25 & Inferecing Nodes, Page-27
are small in size and does not requried to be mised with performacne capacity drives and large dries .
Corrigendum 2 : 10. Minimum Technical Specification Inferencing Power Requriement : Appropriate rated and energy efficient, redundant (N+N)
32 We reqeust GIL to defined the N+N, as N=1, would means N+1 redudnant hotswappable power supply to be offered - please refer corrigendum-03.
Nodes, Page-27 hot swappable power supply and FAN
#
We request GIL to open up this clause for wider OEMs particiation ,as current clause is restrictive in nature.
As per RFP and time to time published
2 x Accelerators per node, each with minimum 140 GB or higher GPU accelarators
corrigendum.
OR
However, Bidder may propose 2 or higher
4 x Accelarators per node , each with minimum 94GB or higher GPU
accelerator card per node (In inference Node)
OR
Number of GPUs and and numbers of inference node quantity may be
Corrigendum 2 : 10. Minimum Technical Specification, Inference 4x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 6 GPU Nodes for inferencing).
33 GPU Communication : 2 x Accelerators per node, each with minimum 140 GB or - change accordingly. However, total requirement
Node, Page-27 OR
higher GPU per Accelerator. of core and memory may be provided a s
8x Accelarator per node, each with minimum 140GB or higher GPU ( total 24 GPUs to be offerd acreoss 3 GPU Nodes for inferencing).
mentioned in RFP and time to time published
Justification : we request GIL to consider the suggested changes, as All leading OEM and GPU OEM does not have ratings, configurations
corrigendum.
available and benchmakred.
Also this will help GIL to have better options with better Rack Space, Power Cooling for Data Center. without any compromise on
performacne.
OS Support : The system should support latest version of Red Hat Enterprise Linux
Corrigendum 2 : 10. Minimum Technical Specification, Inference /Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server, Quoted OS should be
35 We requesg GIL to clarify, which of these to be quoted as part of solution for better clarity!! - please refer Corrigendum-03.
Node, Page-28 under Enterprise support from OEM with premium or highest level of
support.
Corrigendum 2 : 10. Minimum Technical Specification, Inference We request GIL to define the user base, as this ask of Unlimited Guest OS, will force bidder to quote highest level of subscritpion license,
36 OS Support : Supply should include DC edition unlimited Guest OS licenses - please refer Corrigendum-03.
Node, Page-28 escalating the cost, which may not be utilized by GIL for years to come!! Hence defineing the user license will make it optimized.
Modification : (As per Corrigendum 1). Performance: 28GBps Read and 14GBps Write from day one and scalable up to >100GBps read
write combinations with a scale-out architecture and additional controllers/nodes in the future.
OR
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day one and scalable up to 200% with a scale-out architecture
and additional controllers/nodes in the future.
Justification :: We request GIL to allow NFS solution, as all current data is on NFS solution, converting, migrating NFS to PFS will take lot
of time, that will further delay the productivity of the AI/ML setup. !! Relaxing this restrictive clause will help wider OEM participation
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day with established results, deployments of AI/ML Solution , will Even NVIDIA does not have any bias towards PFS, as both perform well for
one and scalable up to 200% with a scale-out architecture and additional AI/ML environment. Current Specifications are OEM Specific and limiting participation by Leading OEMs.
controllers/nodes in the future.
IOPS: minimum 8,00,000 Please refer the link shared again here with for detailed view, where it clearly shows NVIDIA SuperPod recommendation as well. As per
37 Storage Nodes, External Storage, Page -30 - please refer Corrigendum-03.
1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs. Nvidia documentation, for DGX superpod, available at Storage Architecture — NVIDIA DGX SuperPOD: Next Generation Scalable
2. NVMe Storage offered must be certified with the proposed GPU OEM. Infrastructure for AI Leadership Reference Architecture Featuring NVDIA DGX H200 , the storage performance is as under
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
all nodes as per proposed solution. Single SU* (256 GPU) aggregate system read = 125GBps
Single SU* (256GPU) aggregate system write = 62GBPs
*A single Scalable Unit (SU) in an NVIDIA DGX SuperPOD consists of 32 DGX systems, each system with 8 GPUs of H200
Hence for a 56GPU (4 training nodes with 8 GPU per node, and 12 inference server with 2 GPU per node) system, the maximum
throughput we should need is as under:
Aggregate system read = 28GBps
Aggregate system write = 14GBps
Modification: Ethernet (200Gbps or higher) as required for quoted storage delivery to nodes
InfiniBand / Ethernet (200GBps or higher) as required for
38 Master Node, Page-24, Network Section - please refer Corrigendum-03.
quoted storage delivery to nodes Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity" the mentioned GBps is not the
correct rating ,as the HBA/NIC are available from leading OEMS.
Modification: Required 200G or higher Ethernet 2 x twin-port HCA as required for quoted storage delivery to node
d) Required InfiniBand / 200G or higher Ethernet 2 x twin-port As per RFP and time to time published
39 Nodes for AI Training, Page-25, Network Section -
HCA as required for quoted storage delivery to node corrigendum
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"
Modification: Required two switches, each with 64 non-blocking ports and 1RU or 2RU form factor with aggregate data throughput up to
Clarification:
51.2 Tb/s and required compatible cables of appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet
The bidder must deploy the required quantity of
f) Required switch with 64 non-blocking ports with of all nodes in non-blocking mode
switches with the same or higher functionality to
aggregate data throughput up to 51.2 Tb/s and required
meet the solution requirements.
40 Nodes for AI Training, Page-25, Network Section compatible cables of appropriate length to connect all 8 Justification: Chassis based switches can be very power hungry. 1 or 2RU form factor switches consume power around 2000W depending -
The switch count shall be adjusted
(compute communication) nos. of IB NDR ports / Ethernet on the number of transceivers used.
(increased/decreased) based on the actual port
of all nodes in non-blocking mode
availability per device while maintaining the
Also, with 2 switches, a redundancy can be formed in the back-end GPU connectivity so that failure of one switch can still allow GPU-to-
specified speed and functionality.
GPU traffic to flow through at 400G
Modification: Ethernet (200GBps or higher) as required for quoted storage delivery to nodes
4. InfiniBand / Ethernet (200GBps or higher) as required for quoted storage
42 Inference Node, Page 28, System Network Section - please refer Corrigendum-03.
delivery to nodes
Justification: External Storage mentions: "Front-End Connectivity: 200GBE or higher Ethernet connectivity"
Clarification:
Modification: Min. Four or higher Nos. of Switch with 48 *10/25 G SFP+ and 6 x 100G QSFP ports or higher to connect to Core for all of The bidder shall have to deploy the required
“Master Node”, “Node for AI Training” and “Inference Node” to form Cluster Communication and Perimeter N/W. Switch must support of quantity of switches with said or higher
MLAG/MCLAG and EVPN-Multihoming feature. functionality to complete the solution. EVPN-
Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G
multihoming is not mandatory with the solution.
QSFP ports to connect all of “Master Node”, “Node for AI Training” and
43 Networking Switch, Page-28 Justification: Ideally, it is better to physically separate the perimeter traffic and cluster orchestration traffic. Across all servers, the total - The switch count may be adjusted
“Inference Node” to form Cluster Communication and Perimeter N/W.
perimeter ports are 46 and orchestration node ports are 46. Additional layer of segregation can be done using VLAN, VxLAN and VRFs. (increased/decreased) based on actual device
Switch must support of MLAG /MCLAG feature
Secondly, 25G standard is becoming highly common in DC use case. Having 25G as switch port option can allow adding servers with 25G port availability, provided the specified speed
ports, instead of buying new switches. Additionally, 6x 100G is adequate to connect to uplink towards Core / Spine switch. Multihoming and functionality are maintained.
allows the switch to form a setup similar to "MC-LAG" without needing to directly interconnect switches. EVPN -multihoming feature can be provided, in
case it is required to complete the solution.
Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 48 *1G RJ45, 4* 10G SFP28 and 2 x 100G QSFP ports to connect all of “Master
quantity of switches with said or higher
Node”, “Node for AI Training” and “Inference Node” to form In-band and BMC/Out-of-band Management N/W.
functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28 and 2 x multihoming is not mandatory with the solution.
Justification: There are a total of 23 servers (Master, Training and Inferencing). In addition, there will be other devices like switches,
44 Networking Switch, Page-28 100G QSFP ports to connect all of “Master Node”, “Node for AI Training” and - The switch count may be adjusted
firewall, storage, load-balancers, etc. These devices will be spread across multiple racks considering the power constraint of the rack and
“Inference Node” to form In-band and BMC/Out-of-band Management N/W (increased/decreased) based on actual device
the limit on the number of devices it can have. Having 1 OOB switch can become an issue with respect to cable laying across the racks
port availability, provided the specified speed
including the distance. That's why to ensure that all bidders are able to follow appropriate DC design standard, minimum 2 OOB switches
and functionality are maintained.
will be helpful / necessary.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.
Clarification:
The bidder shall have to deploy the required
Modification: Min. Two or higher Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of Switch with 64* 100GbE QSFP ports to
quantity of switches with said or higher
connect all of “Master Node”, “Node for AI Training” and “Inference Node” to form User N/W. Switch must support of MLAG /MCLAG and
functionality to complete the solution. EVPN-
Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or One Nos. of EVPN-Multihoming feature.
multihoming is not mandatory with the solution.
Switch with 64* 100GbE QSFP ports to connect all of “Master Node”, “Node for AI
45 Networking Switch, Page-28 - The switch count may be adjusted
Training” and “Inference Node” to form User N/W. Switch must support of MLAG Justification: A Server NIC usually have 2x 100G NIC, instead of 1. Moreover, single connectivity to switch (instead in HA) can be a single
(increased/decreased) based on actual device
/MCLAG feature point of failure and losing connectivity to user traffic is a switch or transceiver or fiber cable were to go bad. Therefore, specifying two
port availability, provided the specified speed
switches which can be in HA will be good option to ensure all bidders comply to minimal baseline. Multihoming allows the switch to form
and functionality are maintained.
a setup similar to "MC-LAG" without needing to directly interconnect switches.
EVPN -multihoming feature can be provided, in
case it is required to complete the solution.
Modification T1=T+90 days from the date of issuance of contract over GEM
Supply of the Hardware including Licenses and OEM Warranty Certificate. As per RFP and time to time published
46 Justification: OEM Delivery timeline for Hardware taked minimum 3 months. Hence request authority to kindly look into this and amend -
T1=T+60 days from the date of issuance of contract over GEM corrigendum.
the clause.
Please refer to RFP and time to time published
47 POC to meet the benchmark as mentioned in this RFP document. We request Authority to kindly provide POC Evaluation criteria -
Corrigendum
Clarification:
After Successful POC which is successful demonstration of the benchmark as We request Authorities to kindly elobarote the detail Benchmark criteria for successful DEMO and also provide scripts / tests to perform
48 - Will be share at the time of POC.
mentioned in RFP, benchmark testing
49 Submission Date Extension Request authority to kindly give extension of BID subimission date of 20 working days from the date of publication of query responses. - please refer corrigendum-03.
We request Authority to kindly provide approval for site survey for study of
- possibilities of integration
50 Require Site Visit - Please refer to RFP and time to time published Corrigendum. Bidder can si
- Understanding on integration / Networking with existing infrastructure
- Overall Power requirement against needs
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
51 24 Clarification- No of 32G FC ports required per server and card level redundancy required or not -
existing SAN switch and storage. maintaining a card level redundency.
This in line with point 9 of page 18, where is is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
52 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
53 27 Clarification- No of 32G FC ports required per server and card level redundacy required or not
existing SAN switch and storage. maintaining a card level redundency.
NFS over RDMA is a superior choice to PFS for AI/ML workloads using GPU Direct due to its lower latency,
higher throughput, and improved scalability. By leveraging RDMA's direct memory-to-memory transfer,
NFS over RDMA reduces latency and overhead, making it ideal for massive data transfers required in AI/ML
The solution should be PFS (Parallel File System) based and delivered with 1PB (All
The solution should be PFS (Parallel File System) or NFS over RDMA based and delivered with 1PB (All NVMe) usable post RAID workloads. Additionally, NFS is a standardized protocol, simplifying management and integration with
61 NVMe) usable post RAID 6/equivalent or better protection, expandable up to 2PB please refer Corrigendum-03.
6/equivalent or better protection, expandable up to 2PB in the same file system. existing infrastructure. In contrast, PFS, while designed for high-performance computing, can be complex to
in the same file system.
set up and manage, and may not scale as well as NFS over RDMA. When combined with GPU Direct, NFS
over RDMA enables faster data transfer, reduced latency, and improved overall performance by offloading
data transfer tasks from the CPU and allowing it to focus on compute-intensive tasks.
TLC NVMe drives are well-suited for AI/ML workloads due to their high capacity, lower cost, improved
performance, and increased endurance, offering a better balance of performance, capacity,while As per RFP and time to time published
63 1PB (NVMe) usable post RAID 6 or better configuration 1PB (NVMe TLC drives) usable post RAID 6 or better configuration
significantly outperforming traditional HDDs and SATA SSDs, making them a practical choice for AI/ML corrigendum
applications that require rapid data access, large datasets, and frequent data writes.
According to the validated architecture jointly developed by Nvidia and NetApp, a read performance of 45
GBps is more than sufficient to support 128 GPUs. Given that our current requirement is for 32 GPUs, the
Performance: Min 120 GBps Read and Min 60 GBps Write aggregated from day proposed solution will provide ample headroom for future expansions, easily supporting up to 64 GPUs
Performance: Min 120 40 GBps Read and Min 60 15 GBps Write aggregated from day one and scalable up to 200% with a scale-out
65 one and scalable up to 200% with a scale-out architecture and additional control- without compromising performance. This ensures a scalable and future-proof infrastructure that can grow please refer Corrigendum-03.
architecture and additional controllers/nodes in the future.
lers/nodes in the future. with our evolving needs. Please refer to the below reference
https://docs.netapp.com/us-en/netapp-solutions/ai/aipod_nv_validation_sizing.html#solution-
validation
Throughput is a more critical metric than IOPS for AI/ML workloads, as these workloads typically involve
66 IOPS: minimum 8,00,000 IOPS: minimum 8,00,000 processing large datasets and sequential data access patterns, requiring high-throughput storage solutions please refer Corrigendum-03.
to transfer data quickly, whereas IOPS measures small I/O operations, which is less relevant for AI/ML
workloads that prioritize high-speed data transfer.
67 1. Storage must offer NVIDIA GPUDirect Storage connectivity to GPUs. Query not clear
68 2. NVMe Storage offered must be certified with the proposed GPU OEM. Query not clear
69 Query not clear
Front-End Connectivity: 200GBE or higher Ethernet connectivity compatible with
70 Query not clear
all nodes as per proposed solution.
#
Including tamperproof snapshots and ransomware protection in the storage specifications for AI/ML
Security : workloads is crucial to ensure data integrity, version control, compliance, and rapid recovery. AI/ML
1. Offered Storage solution must offer tamperproof snapshots for the data with capability to automatically create snapshot and expire workloads are data-intensive and require high-performance storage solutions, making it essential to protect
them by defining retention period. minimum 1000 snapshots must be supported. data from unauthorized modifications, deletions, and ransomware attacks. Tamperproof snapshots provide As per RFP and time to time published
71 Additional points
a reliable way to track changes and maintain version control, while ransomware protection ensures real- corrigendum
2. Offered Storage solution must support native or addon solution to Identify Ransomware Attacks, Take autonomous actions to protect time detection and prevention of attacks, enabling rapid recovery and minimizing downtime. Furthermore,
the data from ransomware attacks, report the attack to administrators and offer recovery capabilities to the administrators. these features are critical for regulated industries, such as healthcare and finance, where strict data
protection and retention policies are mandatory.
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
72 24 Clarification- No of 32G FC ports required per server and card level redundancy required or not
existing SAN switch and storage. maintaining a card level redundency.
This in line with point 9 of page 18, where is is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
73 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.
Clarification:
Both end in the clause refer to 32 Gbps SFP
required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
74 27 Clarification- No of 32G FC ports requored per server and card level redundacy required or not
existing SAN switch and storage. maintaining a card level redundency.
Clarification:
The bidder can quote the product on higher side
75 25 c) Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or higher (Fiber/Copper) port. Change to-c) Minimum 1 no. of 1 GbE port and 2 4 nos of 10 GbE or higher (Fiber/Copper) port. Wider Participant and get better perfomance and redundancy
meeting the requirements to complete the
solution.
78 30 IOPS: minimum 8,00,000 IOPS: minimum 8,00,000 Read AI Storage Systems require a high Read Performance, hence the suggestion for change. please refer Corrigendum-03.
As per RFP and time to time published
79 30 NVMe Storage offered must be certified with the proposed GPU OEM. NVMe Storage offered must be certified/compatible with the proposed GPU Server OEM. To ensure wider choice of Storage solutions. Kindly approve.
corrigendum
Page 26 - Scalability, Cluster and
Page 26 already mentions Scalability, Cluster and Management Hardware and software. Kindly remove the clause of 'Cluster Management Having only the requirement of 'Scalability, Cluster and Management Hardware and software', will help in As per RFP and time to time published
80 27 Management Hardware and software and Page 27 and 29 - Cluster Management
& Scheduler and hardware ', for the Training and Inference nodes since the features asked are proprietary. providing a uniform cluster tool. corrigendum
& Scheduler and hardware
Kindly be noted that the above-men oned clause is a clear viola on of Office Memorandum P-
45014/33/2021-BE-II (E-64737) dated 20th December 2022 & P-45021/121/2018-(B.E.-II) dated 20th June
2019 issued by DPIIT, which clearly cites “common examples of
restric ve and discriminatory condi ons against the local suppliers” and Sub-clause ‘e’ of ‘Clause 1’ in the
ANNEXURE-A
expressly states that “Excessive past experience requirement, not commensurate with the proven
experience expected from Bidder for successful execu on of contract”
In additon to the above, we would like to highlight that the above highlighted pre-qualification conditon is
deviating
from GFR rules. General Financial Rules (GFR), 2017, Clause b i.e. Particular Construction Experience and
Key Production Rates of subclause 2
(iii) i.e., Pre-qualification Criteria on Page No.33 and 34 under Chapter 3 of the Manual for Procurement of
The OEM should have executed similar GPU setup for min 3 clients in last 5 Years Works 2022 issued by DOE. The clause b of sub-clause (iii) states that:
81 Clause No. 5 under "Eligibility Conditions," in India as on date of bid submission. Out of which One client deployment should “The applicant should have:1. successfully completed or substan ally completed similar works during last please refer Corrigendum-03.
be One project having similar works total value of INR 125 Cr. seven years ending last day of month previous to the one in which applica ons are invited should be either
of the following:
-1.1 Three similar completed works cos ng not less than the amount equal to 40(forty) percent of the
es mated cost;
or1.2 Two similar completed works cos ng not less than the amount equal to 50 (fi y) percent of the
es mated cost;
or1.3 One similar completed work cos ng not less than the amount equal to 80 (eighty) percent of the
es mated cost; In view of the above , it is per nent to men on here that clause no. 5 men oned in the
Eligibility Condi ons is taking away the opportunity to par cipate from poten al OEMs who have strong
experience in deploying GPU Clusters ,
which will limit the compe on . In view of above we request you to please modify the clause in order to
avoid
restric ve par cipa on and providing fair opportunity to all.
#
It may please be noted that Clause No. 5 under "Eligibility Conditions," related to the pre-qualification
criteria
for participating OEMs, appears to be contrary to the General Financial Rules (GFR), 2017.
The clause, under Particular Construction Experience and Key Production Rates, states that the applicant
In light of the above, we respectfully request that Clause No. 5 be reviewed and suitably amended in alignment
The OEM should have executed similar GPU setup for min 3 clients in last 5 Years must
with the GFR, 2017, to provide a fair and inclusive opportunity for all eligible participants.
in India as on date of bid submission. Out of which One client deployment should have:
82 Clause No. 5 under "Eligibility Conditions," Furthermore, since the proposed solution involves multiple vendors, we kindly request an extension of the bid please refer Corrigendum-03.
be One project having similar works total value of INR 125 Cr. 1. Successfully completed or substantially completed similar works during the last seven years ending on
submission deadline by at least 7 additional days to accommodate the coordination and compliance
Note: Similar works means SITC OF GPU ACCELERATED with multiple GPU Node. the last day of the month previous to the one in which applications are invited, in either of the following
requirements effectively.
ways:
o 1.1 Three similar completed works costing not less than 40% of the estimated cost; or
o 1.2 Two similar completed works costing not less than 50% of the estimated cost; or
o 1.3 One similar completed work costing not less than 80% of the estimated cost.
Clarification:
Both end in the clause refer to 32 Gbps SFP
Clarification- No of 32G FC ports required per server and card level redundancy required or not. required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
89 24
existing SAN switch and storage. Also, Regarding the existing FC port in DC, it is mentioned that 16G SFP are available, however as per RFP you have asked 32GB SFP at both maintaining a card level redundency.
end, kindly confirm if we can use 16GB or it is mandatory to use 32 GB SFP at both sides?
Please refer to corrigendum for make & model
of existing SAN switches and storage.
Clarification:
We are assuming that we need to set-up the required infra at GIL site, but all the AI/ML workloads and use cases will be responsibilities of Please refer manpower clause of the RFP and
90 17 SOW, Bidder has to deploy propose solution for inference and AI training model
GIL, bidder has no role to play in it once the underlying infra is ready. time to time published corrigendum.
This in line with point 9 of page 18, where it is mentioned N+1 redundant Power supplies. Dense GPU
Appropriate rated and energy efficient, redundant (N+N) hot swappable power
91 26 Change to-Appropriate rated and energy efficient, redundant (N+1) or better hot swappable power supply servers typically come with N+1 redundancy. Dense GPU Servers do not come with Hot swappable fans, please refer corrigendum-03.
supply and FAN
hence the request for change.
Clarification:
Both end in the clause refer to 32 Gbps SFP
Clarification- No of 32G FC ports required per server and card level redundacy required or not. required for existing SAN Switch at GSDC and
32 Gbps Host Bus Adaptor with required SFP (at both end) for connecting with SFPs (if required) for the supplied servers
92 27
existing SAN switch and storage. Also, Regarding the existing FC port in DC, it is mentioned that 16G SFP are available, however as per RFP you have asked 32GB SFP at both maintaining a card level redundency.
end, kindly confirm if we can use 16GB or it is mandatory to use 32 GB SFP at both sides?
Please refer to corrigendum for make & model
of existing SAN switches and storage.
Clarifications:
The proposed solution should support GPU
virtualization, enabling the efficient sharing of
GPU resources across multiple virtual machines
and containerized environments. It should be
compatible with industry-standard hypervisors
and container orchestration platforms,
supporting vGPU or GPU passthrough
The proposed solution should support for sharing of GPU across multiple virtual mechanisms.The solution should support GPU
Kindly clarify the on below points 1. Which hypervisor will be used for VM-based workloads. 2. do we need to provison hypervisor licesnes
93 17 environments and virtualization, enabling shared GPU resources
or GIL will provide the same. 3. will container and VM co-exist on the same hardware.
containers. across multiple VMs and containerized
applications instead of dedicating them to a
single instance. It should allow simultaneous
access for multiple workloads, support vGPU or
GPU passthrough for fractional GPU allocation,
and be compatible with industry-standard
hypervisors and container platforms like
VMware vSphere, Microsoft Hyper-V, KVM,
Docker, and Kubernetes.
Onsite replacement of faulty hardware and skills support directly from hardware OEM is utmost important
Server, Storage & Switch OEM must have local service support depot in Gujarat preferably in Gandhinagar/Ahmedabad since last 5 years As per RFP and time to time published
94 New Clause to be incorporated in solution led bids wherein uptime and SLA are paramount and OEM skills and support on-site is mandatory
as on date of RFP release date. corrigendum
and important.
Both Bidder and OEM should be mandatorily registered under Indian Companies Act 1956, Act 2013 for As per RFP and time to time published
95 New Clause to be incorporated All OEMs (Hardware & Software) must be be a company registered in India under the Companies Act 1956, Act 2013
Indian laws to be applicable on these entities and to make them accountable under Indian Judicial Laws. corrigendum
#
External Storage
Minimum Technical Specifications In case PFS (Parallel File System) is asked IOPS: minimum 8,00,000 Self-certification of PFS by Storage OEM for proposed GPU will suffice the interoperability and performance
100 please refer Corrigendum-03.
Storage Nodes IOPS: minimum 8,00,000 NVMe Storage offered must be either certified with the proposed GPU OEM or must be self-certified by Storage OEM for proposed GPU. requirements of GIL.
NVMe Storage offered must be certified with the proposed GPU OEM
External Storage
Minimum Technical Specifications There are no references to IOPS in NVIDIA SUPERPOD documents and AI NFS Storage will have very high
101 In case NFS based Storage is asked IOPS: minimum 8,00,000 READ please refer Corrigendum-03.
Storage Nodes number of READ IOPS, hence IOPS requirement needs to be READ IOPS.
IOPS: minimum 8,00,000
External Storage
In case NFS based Storage is asked
Minimum Technical Specifications The proposed storage array should be configured with no single point of failure, including controllers (at least 2 controllers per disk tier), As per RFP and time to time published
102 The proposed storage array should be configured with no single point of failure, 2 Controllers are industry standard, please keep it to 2.
Storage Nodes cache, power supply, cooling fans, etc. It should be scalable up to 12 additional controllers / nodes. corrigendum
including controllers (at least 3 controllers per disk tier), cache, power supply,
cooling fans, etc. It should be scalable up to 12 additional controllers / nodes.
External Storage
Performance: 40GBps 100% Read from day one and scalable up to 80GBps 100% Read with a scale-out architecture and additional Performance in NFS Storage is better measured on Read throughput and also these numbers are available
Minimum Technical Specifications In case NFS based Storage is asked
103 controllers/nodes in the future. with all Storage OEMs. please refer Corrigendum-03.
Storage Nodes Performance: 20GBps Read/Write from day one and scalable up to 60GBps with a
Data Availability: 99.9999% data availability guarantee on proposed Storage model duly certified by Storage OEM. Data Availability Guarantee of Six-Nines (99.9999%) is practically must for this critical infrastructure.
scale-out architecture and additional controllers/nodes in the future.
External Storage
Minimum Technical Specifications Vendor shall ensure that concurrent failure of at-least 4 disks can be handled without any kind of downtime and vendor shall configure the As per RFP and time to time published
104 In case NFS based Storage is asked Required feature for giving different better resilance and performance
Storage Nodes erasure code accordingly corrigendum
New Clause to be incorporated
1. Offered storage system shall be able to create native immutable snapshots for offered solution.
External Storage 2. Offered storage system shall have capability for creating the immutable snapshot copies at both primary and DR location through
Minimum Technical Specifications As per RFP and time to time published
105 In case NFS based Storage is asked replication engine shall provide the flexibility for having different retention period for each location. Required feature for giving better prtoction to storage
Storage Nodes corrigendum
New Clause to be incorporated .
3. After defining the expiration period, it shall not be possible to reduce the expiration time. However, if the business need arise, then
expiration period shall be shortened only through Dual authorization and through different set of authorized users only.
1. Each offered file services front-end controller shall have minimum of 256GB memory and minimum 32 number of CPU cores.
External Storage
Minimum Technical Specifications 2. Each front-end controller shall also be offered with 2 x 100Gbps Ethernet Front-end ports and shall also have 2 x 100Gbps backend As per RFP and time to time published
106 In case NFS based Storage is asked Minimum hardware to be proposed so that everyone proposes enough resources for performance.
Storage Nodes ports for disk connectivity. corrigendum
New Clause to be incorporated
3. Every front-end controller shall have dual physical CPUs.
1. Offered Storage platform shall support NFS nconnect feature for increasing the NFS performance. It shall allow at-least 16 x TCP
connections between each client and storage platform.
2. Offered storage platform shall support NFS over RDMA and multi-pathing feature for increasing the NFS performance while connecting
External Storage
Minimum Technical Specifications the client to storage system. As per RFP and time to time published
107 In case NFS based Storage is asked Some advanced NFS features.
Storage Nodes corrigendum
New Clause to be incorporated
3. Multi-pathing shall be able to work in conjunction with nconnect and NFS over RDMA.
4. Offered Storage platform shall also support byte range file locking for both NFS V 3.x and NFS 4.1.
1. Offered storage system shall provide the functionality of disaster recovery by replicating the required path or directory to DR or peer
location.
External Storage
Minimum Technical Specifications 2. Offered storage system shall ensure that data path between Primary and DR location is encrypted. Vendor shall offer required Software As per RFP and time to time published
108 In case NFS based Storage is asked Disaster Recovery capabilities
Storage Nodes / License or hardware for achieving the required functionality. corrigendum
New Clause to be incorporated
3. Offered storage system shall support one to many and many to one replication so that one site can replicate to more than 1 DR site or
replication peer and multiple Primary sites can replicate to single DR location.
#
1. The Hypervisor should support integrating with AI solutions to help build, deploy and manage AI workloads, leveraging the benefits
of Virtualization and containerization. The Hypervisor should be supported and certified by AI Solution.
2. Solution should include hypervisor management tools, simplify the deployment, management and scaling of AI workloads, reducing
operation complexity.
3. Hypervisor should support GPU acceleration, enabling efficient utilization of GPU resources for AI training and inference, ensuring
high performance and scalability. As per RFP and time to time published
113 Hypervisor
corrigendum
4. The solution should provide capability of generating reports for GPU usage, performance, compliance, health, forecasting, capacity,
across AI workload.
5. Should support HA for migration of VMs in case one server fails all the Virtual machines running on that server shall be able to
migrate to another physical server running same virtualization software. Should support HA for VMs with a passthrough PCIe device or a
NVIDIA / other vGPUs.
The OEM should have executed similar GPU setup for min 3 clients in last 5 Years
in India as on date of bid submission. Out of which One client deployment should The OEM should have executed similar GPU setup for min 3 clients in last 5 Years in India as on date of bid submission. Out of which One
be One project having similar works total client deployment should be One project having similar works totalvalue of INR 20 Cr. or Total of 50 Cr from 2 Customers. As per RFP and time to time published
114 Eligibility Conditions This is required for wider participation and considering the budgeting of this project.
value of INR 125 Cr. Note: Similar works means SITC of Project which includes GPU corrigendum
Note: Similar works means SITC OF GPU ACCELERATED with multiple GPU Node.
ACCELERATED with multiple GPU Node.
Master Node
Hardware RAID 0,1, 5, 6, 10, 50, 60 with 4GB cache Flash based cache protection Hardware RAID 0,1, 5, 6, 10, 50, 60 with 4GB cache Flash based cache protection module should be included, should support Gen 4/5.0 Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
115 Storage Controller
module should be included, should support Gen 5.0 PCIe NVMe PCIe. restricting us. corrigendum.
Following N/W are required:
Following N/W are required:
a) Infiniband / Ethernet (200GBps or higher) as required for quoted storage Ethernet connectivity provides better throughput and performance , also ethernet is widely used and
a) Ethernet (200GBps or higher) as required for quoted storage delivery to nodes
delivery to nodes available with all leading OEMs while Infiniband is less used and available with some specific OEMs only so
Master Node b) Ethernet (100Gbps or higher) for User delivery Clarification:
116 b) Ethernet (100Gbps or higher) for User delivery please revise accordingly.
Network c) Ethernet (10GbE or higher) for cluster orchestration Ethernet connectivity is already allowed.
c) Ethernet (10GbE or higher) for cluster orchestration
d) Ethernet (10GbE or higher) for perimeter connectivity
d) Ethernet (10GbE or higher) for perimeter connectivity 1G connectivity is mostly required for oob connectivity and inband is used case for telemetry data so please
e)Ethernet (1GbE or higher) for in-band/ oob management
e) Ethernet (1GbE or higher) for in-band management revise for wider OEM participation
#
Dedicated IPMI 2.0 compliant management LAN port having support for system health monitoring, event log access, Virtual media over
net- work, and Virtual KVM (KVM over IP). All required licenses to use IPMI features should be included. Licenses shall be
perpetual/subscription base for entire contract period to use.
•The management tool should be able to provide global resource pooling and policy management to enable policy-based automation and
capacity planning with Zero-touch repository manager and self-updating firmware system, Automated hardware configuration and
Operating System deployment to multiple servers
Dedicated IPMI 2.0 compliant management LAN port having support for system
Servers ( specially like master nodes) play critical role in the Data centers as multiple applications dependo
Master Node health monitoring, event log access, Virtual media over net- work, and Virtual * Virtual IO management / stateless computing and Server management software should provide capaibility to view health, inventory for As per RFP and time to time published
117 on it and external storage also connects to it due to this end to end server management sofwtare with
Server Management KVM (KVM over IP). All required licenses to use IPMI features should be included. third-party compute, network, storage, integrated systems, virtualization, and containers. corrigendum.
mentioned features would be needed so please revise the clause acordingly
Licenses shall be perpetual/subscription base for entire contract period to use.
* The management software should participate in server provisioning, device discovery, inventory, diagnostics, monitoring, fault
detection, auditing, and statistics collection and should provide an alert in case the system is not part of OEM Hardware Compatibility list
& should provide anti counterfeit.
*The proposed management solution should provide proactive security & software advisory alerts and should outline the fixes required to
address the issues and analyze current configurations & identify potential issues due to driver & firmware incompatibility
* The proposed solution should have customizable dashboard to show overall faults / health / inventory for all managed infrastructure.
With option to create unique dashboards for individual users. The user should have flexibility to select names for dashboards and widgets
(ex:- health, utilization etc.)
ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled
min ACPI 6.2 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled within the BIOS for secure cryptographic key generation, Every server architecture on different compliances get certified jointly by OEM and third parties due to that As per RFP and time to time published
118 Security Features within the BIOS for secure cryptographic key generation, SMBIOS 3.5 or later,
SMBIOS 3.5 or later, Malicious Code Free design" (to be certified by OEM). please revise with mentioned so that every OEM can participate corrigendum
Malicious Code Free design" (to be certified by OEM).
Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc scalable As per RFP and time to time published
Min Dual 60 core latest Gen Intel® Xeon® platinum (5th Gen or higher) or AMD Epyc (Turin or higher) scalable processors , with Min 8 X
AI Training processors, with Min 8 X GPU Accelerators. Providing 500TF or Higher Double corrigendum.
119 GPU Accelerators. providing Defining processors's generation will standardize the type of processors offered by every OEM for equal
Processors & performance (per node, minimum) Precision Tensor FP64 / TF64 Performance, 31 PetaFlops or Higher FP8 However, Bidder can quote higher side compute.
500TF or Higher Double Precision Tensor FP64 / TF64 Performance, 31 PetaFlops or Higher FP8 performance with sparsity. participation also considering Training nodes require high compute performance so cores needs to be
performance with sparsity.
udpated as per processor's generation.So, please revise accordingly.
a)Minimum 8 nos of InfiniBand NDR ports or Ethernet (400Gb/s or higher) for
compute communication for internode communication,
b)1 nos. of port for BMC (dedicated LAN port), a)Minimum 8 nos of SuperNIC’s with minimum 8 arm cores capable of Suppor ng Infiband / Ethernet (400Gb/s or higher) for compute
c)Minimum 1 no. of 1 GbE port and 2 nos of 10 GbE or higher (Fiber/Copper) communication for internode communication GPU training nodes in AI architecture requires SuperNICs for east-west communication across the nodes
port. b)1 nos. of port for BMC (dedicated LAN port), and DPUs are reqired for North-South communication.
AI Training d)Required InfiniBand / 200G or higher Ethernet 2 x twin-port HCA as required c)Minimum 2 nos of 10 GbE or higher (Fiber/Copper) port. Also RoCEv2 is very important & must needed critical protocol used for GPU to GPU communication across As per RFP and time to time published
120
Network for quoted storage delivery to node. d)Required 200G or higher DPU’s with 2 x twin-port 200G as required for quoted storage delivery to node network switches which along with PFC, ECN make sures to provide lossless , low latency , high bandwidth corrigendum
e)Addi onally, 1 nos of 100GbE or higher Ethernet (Fibre). e)Required switch with 64 non-blocking ports with aggregate data throughput up to 51.2 Tb/s and required compatible cables of communication, So please update this point accordingly and allow wider OEM participation
f)Required switch with 64 non-blocking ports with aggregate data throughput up appropriate length to connect all 8 (compute communication) nos. of IB NDR ports / Ethernet of all nodes in non-blocking mode. The
to 51.2 Tb/s and required compatible cables of appropriate length to connect all 8 switch should provide RoCE v2 / equivalent, PFC, ECN, Telemetry capabilities to run the setup
(compute communication) nos. of IB NDR ports / Ethernet of all nodes in
non-blocking mode.
AI Training • For Operating System: minimum 1.92 TB NVMe drives • For Operating System: minimum 2x 960 GB M.2 SSD Boot drives Every OEM's architecture as per their testing and availability of drives is different so please update this
121 please refer Corrigendum-03.
Internal Storage • For Data: Minimum 8 * 3.84 TB U.2 or EDSFF NVMe drives • For Data: Minimum 8 * 3.84 TB U.2 Gen5 NVMe drives clause for us to participate.
AI Training Appropriate rated and energy efficient, redundant (N+N) hot swappable power Appropriate rated and energy efficient,redundant (N+1) hot swappable power supply and FANs. In case of power failure, the system should High availabilty and tolerance to powersupply is very important for crtical server having large datasets
122 please refer corrigendum-03.
Power requirements supply and FAN be able to sustain 3 power supplies failure with GPU throttling no less than 60% dependent on them for training so please update the clause for allowing us to participate
Bidder needs to submit proof of the quoted GPU meeting these Mlcommons
training benchmarks at the time of bidding
Bidder needs to submit proof of the quoted GPU meeting these Mlcommons training benchmarks at the time of bidding
or
or Since AI training models are quite new to the market and their benchmarks are getting updated during Please refer to RFP and time to time published
125 If not listed on Mlcommons, bidder shall be required to submit
If not listed on Mlcommons, bidder shall be required to submit GPU Accelerator’s test results for the Make/Model (same GPU’s ) of the course of time so please generalise this clause for wider OEM participation as it is restrcting us to participate Corrigendum.
benchmark report for the Make/Model (same configura on) of the
quoted server as part of bid submission.
quoted server as part of bid submission. The submission should be on
OEM letterhead duly signed and referring the bidder and bid details.
Inference Node For Operating System: minimum 2*1.92 TB M.2 NVMe drives Minimum 4 * 3.84 Request you to please update the clause for wider OEM participation as different server OEMs has different
128 For Operating System: minimum 2*1.92 TB NVMe drives Minimum 4 * 3.84 TB U.2 /U.3 or EDSFF NVMe drives please refer Corrigendum-03.
Internal Storage TB U.2 or EDSFF NVMe drives types of drives certified and tested.
ACPI 6.4 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled
Inference Node Min ACPI 6.2 Compliant, UEFI 2.8, Support for Trusted Platform Module enabled within the BIOS for secure cryptographic key generation, Every server architecture on different compliances get certified jointly by OEM and third parties due to that As per RFP and time to time published
129 within the BIOS for secure cryptographic key generation, SMBIOS 3.5 or later,
Security Features SMBIOS 3.5 or later, Malicious Code Free design" (to be certified by OEM). please revise with mentioned so that every OEM can participate corrigendum
Malicious Code Free design" (to be certified by OEM)
Inference Node 4 x PCIe Gen 5.0 x 16 FH FL Slots. All slots must operate at PCI Gen 5.0 speed when Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
130 4 x PCIe Gen 5.0 x 16 FH FL Slots. All slots must operate at PCI Gen 4.0/5.0 speed when fully populated
PCI Express interface fully populated restricting us. corrigendum
Inference Node Appropriate Motherboard and chipset. Must support PCIe Gen 5.0 and Every OEM has different architecture so please revise as requested for wider OEM participation as it is As per RFP and time to time published
131 Appropriate Motherboard and chipset. Must support PCIe Gen 4.0/5.0 and compatible with selected processors and GPUs.
Mother Board compatible with selected processors and GPUs. restricting us. corrigendum
1.Min. Two or required Nos. of Switch with 48 *10G SFP+ and 8 x 100G
QSFP ports to connect all of “Master Node”, “Node for AI Training” and
“Inference Node” to form Cluster Communication and Perimeter N/W.
Switch must support of MLAG /MCLAG feature.
Switch must support EVPN – VxLAN based network. Required cables of
appropriate length, transceivers should be supplied. Switches(s) should
have redundant Power Supply. 5 Years Comprehensive Onsite Warranty
2.Min. One or required Nos. of Switch with 48 *1G RJ45, 4* 25G SFP28
Inference Node and 2 x 100G QSFP ports to connect all of “Master Node”, “Node for AI = Please clarify this switch’s usage as if this switch is required for master node connectivity also then it should have 400G switch ports also As per RFP and time to time published
132 need clarification
Networking Switch Training” and “Inference Node” to form In-band and BMC/Out-of-band in it corrigendum
Management N/W. Required cables of appropriate length, transceivers
should be supplied. Switch should have redundant Power Supply. 5
Years Comprehensive Onsite Warranty.
3.Min. One or required Nos. of Switch with 32 * 100GbE QSFP ports or
One Nos. of Switch with 64* 100GbE QSFP ports to connect all of
“Master Node”, “Node for AI Training” and “Inference Node” to form
User N/W. Switch must support of MLAG /MCLAG feature.
Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc
Kindly help to amend the clause as "Min Dual 56-core latest Gen Intel® Xeon® platinum or AMD Epyc
scalable processors, with Min 8 X GPU Accelerators. providing
scalable processors, with Min 8 X GPU Accelerators. providing 500TF or Higher Double Precision Tensor FP32 / TF32 Performance, 31 Suggested changes helps to participate in the bid process and this would give the department to make the As per RFP and time to time published
134 Nodes for AI training: Total Qty-4 sets. Page no.25 500TF or Higher Double Precision Tensor FP64 / TF64 Performance, 31 PetaFlops
PetaFlops or Higher FP8 performance with sparsity" bid more competitive and also help to participate respective other OEM’s in the bid process. corrigendum.
or Higher FP8 performance with sparsity
Revised RFP (Corrigendum-03)
Bid for GPU Compute for uses AI / ML at GSDC
1. Eligibility Conditions:
� If multiple bidders have proposed the same complete End to End OEM solution, the
result of that PoC shall apply to all such bidders.
� In the event proposed solution of bidder fails in the PoC, bidders quoting this solution will
be considered disqualified.
� Bidders must arrange and deploy all necessary infrastructure and accessories, including
hardware, software, cables, racks, operating systems, and any other items required to
conduct the PoC.
� GSDC shall provide only space, power, and cooling. All other logistical and PoC execution-
related costs shall be borne entirely by the bidder/OEM.
� Upon completion of the PoC, the bidder must lift all deployed equipment from GSDC
premises at their own cost.
� Bidders must demonstrate the required benchmarks as specified in the RFP during the PoC.
� Only those bidders who successfully qualify the PoC will be eligible for financial bid opening.
� The Tenderer reserves the right to accept or reject any or all bids in case it is not satisfied
with the outcome of the PoC testing & benchmark required.
� The Tenderer reserves the right to accept or reject any or all bids or to re-tender at the
Tenderer’s sole discretion without assigning any reasons to anybody whatsoever.
� A prospective Bidder requiring any clarification of the bidding documents may seek
clarifications by submitting queries on email Id: [email protected],
[email protected] prior to the date of Pre-Bid Meeting.
� Tenderer will discuss the queries received from the interested bidders in the Pre-Bid
Meeting and respond the clarifications by uploading on the website
https://gil.gujarat.gov.in.
� No further or new clarification what so ever shall be entertained after the Pre-Bid
Meeting.
� The interested bidder should send the queries as per the following format:
3. Scope of Work:
1. GPU Servers shall be supplied, installed, configured, tested and commissioned along with
necessary software’s, OS’s and license’s at GSDC located at Gandhinagar, Gujarat.
2. Bidder has to deploy propose solution for inference and AI training model.
3. All software and library licenses to be provided in the name of DST/ DIT, Government of
Gujarat.
4. Gujarat GPU Compute for AI / ML solution must have rack mounted computing platform-
based computer servers, either as rack or blade server design housed in its suitable chassis.
5. The proposed solution should support for sharing of GPU across multiple virtual
environments and containers. Required license should be available from day one. Bidder to
ensure premium level or highest level of OEM support to meet SLA for all OEM provided
software and libraries.
6. MLops practices and principles should be followed under training model. If required, Bidder
can use appropriate tool for the same without any additional cost to tenderer.
7. The bidder shall submit the detailed documentation on the implementation and deployment.
8. The solution should support remote console access as per GSDC policy to all the servers
for cluster server's health monitoring at Fast Ethernet or better access speed.
9. The servers/chassis/enclosures should be populated fully with N+1 redundant power
supplies of the suitable capacity rating available for the proposed model with the supplier.
Failure of one of the Power supplies should not throttle the Compute nodes. In case the
offered Power Supplies cannot take the HPL load of all the Compute Nodes in the chassis,
lower number of Compute Nodes per chassis may be proposed.
10.The bidder will have to supply Server Rack along with provision of iPDU, TOR Switch, patch
panel, cables, SFP modules, any other active/passive components etc. to host the GPU
Cluster with GPUs at GSDC. Any other component required for the solution proposed by the
supplier has to be incorporated for completion of the Solution.
11.Onsite comprehensive annual maintenance with warranty and OEM support for 5 Years from
the date of completion of Functional Acceptance Test (Onsite warranty will include those
sites to which the item supplied under the contract is moved, in case of migration of the
equipment). Warranty should include but not limited to - On-going Firmware updates,
Proactive bug fixes, Preventive Maintenance, Parts replacement, etc.
12.After completing the installation and integration, the bidder will demonstrate the compliance
of the RFP and provide required training to the GSDC /TPA for executing FAT and further
Operation.
13.All the items as required under this RFP should be delivered in a single lot.
14.The bidder shall be fully responsible for the manufacturer’s warranty for all equipment,
accessories, spare parts etc. against any defects arising from design, material,
manufacturing, workmanship, or any act or omission of the manufacturer / bidder or any
defect that may develop under normal use of supplied equipment during the warranty period.
15.The bidder shall replace the faulty hard disk at no cost, the department will not returned the
faulty disk after replacement of new disk.
16.The bidder should provide entire support of the required solution asked in the RFP and back-
to-back support from the OEM.
17.The bidder should provide Support/ Escalation Matrix & Portal details for logging tickets for
any failure/performance incidents. Also there has to be mechanism wherein all licenses to
be showcased on the portal.
� Successful bidder will have to depute 2 (two) technical manpower as below to provide hand
holding support for the contract period.
1. System Administrator
2. AI / ML Deployment engineer
� The deputed manpower will have to remain present during normal office hours of GSDC (9 AM
to 7 PM) during working days and support GSDC for day-to-day maintenance and handling
effective GPU infrastructure utilization.
� If require, the manpower will have to remain present on holyday(s) or after office hours based
on the requirements of GSDC.
� The bidder shall have to provide backup resources in case of the deputed manpower is absent
or on leave. The backup resource deputed shall be aware of the tasks and responsibility being
carried out during that period at GSDC and should be able to execute the tasks with minimum
on-call support.
� The manpower will have to report to GSDC authority. The bidder shall submit proof of
attendance certified by the GSDC authority along with the Invoice for payment process.
I. Bidder shall provide a comprehensive on‐site free warranty for 5 years from the date of
acceptance of FAT (Final Acceptance Test) for proposed solution.
II. Bidder shall also obtain the 5 years OEM support (ATS/AMC) on all hardware and other
equipment for providing OEM support during the warranty period.
III. Bidder shall provide the comprehensive manufacturer's warranty and support in respect of
proper design, quality and workmanship of all hardware, equipment, Software, Licenses,
accessories etc. covered by the bid. Bidder must warrant all hardware, equipment,
accessories, spare parts, software etc. procured and implemented as per this bid against
any manufacturing defects during the warranty period.
IV. Bidder shall provide the performance warranty in respect of performance of the installed
hardware and software to meet the performance requirements and service levels in the bid.
V. Bidder is responsible for sizing and procuring the necessary hardware and software licenses
as per the performance requirements provided in the bid. During the warranty period bidder,
shall replace or augment or procure higher‐level new equipment or additional licenses at no
additional cost in case the procured hardware or software is not adequate to meet the service
levels.
VI. Mean Time between Failures (MTBF): If during contract period, any equipment has a
hardware failure on four or more occasions in a period of less than three months, it shall be
replaced by equivalent or higher‐level new equipment by the bidder at no cost. For any delay
in making available the replacement and repaired equipment’s for inspection, delivery of
equipment’s or for commissioning of the systems or for acceptance tests / checks on per
site basis, DST/GIL/DIT reserves the right to charge a penalty.
VII. During the warranty period bidder, shall maintain the systems and repair / replace at the
installed site, at no charge, all defective components that are brought to the bidder notice.
VIII. The bidder shall as far as possible repair/ replace the equipment at site.
IX. Warranty should not become void, if DST/GIL/DIT buys, any other supplemental hardware
from a third party and installs it within these machines under intimation to the bidder.
However, the warranty will not apply to such supplemental hardware items installed.
X. The bidder shall carry out quarterly Preventive Maintenance (PM), including cleaning of
interior and exterior, of all hardware, if any, and should maintain proper records at each site
for such PM. Failure to carry out such PM will be a breach of warranty and the warranty
period will be extended by the period of delay in PM.
XI. Bidder shall monitor warranties to check adherence to preventive and repair maintenance
terms and conditions.
XII. Bidder shall ensure that the warranty complies with the agreed Technical Standards,
Security Requirements, Operating Procedures, and Recovery Procedures.
XIII. Bidder shall have to stock and provide adequate onsite and offsite spare parts and spare
component to ensure that the uptime commitment as per SLA is met.
XIV. Any component that is reported to be down on a given date should be either fully repaired
or replaced by temporary substitute (of equivalent configuration) within the time frame
indicated in the Service Level Agreement (SLA).
XV. Bidder shall develop and maintain an inventory database to include the registered hardware
warranties.
XVI. To provide warranty support effectively, OEM should have spare depo in India and will be
ask to deliver spare as per SLA requirement.
1. All supplied items must conform to the detailed technical specifications as mentioned in
this document.
2. Install the equipment, obtain user acceptance and submit a copy of user acceptance to
designated authority.
3. The agreement stipulates that the vendor shall maintain the system with uptime. It is
required to maintain uptime of 99.741%. Further, bidder is responsible for providing
comprehensive warranty and support (24x7) for the period of 5 years from the date of
successful completion FAT.
4. The Bidder shall be responsible for providing all material, equipment and services
specified or otherwise, which are required to fulfill the intent of ensuring operability,
maintainability and the reliability of the complete work covered under this specification.
5. Manufacturer shall provide and support for installation, commissioning, spares,
technical support in Gujarat.
6. All supporting equipment, tools shall be arranged by vendor himself.
7. Unpacking of goods shall be done in front of GIL/GSDC officer, Gandhinagar official
and for any damage it is sole responsibility of vendor.
8. Delivery of goods: packing unpacking transportation loading unloading Octroi insurance
and any other taxes and duties shall be included in the bid price.
9. All the liabilities like human injury, incident, etc. pertain in the bidder scope. The bidder
will be solely responsible to execute insurance for the said work as mentioned in this
RFP.
10. All safety precaution should be taken as per Industrial practice by the bidder to take
upmost care. In any case, the tenderer will not be liable to any obligation for any issue
arise under this project.
� The Bidder shall be deemed to have carefully examined all RFP documents to its entire
satisfaction. Any lack of information shall not in any way relieve the Bidder of its
responsibility to fulfil its obligation under the Contract.
6. Payment Terms:
� GIL and GSDC reserves the right to inspect goods and services supplied as per the scope
of this RFP document. The cost of all such tests shall be borne by the Vendor. Any inspected
goods fail for confirm to the specification will be rejected, and Vendor shall have to replace
the rejected goods as per the contract specification without any financial implication to the
GIL/DIT.
� After successful installation of the System in accordance with the requirements as mentioned
in Schedule of Requirement, POC shall be executed.
� Successful bidder has to complete the SITC of proposed complete solution and execute POC
to meet the benchmark as mentioned in this RFP document. All cost with respect to execute
the POC shall be borne by successful bidder.
� If POC does not meet the benchmark, bidder shall lift deployed complete solution from GSDC
without any cost to Tender. No payment will be made on failure of POC.
� After Successful POC which is successful demonstration of the benchmark as mentioned in
RFP, only then the bidder shall go for Final Acceptance Test.
� After successful installation of the System in accordance with the requirements as mentioned
in Schedule of Requirement, Final Acceptance Test will be conducted. The GSDC or
designated agency shall through review all aspects of the solution as per the ask of the RFP.
After successful testing, Acceptance Test Certificate will be issued by GIL/DIT and member
of GSDC or its designated agency to the Bidder. The Bidder shall submit the certificate to
GIL/DIT for further payment process.
� The date on which Final Acceptance certificate is issued shall be deemed to be the date of
successful commissioning and Go-Live of the System.
� Any delay by the successful bidder in the POC or Acceptance Testing shall render the
successful bidder liable to the imposition of appropriate Penalties.
� Bidder is required to update the details of Hardware installed in the Assets Master or as
decided by GIL and member of GSDC Officer before completion of FAT.
� GIL/GSDC and/or an outside agency nominated by DST will conduct an acceptance test on
the hardware after completion of installation and commissioning of hardware by the vendor.
Acceptance test shall comprise of tests to verify conformity of technical
requirements/specifications and performance. In case GIL/GSDC is not satisfied with the
above then, the vendor will upgrade /replace them with equal or higher model after due
approval of GSDC team without any extra cost. The exact details of acceptance test will be
mutually decided after the installation of hardware.
Successful bidder has to complete the Installation, Configure, Commissioning, Integration with
Acceptance of the ordered work within the time period (s) specified in the below table. However, in
case of any delay solely on the part of successful bidder TENDERER reserve the right to levy the
appropriate penalties as per the below table:
IMPLEMENTATION TIMELINES & PENALTIES FOR PROPOSED GPU Cluster with GPUs AT
GSDC
Time Limit Maximu Overall
S/n Penalty for
Work type for m Penalty Cap
Delay
Execution Penalty
EMD may be
Within 15
forfeited and
Days from
contract may be
date of
1 Submission of PBG terminated or part -
issuance of
thereof
GEM
contract
T1=T+60 0.5% of Capex
Supply of the
days from value of 10% of
Hardware including
the date delayed/pending GEM Overall ( Sr. no- 2 to 6)
2 Licenses and OEM
of issuance work per week or order Penalty CAP not be
Warranty
of contract part thereof value more than 10 % of the
Certificate.
over GEM total GEM order value
Installation, 0.5% of Capex for IMPLEMENTATION
10% of
commissioning & value of TIMELINES &
GEM
3 integration of GPU T2=T1+30 delayed/pending PENALTIES:
order
servers at GSDC work per week or
value
along with HLD , part thereof
LLD documents
0.1 % of Capex
value of
delayed/pending
work per week or
part thereof.
In case of delay
for more than
POC to meet the 10% of
2(two) weeks
benchmark as T3=T2+30 GEM
4 after the defined
mentioned in this days order
milestone, the
RFP document. value.
POC shall be
treated as failed
and the contract
shall be
terminated and
PBG may be
forfeited.
0.5% of Capex
10% of
value of
Final Acceptance T3=T2+15 GEM
5 delayed/pending
Testing (FAT) days order
work per week or
value.
part thereof.
Deployment of
Rs.
6 required Skilled T3+7 Days Rs. 10000/- day.
250000/-
Resource at GSDC
10 Days Rs.
7 Training Rs. 10000/- day.
from T3 250000/-
Note:
� Material supplied, installed and commission as per this Bid/contract should be covered under the
warranty for a period of five years from the date of FAT acceptance.
� T= Date of issuance of contract over GEM.
� In case of any fault arises in the installed items during the warranty period of 5 years, bidder is
requiring to either repair the faulty items or have to install the replacement (complying to the RFP
specification) for faulty material without any additional cost to the Tenderer.
� Aforesaid penalty cap will not be applicable for any severe impact/incident/outage at GSDC,
resulting in loss to Government of Gujarat.
� The successful bidder shall repair/ replace all faulty material covered under the warranty
within the shortest possible time thus ensuring minimum downtime, failing which applicable
penalty will be imposed. In case of failure of appliance / solution for more than 3 consecutive
time for the same issue within any of the single quarter during contract period, bidder would
be bound to replace the product with no cost to DST / GIL/DIT.
� The successful bidder shall be responsible for maintaining the desired performance and
availability of the system/services.
� Successful bidder should ensure the prompt service support during warranty period.
� Timeline for resolution is within NBD ( Next Business Day) from the time of call logged /
reported to Bidder/OEM. If the successful bidder fails to resolve the call as specified above,
penalty will be imposed on each delayed hour for Rs. 5000 / hour or part thereof
proportionately, which will be recovered against Performance bank guarantee or billable
quarterly invoice amount submitted by the successful bidder.
� Down time will be calculated from the time complain is logged to service in charge of
Successful Bidder (via email/call/written letter) till the GSDC’s authorized / Nominated
employee acknowledge the repair / service completion.
b. SLA for Uptime (99.741%)
� SLA will be calculated on quarterly basis, However, Final penalty deduction on the quarterly
payment i.e., (4* 3 quarter SLA report penalty will be applied during O&M and Manpower
quarterly payment.)
� Bidder has to ensure support 365*24*7 for SLA calculation.
2. Replacement of a profile by the agency (only one replacement per technical profile –
with equal or higher qualification and experience – would be permitted per year)
For every SLA non-compliance reported and proved, there shall be a penalty as given below:
Note:
The bidder must deploy the required quantity of switches with the
same or higher functionality to meet the solution requirements.
The switch count shall be adjusted (increased/decreased) based
on the actual port availability per device while maintaining the
specified speed and functionality.
The system should support latest version of Red Hat Enterprise
Linux / Ubuntu Linux / RHEL AI / Red Hat OpenShift AI server.
OS Support However, the bidder shall deliver Ubuntu Linux version 22 or higher
which should be delivered with Enterprise support from OEM with
premium or highest level of support available.
Quoted model should be certified for RHEL, Ubuntu OS. The same
shall be verifiable from OS OEMs website.
Supply should include DC edition unlimited Guest OS licenses
Hypervisor with Enterprise level highest license and support
Hypervisor
available should be provided from day one.
Virtual GPU Support for virtual GPU to share a physical GPU across multiple
VMs. required license should be from day one. Bidder to ensure
that enterprise level OEM support & SLA is available for all OEM
provided software and libraries.
AI Enterprise software & subscription or equivalent for each and
every GPUs to be included from day one. Bidder to ensure that
enterprise level OEM support & SLA is available for all OEM
provided software and libraries.
All necessary and required software, SDK, libraries, tools to cater
and run the AI/ML workload should be provide from day 1.
Comprehensive software frameworks for the following should be
provided:
AI Enterprise Software
a) Accelerated ML and data processing
b) Microservices enabled framework for API based LLM model
deployment & serving
d) End to End flows for conversational AI - ASR, NMT, TTS
e) Video, Audio and Image processing pipelines
1.Specrate2017_fp_base >690
2.Specrate2017_Int_base >530
The System OEM must have listed the SPEC benchmark score on
www.spec.org for the same node model with the same CPU
configuration and a memory configuration of at least 1TB.
Performance Benchmarks
If not listed on spec.org, bidder shall be required to submit
benchmark report / logs for the Make/Model (same configuration)
of the quoted server as part of bid submission. The submission
should be on OEM letterhead duly signed and referring the bidder
and bid details.
Storage Nodes
The solution should be PFS (Parallel File System) OR NFS (Network File
System) based and delivered with 1PB (All NVMe) usable post RAID
6/equivalent or better protection, expandable up to 2PB in the same file
system.
The proposed storage array should be configured with no single point of
failure, including required controllers, cache, power supply, cooling fans,
etc. It should be scalable up to 12 additional controllers/nodes.
1PB (NVMe) usable post RAID 6 or better configuration
The storage should be distributed with namespace consistent across nodes.
External Storage Performance: Min 32 GBps Read and Min 16 GBps Write aggregated from
day one and scalable up to 200% with a scale-out architecture and
additional controllers/nodes in the future.
IOPS: minimum 8,00,000
1. Storage must offer NVIDIA GPUDirect Storage connectivity to
GPUs.
2. NVMe Storage offered must be certified with the proposed GPU
OEM.
Front-End Connectivity: 200GBE or higher Ethernet connectivity
compatible with all nodes as per proposed solution.
3 Cable Entry Top and Bottom Panel with cable entry facility with Brush.
The 19” mounting angles should be provided 2 Nos. on front and rear side
Mounting of the Rack. It should be adjustable full depth. 19” Mounting Angles made
4
Angle up of Steel 2mm Thickness with better mounting flexibility and maximizes
usable mounting space.
"U" “U” numbering should be provided on the 19" mounting rails such that these
5
Identification unique numbers are visible after mounting of the equipment also.
PDU Each rack should have provision for installation of two PDU with toolless
6
Provision mounting provision to be connected to the two different sources individually.
Cable
7 Manager Each rack should have 4 horizontal 1U Closed type cable manager.
Provision
Side Panel shall be covered with horizontally split steel panels The side
8 Side Panels
panels should be easily detachable with locking provision.
Front and Rear doors should be perforated and both front and rear doors
should be at least 80
9 Door % hexagonal perforated (Holes). Front & Rear Door should be with Minimum
of
138 degrees to allow easy access to the interior.
Door Hexagaonal Perforated Single Front Door will be Lockable and - handle
10
Perforation Lock & Key should be provided.
Hexagonal Perforated Dual Rear Door will be Lockable and 3 Point Lock
11 Door Lock
should be provided.
Rack should be with Plinth of 800 MMW, 100MM H and 1400 MM D. The
12 Castor
rack shall be not having External height >2060 mm including Plinth.
Minimum load bearing capacity supported by Base Frame should be static
13 Load Bearing
load of at least -1200 Kg.
Rack shall be pre-treated and powder coated. The Powder coating process
Powder
14 shall be ROHS compliant. Powder coating thickness shall be 80 to 100
Coating
microns. The color of the powder coat shall be Black.
Each rack shall be provided with 3 Nos. of 3 PHASE 63A PDU IEC C19 X
12 SKT (PER SOCKET IEC C19 X 4 SOCKET + 63A D Curve DP MCB )
15 PDU
X 3 + 16 SQ MM 5 CORE 3.5 MTR FRLS CABLE WITH 5
PIN 63A IND PLUG (2No Vertical and 1 No Horizontal)
16 Shelf 1No Heavy Duty Shelf for keeping the Display & Keyboard
All Racks & Doors are inherently grounded to Rack Frame. Both the front
Door and rear doors should be designed with quick-release hinges allowing for
17
Construction quick and easy detachment without the use of tools. The front door of unit
should be field reversible so that it may open from either side.
18 Statutory 100% assured compatibility with all equipment conforming to DIN 41494 /
Standard EIA 310-D standard(General industrial standard for equipment).
Indicative Diagram
Note:
� Bidders should refer to the indicative diagram for reference and propose their own solution
to meet the requirement and ensuring minimum failure / no failure accordingly.
� Bidder has to conduct site visits in advance (before the bid submission date) during working
days and hours to assess the rack positioning. Based on this assessment, they should quote
their solution in the bid submission.
� In addition, Bidder has to connect Management and inference node with existing storage at
GSDC as following;
Existing Storage
� NetApp FAS8300 with Total ports 4 and Used ports 2
� Hitachi VSP 5600 with Total ports-64 ports and Used 64 ports
CISCO MDS 9710 SAN Switch with 16 Gbps SFP (having port capacity of 32 Gbps) is used
to connect with Storage. Port details are as below.
� For any additional requirement of ports over and above as aforementioned available ports,
the bidder shall provide SAN switch with same or higher configuration compatible to connect
inference node and management node with existing storage to complete the solution without
any additional cost to the tenderer.
� The bidder has to ensure propose management and inference node solution should be
compatible with aforementioned storage and switch. All necessary accessories, cabling,
hardware, software, and licenses should be considered accordingly.
Cost
Sr. includin
Description
No. g GST
(Rs.)
GPU Compute for uses AI / ML at GSDC:
A. Inclusive of all the required hardware, Software and necessary Licenses
required to make the solution fully functional.
B. As per the Scope of work, functional and technical requirement, including
racks, cable & all other accessories (including active & passive
1
components), Installation, testing, commissioning and training etc.
C. Cost of Comprehensive Annual Maintenance with warranty and OEM
support for 5 year
D. Cost for O&M cost (including two skilled resources) for period of 5 Years
Note:
On Letterhead of Bidder
Sub: Undertaking as per Office Memorandum No.: F. No.6/18/2019-PPD dated 23.07.2020 &
Office Memorandum No.: F.18/37/2020-PPD dated 08.02.2021 published by Ministry of
Finance, Dept. of Expenditure, Public Procurement division
I have read the clause regarding restriction on procurement from a bidder of a country that shares a
land border with India. I certify that we as a bidder and quoted product from the following OEMs are
not from such a country or if from such a country, these quoted products OEM has been registered
with the competent authority. I hereby certify that these quoted product & its OEM fulfills all
requirements in this regard and is eligible to be considered for procurement for Bid
number_______________________.
In case I’m supplying material from a country which shares a land border with India, I will provide
evidence for valid registration by the competent authority, otherwise GIL/End user Dept. reserves
the right to take legal action on us.
(Signature)
Authorized Signatory of M/s <<Name of Company>>
On Letterhead of OEM
Sub: Undertaking as per Office Memorandum No.: F. No.6/18/2019-PPD dated 23.07.2020 &
Office Memorandum No.: F.18/37/2020-PPD dated 08.02.2021 published by Ministry of
Finance, Dept. of Expenditure, Public Procurement division
Dear Sir,
I have read the clause regarding restriction on procurement from a bidder of a country that shares a
land border with India. I certify that our quoted product and our company are not from such a country,
or if from such a country, our quoted product and our company have been registered with the
competent authority. I hereby certify that these quoted products and our company fulfills all
requirements in this regard and is eligible to be considered for procurement for Bid
number_______________________.
In case I’m supplying material from a country which shares a land border with India, I will provide
evidence for valid registration by the competent authority; otherwise GIL/End user Dept. reserves
the right to take legal action on us.
(Signature)
Authorized Signatory of M/s <<Name of Company>>