0% found this document useful (0 votes)
62 views19 pages

AI Lab Infrastructure Proposal: Server vs. Desktops

Uploaded by

Mainali Keshav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views19 pages

AI Lab Infrastructure Proposal: Server vs. Desktops

Uploaded by

Mainali Keshav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

11/18/25, 1:33 AM AI Lab Infrastructure Proposal

AI Laboratory
Infrastructure
Proposal
Comparative Analysis & Strategic
Recommendations

Centralized Server vs. Individual Desktop Systems


A Comprehensive Technical and Financial Evaluation

Cosmos College of Management and Technology

[Link]:5500/ai_lab_presentation.html 1/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Executive Overview
Two Strategic Approaches for AI Lab Infrastructure

🖥️ 💻
Option 1 Option 2
Centralized AI Server Individual Desktop
Workstations

Total Investment
Per Machine Cost
NPR 26L -
32L NPR 2.9L -
One-Time Setup
4.6L
×10 machines = NPR 29L
- 46L
Single enterprise-grade
server with NVIDIA A100 GPU
Independent high-
supporting multiple thin client
performance workstations
terminals for simultaneous
with RTX 4070 GPU providing
access
dedicated resources to each
user
Key Benefits:

Key Benefits:
✓ Serves 20-30 students
simultaneously
✓ Dedicated resources per
student
✓ Enterprise-grade AI
performance
✓ No resource contention

[Link]:5500/ai_lab_presentation.html 2/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

✓ Centralized management ✓ Customizable


& updates environments

✓ Cost-effective per student ✓ System redundancy

[Link]:5500/ai_lab_presentation.html 3/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Technical Specifications
Component-by-Component Comparison

Centralized Individual
Component
Server Desktop

GPU NVIDIA A100 NVIDIA RTX 4070


40GB PCIe 12GB
Enterprise Grade Consumer Grade
• 312 TFLOPS tensor • 29 TFLOPS tensor
performance performance
• 40GB HBM2 • 12GB GDDR6X
memory memory
• Multi-Instance GPU • Ada Lovelace
support architecture
NPR 12L - 17L NPR 90K - 1.5L

CPU AMD Threadripper Intel i7-13700K /


PRO 5975WX Ryzen 9 7900X
• 32 cores / 64 • 16-24 cores
threads • Consumer platform
• 128 PCIe lanes • High single-core
• ECC memory speed
support NPR 70K - 1.1L
NPR 2.25L - 4.6L

RAM 256GB DDR4-3200 32GB DDR5-5200


ECC • Standard memory
• Error-correcting • Latest DDR5 tech
memory • Non-ECC
• Registered DIMMs NPR 42K
• Maximum reliability
NPR 3.6L

Storage 4TB NVMe 1TB NVMe + 2TB


(2×2TB) HDD
• Samsung 990 PRO • PCIe Gen 4.0 SSD
• 7,450 MB/s read • Secondary HDD
• PCIe Gen 4.0

[Link]:5500/ai_lab_presentation.html 4/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Centralized Individual
Component
Server Desktop

NPR 70K storage


• Balanced solution
NPR 28K - 45K

Power Supply Corsair AX1600i 750W Modular


(1600W) PSU
• 80+ Titanium • 80+ Gold
• Server-grade • Standard desktop
reliability NPR 19K - 25K
NPR 1.05L

[Link]:5500/ai_lab_presentation.html 5/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

AI Performance Analysis
Real-World AI/ML Task Performance

Individual
Centralized
AI Capability Desktop (RTX
Server (A100)
4070)

Deep Learning Excellent Good


Training • Train GPT-style • Limited to ~3B
models up to 10B parameter models
params • ResNet-50: ~25
• ResNet-50: ~8 min/epoch
min/epoch on • BERT-Base: 8-10
ImageNet hours
• BERT-Large: 3-4 • Suitable for
hours full training learning & small
• Optimized for projects
FP16/TF32

Model Size Capacity 40GB VRAM 12GB VRAM


• Large language • Small-medium
models (LLaMA- models only
13B) • Pre-trained model
• Vision fine-tuning
transformers (ViT- • Limited by memory
Huge) • Batch size: 16-64
• Multi-modal
models
• Batch size: 128-
512+

Concurrent Users 20-30 Users 1 User


• MIG technology • Single user per
for GPU partitioning machine
• JupyterHub multi- • No resource
user support sharing
• Resource • Independent
scheduling & sessions
queuing
[Link]:5500/ai_lab_presentation.html 6/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Individual
Centralized
AI Capability Desktop (RTX
Server (A100)
4070)

• Shared dataset • Isolated


access environments

Data Processing 32 Cores 16-24 Cores


• Massive parallel • Good parallel
preprocessing processing
• 256GB RAM for • 32GB RAM
large datasets adequate for most
• Process ImageNet tasks
in minutes • Standard dataset
• Handle handling
video/audio • May require data
datasets streaming

Inference/Deployment Production Development


Ready Only
• Deploy multiple • Testing &
models prototyping
simultaneously • Single model
• Real-time API serving
serving • Limited throughput
• High throughput • Not for production
inference
• Industry-standard
pipeline

Research Capability Advanced Educational


Research Research
• Publish-grade • Course projects
experiments • Small-scale
• Novel architecture experiments
testing • Limited ablations
• Large-scale • Learning-focused
ablation studies
• Competitive
benchmark results

[Link]:5500/ai_lab_presentation.html 7/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Comprehensive Cost Analysis


5-Year Total Cost of Ownership (12 Students)

Centralized Individual
Server Desktops (12
units)

Server Hardware
GPU, CPU, RAM, Desktop Hardware
Storage, etc. (12 units)
NPR 26.05L - 31.73L @ NPR 2.92L - 4.61L
each

Thin Clients (12 NPR 35L - 55L


units)
Basic terminals @ 15K- Monitors &
20K each Peripherals
NPR 1.8L - 2.4L 12 × displays @ 15K-20K
+ peripherals

Networking NPR 2L - 3L
Infrastructure
10GbE switches, cabling, Basic Networking
rack
Standard gigabit
NPR 50K - 1L switches
NPR 50K - 1L
Annual Electricity
~2kW continuous Annual Electricity
operation

[Link]:5500/ai_lab_presentation.html 8/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

NPR 50K - 1L/year 12 × 400W machines @


8hrs/day

Annual NPR 1L - 2L/year


Maintenance
Support, repairs, Annual
upgrades Maintenance
NPR 50K - 1L/year Multiple machine
servicing
NPR 50K - 1.5L/year
5-Year Total Cost

NPR 33L -
5-Year Total Cost
41L
Cost per student: NPR NPR 45L -
2.75L - 3.42L
76L
Cost per student: NPR
3.75L - 6.33L

💰 Cost Savings: The centralized server approach offers 27-46% lower


total cost of ownership over 5 years, saving NPR 12L - 35L compared to
individual desktop setup for 12 students.

[Link]:5500/ai_lab_presentation.html 9/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Advantages & Limitations


Comprehensive Evaluation

Centralized Individual
Server Desktops

✅ Advantages ✅ Advantages
✓ Cost Efficiency: 60% ✓ Dedicated Access: No
lower TCO per student sharing or waiting

✓ Enterprise ✓ Independence: Each


Performance: A100 student fully autonomous
GPU for research-grade
AI ✓ Redundancy: One
failure doesn't affect
✓ High Utilization: others
Shared resources
maximize value ✓ Customization:
Individual software setup
✓ Easy Management:
Single system to maintain ✓ Simple Management:
Standard IT practices
✓ Scalability: Add users
without hardware cost ✓ Offline Capable: Works
without network
✓ Centralized Data:
Simplified backup & ✓ Incremental Growth:
security Buy machines as needed

[Link]:5500/ai_lab_presentation.html 10/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

✓ Energy Efficient: Lower ✓ Familiar Interface:


power consumption Standard desktop
experience
✓ Professional Training:
Real cluster computing
experience
⚠️ Limitations
✓ Research Capable: ⚠ High Cost: 2-3x more
expensive overall
Publish-quality
experiments
⚠ Limited GPU: Only
12GB VRAM per machine
⚠️ Limitations
⚠ Underutilization: Idle
⚠ Single Point of resources wasted
Failure: Downtime
affects all ⚠ Multiple Maintenance:
Service 20 machines
⚠ Resource Queue: Peak
time contention ⚠ Higher Energy: 2-3x
power consumption
⚠ Network Dependent:
Requires robust ⚠ Space Intensive: Large
infrastructure lab footprint

⚠ Complexity: Needs ⚠ Research Limited:


skilled administration Can't train large models

⚠ Initial Investment: ⚠ Inconsistency: Varied


Higher upfront cost configurations

⚠ Learning Curve: Users


need cluster training

[Link]:5500/ai_lab_presentation.html 11/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Ideal Use Case Scenarios


Which Option Fits Your Needs?

🖥️ 💻
Choose Choose
Centralized Individual
Server Desktops

Perfect For: Perfect For:

✓ Large Classes: 15-30 ✓ Small Groups: 5-10


students per batch students maximum

✓ AI/ML Programs: ✓ Mixed Usage: Gaming,


Dedicated AI courses 3D, and AI together

✓ Research Projects: ✓ Individual Projects:


Graduate-level research Diverse student work

✓ Deep Learning: ✓ Development Focus:


Training transformer Software engineering
models
✓ Pre-trained Models:
✓ Computer Vision: Using existing models
ImageNet-scale datasets
✓ Application
✓ NLP Tasks: Large Development: Building
language model fine- AI apps

[Link]:5500/ai_lab_presentation.html 12/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

tuning
✓ Self-paced Learning:
Flexible schedules
✓ Workshop Sessions:
Intensive training
✓ Offline Requirements:
programs
No network dependency

✓ Publication Goals:
✓ Custom
Research paper
Environments:
experiments
Individual setups

✓ Budget Conscious:
✓ Incremental Growth:
Maximum value per
Start small, expand later
student

✓ Professional Training: Real-World


Industry-standard tools Examples:
• Using pre-trained
BERT/ResNet
• Building chatbot
Real-World applications
Examples: • Image classification
• Training GPT-style projects
models • Data analysis with
• Object detection with pandas
YOLO/R-CNN • Web app
• Medical image development with ML
analysis
• Recommendation
systems
• Time series
forecasting at scale

[Link]:5500/ai_lab_presentation.html 13/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Performance Benchmarks
Real-World Training Time Comparisons

Centralized Desktop (RTX


Task Dataset/Model
(A100) 4070)

Image ResNet-50 on ~8 ~25


Classification ImageNet minutes/epoch minutes/epoch
(1.2M images, Complete training: Complete training:
1000 classes) 7-8 hours 22-25 hours
Batch size: 512 Batch size: 128

NLP - BERT-Large 3-4 hours Won't fit


Language (340M Fine-tuning: 30-45 BERT-Base only: 8-
Model parameters) minutes 10 hours
Fits entirely in Gradient
VRAM checkpointing
required

Object YOLOv8 on COCO 4-5 hours 10-12 hours


Detection (118K images, 80 High-res training: Standard res:
classes) 1024×1024 640×640
Multiple models Single model only
parallel

Generative Stable Diffusion 2-3 hours 6-8 hours


AI Fine-tuning Full precision Mixed precision
training required
Large batch Small batches
processing

LLM Fine- LLaMA-7B 5-6 hours Not possible


tuning (7 billion Full model fits 4-bit quantization
parameters) LoRA or full fine- might work
tuning Very limited
capability

[Link]:5500/ai_lab_presentation.html 14/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Centralized Desktop (RTX


Task Dataset/Model
(A100) 4070)

Multi-User 10 students Supported Not applicable


Scenario training MIG partitioning Need 10 separate
different models Each gets machines
simultaneously dedicated slice Total cost: 29-46
Job scheduling lakhs
High maintenance

📊 Performance Summary: The A100 GPU is approximately 3-4×


faster for training and can handle 3-4× larger models compared to RTX
4070, while supporting 20-30 users simultaneously.

[Link]:5500/ai_lab_presentation.html 15/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Strategic Recommendation
Data-Driven Decision for Cosmos College

🎯 Recommended: Centralized
AI Server
Based on comprehensive analysis of technical
requirements, cost-effectiveness, scalability, and
educational objectives, the Centralized AI Server approach
is the optimal choice for Cosmos College.

Why This Recommendation?

60% 20-30 3-4×


Cost Savings Simultaneous Faster
Over 5 Years Users Performance

✅ Strategic Benefits 💡 Educational Impact


✓ Market Leadership: ✓ Real-World Skills: Cluster
Position as Nepal's premier computing & job scheduling

[Link]:5500/ai_lab_presentation.html 16/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

AI education institution
✓ Large-Scale Projects:
Train state-of-the-art models
✓ Research Excellence:
Enable publication-quality
✓ Collaborative Work:
research projects
Team-based AI projects

✓ Industry Alignment:
✓ Resource Management:
Match corporate AI
Professional development
infrastructure standards
practices

✓ Graduate Programs:
✓ Industry Tools: Same
Support advanced AI/ML
stack as Google, OpenAI
degree programs

✓ Career Ready: Direct


✓ Competitive Edge: Attract
industry applicability
top students and faculty

✓ Future-Proof: Scalable
without major reinvestment

Return on Investment (ROI)


• Immediate: Support 3-4× more students than individual
desktop approach
• Year 1: Establish reputation for cutting-edge AI education
• Year 2-3: Attract research grants and industry partnerships
• Year 4-5: Graduate students placed in top AI companies
• Long-term: Sustainable, scalable infrastructure for decades

[Link]:5500/ai_lab_presentation.html 17/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

Implementation Roadmap
4-Month Deployment Plan

Phase 1: Phase 2:
Procurement Infrastructure
(Weeks 1-4) (Weeks 5-8)

Hardware Server assembly &


procurement testing
Vendor selection Network setup
(10GbE)
Import
coordination Rack installation
Quality Thin client
verification deployment
Warranty Power & cooling
documentation setup

[Link]:5500/ai_lab_presentation.html 18/19
11/18/25, 1:33 AM AI Lab Infrastructure Proposal

[Link]:5500/ai_lab_presentation.html 19/19

You might also like