AWS introduces fractional vGPU solutions with NVIDIA L4 Tensor Core GPUs for various workloads

View organization page for NVIDIA Virtual GPU

25,266 followers

4mo Edited

📣 Just announced: Amazon Web Services (AWS) now offers fractional vGPU solutions accelerated by NVIDIA L4 Tensor Core GPUs, delivering scalable and cost-effective options for a variety of accelerated workloads. Discover how you can leverage precise, on-demand GPU resources for natural language processing, graphics workloads, game streaming, and more ➡️ https://lnkd.in/g3eV8rBF

Amazon EC2 G6 Instances aws.amazon.com

To view or add a comment, sign in

More Relevant Posts

Techstrong.ai

1,035 followers
1mo
Report this post
NVIDIA just shrunk the AI supercomputer to the size of a book, and it might change how developers build and test large models. The new DGX Spark is a desk-sized AI supercomputer, built around NVIDIA’s Grace Blackwell GB10 Superchip. That’s a big deal for developers stuck between consumer GPUs running out of VRAM and cloud costs that keep climbing. Could this mark the start of a new “local-first” AI era? Read the full article to see how DGX Spark fits into NVIDIA’s broader strategy: https://lnkd.in/esRiaRNP #NVIDIA #AICompute #GenerativeAI

NVIDIA's Desk-Size Supercomputer Puts Big AI Models Within Arm’s Reach - Techstrong.ai https://techstrong.ai
Like Comment
To view or add a comment, sign in
Jason Perlow
1mo
Report this post
The future of fine-tuning is open and scalable. With Unsloth AI , NVIDIA Blackwell GPUs, and NVIDIA DGX Spark, developers can fine-tune massive LLMs locally—then scale the same workflows seamlessly into NVIDIA DGX Cloud and NVIDIA Cloud Partners for enterprise workloads. This open source project and NVIDIA engineering collaboration are redefining what’s possible in efficient AI model customization. #AI #MachineLearning #DeepLearning #LLM #GenAI #OpenSource #NVIDIA #DGXSpark #Blackwell #DGXCloud #AIDevelopment #FineTuning #CloudComputing #AIEcosystem Lee D. Brian Carpenter Paul Abruzzo Dane Aconfora NVIDIA AI 👉 Learn more:

Train an LLM on NVIDIA Blackwell with Unsloth—and Scale for Production | NVIDIA Technical Blog developer.nvidia.com

6 Comments
Like Comment
To view or add a comment, sign in
Technetbook

351 followers
1mo
Report this post
Microsoft Azure launches its first production cluster with NVIDIA GB300 Blackwell Ultra GPUs, scaling AI with new VMs to cut training times for huge m

Microsoft Azure Launches NVIDIA GB300 Blackwell Ultra GPU Cluster for Large-Scale AI Model Training technetbooks.com
Like Comment
To view or add a comment, sign in
Prince Humayun
1mo Edited
Report this post
Microsoft’s New Supercomputing NVL72 cluster of NVIDIA GB300s with 4600+ GPUs and featuring nextgen NVIDIA InfiniBand for OpenAI workloads #AI #Artificialintelligence #nidia #openai https://lnkd.in/g8JYU7Gp

NVIDIA GB300 NVL72: Next-generation AI infrastructure at scale | Microsoft Azure Blog https://azure.microsoft.com/en-us/blog
Like Comment
To view or add a comment, sign in
Florin Lungu

Lead DevOps Engineer | Vice President (VP) @ Deutsche Bank
1mo
Report this post
Microsoft has launched its first large-scale production cluster with over 4,600 NVIDIA GB300 NVL72 GPUs, designed specifically for OpenAI workloads. I found it interesting that this setup utilizes the next-generation NVIDIA InfiniBand network for improved connectivity and performance. What are your thoughts on the potential impact of such advancements in AI infrastructure on the industry?

Microsoft Azure delivers the first large scale cluster with NVIDIA GB300 NVL72 for OpenAI workloads https://azure.microsoft.com/en-us/blog
Like Comment
To view or add a comment, sign in
SemiAnalysis

27,952 followers
1mo
Report this post
Meta has open sourced their CTran library that natively works with AMD & NVIDIA GPUs 🚀. Previously, if u want multiple NVIDIA GPUs to work together on an workload, you must used the NVIDIA NCCL library. Although NCCL's source code is public, it does not have an open governance model, does not have open CI, employs an "code dump" update model, is not GitHub first, and rarely accepts external contributions. Previously, If you want multiple GPUs to work together on an workload, you must used the AMD fork called RCCL library, which is a delayed fork of NVIDIA's NCCL. With CTran, it is 1 unified library and allows for adding new like Bruck's in an way such that the code can be shared between different AI GPU types. Furthermore, Meta has open sourced NCCLX (NCCL extended) which is their production-tested collective library that powered all Llama training and uses the unified CTran library. Meta is the creator & main maintainer of PyTorch and is well trusted in the open source community. NVIDIA continues to be the leader in collective libraries but Jensen must not taken it for granted given the heavily increased competition in the open source collective communication space. Just like how TRTLLM moved to an GitHub first development when facing heavy competition from SGLang/vLLM, Jensen should seriously consider moving NCCL to GitHub first open development model due to the competition in the collective front too. To draw parallel comparisons to the inference engine world, Collective Communication Libraries are moving from the 2021 "FasterTransformer" era to the 2025 "SGLang/vLLM/TRTLLM" era. The main competitors in the collective library space include China's DeepEP library, AMD's new MORI, AMD's upcoming MORI-CCL, Meta's CTran & NCCLX, NVIDIA's NCCL (which has released their new NCCL Device API, NCCL's new GPU-Initiated Networking, etc). Competition breeds innovation! 🚀
6 Comments
Like Comment
To view or add a comment, sign in
Erez Weintraub
1mo
Report this post
Democratizing Collective Communication - The release of Meta’s CTran and NCCLX is a major step toward open, high-performance collective communication across heterogeneous GPUs, and it's just the beginning. What really unlocking this shift? user memory registration and symmetric-memory, recently supported in PyTorch and NCCL. Registering user buffers enables zero-copy, memory-semantics communication across GPUs which is critical for scale-up performance. Symmetric memory (same virtual address across GPUs) enables one-shot collectives and GPU-initiated networking with ultra-low latency. These features reduce vendor lock-in and open the door for more flexible, efficient collective libraries across NVIDIA, AMD, and beyond. This is how we move from opaque, vendor-bound collectives to a new era of transparent, fast, and community-driven infrastructure. Shortcut to SemiAnalysis: https://lnkd.in/dGuEE2Se #PyTorch #GPUcomputing #OpenSourceAI #HPC #AIInfrastructure #LLMTraining #SymmetricMemory
SemiAnalysis

27,952 followers
1mo

Meta has open sourced their CTran library that natively works with AMD & NVIDIA GPUs 🚀. Previously, if u want multiple NVIDIA GPUs to work together on an workload, you must used the NVIDIA NCCL library. Although NCCL's source code is public, it does not have an open governance model, does not have open CI, employs an "code dump" update model, is not GitHub first, and rarely accepts external contributions. Previously, If you want multiple GPUs to work together on an workload, you must used the AMD fork called RCCL library, which is a delayed fork of NVIDIA's NCCL. With CTran, it is 1 unified library and allows for adding new like Bruck's in an way such that the code can be shared between different AI GPU types. Furthermore, Meta has open sourced NCCLX (NCCL extended) which is their production-tested collective library that powered all Llama training and uses the unified CTran library. Meta is the creator & main maintainer of PyTorch and is well trusted in the open source community. NVIDIA continues to be the leader in collective libraries but Jensen must not taken it for granted given the heavily increased competition in the open source collective communication space. Just like how TRTLLM moved to an GitHub first development when facing heavy competition from SGLang/vLLM, Jensen should seriously consider moving NCCL to GitHub first open development model due to the competition in the collective front too. To draw parallel comparisons to the inference engine world, Collective Communication Libraries are moving from the 2021 "FasterTransformer" era to the 2025 "SGLang/vLLM/TRTLLM" era. The main competitors in the collective library space include China's DeepEP library, AMD's new MORI, AMD's upcoming MORI-CCL, Meta's CTran & NCCLX, NVIDIA's NCCL (which has released their new NCCL Device API, NCCL's new GPU-Initiated Networking, etc). Competition breeds innovation! 🚀
1 Comment
Like Comment
To view or add a comment, sign in
Samuel Zeman
1mo
Report this post
Meta just made AI training way more open and flexible! Until now, NVIDIA and AMD GPUs didn’t really “talk” to each other directly when training AI models. Each used its own closed system — NVIDIA’s NCCL and AMD’s slower fork called RCCL. Meta just open-sourced CTran, a new communication library that works across both NVIDIA and AMD GPUs — one unified system instead of two separate worlds. That means: ✅ Easier collaboration across different GPU brands ✅ Faster innovation from the open-source community ✅ Less vendor lock-in for AI infrastructure buyers ✅ Same benefits whether you’re using only NVIDIA, only AMD, or both Meta also released NCCLX, the production-grade version that powered all Llama training. In short: GPUs from different vendors can now “speak the same language.” It’s a big shift — from a world of closed silos to open, interoperable AI compute. Competition = progress. 💪
SemiAnalysis

27,952 followers
1mo

Meta has open sourced their CTran library that natively works with AMD & NVIDIA GPUs 🚀. Previously, if u want multiple NVIDIA GPUs to work together on an workload, you must used the NVIDIA NCCL library. Although NCCL's source code is public, it does not have an open governance model, does not have open CI, employs an "code dump" update model, is not GitHub first, and rarely accepts external contributions. Previously, If you want multiple GPUs to work together on an workload, you must used the AMD fork called RCCL library, which is a delayed fork of NVIDIA's NCCL. With CTran, it is 1 unified library and allows for adding new like Bruck's in an way such that the code can be shared between different AI GPU types. Furthermore, Meta has open sourced NCCLX (NCCL extended) which is their production-tested collective library that powered all Llama training and uses the unified CTran library. Meta is the creator & main maintainer of PyTorch and is well trusted in the open source community. NVIDIA continues to be the leader in collective libraries but Jensen must not taken it for granted given the heavily increased competition in the open source collective communication space. Just like how TRTLLM moved to an GitHub first development when facing heavy competition from SGLang/vLLM, Jensen should seriously consider moving NCCL to GitHub first open development model due to the competition in the collective front too. To draw parallel comparisons to the inference engine world, Collective Communication Libraries are moving from the 2021 "FasterTransformer" era to the 2025 "SGLang/vLLM/TRTLLM" era. The main competitors in the collective library space include China's DeepEP library, AMD's new MORI, AMD's upcoming MORI-CCL, Meta's CTran & NCCLX, NVIDIA's NCCL (which has released their new NCCL Device API, NCCL's new GPU-Initiated Networking, etc). Competition breeds innovation! 🚀
Like Comment
To view or add a comment, sign in
Brijesh Tripathi
1mo
Report this post
AI needs more open solutions. This release from Meta will drive further adoption of multi-architecture infrastructure. At FlexAI, we are all in on open and choice by enabling multi-cloud and multi-architecture solutions enabling customers to get the best solution for their workload needs.
SemiAnalysis

27,952 followers
1mo

Meta has open sourced their CTran library that natively works with AMD & NVIDIA GPUs 🚀. Previously, if u want multiple NVIDIA GPUs to work together on an workload, you must used the NVIDIA NCCL library. Although NCCL's source code is public, it does not have an open governance model, does not have open CI, employs an "code dump" update model, is not GitHub first, and rarely accepts external contributions. Previously, If you want multiple GPUs to work together on an workload, you must used the AMD fork called RCCL library, which is a delayed fork of NVIDIA's NCCL. With CTran, it is 1 unified library and allows for adding new like Bruck's in an way such that the code can be shared between different AI GPU types. Furthermore, Meta has open sourced NCCLX (NCCL extended) which is their production-tested collective library that powered all Llama training and uses the unified CTran library. Meta is the creator & main maintainer of PyTorch and is well trusted in the open source community. NVIDIA continues to be the leader in collective libraries but Jensen must not taken it for granted given the heavily increased competition in the open source collective communication space. Just like how TRTLLM moved to an GitHub first development when facing heavy competition from SGLang/vLLM, Jensen should seriously consider moving NCCL to GitHub first open development model due to the competition in the collective front too. To draw parallel comparisons to the inference engine world, Collective Communication Libraries are moving from the 2021 "FasterTransformer" era to the 2025 "SGLang/vLLM/TRTLLM" era. The main competitors in the collective library space include China's DeepEP library, AMD's new MORI, AMD's upcoming MORI-CCL, Meta's CTran & NCCLX, NVIDIA's NCCL (which has released their new NCCL Device API, NCCL's new GPU-Initiated Networking, etc). Competition breeds innovation! 🚀
Like Comment
To view or add a comment, sign in
Thapelo Selebedi
1mo
Report this post
🚀 Deploying Open Models on Serverless GPUs — Highlights from Microsoft Reactor (APAC) Recently, I tuned in to “Run Open Models on Serverless GPUs”, part of Microsoft Reactor’s AI Apps & Agents Dev Days series. The session, led by Anthony Shaw (Microsoft) and Stephen McCullough (NVIDIA), explored how developers can seamlessly deploy GPT-OSS models using Azure Container Apps and NVIDIA NIM micro-services. 💡 Key takeaways: Serverless GPUs offer true on-demand scalability — automatically spinning up resources when needed and scaling down to zero when idle. Per-second billing means you only pay for the compute time you use — perfect for experimentation and production. NVIDIA NIM brings plug-and-play inference micro-services with standard APIs, eliminating the hassle of GPU driver and dependency management. Flexible performance: NVIDIA T4 GPUs for real-time inference and A100 GPUs for high-performance workloads. Agentic workflows are easier than ever — combining open models, reasoning, and tool-calling to build scalable, cost-efficient AI agents. This collaboration between Azure’s serverless GPU infrastructure and NIM’s pre-optimized micro-services bridges the gap between experimentation and enterprise-grade AI deployment. 🌐 If you’re working with open-source LLMs or building AI agents, this is a session worth checking out — it’s a clear example of how infrastructure innovation is making advanced AI more accessible and efficient. #Azure #NVIDIA #Serverless #AI #LLMs #MachineLearning #DevDays #MicrosoftReactor
Like Comment
To view or add a comment, sign in

25,266 followers

View Profile Follow

LinkedIn respects your privacy

AWS introduces fractional vGPU solutions with NVIDIA L4 Tensor Core GPUs for various workloads

Explore content categories