NVIDIA Nemotron Speech for Developers

NVIDIA Nemotron™ Speech is a family of GPU-accelerated open models for automatic speech recognition (ASR), text-to-speech (TTS), neural machine translation (NMT), and speech-to-speech across ~40 languages. Deployed through the NVIDIA Riva library for optimized inference, the speech models are fully customizable and run across all clouds, in data centers, at the edge, and in embedded devices. As speech becomes the primary interface for AI agents, these models enable organizations to build voice-first AI agents that understand and respond naturally across dozens of languages.

Try APIs Documentation Forum

How Nemotron Speech Works

Speech and translation AI microservices convert spoken words into text (speech recognition), written language into spoken words (speech synthesis), and spoken or written words from one language to another (translation). Pretrained AI models are trained on vast datasets and can be fine-tuned on custom datasets to accelerate the development of domain-specific models. Fully containerized, these microservices are optimized for real-time performance and offline high throughput on premises or in the cloud and can quickly scale to hundreds and thousands of parallel streams.

A workflow diagram showing speech and translation AI microservices

Quick-Start Guide

Get step-by-step instructions for deploying pretrained models and how to interact with them.

Get Started

Learn to Add Voice to Your Apps

Learn how to add real-time speech recognition, translation, and natural-sounding text-to-speech to your applications using NVIDIA NIM™ for production-ready voice experiences.

Read Blog

Tutorial

Read how to build a voice-powered AI agent that combines the NVIDIA Nemotron streaming ASR model with multimodal retrieval-augmented generation (RAG), safety guardrails, and long-context reasoning.

Read Blog

Video Demo

Learn how to build a voice-enabled AI agent that integrates an NVIDIA Nemotron ultra-low-latency streaming model for real-time automatic speech recognition.

Watch Video

Ways to Get Started With Nemotron Speech

Use the right tools and technologies to build and deploy fully customizable, multilingual speech and translation AI applications.

A decorative image of building AI application with NVIDIA NIM APIs

Try

Experience Nemotron Speech through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through build.nvidia.com.

Try Now

Deploy

Get a free license to try NVIDIA AI Enterprise for 90 days using your existing infrastructure.

Request a 90-Day License

Development Starter Kits

Start developing your speech and translation AI application with Nemotron Speech by accessing tutorials, notebooks, forums, release notes, and comprehensive documentation.

Automatic Speech Recognition

Achieve high transcription accuracy for Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Portuguese, Russian, and Spanish with state-of-the-art models pretrained on thousands of hours of audio on NVIDIA supercomputers.

Text-to-Speech

Customize across English, German, Italian, Mandarin, and Spanish TTS pipelines for the voice and intonation you want.

Neural Machine Translation

Integrate highly accurate text-to-text, speech-to-text, or speech-to-speech translation for up to 32 languages into your conversational application pipelines.

Nemotron Speech and NVIDIA Riva Learning Library

More Resources

A decorative image representing Developer Community

Explore the Community

Get Training and Certification

A decorative image representing Inception for Startups

Meet the Program for Startups

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Always consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instructions and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Stay up to date on the latest speech and translation AI news from NVIDIA.