此页面由 Cloud Translation API 翻译。

Get started with Live API

Live API 支持与 Gemini 建立低延迟的实时语音和视频互动。它可处理连续的音频、视频或文本数据流，以便立即提供类似人类的语音响应，为用户打造自然的对话体验。

Live API 概览

Live API 提供一整套全面的功能，例如语音活动检测、工具使用和函数调用、会话管理（用于管理长时间运行的对话）和短时性令牌（用于安全的客户端身份验证）。

本页面提供了示例和基本代码示例，可帮助您快速上手使用。

示例应用

请查看以下示例应用，了解如何将 Live API 用于端到端用例：

AI Studio 上的实时音频入门应用，使用 JavaScript 库连接到 Live API，并通过麦克风和扬声器流式传输双向音频。
使用连接到 Live API 的 Pyaudio 的 Live API Python 食谱。

合作伙伴集成

如果您希望简化开发流程，可以使用 Daily 或 LiveKit。这些第三方合作伙伴平台已通过 WebRTC 协议集成了 Gemini Live API，以简化实时音频和视频应用的开发。

开始构建之前

在开始使用 Live API 进行构建之前，您需要做出两项重要的决策：选择模型和选择实现方法。

选择模型

如果您要构建基于音频的用例，您选择的模型将决定用于创建音频响应的音频生成架构：

原生音频（使用 Gemini 2.5 Flash）：此选项可提供最自然、最逼真的语音，并提供更好的多语言性能。它还支持情感（情感感知）对话、主动音频（模型可以决定忽略或响应某些输入）和“思考”等高级功能。以下原生音频模型支持原生音频：
- gemini-2.5-flash-preview-native-audio-dialog
- gemini-2.5-flash-exp-native-audio-thinking-dialog
使用 Gemini 2.0 Flash 的半级联接音频：此选项适用于 gemini-2.0-flash-live-001 模型，使用级联模型架构（原生音频输入和文本到语音输出）。它在生产环境中可提供更好的性能和可靠性，尤其是在使用工具时。

选择实现方法

与 Live API 集成时，您需要选择以下实现方法之一：

服务器到服务器：您的后端使用 WebSockets 连接到 Live API。通常，客户端会将流式数据（音频、视频、文本）发送到服务器，然后服务器会将其转发到 Live API。
客户端到服务器：您的前端代码使用 WebSockets 直接连接到 Live API 以流式传输数据，而绕过后端。

开始使用

此示例读取 WAV 文件，以正确的格式发送该文件，并将收到的数据保存为 WAV 文件。

您可以将音频转换为 16 位 PCM、16kHz、单声道格式来发送音频，也可以通过将 AUDIO 设置为响应模式来接收音频。输出使用 24kHz 的采样率。

Python

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
import wave
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client(api_key="GEMINI_API_KEY")

# Half cascade model:
# model = "gemini-2.0-flash-live-001"

# Native audio output model:
model = "gemini-2.5-flash-preview-native-audio-dialog"

config = {
  "response_modalities": ["AUDIO"],
  "system_instruction": "You are a helpful assistant and answer in a friendly tone.",
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)  # Output is 24kHz

        async for response in session.receive():
            if response.data is not None:
                wf.writeframes(response.data)

            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)

        wf.close()

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';  // npm install wavefile
const { WaveFile } = pkg;

const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });
// WARNING: Do not use API keys in client-side (browser based) applications
// Consider using Ephemeral Tokens instead
// More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens

// Half cascade model:
// const model = "gemini-2.0-flash-live-001"

// Native audio output model:
const model = "gemini-2.5-flash-preview-native-audio-dialog"

const config = {
  responseModalities: [Modality.AUDIO], 
  systemInstruction: "You are a helpful assistant and answer in a friendly tone."
};

async function live() {
    const responseQueue = [];

    async function waitMessage() {
        let done = false;
        let message = undefined;
        while (!done) {
            message = responseQueue.shift();
            if (message) {
                done = true;
            } else {
                await new Promise((resolve) => setTimeout(resolve, 100));
            }
        }
        return message;
    }

    async function handleTurn() {
        const turns = [];
        let done = false;
        while (!done) {
            const message = await waitMessage();
            turns.push(message);
            if (message.serverContent && message.serverContent.turnComplete) {
                done = true;
            }
        }
        return turns;
    }

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                responseQueue.push(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    // Send Audio Chunk
    const fileBuffer = fs.readFileSync("sample.wav");

    // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
    const wav = new WaveFile();
    wav.fromBuffer(fileBuffer);
    wav.toSampleRate(16000);
    wav.toBitDepth("16");
    const base64Audio = wav.toBase64();

    // If already in correct format, you can use this:
    // const fileBuffer = fs.readFileSync("sample.pcm");
    // const base64Audio = Buffer.from(fileBuffer).toString('base64');

    session.sendRealtimeInput(
        {
            audio: {
                data: base64Audio,
                mimeType: "audio/pcm;rate=16000"
            }
        }

    );

    const turns = await handleTurn();

    // Combine audio data strings and save as wave file
    const combinedAudio = turns.reduce((acc, turn) => {
        if (turn.data) {
            const buffer = Buffer.from(turn.data, 'base64');
            const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
            return acc.concat(Array.from(intArray));
        }
        return acc;
    }, []);

    const audioBuffer = new Int16Array(combinedAudio);

    const wf = new WaveFile();
    wf.fromScratch(1, 24000, '16', audioBuffer);  // output is 24kHz
    fs.writeFileSync('audio.wav', wf.toBuffer());

    session.close();
}

async function main() {
    await live().catch((e) => console.error('got error', e));
}

main();

后续步骤

如需了解主要功能和配置（包括语音活动检测和原生音频功能），请参阅完整的 Live API 功能指南。
阅读工具使用指南，了解如何将 Live API 与工具和函数调用集成。
如需管理长时间运行的对话，请参阅会话管理指南。
如需了解如何在客户端到服务器应用中进行安全身份验证，请参阅暂时性令牌指南。
如需详细了解底层 WebSockets API，请参阅 WebSockets API 参考文档。