This demo operates by capturing a recording, then running a voice pipeline on it.
Run via:
python -m examples.voice.static.main
- We create a
VoicePipeline
, setup with a custom workflow. The workflow runs an Agent, but it also has some custom responses if you say the secret word. - When you speak, audio is forwarded to the voice pipeline. When you stop speaking, the agent runs.
- The pipeline is run with the audio, which causes it to:
- Transcribe the audio
- Feed the transcription to the workflow, which runs the agent.
- Stream the output of the agent to a text-to-speech model.
- Play the audio.
Some suggested examples to try:
- Tell me a joke (the assistant tells you a joke)
- What's the weather in Tokyo? (will call the
get_weather
tool and then speak) - Hola, como estas? (will handoff to the spanish agent)
- Tell me about dogs. (will respond with the hardcoded "you guessed the secret word" message)