Cosyvoice2

1.下载cosyvoice2 git 下载zip也可以

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone the submodule due to network failures, please run the following command until success
cd CosyVoice
git submodule update --init --recursive

 

2. 安装conda

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html

3.新建cosyvoice的conda 环境

conda create -n cosyvoicev2 -y python=3.10
conda activate cosyvoicev2
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

4.下载模型

# git模型下载,请确保已安装git lfs
mkdir pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd

 在cosyvoice目录新建pretrained_models

  • CosyVoice2-0.5B
    这是 CosyVoice2 系列的大型模型,参数规模为 5 亿 (0.5B)。相比基础版本,它可能在语音合成质量、多语言支持或音色表现力上更优,适合需要高精度语音生成的场景。

  • CosyVoice-300M
    基础版本的 CosyVoice 模型,参数规模 3 亿 (300M)。提供了语音合成的核心能力,适用于一般场景的语音生成任务。

  • CosyVoice-300M-SFT
    SFT (Supervised Fine-Tuning) 表示该模型经过了监督微调。通常基于基础模型,针对特定任务或数据进行了优化,可能在特定领域的语音合成效果更好。

  • CosyVoice-300M-Instruct
    Instruct 表示支持指令微调,意味着模型可以通过自然语言指令灵活控制语音合成的风格、情感等属性,增强了交互性和可控性。

  • CosyVoice-ttsfrd
    专注于 TTS (Text-to-Speech) 友好度优化的版本,可能在发音准确性、韵律自然度或特定语言 / 方言支持上有针对性改进。

5. 启动web服务, 参数为使用的模型

python webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M

python webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M-Instruct

 http://localhost:50000/?


6.代码调用

import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio


cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)

# NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference
# zero_shot usage
prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

# save zero_shot spk for future usage
assert cosyvoice.add_zero_shot_spk('希望你以后能够做的比我还好呦。', prompt_speech_16k, 'my_zero_shot_spk') is True
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '', '', zero_shot_spk_id='my_zero_shot_spk', stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
cosyvoice.save_spkinfo()

# fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248
for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中,他突然[laughter]停下来,因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):
    torchaudio.save('fine_grained_control_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

# instruct usage
for i, j in enumerate(cosyvoice.inference_instruct2('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '用四川话说这句话', prompt_speech_16k, stream=False)):
    torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

# bistream usage, you can use generator as input, this is useful when using text llm model as input
# NOTE you should still have some basic sentence split logic because llm can not handle arbitrary sentence length
def text_generator():
    yield '收到好友从远方寄来的生日礼物,'
    yield '那份意外的惊喜与深深的祝福'
    yield '让我心中充满了甜蜜的快乐,'
    yield '笑容如花儿般绽放。'
for i, j in enumerate(cosyvoice.inference_zero_shot(text_generator(), '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

C:\Users\Administrator\anaconda3\envs\cosyvoice\python.exe D:\LLM\CosyVoice\CosyVoice\test.py 
failed to import ttsfrd, use WeTextProcessing instead
Traceback (most recent call last):
  File "C:\Users\Administrator\anaconda3\envs\cosyvoice\lib\pydoc.py", line 439, in safeimport
    module = __import__(path)
  File "D:\LLM\CosyVoice\CosyVoice\cosyvoice\flow\flow_matching.py", line 17, in <module>
    from matcha.models.components.flow_matching import BASECFM
ModuleNotFoundError: No module named 'matcha.models'; 'matcha' is not a package

During handling of the above exception, another exception occurred:

错误解决

pip install Matcha-TTS

conda 环境安装Matcha-TTS 依赖

(cosyvoice) D:\LLM\CosyVoice\CosyVoice\third_party\Matcha-TTS>pip install -r requirements.txt 

7.部署docker(运行失败)

cd runtime/python
docker build -t cosyvoice:v2.0 .
# change pretrained_models/CosyVoice-300M to pretrained_models/CosyVoice-300M-Instruct if you want to use instruct inference

# for grpc usage
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v2.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python server.py --port 50000 --max_conc 4 --model_dir pretrained_models/CosyVoice-300M-Instruct && sleep infinity"
cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>

# for fastapi usage
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v2.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && python server.py --port 50000 --model_dir pretrained_models/CosyVoice-300M-Instruct && sleep infinity"
cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>

 

 

 

8.1 系统找不到指定的文件 生成错误

尝试安装ffmpeg

https://www.gyan.dev/ffmpeg/builds/https://www.gyan.dev/ffmpeg/builds/

https://github.com/BtbN/FFmpeg-Builds/releaseshttps://github.com/BtbN/FFmpeg-Builds/releases

 

8.2 不能读出方言

安装ttsfrd依赖

(cosyvoicev2) D:\LLM\CosyVoice\CosyVoice\pretrained_models\CosyVoice-ttsfrd>pip install ttsfrd_dependency-0.1-py3-none-any.whl
Processing d:\llm\cosyvoice\cosyvoice\pretrained_models\cosyvoice-ttsfrd\ttsfrd_dependency-0.1-py3-none-any.whl
Installing collected packages: ttsfrd-dependency
Successfully installed ttsfrd-dependency-0.1

 安装 ttsfrd

(cosyvoicev2) D:\LLM\CosyVoice\CosyVoice\pretrained_models\CosyVoice-ttsfrd>pip install ttsfrd
Collecting ttsfrd
  Downloading ttsfrd-0.1.0-py3-none-any.whl.metadata (291 bytes)
Downloading ttsfrd-0.1.0-py3-none-any.whl (1.3 kB)
Installing collected packages: ttsfrd
Successfully installed ttsfrd-0.1.0

安装后启动时候报错

(cosyvoicev2) D:\LLM\CosyVoice\CosyVoice>python webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M-Instruct
C:\Users\Administrator\anaconda3\envs\cosyvoicev2\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-06-29 16:52:14,383 INFO input frame rate=50
C:\Users\Administrator\anaconda3\envs\cosyvoicev2\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
Traceback (most recent call last):
  File "D:\LLM\CosyVoice\CosyVoice\webui.py", line 188, in <module>
    cosyvoice = CosyVoice(args.model_dir)
  File "D:\LLM\CosyVoice\CosyVoice\cosyvoice\cli\cosyvoice.py", line 41, in __init__
    self.frontend = CosyVoiceFrontEnd(configs['get_tokenizer'],
  File "D:\LLM\CosyVoice\CosyVoice\cosyvoice\cli\frontend.py", line 65, in __init__
    self.frd = ttsfrd.TtsFrontendEngine()
AttributeError: module 'ttsfrd' has no attribute 'TtsFrontendEngine'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\LLM\CosyVoice\CosyVoice\webui.py", line 191, in <module>
    cosyvoice = CosyVoice2(args.model_dir)
  File "D:\LLM\CosyVoice\CosyVoice\cosyvoice\cli\cosyvoice.py", line 152, in __init__
    raise ValueError('{} not found!'.format(hyper_yaml_path))
ValueError: pretrained_models/CosyVoice-300M-Instruct/cosyvoice2.yaml not found!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\LLM\CosyVoice\CosyVoice\webui.py", line 193, in <module>
    raise TypeError('no valid model_type!')
TypeError: no valid model_type!

原始是安装了错误的ttsfrd

(cosyvoicev2) D:\LLM\CosyVoice\CosyVoice\pretrained_models\CosyVoice-ttsfrd>pip install ttsfrd

使用pip uninstall ttsfrd,卸载错误版本

没有适用于windows的ttsfrd 版本

 使用WeTextProcessing 代替ttsfrd,但是不能说方言

9.方言最终方案使用整合包

可以输出四川话

API调用方法

promoteAudio/sichuan.wav 为样例四川话音频

prompt_text 为样例四川话音频内容

    result = client.predict(
		tts_text=command,
		mode_checkbox_group="自然语言控制",
		prompt_text="要不要嘛  要不要得 你要爪子  老子数到三 把你龟儿卡撕烂  莫挨老子  个人爬 滚",
		prompt_wav_upload=file('promoteAudio/sichuan.wav'),
		prompt_wav_record=file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
		instruct_text="用四川话说这段话",
		seed=0,
		stream="false",
		speed=1,
		api_name="/generate_audio"
        )

Web使用方法

### CosyVoice2 技术文档和使用教程 #### 一、技术概述 CosyVoice2 是一款深度融合文本理解和语音生成的新型语音合成技术,能够精准解析并诠释各类文本内容,将其转化为宛如真人般的自然语音。相较于前代产品,CosyVoice2 提供更准、更稳、更快、更好的语音生成能力[^2]。 #### 二、安装指南 为了方便开发者快速上手,官方提供了详细的安装说明。以下是基于 Python 的环境配置方法: 1. 安装依赖库: ```bash pip install cosyvoice-sdk ``` 2. 配置 API 密钥(假设密钥名为 `COSY_VOICE_API_KEY`): ```python import os os.environ["COSY_VOICE_API_KEY"] = "your_api_key_here" ``` #### 三、基本功能调用 通过简单的几行代码即可实现基础的文字转语音功能: ```python from cosyvoice import TextToSpeech tts = TextToSpeech() audio_data = tts.synthesize("欢迎体验 CosyVoice2!") with open('output.wav', 'wb') as f: f.write(audio_data) ``` 这段代码会将字符串 “欢迎体验 CosyVoice2!” 转换成音频文件 output.wav 并保存到本地磁盘中。 #### 四、高级特性支持 除了标准 TTS 功能外,CosyVoice2 还引入了一些增强型特性,比如情感表达控制、语速调整以及多角色对话模拟等功能。这些都可以通过设置不同的参数来实现个性化定制化服务[^3]。 例如,在合成过程中加入情绪调节选项可以使得输出更加生动形象;而利用内置的角色模板则可以让不同人物之间的交流显得更为真实可信。 #### 五、常见问题解答 对于初次接触该平台的新用户来说,可能会遇到一些困惑之处。为此,开发团队整理了一份 FAQ 文档,涵盖了从账户注册到具体操作流程等多个方面的问题解释与解决方案[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值