0. 总结
tts | 速度 | 效果 | 部署难度 | 综合评价 |
---|---|---|---|---|
espeak | 快到飞起 | 不太行。机器人音色,姑且能用 | 非常简单 | ⭐️ |
edge-tts | 非常快 | 很好。不过是微软在线服务,需要联网 | 非常简单 | ⭐️⭐️⭐️ |
sherpa-onnx | 很快 | 比较好。多音字有点问题,有很多不自然的地方 | 中等 | ⭐️⭐️⭐️⭐️ |
piper | 很快 | 中等。听着是人声,但语音很不自然 | 简单 | ⭐️⭐️⭐️ |
melo-tts | 很快 | 中等。听着是人声,但语音很不自然 | 中等 | ⭐️⭐️⭐️⭐️⭐️ |
chattts | 比较慢 | 很好 | 看起来简单,但是本地部署没成功 | ⭐️⭐️⭐️ |
cosyvoice | 比较慢 | 很好 | 中等 | ⭐️⭐️⭐️ |
比较慢 | 很好 | 中等 | ⭐️⭐️⭐️ |
1. espeak
mac上brew install,linux上apt-get install即可
使用也很简单:espeak -v zh 天气不错
跑的虽然快,但是音色特别“机械”
2. edge-tts
使用edge-tts(微软家的,需要联网)的示例代码如下:
import edge_tts,os,asyncio,sys
from pydub import AudioSegment,playback
async def read(text):
tts = edge_tts.Communicate(text=text, voice='zh-CN-YunxiNeural',rate = '+5%')
if 'temp.mp3' in os.listdir('.'):
os.system("rm temp.mp3")
await tts.save(text+".mp3")
asyncio.run(read(sys.argv[1]))
playback.play(AudioSegment.from_mp3(sys.argv[1]+'.mp3'))
3. parler-tts
huggingface最近出的,可以通过语音提示 (voice prompts),控制说话者的声调、语速、性别、噪音程度、情绪特征等。
目前只支持英文,期待后面支持中文。
如果想利用Mac上的mps,需要安装torch nightly。安装命令:
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
测试代码:
# demo.py
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# 指定下载的模型的地址
model_path = f"{前面下载好的模型存放的目录}/parler-tts/parler_tts_mini_v0.1"
model = ParlerTTSForConditionalGeneration.from_pretrained(model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path")
# 需要转成声音的文本
prompt = "Hey, how are you doing today?"
# 对声音的要求
description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."
#description = "an old man"
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
4. sherpa
参考文档:https://k2-fsa.github.io/sherpa/sherpa.pdf
使用sherpa-onnx的参考代码如下,模型下载地址见https://hf-mirror.com/csukuangfj/vits-zh-aishell3
import soundfile as sf
import sherpa_onnx
def write(text,output_filename,sid=10,provider='cpu'):
tts_config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
vits=sherpa_onnx.OfflineTtsVitsModelConfig(
model='vits-aishell3.onnx',
lexicon='lexicon.txt',
tokens='tokens.txt',
),
provider=provider
),
rule_fsts='number.fst',
max_num_sentences=2,
)
audio = sherpa_onnx.OfflineTts(tts_config).generate(text, sid=sid)
sf.write(
output_filename,
audio.samples,
samplerate=audio.sample_rate,
subtype="PCM_16",
)
在GPU上不能直接用pip安装。首先安装好onnxruntime-gpu(jetson上参考https://www.elinux.org/Jetson_Zoo),然后找到libonnxruntime_providers_***.so文件,创建软连接:
sudo ln -s /usr/local/libXXX.so /usr/lib/libXXX.so
然后参照https://k2-fsa.github.io/sherpa/onnx/python/index.html进行安装:
执行下面的命令
git clone https://github.com/k2-fsa/sherpa-onnx
export SHERPA_ONNX_CMAKE_ARGS="-DSHERPA_ONNX_ENABLE_GPU=ON"
cd sherpa-onnx
python3 setup.py install
5. pytts3和pytts4
pip install pyttsx3
pip install -U pyobjc
import pyttsx4
engine = pyttsx4.init()
engine.say('this is an english text to voice test.')
engine.runAndWait()
使用很简单
import pyttsx3
pyttsx3.speak("I will speak this text")
import pyttsx4
engine = pyttsx4.init()
engine.say('this is an english text to voice test.')
engine.runAndWait()
6. Melo-tts
参见https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#linux-and-macos-install
速度很快,效果也不错。如果运行不起来,可以先检查一下pytorch版本。目前来看,这是效果和速度最佳的一个了。
git clone https://hub.nuaa.cf/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
然后安装unidic和unidic-lite,再将unidic-lite文件夹下的dicdata拷贝到unidic目录下
然后在~目录下安装nltk_data:https://hub.nuaa.cf/nltk/nltk_data
然后export HF_ENDPOINT=https://hf-mirror.com
然后执行melo-ui,或者直接在目录下执行python melo/app.py,可以打开网页:
命令行方式:melo “text-to-speech 领域近年来发展迅速” zh.wav -l ZH
使用python api:
from melo.api import TTS
model = TTS(language='ZH',device='auto')
model.tts_to_file("text-to-speech 领域近年来发展迅速", 1, 'zh.wav')
此外,melotts还有一个cpp版本,参见https://github.com/apinge/MeloTTS.cpp/blob/ZH_MIX_EN/README.zh-CN.md和https://github.com/zhaohb/MeloTTS-OV/tree/speech-enhancement-and-npu,看起来速度会更快。
7. piper
支持多语言,针对树莓派做了优化的https://github.com/rhasspy/piper#running-in-python。
piper速度极快,效果还行。
在mac上安装有点特别,步骤为:
pip install piper-phonemize-cross
pip install piper-tts --no-deps
语音下载见:https://github.com/rhasspy/piper/blob/master/VOICES.md
使用非常简单:
!echo '请问有什么可以帮助您的?' | piper --model m.onnx --output_file welcome.wav
8. voiceapi
这是一个流式的系统,基于sherpa-onnx。代码地址:https://github.com/ruzhila/voiceapi。将页面上需要的模型下载到model文件夹下(silero_vad.onnx放在silero_vad下),然后执行python app.py即可启动页面,当然也可以使用接口:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"sid": 0,
"samplerate": 16000
}' -o helloworkd.wav
流式接口参见这里:
9. F5-tts
原版见https://github.com/SWivid/F5-TTS,onnx版本见https://github.com/DakeQQ/F5-TTS-ONNX。这里只看onnx版本
10. chat-tts
各种衍生项目的合集:https://github.com/libukai/Awesome-ChatTTS/tree/en
onnx版本:https://github.com/ZillaRU/ChatTTS-ONNX/tree/main
使用方法很简单,首先pip install ChatTTS
进行安装,然后下载到这里下载资源:https://huggingface.co/2Noise/ChatTTS/tree/main/asset
import ChatTTS
import torch
import torchaudio
chat = ChatTTS.Chat()
chat.load(compile=True) # Set to True for better performance
texts = ["你好"]
wavs = chat.infer(texts)
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
11. coqui-ai
支持多语言,现在很流行的https://github.com/coqui-ai/TTS
12. cosyvoice & cosyvoice2
13. tortoise-tts
原版:
有cpp版本:https://github.com/balisujohn/tortoise.cpp.git