sherpa-onnx使用http协议实现在线语音转文本

本文不生产技术,只做技术的搬运工!!

前言

        最近有一个客户需求,使用语音输入转成文本完成大模型文本输入,测试了whisper和sherpa-onnx模型的表现,发现whisper太慢了,sherpa-onnx可以兼容速度和准确度,因此使用该模型进行部署测试

实现

import wave
import numpy as np
import sherpa_onnx
from flask import Flask, request, jsonify
import io

app = Flask(__name__)

encoder_model_path = '/home/project_python/whisper/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx'
decoder_model_path = '/home/project_python/whisper/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx'
joiner_model_path = '/home/project_python/whisper/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx'
tokens_path = '/home/project_python/whisper/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt'
recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
    encoder=encoder_model_path,
    decoder=decoder_model_path,
    joiner=joiner_model_path,
    tokens=tokens_path,
    sample_rate=16000,
    feature_dim=80,
    provider="cpu"
)

def read_wave(wave_data):
    """
    Args:
      wave_data:
        Bytes of a wave file. It should be single channel and each sample should
        be 16-bit. Its sample rate does not need to be 16kHz.
    Returns:
      Return a tuple containing:
       - A 1-D array of dtype np.float32 containing the samples, which are
       normalized to the range [-1, 1].
       - sample rate of the wave file
    """

    with wave.open(io.BytesIO(wave_data), 'rb') as f:
        assert f.getnchannels() == 1, f.getnchannels()
        assert f.getsampwidth() == 2, f.getsampwidth()  # it is in bytes
        num_samples = f.getnframes()
        samples = f.readframes(num_samples)
        samples_int16 = np.frombuffer(samples, dtype=np.int16)
        samples_float32 = samples_int16.astype(np.float32)

        samples_float32 = samples_float32 / 32768
        return samples_float32, f.getframerate()

@app.route('/transcribe', methods=['POST'])
def transcribe():
    if 'file' not in request.files:
        return jsonify({'error': 'No file part'}), 400

    file = request.files['file']
    if file.filename == '':
        return jsonify({'error': 'No selected file'}), 400

    if file:
        wave_data = file.read()
        samples, sample_rate = read_wave(wave_data)

        streams = []
        s = recognizer.create_stream()
        s.accept_waveform(sample_rate, samples)
        tail_paddings = np.zeros(int(0.66 * sample_rate), dtype=np.float32)
        s.accept_waveform(sample_rate, tail_paddings)
        s.input_finished()
        streams.append(s)

        while True:
            ready_list = []
            for s in streams:
                if recognizer.is_ready(s):
                    ready_list.append(s)
            if len(ready_list) == 0:
                break
            recognizer.decode_streams(ready_list)
        results = [recognizer.get_result(s) for s in streams]
        return jsonify({'results': results})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8079)

Postman测试

 实验文件为歌曲“后会无期”的片段截取,本人自己唱的,发音不够标准,整体识别效果一般。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

魔障阿Q

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值