AISHELL-2 中文语音数据库

AISHELL-2是面向 Mandarin ASR 的一个大型开放源语音数据库,包含1000小时的清洁朗读语音数据,用于学术研究。数据覆盖12个领域,由1991位不同口音的中国发言人录制,文本准确率超过96%。此外,还提供了适用于工业应用的改进流程,支持多种先进的ASR技术。AISHELL-2旨在促进研究社区的转移学习和鲁棒ASR研究,以及帮助行业构建实际系统和产品。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Abstract

AISHELL-1 is by far the largest open-source speech corpus available for Mandarin speech recognition research. It was released with a baseline system containing solid training and testing pipelines for Mandarin ASR. In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, containing key components for industrial applications, such as Chinese word segmentation, flexible vocabulary expension and phone set transformation etc. Pipelines support various state-of-the-art techniques, such as time-delayed neural networks and Lattic-Free MMI objective funciton. In addition, we also release dev and test data from other channels(Android and Mic). For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR. For industry, we hope AISHELL-2 recipe can be a helpful reference for building meaningful industrial systems and products.

Index Terms: Speech recognition, Mandarin ASR, Industrial Speech Recognition

Introduction

Automatic Speech Recognition (ASR) is a major application domain in the bloom of Artificial Intelligence (AI). Huge effort has been made from both research community and industry to improve ASR system performance. Among all solutions proposed, deep learning approach has been dominating for the last half decade. Given enough data, neural network (NN) models generally perform better in terms of recognition accuracy, and turn out to be more robust. From industrial perspective, accessing and collecting large amount of speech data has become easier than ever before, with emerging market of smart phones and various other smart devices. However, on the other hand, research community still has limited-access to real-world application data. As a result, improvements in research community do not always scale well to industrial scenarios. In computer vision, there are many high quality free data sets which transform research efforts into industrial applications, such as ImageNet [1] and COCO [2]. In Mandarin ASR, although there are corpus like thchs30 [3] and hkust [4], a large-scale high-quality free corpus is still needed.

AISHELL-2 is a 1000-hour Mandarin Chinese Speech Corpus. 718 hours are from AISHELL-ASR0009-[ZH-CN] and 282 hours are from AISHELL-ARS0010-[ZH-CN]. The speech utterance contains 12 domains, including keywords, voice command, smart home, autonomous driving, industrial production, etc.The recording was put in quiet indoor environment, using 3 different devices in parallel: high fidelity microphone (44.1kHz, 16-bit); Android-system mobile phone (16kHz, 16-bit), iOS-system mobile phone (16kHz, 16-bit). AISHELL-2 choose audio data record by iOS-system.1991 speakers from different accent areas in China were participate in this recording. The manual transcription accuracy rate is above 96%, through professional speech annotation and strict quality inspection.( This database is free for academic research, not in the commerce, if without permission. )

希尔贝壳中文普通话语音数据库AISHELL-2的语音时长为1000小时,其中718小时来自AISHELL-ASR0009-[ZH-CN],282小时来自AISHELL-ASR0010-[ZH-CN]。录音文本涉及唤醒词、语音控制词、智能家居、无人驾驶、工业生产等12个领域。录制过程在安静室内环境中, 同时使用3种不同设备: 高保真麦克风(44.1kHz,16bit);Android系统手机(16kHz,16bit);iOS系统手机(16kHz,16bit)。AISHELL-2采用iOS系统手机录制的语音数据。1991名来自中国不同口音区域的发言人参与录制。经过专业语音校对人员转写标注,并通过严格质量检验,此数据库文本正确率在96%以上。(支持学术研究,未经允许禁止商用。)

1000小时 | 1000 Hours

1991人中文普通话

1991 speakers in the recording

语音识别实验

声纹实验

Speech & Speaker Recognition evaluation

Kaldi系统应用

merged with Kaldi system

Kaldi recipe

希尔贝壳—专注于人工智能大数据和技术的创新北京希尔贝壳科技有限公司成立于2017年,是一家专注人工智能大数据和技术服务的创新公司。针对家居、车载、机器人等语音智能产品做精准场景语音数据并输出方案。利用机器学习平台,在语音数据评测、辅助转写、数据分析、智能语音客服等场景业务建立了领先的核心技术体系。http://www.aishelltech.com/aishell_2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值