tensorflow_end2end_speech_recognition：基于TensorFlow的端到端语音识别实现（CTC，Attention和MTL培训）

共332个文件

py：207个

yml：59个

txt：30个

tensorflow

end-to-end

speech-recognition

beam-search

automatic-speech-recognition

需积分: 50 32 浏览量 2021-02-06 07:34:18 上传评论收藏 809KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

tensorflow_end2end_speech_recognition：基于TensorFlow的端到端语音识别实现（CTC，Attention和MTL培训）（332个子文件）

.gitignore 244B

LICENSE 1KB

README.md 2KB

README.md 1022B

README.md 228B

LDC93S1.phn 535B

attention_seq2seq.py 31KB

train_multitask_ctc.py 26KB

train_ctc.py 25KB

beam_search_decoder_from_tensorflow.py 25KB

train_attention.py 22KB

train_ctc_temp.py 21KB

finetune_ctc_dialog.py 20KB

train_joint_ctc_attention.py 20KB

train_attention.py 20KB

train_attention.py 18KB

multitask_ctc.py 18KB

blstm.py 18KB

train_ctc.py 18KB

train_student_cnn_xe.py 17KB

train_multitask_ctc.py 17KB

train_attention.py 17KB

train_ctc.py 17KB

student_ctc.py 16KB

save_ctc_prob.py 16KB

train_multitask_ctc.py 16KB

attention_layer.py 16KB

ctc.py 15KB

joint_ctc_attention.py 15KB

train_ctc.py 15KB

probs.py 15KB

ctc.py 14KB

test_encoder.py 13KB

beam_search_decoder.py 13KB

ctc.py 13KB

attention.py 12KB

attention_decoder.py 12KB

lstm.py 11KB

test_attention.py 10KB

cldnn_wang.py 10KB

test_ctc.py 10KB

decode_multitask_ctc.py 10KB

eval_ensemble4_ctc.py 10KB

bn_lstm.py 10KB

ctc.py 10KB

plot_ctc_posterior.py 9KB

test_joint_ctc_attention.py 9KB

decode_ctc.py 9KB

eval_ctc.py 9KB

decode_ctc.py 9KB

edit_distance.py 9KB

attention.py 9KB

vgg_blstm.py 9KB

vgg_lstm.py 9KB

attention.py 9KB

dynamic_decoder.py 9KB

test_multitask_ctc.py 9KB

plot_attention_weights.py 9KB

lstm.py 9KB

plot_ctc_prob.py 8KB

plot_multitask_ctc_prob.py 8KB

ctc.py 8KB

plot_ctc_prob.py 8KB

model_base.py 8KB

decode_attention.py 8KB

decode_multitask_ctc.py 8KB

joint_ctc_attention.py 8KB

decode_attention.py 8KB

eval_multitask_ctc.py 8KB

multitask_ctc.py 8KB

attention.py 7KB

decode_ctc.py 7KB

bn_blstm_ctc.py 7KB

vgg_wang.py 7KB

eval_attention.py 7KB

plot_ctc_prob.py 7KB

pyramidal_blstm.py 7KB

eval_ctc.py 7KB

ctc.py 7KB

eval_ensemble8_ctc.py 7KB

eval_student.py 6KB

cnn_zhang.py 6KB

data.py 6KB

eval_multitask_ctc.py 6KB

eval_ctc_temp.py 6KB

eval_ctc.py 6KB

eval_attention.py 6KB

eval_framewise.py 6KB

beam_search_decoder.py 6KB

test_ctc_decoder.py 6KB

test_tf_qrnn_work.py 6KB

plot_ctc_posterior.py 6KB

qrnn.py 6KB

gru.py 6KB

test_load_dataset_ctc.py 5KB

student_cnn_compact_ctc.py 5KB

student_cnn_ctc.py 5KB

xe.py 5KB

共 332 条

## TensorFlow Implementation of End-to-End Speech Recognition ### Requirements - TensorFlow >= 1.3.0 - tqdm >= 4.14.0 - python-Levenshtein >= 0.12.0 - setproctitle >= 1.1.10 - seaborn >= 0.7.1 ### Corpus #### [TIMIT](https://catalog.ldc.upenn.edu/LDC93S1) - Phone (39, 48, 61 phones) - character #### [LibriSpeech](http://www.openslr.org/12/) - Phone (under implementation) - Character - Word #### [CSJ (Corpus of Spontaneous Japanese)](http://pj.ninjal.ac.jp/corpus_center/csj/en/) - Phone (under implementation) - Japanese kana character (about 150 classes) - Japanese kanji characters (about 3000 classes) These corpuses will be added in the future. - Switchboard - WSJ - [AMI](http://groups.inf.ed.ac.uk/ami/corpus/) This repository does'nt include pre-processing and pre-processing is based on [this repo](https://github.com/hirofumi0810/asr_preprocessing). If you want to do pre-processing, please look at this repo. ### Model #### Encoder - BLSTM - LSTM - BGRU - GRU - VGG-BLSTM - VGG-LSTM - Multi-task BLSTM - you can set another CTC layer to the aubitrary layer. - Multi-task LSTM - VGG #### Connectionist Temporal Classification (CTC) [\[Graves+ 2006\]](http://dl.acm.org/citation.cfm?id=1143891) - Greedy decoder - Beam Search decoder - Beam Search decoder w/ CharLM (under implementation) ##### Options - Frame-stacking [\[Sak+ 2015\]](https://arxiv.org/abs/1507.06947) - Multi-GPUs training (synchronous) - Splicing - Down sampling (under implementation) #### Attention Mechanism ##### Decoder - Greedy decoder - Beam search decoder (under implementation) ##### Attention type - Bahdanau's content-based attention - Bahdanau's normed content-based attention (under implementation) - location-based attention - Hybrid attention - Luong's dot attention - Luong's scaled dot attention (under implementation) - Luong's general attention - Luong's concat attention - Baidu's attention (under implementation) ###### Options - Sharpning - Temperature regularization in the softmax layer (Output posteriors) - Joint CTC-Attention [\[Kim 2016\]](https://arxiv.org/abs/1609.06773.) - Coverage (under implementation) ### Usage Please refer to docs in each corpuse - TIMIT - LibriSpeech - CSJ ### Lisense MIT ### Contact [email protected]