Google Scholar

Conwst: Non-native multi-source knowledge distillation for low resource speech translation

W Zhu, H Jin, JW Chen, L Luo, J Wang, Q Lu… - … Conference on Cognitive …, 2021 - Springer

W Zhu, H Jin, JW Chen, L Luo, J Wang, Q Lu, A Li

International Conference on Cognitive Systems and Signal Processing, 2021•Springer

Abstract

In the absence of source speech information, most end-to-end speech translation (ST) models showed unsatisfactory results. So, for low-resource non-native speech translation, we propose a self-supervised bidirectional distillation processing system. It improves speech ST performance by using a large amount of untagged speech and text in a complementary way without adding source information. The framework is based on an attentional Sq2sq model, which uses wav2vec2.0 pre-training to guide the Conformer encoder for reconstructing the acoustic representation. The decoder generates a target token by fusing the out-of-domain embeddings. We investigate the use of Byte pair encoding (BPE) and compare it with several fusion techniques. Under the framework of ConWST, we conducted experiments on language transcription from Swahili to English. The experimental results show that the transcription under the framework has a better performance than the baseline model, which seems to be one of the best transcriptional methods.

Springer

Show moreShow less

Save Cite Cited by 1 Related articles

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Conwst: Non-native multi-source knowledge distillation for low resource speech translation