Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text
What is Auto Speech Recognition (ASR)?
Hello,
how are you?
Voice Speech Waveform Feature from Audio
(e.g., Spectrogram)
Auto ML Text
Automatic Speech Recognition (ASR)
The technology of converting speech to written form (called speech-to-
text) which human can interpret the meaning of text.
1/7
How to Understand Speech: Approach by Educated Human
• Reading Spectrogram
source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
How to Understand Speech: Approach by AI
• Static ASR system
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model
↓
Pronunciation Model
↓
Language Model
↓
Text
• End-to-End Neural ASR System
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model with DNN
↓
Language Model with DNN
↓
Text
(Graves, Jaitley, 2014, “Towards End-to-End Speech
Recognition with Recurrent Neural Networks”, ICML)
(Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network
for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
Challenges in a System Using Only Acoustic Model
Word “Probably”
Dictionary Pronunciation pr aa b ax b l iy
Actual pronunciations
(many common ways
to pronounce)
p r aa b iy
p r aw l uh
p r aa l iy
p aa b uh b liy
p ow ih
p aa iy
p r ah b iy
(Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”)
I
• probably (p r aa b iy)
• probability (p r aa b iy)
play tennis.
Language Model
4/7
BeyondText – Insights
Insight
• Topic Classification
• Semantic Parsing and Question Answering
• Customer Segmentation/ Prioritization
• Summary Generation
Voice Sound Waveform Feature Auto ML Text Insights!
5/7
Current Challenges in ASR
• Noisy real-life conversion with multiple speakers
• Robustness to variations in ages and accents
• Integration of effort across multiple dialects with transfer learning
• Embedded ASR system locally on mobile devices without internet
connection
• Bad channel conditions (intermittently dropping voice)
(Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”)
6/7
Great Study Materials on Speech Recognition atYouTube
Deep Learning Lecture Series
1. “CS231N Winter 2016”
Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos
2. “CS231N Spring 2017”
Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos
3. “CS224N Winter 2017”
Natural Language Processing with Deep Learning by Stanford University, 19 videos
Auto Speech Recognition Sessions
1. Automatic Speech Recognition - An Overview
Presenter is Preethi Jyothi, IIT Bombay in Sep 2017
2. State-of-the-Art in Speech Technologies
Presenter is Preethi Jyothi, IIT Bombay in Jan 2018
3. Lecture 2 | Word Vector Representations: word2vec
Lecture 2 in Natural Language Processing with Deep Learning
4. Step by step through a spectrogram
Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University
7/7

More Related Content

DOCX
Speech Recognition by Iqbal
PPTX
Speech to text conversion
PPTX
A brief primer on OpenAI's GPT-3
PDF
Chatbot Artificial Intelligence
PPT
Automatic speech recognition
PPTX
Silent sound technologyrevathippt
PPT
Haptic Technology
Speech Recognition by Iqbal
Speech to text conversion
A brief primer on OpenAI's GPT-3
Chatbot Artificial Intelligence
Automatic speech recognition
Silent sound technologyrevathippt
Haptic Technology

What's hot (20)

PPTX
Natural Language Processing in AI
PDF
PPTX
Bionic Arm Using EMG Processing
PPTX
Speech recognition system seminar
DOCX
Voice morphing document
PPT
Blue brain ppt
PPT
Speech Recognition in Artificail Inteligence
PPTX
Natural language processing
PPSX
Speech recognition an overview
PPT
Speech Recognition System By Matlab
PPTX
Natural language processing
PDF
Mixed Reality in the Workspace
PPTX
BRAIN COMPUTER INTERFACE......
PPTX
Speech recognition final presentation
PPTX
TEXT-SPEECH PPT.pptx
PDF
Exploring the Brain Computer Interface
PPT
Voicemorphing
PPT
Voice morphing-
PPTX
Brain computer interface
Natural Language Processing in AI
Bionic Arm Using EMG Processing
Speech recognition system seminar
Voice morphing document
Blue brain ppt
Speech Recognition in Artificail Inteligence
Natural language processing
Speech recognition an overview
Speech Recognition System By Matlab
Natural language processing
Mixed Reality in the Workspace
BRAIN COMPUTER INTERFACE......
Speech recognition final presentation
TEXT-SPEECH PPT.pptx
Exploring the Brain Computer Interface
Voicemorphing
Voice morphing-
Brain computer interface
Ad

Similar to Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text (20)

PDF
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
PPTX
Wreck a nice beach: adventures in speech recognition
PPTX
PPTX
PDF
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
PPTX
Speech-Recognition.pptx
PPTX
Project_Phase1_-_Literature_Review-1[1].pptx
PPTX
Accent conversion using Deep neural network
PDF
Efficient Intralingual Text To Speech Web Podcasting And Recording
PDF
Nikko Ström at AI Frontiers: Deep Learning in Alexa
PPTX
Introduction to text to speech
PPTX
Voice Assistance Technology for integration with smart home ecosystem
PDF
ACHIEVING SECURITY VIA SPEECH RECOGNITION
PPTX
PDF
speech technologies with neural networks present
PDF
Language Modelling in Natural Language Processing-Part I.pdf
PDF
Recent advances in LVCSR : A benchmark comparison of performances
PDF
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
PDF
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
Wreck a nice beach: adventures in speech recognition
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
Speech-Recognition.pptx
Project_Phase1_-_Literature_Review-1[1].pptx
Accent conversion using Deep neural network
Efficient Intralingual Text To Speech Web Podcasting And Recording
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Introduction to text to speech
Voice Assistance Technology for integration with smart home ecosystem
ACHIEVING SECURITY VIA SPEECH RECOGNITION
speech technologies with neural networks present
Language Modelling in Natural Language Processing-Part I.pdf
Recent advances in LVCSR : A benchmark comparison of performances
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
Ad

Recently uploaded (20)

PPTX
Basic Statistical Analysis for experimental data.pptx
PDF
Library Hi Tech, technology of the world
PPT
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt
PPTX
Power BI - Microsoft Power BI is an interactive data visualization software p...
PPTX
DataGovernancePrimer_Hosch_2018_11_04.pptx
PDF
TenneT-Integrated-Annual-Report-2018.pdf
PPTX
The future of AIThe future of AIThe future of AI
PPTX
Bussiness Plan S Group of college 2020-23 Final
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PDF
American Journal of Multidisciplinary Research and Review
PDF
Stochastic Programming problem presentationLuedtke.pdf
PPTX
ISO 9001-2015 quality management system presentation
PPTX
text mining_Natural Language Processing.pptx
PPTX
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
PPTX
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
PPTX
Transport System for Biology students in the 11th grade
PPT
Handout for Lean and Six Sigma application
PDF
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
PPTX
BDA_Basics of Big data Unit-1.pptx Big data
PPTX
An Introduction to Lean Six Sigma for Bilginer
Basic Statistical Analysis for experimental data.pptx
Library Hi Tech, technology of the world
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt
Power BI - Microsoft Power BI is an interactive data visualization software p...
DataGovernancePrimer_Hosch_2018_11_04.pptx
TenneT-Integrated-Annual-Report-2018.pdf
The future of AIThe future of AIThe future of AI
Bussiness Plan S Group of college 2020-23 Final
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
American Journal of Multidisciplinary Research and Review
Stochastic Programming problem presentationLuedtke.pdf
ISO 9001-2015 quality management system presentation
text mining_Natural Language Processing.pptx
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
Evaluasi program Bhs Inggris th 2023-2024 dan prog th 2024-2025-1.pptx
Transport System for Biology students in the 11th grade
Handout for Lean and Six Sigma application
Lesson 1 - intro Cybersecurity and Cybercrime.pptx.pdf
BDA_Basics of Big data Unit-1.pptx Big data
An Introduction to Lean Six Sigma for Bilginer

Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text

  • 2. What is Auto Speech Recognition (ASR)? Hello, how are you? Voice Speech Waveform Feature from Audio (e.g., Spectrogram) Auto ML Text Automatic Speech Recognition (ASR) The technology of converting speech to written form (called speech-to- text) which human can interpret the meaning of text. 1/7
  • 3. How to Understand Speech: Approach by Educated Human • Reading Spectrogram source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
  • 4. How to Understand Speech: Approach by AI • Static ASR system Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model ↓ Pronunciation Model ↓ Language Model ↓ Text • End-to-End Neural ASR System Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model with DNN ↓ Language Model with DNN ↓ Text (Graves, Jaitley, 2014, “Towards End-to-End Speech Recognition with Recurrent Neural Networks”, ICML) (Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
  • 5. Challenges in a System Using Only Acoustic Model Word “Probably” Dictionary Pronunciation pr aa b ax b l iy Actual pronunciations (many common ways to pronounce) p r aa b iy p r aw l uh p r aa l iy p aa b uh b liy p ow ih p aa iy p r ah b iy (Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”) I • probably (p r aa b iy) • probability (p r aa b iy) play tennis. Language Model 4/7
  • 6. BeyondText – Insights Insight • Topic Classification • Semantic Parsing and Question Answering • Customer Segmentation/ Prioritization • Summary Generation Voice Sound Waveform Feature Auto ML Text Insights! 5/7
  • 7. Current Challenges in ASR • Noisy real-life conversion with multiple speakers • Robustness to variations in ages and accents • Integration of effort across multiple dialects with transfer learning • Embedded ASR system locally on mobile devices without internet connection • Bad channel conditions (intermittently dropping voice) (Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”) 6/7
  • 8. Great Study Materials on Speech Recognition atYouTube Deep Learning Lecture Series 1. “CS231N Winter 2016” Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos 2. “CS231N Spring 2017” Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos 3. “CS224N Winter 2017” Natural Language Processing with Deep Learning by Stanford University, 19 videos Auto Speech Recognition Sessions 1. Automatic Speech Recognition - An Overview Presenter is Preethi Jyothi, IIT Bombay in Sep 2017 2. State-of-the-Art in Speech Technologies Presenter is Preethi Jyothi, IIT Bombay in Jan 2018 3. Lecture 2 | Word Vector Representations: word2vec Lecture 2 in Natural Language Processing with Deep Learning 4. Step by step through a spectrogram Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University 7/7

Editor's Notes

  • #3: The reason that the sample is discretized within a certain sample rate is that within around 25 mili-seconds, you speech signal is stationary. Starting from raw speech waveform, we generate tiny slices, which is called speech frames, each speech frame represents a feature.
  • #4: Amy Costanza Smith
  • #8: Academic Research Summit, which was co-organized by Microsoft Research, was held at the International Institute of Information Technology (IIIT) Hyderabad on the 24th and 25th of January 2018.