Speech Recognition

The presentation discusses speech recognition, including its history from the 1960s to present day, the basic structure of speech recognition systems, and common types and models. It covers applications in areas like car systems, medical documentation, aircraft, and education. Challenges are also outlined such as handling noise, overlapping speech, and differences between languages. The future of speech recognition is explored with DARPA projects aiming for improved translation accuracy.

Uploaded by

anisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

321 views17 pages

Speech Recognition

Uploaded by

anisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

PowerPoint Presentation

On
SPEECH RECOGNITION
Submitted To: Submitted By:
Prof. Neha Mathur Anisha Mittal
CONTENTS
 Introduction
 History
 Types Of Speech Recognition
 Models Of Speech Recognition
 Applications
 Drawbacks and Challenges
 Future Of Speech Recognition
INTRODUCTION
Speech Recognition basically means talking to a computer,
having it recognize what we are saying. It is also known as
Automatic Speech Recognition(ASR), Computer Speech
Recognition or Speech to text(STT) which means
understanding voice by the computer and performing any
required task.
 Speech Recognition is the process of converting an
acoustic signal, captured by a microphone, to a set of
words.
 The recognized words can be an end in themselves, as for
applications such as commands and control, data entry,
and document preparation.
 They can also serve as the input to further linguistic
processing in order to achieve speech understanding.
BASIC STRUCTURE
HISTORY
 Around since 1960s, ASR has seen steady, incremental
improvement over the years.
 It has benefitted greatly from increased processing speed of
computers in the last decade, entering the marketplace in
the mid-2000s.
 Early systems were acoustic phonetics-based and worked
with small vocabularies to identify isolated words.
 Over the years, vocabularies have grown while ASRs systems
have become statistics-based.
 They now have large vocabularies and can recognize
continuous speech.
TYPES OF SPEECH RECOGNITION
1. Speaker – Dependent: commonly used for dictation
software.
2. Speaker – Independent: more commonly used in
telephone applications.
MODELS OF SPEECH RECOGNITION
 First digital sampling is done, as when you speak you
create vibrations in the air and Analog to Digital
Convertor(ADC) translates this analog wave into digital
data that computer can understand, to do this it samples
sound by taking precise measurements of wave at
frequent intervals.
 The system filters it then to remove unwanted noise.
ACOUSTIC MODEL
 Next the signal is divided into small segments as short as a
few hundredths of a second, or even thousandths in the
case of plosive consonant sounds -- consonant stops
produced by obstructing airflow in the vocal tract -- like
"p" or "t."
 The program then matches these segments to
known phonemes in the appropriate language.
 A phoneme is the smallest element of a language -- a
representation of the sounds we make and put together to
form meaningful expressions.
LANGUAGE MODEL

 The program examines phonemes in the context of the

other phonemes around them.
 The program examines phonemes in the context of the
other phonemes around them.
 The program then determines what the user was probably
saying and either outputs it as text or issues a computer
command.
APPLICATIONS OF SPEECH RECOGNITION

1. In Car Systems- Typically a manual control input, for example by means

of a finger control on the steering-wheel, enables the speech recognition
system and this is signaled to the driver by an audio prompt. Following the
audio prompt, the system has a "listening window" during which it may
accept a speech input for recognition.

2. Medical documentation- In the health care sector, speech

recognition can be implemented in front-end or back-end of the medical
documentation process. Front-end speech recognition is where the provider
dictates into a speech-recognition engine, the recognized words are
displayed as they are spoken, and the dictator is responsible for editing and
signing off on the document. Back-end or deferred speech recognition is
where the provider dictates into a digital dictation system, the voice is
routed through a speech-recognition machine and the recognized draft
document is routed along with the original voice file to the editor, where
the draft is edited and report finalized.
3. High-performance fighter aircraft- Substantial efforts have been
devoted in the last decade to the test and evaluation of speech recognition
in fighter aircraft. Of particular note have been the US program in speech
recognition for a variety of aircraft platforms. In these programs, speech
recognizers have been operated successfully in fighter aircraft, with
applications including setting radio frequencies, commanding an autopilot
system, setting steer-point coordinates and weapons release parameters, and
controlling flight display.

4. Usage in education and daily life- For language learning, speech

recognition can be useful for learning a second language. It can teach proper
pronunciation, in addition to helping a person develop fluency with their
speaking skills.
DRAWBACKS
 Low signal-to-noise ratio - The program needs to "hear" the words
spoken distinctly, and any extra noise introduced into the sound will interfere
with this.

 Overlapping speech- Current systems have difficulty separating

simultaneous speech from multiple users.

 Intensive use of computer power.

 Homonyms- e.g. "There" and "their," "air" and "heir," "be" and "bee"
MAJOR CHALLENGES
 Making a system that can flawlessly handle roadblocks like
slang, dialects, accents and background noise.

 The different grammatical structures used by languages

can also pose a problem. For example, Arabic sometimes
uses single words to convey ideas that are entire
sentences in English.
FUTURE OF SPEECH RECOGNITION
 The Defense Advanced Research Projects Agency (DARPA) has three
teams of researchers working on Global Autonomous Language
Exploitation (GALE), a program that will take in streams of
information from foreign news broadcasts and newspapers and
translate them.

 It hopes to create software that can instantly translate two languages

with at least 90 percent accuracy.

 "DARPA is also funding an R&D effort called TRANSTAC to enable the

soldiers to communicate more effectively with civilian populations in
non-English-speaking countries.
THANK YOU!!!

Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
SPEECH
100% (1)
SPEECH
17 pages
AI & Voice Recognition Basics
No ratings yet
AI & Voice Recognition Basics
24 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Speech Recognition PPT F
100% (2)
Speech Recognition PPT F
16 pages
AI Speech Recognition Overview
No ratings yet
AI Speech Recognition Overview
32 pages
Speech Recognition
No ratings yet
Speech Recognition
16 pages
Synopsis
No ratings yet
Synopsis
18 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
9 pages
Speech Recognition System: A Project Report Submitted by
0% (1)
Speech Recognition System: A Project Report Submitted by
28 pages
Speech and Audio Signal Processing Course
No ratings yet
Speech and Audio Signal Processing Course
30 pages
Project Proposal: FPGA Based Speech Recognition Project
100% (1)
Project Proposal: FPGA Based Speech Recognition Project
9 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Automatic Speech Recognition Documentation
No ratings yet
Automatic Speech Recognition Documentation
24 pages
AI Speech Recognition Guide
100% (3)
AI Speech Recognition Guide
13 pages
Voice Recognition System Report
No ratings yet
Voice Recognition System Report
17 pages
UNIT 5 Application AI
No ratings yet
UNIT 5 Application AI
16 pages
Speech Recognition Seminar
100% (2)
Speech Recognition Seminar
19 pages
Speech Recognition MY Final Year Project
75% (85)
Speech Recognition MY Final Year Project
82 pages
Text To Speech Synthesis 93
No ratings yet
Text To Speech Synthesis 93
15 pages
Speech Based Emotion Recognition
No ratings yet
Speech Based Emotion Recognition
26 pages
Artificial Intelligence For Speech Recognition
92% (12)
Artificial Intelligence For Speech Recognition
48 pages
Python Speech Recognition Guide
No ratings yet
Python Speech Recognition Guide
18 pages
CCS369 - TSS-Unit 5
No ratings yet
CCS369 - TSS-Unit 5
23 pages
Sign Language and Common Gesture Using CNN
0% (1)
Sign Language and Common Gesture Using CNN
7 pages
Voice Assistent - Minor
No ratings yet
Voice Assistent - Minor
14 pages
Speech Tech for Education
No ratings yet
Speech Tech for Education
4 pages
Speech Tech for HCI Designers
100% (6)
Speech Tech for HCI Designers
12 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Visual Speech Recognition
No ratings yet
Visual Speech Recognition
14 pages
Smart Calculator
100% (1)
Smart Calculator
11 pages
Mini Project Progress Presentation: Chatbot (Artificial Intellgence Customer Care Service
100% (1)
Mini Project Progress Presentation: Chatbot (Artificial Intellgence Customer Care Service
11 pages
Speech Recognition: by Shubham Nautiyal
No ratings yet
Speech Recognition: by Shubham Nautiyal
13 pages
Silent Sound Tech Explained
No ratings yet
Silent Sound Tech Explained
17 pages
CCS369 - TSS-Unit 4
No ratings yet
CCS369 - TSS-Unit 4
30 pages
Voice Assistant
No ratings yet
Voice Assistant
46 pages
Sign Language Recognition System With Speech Output
No ratings yet
Sign Language Recognition System With Speech Output
5 pages
Currency Detector App For Visually Impaired
No ratings yet
Currency Detector App For Visually Impaired
5 pages
Chapter 2: THE PROJECT
No ratings yet
Chapter 2: THE PROJECT
25 pages
Sign Language Converter Report
No ratings yet
Sign Language Converter Report
18 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
1.INTRODUCTION A voice browser is a “device which interprets a (voice) markup language and is capable of generating voice output and/or interpreting voice input,and possibly other input/output modalities." The definition of a voice browser, above, is a broad one.The fact that the system deals with speech is obvious given the first word of the name,but what makes a software system that interacts with the user via speech a "browser"?The information that the system uses (for either domain data or dialog flow) is dynamic and comes somewhere from the Internet. From an end-user's perspective, the impetus is to provide a service similar to what graphical browsers of HTML and related technologies do today, but on devices that are not equipped with full-browsers or even the screens to support them. This situation is only exacerbated by the fact that much of today's content depends on the ability to run scripting languages and 3rd-party pl
No ratings yet
1.INTRODUCTION A voice browser is a “device which interprets a (voice) markup language and is capable of generating voice output and/or interpreting voice input,and possibly other input/output modalities." The definition of a voice browser, above, is a broad one.The fact that the system deals with speech is obvious given the first word of the name,but what makes a software system that interacts with the user via speech a "browser"?The information that the system uses (for either domain data or dialog flow) is dynamic and comes somewhere from the Internet. From an end-user's perspective, the impetus is to provide a service similar to what graphical browsers of HTML and related technologies do today, but on devices that are not equipped with full-browsers or even the screens to support them. This situation is only exacerbated by the fact that much of today's content depends on the ability to run scripting languages and 3rd-party pl
24 pages
SARA A Voice Assistant Using Python
No ratings yet
SARA A Voice Assistant Using Python
18 pages
Multi-Language Image to Speech Conversion
No ratings yet
Multi-Language Image to Speech Conversion
31 pages
Python AI Virtual Assistant Guide
No ratings yet
Python AI Virtual Assistant Guide
17 pages
Hand Gesture Recognition and Voice Conversion For Deaf and Dumb
No ratings yet
Hand Gesture Recognition and Voice Conversion For Deaf and Dumb
8 pages
Unit 3
No ratings yet
Unit 3
14 pages
Text-to-Speech Project Report
No ratings yet
Text-to-Speech Project Report
26 pages
Real Time Systems
No ratings yet
Real Time Systems
27 pages
Difference
100% (1)
Difference
2 pages
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
100% (2)
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
20 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
A Report On
No ratings yet
A Report On
35 pages
Speech Recognition for Tech Enthusiasts
No ratings yet
Speech Recognition for Tech Enthusiasts
26 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Unionbank of The Philippines: Verint 15.2 Preventive Maintenance
No ratings yet
Unionbank of The Philippines: Verint 15.2 Preventive Maintenance
7 pages
Avigilon Control Center 5 Datasheet (En)
No ratings yet
Avigilon Control Center 5 Datasheet (En)
9 pages
Assign A Secondary Time Manager For Primary Time Managers
No ratings yet
Assign A Secondary Time Manager For Primary Time Managers
20 pages
MA325 GPS Project: Numerical Solutions
No ratings yet
MA325 GPS Project: Numerical Solutions
3 pages
Tabarina Lab3
No ratings yet
Tabarina Lab3
50 pages
Linux Command Cheat Sheet
No ratings yet
Linux Command Cheat Sheet
1 page
TS80 Soldering Iron Guide
No ratings yet
TS80 Soldering Iron Guide
23 pages
SC200 Digital Controller
No ratings yet
SC200 Digital Controller
56 pages
Mobile Application Development Chap-2. Designing The User Interface
No ratings yet
Mobile Application Development Chap-2. Designing The User Interface
5 pages
IT Practical Sample Question Paper
100% (1)
IT Practical Sample Question Paper
3 pages
SataDosBoot MWP
No ratings yet
SataDosBoot MWP
2 pages
Advantage of Java Generics
No ratings yet
Advantage of Java Generics
9 pages
Bep Sic HSBC 04
No ratings yet
Bep Sic HSBC 04
50 pages
Insurance Claim Project Guide
No ratings yet
Insurance Claim Project Guide
3 pages
System Admin Duties & Checklist
No ratings yet
System Admin Duties & Checklist
2 pages
MCQS - Cs 707 Paper-Solved
No ratings yet
MCQS - Cs 707 Paper-Solved
21 pages
Steel Detailing Guide for Engineers
100% (1)
Steel Detailing Guide for Engineers
43 pages
Fundamentals of Computer (Theory)
No ratings yet
Fundamentals of Computer (Theory)
2 pages
GSIM Operator
No ratings yet
GSIM Operator
60 pages
Spare Parts Provisioning
100% (3)
Spare Parts Provisioning
47 pages
User's Guide: W Ing FTP Server Help
No ratings yet
User's Guide: W Ing FTP Server Help
174 pages
PMIS Application User Guide
No ratings yet
PMIS Application User Guide
58 pages
Ls
No ratings yet
Ls
5 pages
Industrial Internet of Things: Applying Iot in The Industrial Context Professor Duncan Mcfarlane University of Cambridge
100% (1)
Industrial Internet of Things: Applying Iot in The Industrial Context Professor Duncan Mcfarlane University of Cambridge
16 pages
Chapter 11 Intro To Controlnet
No ratings yet
Chapter 11 Intro To Controlnet
67 pages
Information Security Management System: Integrated Research Campus
No ratings yet
Information Security Management System: Integrated Research Campus
30 pages
Pashto and Geography 8th Class 2022 Guide KPK
100% (1)
Pashto and Geography 8th Class 2022 Guide KPK
20 pages
WinGeom Basics - Geometric Drawing For The Web
No ratings yet
WinGeom Basics - Geometric Drawing For The Web
27 pages
Api Error Codes
No ratings yet
Api Error Codes
7 pages
CLASS1 Fundamentals of Data Structures
No ratings yet
CLASS1 Fundamentals of Data Structures
21 pages

Speech Recognition

Uploaded by

Speech Recognition

Uploaded by

PowerPoint Presentation

 The program examines phonemes in the context of the

1. In Car Systems- Typically a manual control input, for example by means

2. Medical documentation- In the health care sector, speech

4. Usage in education and daily life- For language learning, speech

 Overlapping speech- Current systems have difficulty separating

 Intensive use of computer power.

 The different grammatical structures used by languages

 It hopes to create software that can instantly translate two languages

 "DARPA is also funding an R&D effort called TRANSTAC to enable the

You might also like