0% found this document useful (0 votes)
321 views17 pages

Speech Recognition

The presentation discusses speech recognition, including its history from the 1960s to present day, the basic structure of speech recognition systems, and common types and models. It covers applications in areas like car systems, medical documentation, aircraft, and education. Challenges are also outlined such as handling noise, overlapping speech, and differences between languages. The future of speech recognition is explored with DARPA projects aiming for improved translation accuracy.

Uploaded by

anisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
321 views17 pages

Speech Recognition

The presentation discusses speech recognition, including its history from the 1960s to present day, the basic structure of speech recognition systems, and common types and models. It covers applications in areas like car systems, medical documentation, aircraft, and education. Challenges are also outlined such as handling noise, overlapping speech, and differences between languages. The future of speech recognition is explored with DARPA projects aiming for improved translation accuracy.

Uploaded by

anisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

PowerPoint Presentation

On
SPEECH RECOGNITION
Submitted To: Submitted By:
Prof. Neha Mathur Anisha Mittal
CONTENTS
 Introduction
 History
 Types Of Speech Recognition
 Models Of Speech Recognition
 Applications
 Drawbacks and Challenges
 Future Of Speech Recognition
INTRODUCTION
Speech Recognition basically means talking to a computer,
having it recognize what we are saying. It is also known as
Automatic Speech Recognition(ASR), Computer Speech
Recognition or Speech to text(STT) which means
understanding voice by the computer and performing any
required task.
 Speech Recognition is the process of converting an
acoustic signal, captured by a microphone, to a set of
words.
 The recognized words can be an end in themselves, as for
applications such as commands and control, data entry,
and document preparation.
 They can also serve as the input to further linguistic
processing in order to achieve speech understanding.
BASIC STRUCTURE
HISTORY
 Around since 1960s, ASR has seen steady, incremental
improvement over the years.
 It has benefitted greatly from increased processing speed of
computers in the last decade, entering the marketplace in
the mid-2000s.
 Early systems were acoustic phonetics-based and worked
with small vocabularies to identify isolated words.
 Over the years, vocabularies have grown while ASRs systems
have become statistics-based.
 They now have large vocabularies and can recognize
continuous speech.
TYPES OF SPEECH RECOGNITION
1. Speaker – Dependent: commonly used for dictation
software.
2. Speaker – Independent: more commonly used in
telephone applications.
MODELS OF SPEECH RECOGNITION
 First digital sampling is done, as when you speak you
create vibrations in the air and Analog to Digital
Convertor(ADC) translates this analog wave into digital
data that computer can understand, to do this it samples
sound by taking precise measurements of wave at
frequent intervals.
 The system filters it then to remove unwanted noise.
ACOUSTIC MODEL
 Next the signal is divided into small segments as short as a
few hundredths of a second, or even thousandths in the
case of plosive consonant sounds -- consonant stops
produced by obstructing airflow in the vocal tract -- like
"p" or "t." 
 The program then matches these segments to
known phonemes in the appropriate language.
 A phoneme is the smallest element of a language -- a
representation of the sounds we make and put together to
form meaningful expressions.
LANGUAGE MODEL

 The program examines phonemes in the context of the


other phonemes around them.
 The program examines phonemes in the context of the
other phonemes around them.
 The program then determines what the user was probably
saying and either outputs it as text or issues a computer
command.
APPLICATIONS OF SPEECH RECOGNITION

1. In Car Systems- Typically a manual control input, for example by means


of a finger control on the steering-wheel, enables the speech recognition
system and this is signaled to the driver by an audio prompt. Following the
audio prompt, the system has a "listening window" during which it may
accept a speech input for recognition.

2. Medical documentation-   In the health care sector, speech


recognition can be implemented in front-end or back-end of the medical
documentation process. Front-end speech recognition is where the provider
dictates into a speech-recognition engine, the recognized words are
displayed as they are spoken, and the dictator is responsible for editing and
signing off on the document. Back-end or deferred speech recognition is
where the provider dictates into a digital dictation system, the voice is
routed through a speech-recognition machine and the recognized draft
document is routed along with the original voice file to the editor, where
the draft is edited and report finalized.
3. High-performance fighter aircraft- Substantial efforts have been
devoted in the last decade to the test and evaluation of speech recognition
in fighter aircraft. Of particular note have been the US program in speech
recognition for a variety of aircraft platforms. In these programs, speech
recognizers have been operated successfully in fighter aircraft, with
applications including setting radio frequencies, commanding an autopilot
system, setting steer-point coordinates and weapons release parameters, and
controlling flight display.

4. Usage in education and daily life- For language learning, speech


recognition can be useful for learning a second language. It can teach proper
pronunciation, in addition to helping a person develop fluency with their
speaking skills.
DRAWBACKS
 Low signal-to-noise ratio - The program needs to "hear" the words
spoken distinctly, and any extra noise introduced into the sound will interfere
with this.

 Overlapping speech- Current systems have difficulty separating


simultaneous speech from multiple users.

 Intensive use of computer power.

 Homonyms- e.g. "There" and "their," "air" and "heir," "be" and "bee"
MAJOR CHALLENGES
 Making a system that can flawlessly handle roadblocks like
slang, dialects, accents and background noise.

 The different grammatical structures used by languages


can also pose a problem. For example, Arabic sometimes
uses single words to convey ideas that are entire
sentences in English.
FUTURE OF SPEECH RECOGNITION
 The Defense Advanced Research Projects Agency (DARPA) has three
teams of researchers working on Global Autonomous Language
Exploitation (GALE), a program that will take in streams of
information from foreign news broadcasts and newspapers and
translate them.

 It hopes to create software that can instantly translate two languages


with at least 90 percent accuracy.

 "DARPA is also funding an R&D effort called TRANSTAC to enable the


soldiers to communicate more effectively with civilian populations in
non-English-speaking countries.
THANK YOU!!!

You might also like