0% found this document useful (1 vote)
283 views27 pages

Speech Recognition

sppech recognition window speech recognition processing, afvantage, disadvantage, limitation, future scope, how speaker recognition works
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
283 views27 pages

Speech Recognition

sppech recognition window speech recognition processing, afvantage, disadvantage, limitation, future scope, how speaker recognition works
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Speech Recognition

By,
Chhatbar Jay(14mecc03)
Lokender Sekhawat(14mecc08)

What is Speech Recognition?


Speech recognition is the ability of a
machine or program to identify words and
phrases in spoken language and convert
them to a machine-readable format.
Also, known as automatic speech
recognition or computer speech recognition
which means understanding voice by the
computer and performing any required task.
Speech Recognition (SR) is the ability to
translate a dictation or spoken word to text.

Where can it be used?


Dictation
System control/navigation
Commercial/Industrial applications
Voice dialing

Block diagram of speech


recognition

Speech Modeling
Acoustic Model
An acoustic model is created by taking audio
recordings of speech, and their text transcriptions,
and
using
software
to
create
statistical
representations of the sounds that make up each
word. It is used by aspeech recognitionengine to
recognize speech.

Language Model
Language modeling is used in manynatural
language processingapplications such asspeech
recognition tries to capture the properties of a
language, and to predict the next word in a speech
sequence.

TYPES OF VOICE
RECOGNITION
There are two types of speech recognition. One is
calledspeaker-dependentand
the
other
is
speaker-independent. Speaker-dependent software is
commonly used for dictation software, while speakerindependent software is more commonly found in
telephone applications.
Speaker-dependent software works by learning the
unique characteristics of a single persons voice, in a
way similar to voice recognition. New users must first
train the software by speaking to it, so the computer
can analyze how the person talks. This often means
users have to read a few pages of text to the computer
before they can use the speech recognition software.

TYPES OF VOICE
RECOGNITION
Speaker-independent software is designed to
recognize anyones voice, so no training is involved.
This means it is the only real option for applications
such as interactive voice response systems where
businesses cant ask callers to read pages of text
before using the system. The downside is that
speaker-independent software is generally less
accurate than speaker-dependent software.
Speech recognition engines that are speaker
independent generally deal with this fact by limiting
the grammars they use. By using a smaller list of
recognized words, the speech engine is more likely
to correctly recognize what a speaker said.

How do humans do it?

Articulation produces
sound waves which
the ear conveys to the brain
for processing

How might computers do it?

Acoustic waveform

Acoustic signal

Digitization
Acoustic analysis of the speech
signal
Language interpretation

Speech recognition

DIFFERENT PROCESSES
INVOLVED
Digitization
Converting analogue signal into digital
representation

Signal processing
Separating speech from background noise

Phonetics
Variability in human speech

Phonology
Recognizing individual sound distinctions
(similar phonemes)
is the systematic use of sound to encode
meaning in any spokenhuman language

DIFFERENT PROCESSES
INVOLVED(CONTD.)
Lexicology and syntax
Lexicology is that part oflinguisticswhich
studieswords, their nature and meaning,
words' elements, relations between words,
words groups and the whole lexicon.

Syntax and pragmatics


Semantics tells about themeaning
Pragmatics is concerned with bridging the
explanatory gap betweensentencemeaning
and speaker's meaning

Digitization
Analogue to digital conversion
Sampling and quantizing

Sampling is converting a continuous signal into a discrete signal


Quantizing is the process of approximating a continuous range of
values

Use filters to measure energy levels for various


points on the frequency spectrum
Knowing the relative importance of different
frequency bands (for speech) makes this
process more efficient
E.g. high frequency sounds are less informative,
so can be sampled using a broader bandwidth
(log scale)

Separating speech from background


noise
Noise cancelling microphones
Two mics, one facing speaker, the other facing
away
Ambient noise is roughly same for both mics

Knowing which bits of the signal relate to


speech

Process of speech recognition


S1

Speaker
Recognition

Speech
Recognition

parsing
and
arbitration

S2

SK

SN

Switch on
Channel 9

S1

Speaker
Recognition

Speech
Recognition

parsing
and
arbitration

S2

SK

SN

Who is
speaking?

Speaker
Recognition

Speech
Recognition

S1

parsing
and
arbitration

S2

SK

Annie
David
Cathy

Authentication

SN

What is he
saying?

Speaker
Recognition

Speech
Recognition

parsing
and
arbitration

S1

S2

SK

On,Off,TV
Fridge,Door

Understanding

SN

What is he
talking
about?

Speaker
Recognition

Speech
Recognition

parsing
and
arbitration

S1

S2

SK
Switch,to,channel,nine

Inferring and execution

Channel->TV
Dim->Lamp
On->TV,Lamp

SN

Framework of Voice Recognition


S1

Face
Recognition

Gesture
Recognition

parsing
and
arbitration

S2

SK

SN

Authentication

Understanding Inferring and execution

Speaker Recognition

Definition
It is the method of recognizing a person based on his voice
It is one of the forms of biometric identification

Depends of speaker specific


characteristics.

Generic Speaker Recognition System


Speech signal

Analysis

Preprocessing Frames

Feature
Extraction

Preprocessing

Feature
Extraction

Feature
Vector

Score

Pattern
Matching

Speaker Model

ADVANTAGES
Advantages
People with disabilities
Organizations - Increases productivity, reduces costs and
errors.
Lower operational Costs
Advances in technology will allow consumers and
businesses to implement speech recognition systems at
a relatively low cost.
Cell-phone users can dial pre-programmed numbers by voice
command.
Users can trade stocks through a voice-activated trading
system.
Speech recognition technology can also replace touch-tone
dialing resulting in the ability to target customers that speak
different languages

DISADVANTAGES
Difficult to build a perfect system.
Conversations
Involves more than just words (non-verbal communication;
stutters etc.
Every human being has differences such as their voice,
mouth, and speaking style.

Filtering background noise is a task that can even be


difficult for humans to accomplish.

Future of Speech Recognition


Accuracy will become better and better.
Dictation speech recognition will gradually become
accepted.
Small hand-held writing tablets for computer speech
recognition dictation and data entry will be
developed, as faster processors and more memory
become available.
Greater use will be made of "intelligent systems"
which will attempt to guess what the speaker
intended to say, rather than what was actually said,
as people often misspeak and make unintentional
mistakes.
Microphone and sound systems will be designed to
adapt more quickly to changing background noise
levels, different environments, with better
recognition of extraneous material to be discarded.

References
1. Alwang, Greg. Speech Recognition, PC
Magazine, December 1 1999
2. Hauptmann, Alexander G. Jang, Photina
Jaeyun. Carnegie Mellon University. Learning to
Recognize Speech by Watching Television, IEEE
Intelligent Systems, September/October 1999.
3. Miastkowski, Stan. Latest Speech Software
Gets You Up and Running Faster, PC World,
November 1999.

THANK YOU

You might also like