Speech Recognition
By,
Chhatbar Jay(14mecc03)
Lokender Sekhawat(14mecc08)
What is Speech Recognition?
Speech recognition is the ability of a
machine or program to identify words and
phrases in spoken language and convert
them to a machine-readable format.
Also, known as automatic speech
recognition or computer speech recognition
which means understanding voice by the
computer and performing any required task.
Speech Recognition (SR) is the ability to
translate a dictation or spoken word to text.
Where can it be used?
Dictation
System control/navigation
Commercial/Industrial applications
Voice dialing
Block diagram of speech
recognition
Speech Modeling
Acoustic Model
An acoustic model is created by taking audio
recordings of speech, and their text transcriptions,
and
using
software
to
create
statistical
representations of the sounds that make up each
word. It is used by aspeech recognitionengine to
recognize speech.
Language Model
Language modeling is used in manynatural
language processingapplications such asspeech
recognition tries to capture the properties of a
language, and to predict the next word in a speech
sequence.
TYPES OF VOICE
RECOGNITION
There are two types of speech recognition. One is
calledspeaker-dependentand
the
other
is
speaker-independent. Speaker-dependent software is
commonly used for dictation software, while speakerindependent software is more commonly found in
telephone applications.
Speaker-dependent software works by learning the
unique characteristics of a single persons voice, in a
way similar to voice recognition. New users must first
train the software by speaking to it, so the computer
can analyze how the person talks. This often means
users have to read a few pages of text to the computer
before they can use the speech recognition software.
TYPES OF VOICE
RECOGNITION
Speaker-independent software is designed to
recognize anyones voice, so no training is involved.
This means it is the only real option for applications
such as interactive voice response systems where
businesses cant ask callers to read pages of text
before using the system. The downside is that
speaker-independent software is generally less
accurate than speaker-dependent software.
Speech recognition engines that are speaker
independent generally deal with this fact by limiting
the grammars they use. By using a smaller list of
recognized words, the speech engine is more likely
to correctly recognize what a speaker said.
How do humans do it?
Articulation produces
sound waves which
the ear conveys to the brain
for processing
How might computers do it?
Acoustic waveform
Acoustic signal
Digitization
Acoustic analysis of the speech
signal
Language interpretation
Speech recognition
DIFFERENT PROCESSES
INVOLVED
Digitization
Converting analogue signal into digital
representation
Signal processing
Separating speech from background noise
Phonetics
Variability in human speech
Phonology
Recognizing individual sound distinctions
(similar phonemes)
is the systematic use of sound to encode
meaning in any spokenhuman language
DIFFERENT PROCESSES
INVOLVED(CONTD.)
Lexicology and syntax
Lexicology is that part oflinguisticswhich
studieswords, their nature and meaning,
words' elements, relations between words,
words groups and the whole lexicon.
Syntax and pragmatics
Semantics tells about themeaning
Pragmatics is concerned with bridging the
explanatory gap betweensentencemeaning
and speaker's meaning
Digitization
Analogue to digital conversion
Sampling and quantizing
Sampling is converting a continuous signal into a discrete signal
Quantizing is the process of approximating a continuous range of
values
Use filters to measure energy levels for various
points on the frequency spectrum
Knowing the relative importance of different
frequency bands (for speech) makes this
process more efficient
E.g. high frequency sounds are less informative,
so can be sampled using a broader bandwidth
(log scale)
Separating speech from background
noise
Noise cancelling microphones
Two mics, one facing speaker, the other facing
away
Ambient noise is roughly same for both mics
Knowing which bits of the signal relate to
speech
Process of speech recognition
S1
Speaker
Recognition
Speech
Recognition
parsing
and
arbitration
S2
SK
SN
Switch on
Channel 9
S1
Speaker
Recognition
Speech
Recognition
parsing
and
arbitration
S2
SK
SN
Who is
speaking?
Speaker
Recognition
Speech
Recognition
S1
parsing
and
arbitration
S2
SK
Annie
David
Cathy
Authentication
SN
What is he
saying?
Speaker
Recognition
Speech
Recognition
parsing
and
arbitration
S1
S2
SK
On,Off,TV
Fridge,Door
Understanding
SN
What is he
talking
about?
Speaker
Recognition
Speech
Recognition
parsing
and
arbitration
S1
S2
SK
Switch,to,channel,nine
Inferring and execution
Channel->TV
Dim->Lamp
On->TV,Lamp
SN
Framework of Voice Recognition
S1
Face
Recognition
Gesture
Recognition
parsing
and
arbitration
S2
SK
SN
Authentication
Understanding Inferring and execution
Speaker Recognition
Definition
It is the method of recognizing a person based on his voice
It is one of the forms of biometric identification
Depends of speaker specific
characteristics.
Generic Speaker Recognition System
Speech signal
Analysis
Preprocessing Frames
Feature
Extraction
Preprocessing
Feature
Extraction
Feature
Vector
Score
Pattern
Matching
Speaker Model
ADVANTAGES
Advantages
People with disabilities
Organizations - Increases productivity, reduces costs and
errors.
Lower operational Costs
Advances in technology will allow consumers and
businesses to implement speech recognition systems at
a relatively low cost.
Cell-phone users can dial pre-programmed numbers by voice
command.
Users can trade stocks through a voice-activated trading
system.
Speech recognition technology can also replace touch-tone
dialing resulting in the ability to target customers that speak
different languages
DISADVANTAGES
Difficult to build a perfect system.
Conversations
Involves more than just words (non-verbal communication;
stutters etc.
Every human being has differences such as their voice,
mouth, and speaking style.
Filtering background noise is a task that can even be
difficult for humans to accomplish.
Future of Speech Recognition
Accuracy will become better and better.
Dictation speech recognition will gradually become
accepted.
Small hand-held writing tablets for computer speech
recognition dictation and data entry will be
developed, as faster processors and more memory
become available.
Greater use will be made of "intelligent systems"
which will attempt to guess what the speaker
intended to say, rather than what was actually said,
as people often misspeak and make unintentional
mistakes.
Microphone and sound systems will be designed to
adapt more quickly to changing background noise
levels, different environments, with better
recognition of extraneous material to be discarded.
References
1. Alwang, Greg. Speech Recognition, PC
Magazine, December 1 1999
2. Hauptmann, Alexander G. Jang, Photina
Jaeyun. Carnegie Mellon University. Learning to
Recognize Speech by Watching Television, IEEE
Intelligent Systems, September/October 1999.
3. Miastkowski, Stan. Latest Speech Software
Gets You Up and Running Faster, PC World,
November 1999.
THANK YOU