Applications PDF
Applications PDF
Vishu R Viswanathan
TI Fellow, Director, Speech Technologies Lab
DSP Solutions R&D Center
Texas Instruments, Dallas, Texas
[email protected]
Sampled Channel or
Analyzer Encoder
Speech s(n) x(n) y(n) Medium y(n)
Decoder Synthesizer
x(n) s(n)
Low High
Complexity MIPS, Memory Complexity
Human
Music
Speech Sound Effects
ITU Standards
coder rate (kb/s) approach
G.711 64 Mu/A-law
G.726 16-40 ADPCM
G.728 16 LD-CELP
G.729 8 CS-ACELP
G.723.1 5.3/6.3 MP/ACELP
Wideband Standards
coder rate (kb/s) approach
G.722 48,56,64 SB-ADPCM
G.722.1 24,32 Transform
ITU WB 16,24 ACELP
AMR WB 6.60-23.85 ACELP
VMR WB 1.0-13.3 ACELP
Dictionary
and Rules
Feature Acoustic
Decoding
Extraction Scoring
Acoustic Language
Models Models
Clean Noisy
Speech Handheld Hands-free Speech
Low High
Complexity MIPS, Memory Complexity
Server Distributed Client
Based Based
March 2004 Vishu Viswanathan 21
Performance & Robustness
Performance
Recognition Accuracy: Word error rate (WER) or task completion rate
High enough performance required for user acceptance
Robustness Issues
Training versus operational condition differences
Background noise: extent of noise, its variability (Usually additive)
Channel variability: different microphones, different telephone circuits,
handheld, handsfree, handheld-handsfree (Usually convolutive)
Recognizer must have means to compensate for noise and channel variabilities
Out-of-vocabulary rejection capability
Speaker dialect and accent variability (handled by speaker adaptation)
User Interface: Very important for the success of an application
Noise Suppression
Playback Enhancement
Acoustic Echo Cancellation
A
channel
E H ( z ) H(z)
Error Signal C
e( n ) y (n) microphone
x ( n) - v(n) = u (n) + y (n) + n0 (n)