Ijarcet Vol 4 Issue 7 3067 3072 PDF
Ijarcet Vol 4 Issue 7 3067 3072 PDF
IV. AUTOMATIC SPEECH characterize and recognize information about the speaker
identity. The speaker recognition system may be viewed as
RECOGNITION (ASR): working in a four stages-
1. Basic Principle: a. Analysis
ASR systems operate in two phases. First, a training phase, b. Feature extraction
during which the system learns the reference patterns
representing the different speech sounds (e.g. phrases, c. Modeling
words, phones) that constitute the vocabulary of the d. Testing
application. Each reference is learned from spoken
examples and stored either in the form of templates a. Speech analysis:
obtained by some averaging method or models that
In Speech analysis technique Speech data contains different
characterize the statistical properties of pattern [6]. Second, types of information that shows a speaker identity. This
a recognizing phase, during which an unknown input
includes speaker specific information due to vocal tract,
pattern, is identified by considering the set of references. excitation source and behavior feature. The physical
structure and dimension of vocal tract as well as excitation
source are unique for each speaker. This uniqueness is
embedded in the speech signal during speech production
and can be used for speaker used for speaker recognition.
b. Feature Extraction Technique:
Feature Extraction is the most important part of speech
recognition since it plays an important role to separate one
speech from other. Because every speech has different
individual characteristics embedded in utterances[6]. These
characteristics can be extracted from a wide range of
feature extraction techniques proposed and successfully
exploited for speech recognition task. But extracted feature
should meet some criteria while dealing with the speech
signal such as:
a. Easy to measure extracted speech features.
b. It should not be susceptible to mimicry.
c. It should show little fluctuation from one speaking
environment to another.
The goal of speech recognition is for a machine to be able V. SYSTEM DESIGN &
to "hear,” understand," and "act upon" spoken information. IMPLEMENTATION :
The earliest speech recognition systems were first
attempted in the early 1950s at Bell Laboratories. The goal
of automatic speaker recognition is to analyze, extract
The system equipped a speech-to-text system using isolated Synthesized speech can also be used in many educational
word recognition with a vocabulary of ten words (digits 0 institutions in field of study as well as sports. If the teacher
to 9) and statistical modeling (HMM) for machine speech can be tired at a point of time but a computer with speech
recognition. In the training phase, the uttered digits are synthesizer can teach whole day with same performance,
recorded using 16-bit pulse code modulation (PCM) with a efficiency and accuracy.
sampling rate of 8 KHz and saved as a wave file using
sound recorder software. We use the MATLAB software’s 4.Telecommunication and Multimedia
wavered command to convert the .wav files to speech STT systems make it possible to access vocal information
samples. Generally, a speech signal consists of noise- over the telephone. Queries to such information retrieval
speech-noise in a noisy environment. The recognition of systems could be put through the user's voice (with the help
actual speech in the given samples is important. We divided of a speech recognizer), or through the telephone keyboard.
the speech signal into frames of 450 samples each with an Synthesized speech may also be used to speak out short text
overlap of 300 samples, i.e., two-thirds of a frame length. messages in mobile phones.
The speech is alienated from the pauses using voice activity
detection (VAD) techniques, which are discussed in detail 5.Man-Machine Communication
later in the paper. The system performs speech analysis and
synthesis using the linear predictive coding (LPC) method. Speech synthesis can be used in several kinds of human
From the LPC coefficients we get the weighted cepstral machine interactions and interfaces. For example, in
coefficients and cepstral time derivatives, which form the warning, alarm systems, clocks and washing machines
characteristic vector for a frame. Then, the system performs synthesized speech may be used to give more exact
vector quantization using a vector codebook. The resulting information of the current situation [5]. Speech signals are
vectors form the observation sequence. For each word in far better than that of warning lights or buzzers as it enables
the vocabulary, the system builds an HMM model and to react to the signal more fast if the person is unable to get
trains the model during the training phase. The training light due some obstacles.
steps, from VAD to HMM model building, are performed
using PC-based C programs. We load the resulting HMM
6.Voice Enabled E-mail
models onto an FPGA for the recognition phase. In the Voice-enabled e-mail uses voice recognition and speech
recognition phase, the speech is acquired vigorously from synthesis technologies to enable users to access their email
the microphone through a codec and is stored in the from any telephone. The subscriber dials a phone number
FPGA’s memory. These speech samples are preprocessed, to access a voice portal, then, to collect their email
and the probability of getting the observation sequence for messages, they press a couple of keys and, perhaps, say a
each model is calculated. The uttered word is recognized phrase like "Get my e-mail." Speech synthesis software
based on maximum likelihood estimation. converts e-mail text to a voice message, which is played
back over the phone. Voice-enabled e-mail is especially
VI. APPLICATIONS OF SPEECH TO useful for mobile workers, because it makes it possible for
TEXT SYSTEM: them to access their messages easily from virtually
anywhere (as long as they can get to a phone),without
The application field of STT is expanding fast whilst the having to invest in expensive equipment such as laptop
quality of STT systems is also increasing steadily. Speech computers or personal digital assistants.
synthesis systems are also becoming more affordable for
common customers, which makes these systems more VII. CONCLUSION:
suitable for everyday use & becomes a cot effective. Some
uses of STT are described below[5]. In this paper, we discussed the topics relevant to the
development of STT systems .The speech to text
1.Aid to Vocally Handicapped conversion may seem effective and efficient to its users if it
produces natural speech and by making several
A hand-held, battery-powered synthetic speech aid can be
modifications to it. This system is useful for deaf and dumb
used by vocally handicapped person to express their words.
people to Interact with the other peoples from society.
The device will have especially designed keyboard, which Speech to Text synthesis is a critical research and
accepts the input, and converts into the required speech application area in the field of multimedia interfaces. In this
within blink of eyes. paper gathers important references to literature related to
2.Source of Learning for Visually Impaired the endogenous variations of the speech signal and their
importance in automatic speech recognition. A database has
Most important fact for listening is an important skill for been created from the various domain words and syllables.
people who are blind. Blind individuals rely on their ability The desired speech is produced by the Concatenative
to hear or listen to gain information quickly and efficiently. speech synthesis approach. Speech synthesis is
Students use their sense of hearing to gain information advantageous for people who are visually handicapped.
from books on tape or CD, but also to assess what is This paper made a clear and simple overview of working
happening around them. of speech to text system (STT) in step by step process. The
system gives the input data from mice in the form of voice,
3.Games and Education then preprocessed that data & converted into text format
displayed on PC. The user types the input string and the
system reads it from the database or data store where the Journal of Solid-State Circuits, vol SC-22, NO 1, February
words, phones, diaphones, triphone are stored. In this 1987, pp 3-14.
paper, we presented the development of existing STT
system by adding spellchecker module to it for different [10] Aggarwal, R. K. and Dave, M., “Acoustic modelling
language. There are many speech to text systems (STT) problem for automatic speech recognition system:
available in the market and also much improvisation is advances and refinements (Part II)”, International Journal
of Speech Technology (2011) 14:309–320.
going on in the research area to make the speech more
effective, and the natural with stress and the emotions. [11] Ostendorf, M., Digalakis, V., & Kimball, O. A.
(1996). “From HMM’s to segment models: a unified view
VIII. ACKNOWLEDGMENT: of stochastic modeling for speech recognition”. IEEE
The authors would like to thanks the Director/Principal Dr. Transactions on Speech and Audio Processing, 4(5), 360–
Vinod Chowdhary,Prof.Aade K.U. and Prof.Bhope V.P. 378.
Savitribai Phule, Pune University, for their useful [12] Yasuhisa Fujii, Y., Yamamoto, K., Nakagawa, S.,
discussions and suggestions during the preparation of this “AUTOMATIC SPEECH RECOGNITION USING
technical paper. HIDDEN CONDITIONAL NEURAL FIELDS”, ICASSP
2011: P-5036-5039.
IX. REFERENCES:
[13] Mohamed, A. R., Dahl, G. E., and Hinton, G.,
[1] Sanjib Das, “Speech Recognition Technique: A “Acoustic Modelling using Deep Belief Networks”,
Review”,International Journal of Engineering Research and submitted to IEEE TRANS. On audio, speech, and
Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 3, language processing, 2010.
May-Jun 2012.
[14] Sorensen, J., and Allauzen, C., “Unary data structures
[2] Ms. Sneha K. Upadhyay,Mr. Vijay N. for Language Models”, INTERSPEECH 2011.
Chavda,”Intelligent system based on speech recognition
with capability of self learning” ,International Journal For [15] Kain, A., Hosom, J. P., Ferguson, S. H., Bush, B.,
Technological Research In Engineering ISSN (Online): “Creating a speech corpus with semi-spontaneous, parallel
2347 - 4718 Volume 1, Issue 9, May-2014. conversational and clear speech”, Tech Report: CSLU-11-
003, August 2011.
[3] Deepa V.Jose, Alfateh Mustafa, Sharan R,”A Novel
Model for Speech to Text Conversion” International
Refereed Journal of Engineering and Science (IRJES)ISSN
(Online) 2319-183X, Volume 3, Issue 1 (January 2014).
[4] B. Raghavendhar Reddy,E. Mahender,”Speech to Text
Conversion using Android Platform”, International Journal
of Engineering Research and Applications (IJERA) ISSN:
2248-9622,Vol. 3, Issue 1, January -February 2013.
[5] Kaveri Kamble, Ramesh Kagalkar,” A Review:
Translation of Text to Speech Conversion for Hindi
Language”, International Journal of Science and Research
(IJSR) ISSN (Online): 2319-7064.Volume 3 Issue 11,
November 2014.
[6] Santosh K.Gaikwad,Bharti W.Gawali,Pravin Yannawar,
“A Review on Speech Recognition
Technique”,International Journal of Computer Applications
(0975 – 8887)Volume 10– No.3, November 2010.
[7] Penagarikano, M.; Bordel, G., “Speech-to-text
translation by a non-word lexical unit based system,"Signal
Processing and Its Applications, 1999. ISSPA '99.
Proceedings of the Fifth International Symposium on ,
vol.1, no., pp.111,114 vol.1, 1999.
[8]. Olabe, J. C.; Santos, A.; Martinez, R.; Munoz, E.;
Martinez, M.; Quilis, A.; Bernstein, J., “Real time text-to-
speech conversion system for spanish," Acoustics, Speech,
and Signal Processing, IEEE International Conference on
ICASSP '84. , vol.9, no., pp.85,87, Mar 1984.
[9]Kavaler, R. et al., “A Dynamic Time Warp Integrated
Circuit for a 1000-Word Recognition System”, IEEE