0% found this document useful (0 votes)
3 views49 pages

Unit 04 Copy

Uploaded by

Annu Yadav vlogs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views49 pages

Unit 04 Copy

Uploaded by

Annu Yadav vlogs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Unit – IV

Application of Neural Network

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


What is Pattern Recognition (PR)?
It is the study of how machines can

- observe the environment

- learn to distinguish patterns of interest from their background

- make sound and reasonable decisions about the categories


of the patterns.

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

2
What is a pattern?
Watanable [163] defines a pattern as
“ the opposite of a chaos; it is an entity, vaguely defined, that
could be given a name”

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

3
Other Patterns
Insurance, credit card applications - applicants are characterized by
- # of accidents, make of car, year of model
- Income, # of dependents, credit worthiness, mortgage amount
Dating services
- Age, hobbies, income, etc. establish your “desirability”
Web documents
- Key words based descriptions (e.g., documents containing
“terrorism”, “Osama” are different from those containing “football”,
“NFL”).
Housing market
- Location, size, year, school district, etc.

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

4
Pattern Class
A collection of “similar” (not necessarily identical) objects
- Inter-class variability

- Intra-class variabilityThe letter “T” in different typefaces

Characters that look similar

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

5
Pattern Class Model

Different descriptions, which are typically mathematical in form

for each class/population (e.g., a probability density like

Gaussian)

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

6
Classification vs Clustering
- Classification (known categories)
- Clustering (creation of new categories)

Category “A”

Clustering
Category “B” (Unsupervised
Classification (Recognition) Classification)
(Supervised Classification)
7

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

7
Pattern Recognition
Key Objectives:

- Process the sensed data to eliminate noise

- Hypothesize the models that describe each class population


(e.g., recover the process that generated the patterns).

- Given a sensed pattern, choose the best-fitting model for it


and then assign it to class associated with the model.

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

8
Emerging PR Applications
Problem Input Output
Speech recognition Speech waveforms Spoken words, speaker
identity
Non--destructive testing
Non Ultrasound, eddy current, Presence/absence of flaw,
acoustic emission waveforms type of flaw
Detection and diagnosis of EKG, EEG waveforms Types of cardiac conditions,
disease classes of brain conditions

Natural resource Multispectral images Terrain forms, vegetation


identification cover
Aerial reconnaissance Visual, infrared, radar images Tanks, airfields
Character recognition (page Optical scanned image Alphanumeric characters
readers, zip code, license
plate)

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

9
Emerging PR Applications (cont’d)
Problem Input Output

Identification and counting Slides of blood samples, micro-


micro- Type of cells
of cells sections of tissues

Inspection (PC boards, IC Scanned image (visible, infrared) Acceptable/unacceptable


masks, textiles)

Manufacturing 3-D images (structured light, Identify objects, pose,


laser, stereo) assembly

Web search Key words specified by a user Text relevant to the user

Fingerprint identification Input image from fingerprint Owner of the fingerprint,


sensors fingerprint classes

Online handwriting retrieval Query word written by a user Occurrence of the word in
the database

10

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

10
Main PR Areas
Template matching
- The pattern to be recognized is matched against a stored
template while taking into account all allowable pose (translation
and rotation) and scale changes.
Statistical pattern recognition
- Focuses on the statistical properties of the patterns (i.e.,
probability densities).
Structural Pattern Recognition
- Describe complicated objects in terms of simple primitives and
structural relationships.
Syntactic pattern recognition
- Decisions consist of logical rules or grammars.
Artificial Neural Networks
- Inspired by biological neural network models. 11

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

11
Statistical Pattern Recognition

Pattern Feature
Preprocessing Classification
extraction

Recognition
Training
Feature
Preprocessing Learning
Patterns selection
+
Class labels
12

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

12
Structural Pattern Recognition
Describe complicated objects in terms of simple primitives and
structural relationships.
Decision-making when features are non-numeric or structural

Scene

Object Background
N
L
X D E M N
T
M
Z
T X Y Z
D E 13

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


L
13
Syntactic Pattern Recognition

pattern Primitive, Syntax,


Preprocessing relation structural
extraction analysis
Recognition

Training
Grammatical,
Primitive
Preprocessing structural
selection
Patterns inference
+
Class labels

Describe patterns using deterministic grammars or formal languages


14

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

14
Chromosome Grammars

Image of human chromosomes

15

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Hierarchical-structure description of a submedian chromosome
15
Artificial Neural Networks
Massive parallelism is essential for complex pattern recognition tasks
(e.g., speech and image recognition)

- Human take only a few hundred ms for most cognitive tasks;


suggests parallel computation

Biological networks attempt to achieve good performance via dense


interconnection of simple computational elements (neurons)

- Number of neurons ≈ 1010 – 1012

- Number of interconnections/neuron ≈ 103 – 104

- Total number of interconnections ≈ 1014

16

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

16
Artificial Neural Nodes
Nodes in neural networks are nonlinear, typically
analog

x1 w1
x2
Y (output)
xd wd

where is an internal threshold

17

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

17
Multilayer Perceptron

Feed-forward nets with one or more layers (hidden) between the input and
output nodes
A three-layer net can generate arbitrary complex decision regions

. .
. . . .
. . .
c outputs
d inputs First hidden layer Second hidden layer
NH1 input units NH2 input units
These nets can be trained by the back-propagation training algorithm

18

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

18
Comparing Pattern Recognition Models
Template Matching

- Assumes very small intra-class variability

- Learning is difficult for deformable templates

Structural / Syntactic

- Primitive extraction is sensitive to noise


- Describing a pattern in terms of primitives is difficult

Statistical

- Assumption of density model for each class

Artificial Neural Network

- Parameter tuning and local minima in learning 19

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

19
Speech Recognition

20

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Definition

Speech recognition is the process of converting an acoustic signal,


captured by a microphone or a telephone, to a set of words.

The recognised words can be an end in themselves, as for applications


such as commands & control, data entry, and document preparation.

They can also serve as the input to further linguistic processing in order to
achieve speech understanding

21

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Speech Processing

Signal processing:
- Convert the audio wave into a sequence of feature vectors
Speech recognition:
- Decode the sequence of feature vectors into a sequence of words
Semantic interpretation:
- Determine the meaning of the recognized words
Dialog Management:
- Correct errors and help get the task done
Response Generation
- What words to use to maximize user understanding
Speech synthesis (Text to Speech):
- Generate synthetic speech from a ‘marked-up’ word string
22

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


What you can do with Speech Recognition

Transcription

- dictation, information retrieval

Command and control

- data entry, device control, navigation, call routing

Information access

- airline schedules, stock quotes, directory assistance

Problem solving

- travel planning, logistics


23

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Transcription and Dictation

Transcription is transforming a stream of human speech into


computer-readable form

- Medical reports, court proceedings, notes

- Indexing (e.g., broadcasts)

Dictation is the interactive composition of text

- Report, correspondence, etc.

24

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Speech recognition and understanding

Sphinx system

- speaker-independent

- continuous speech

- large vocabulary

ATIS system

- air travel information retrieval

- context management

25

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Speech Recognition and Call Centres

Automate services, lower payroll

Shorten time on hold

Shorten agent and client call time

Reduce fraud

Improve customer service

26

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Applications related to Speech Recognition

Speech Recognition
Figure out what a person is saying.
Speaker Verification
Authenticate that a person is who she/he claims to be.
Limited speech patterns
Speaker Identification
Assigns an identity to the voice of an unknown person.
Arbitrary speech patterns

27

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


kinds of Speech Recognition Systems

Speech recognition systems can be characterised by many


parameters.

An isolated-word (Discrete) speech recognition system requires that


the speaker pauses briefly between words, whereas a continuous
speech recognition system does not.

28

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


A TIMELINE OF SPEECH RECOGNITION

1890s Alexander Graham Bell discovers Phone while trying to


develop speech recognition system for deaf people.
1936AT&T's Bell Labs produced the first electronic speech
synthesizer called the Voder (Dudley, Riesz and Watkins).
This machine was demonstrated in the 1939 World Fairs by experts
that used a keyboard and foot pedals to play the machine and emit
speech.
1969John Pierce of Bell Labs said automatic speech recognition will
not be a reality for several decades because it requires artificial
intelligence.

29

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Early 70s

Early 1970'sThe Hidden Markov Modeling (HMM) approach to


speech recognition was invented by Lenny Baum of Princeton
University and shared with several ARPA (Advanced Research
Projects Agency) contractors including IBM.
HMM is a complex mathematical pattern-matching strategy that
eventually was adopted by all the leading speech recognition
companies including Dragon Systems, IBM, Philips, AT&T and
others.

30

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


70+

1971DARPA (Defense Advanced Research Projects Agency) established


the Speech Understanding Research (SUR) program to develop a
computer system that could understand continuous speech.
Lawrence Roberts, who initiated the program, spent $3 million per year of
government funds for 5 years. Major SUR project groups were established
at CMU, SRI, MIT's Lincoln Laboratory, Systems Development Corporation
(SDC), and Bolt, Beranek, and Newman (BBN). It was the largest speech
recognition project ever.
1978The popular toy "Speak and Spell" by Texas Instruments was
introduced. Speak and Spell used a speech chip which led to huge strides
in development of more human-like digital synthesis sound.
31

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


80+

1982Covox founded. Company brought digital sound (via The Voice


Master, Sound Master and The Speech Thing) to the Commodore 64,
Atari 400/800, and finally to the IBM PC in the mid ‘80s.
1982Dragon Systems was founded in 1982 by speech industry
pioneers Drs. Jim and Janet Baker. Dragon Systems is well known
for its long history of speech and language technology innovations
and its large patent portfolio.
1984SpeechWorks, the leading provider of over-the-telephone
automated speech recognition (ASR) solutions, was founded.

32

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


90s
1993 Covox sells its products out to Creative Labs, Inc.
1995 Dragon released discrete word dictation-level speech
recognition software. It was the first time dictation speech recognition
technology was available to consumers. IBM and Kurzweil followed a
few months later.
1996 Charles Schwab is the first company to devote resources
towards developing up a speech recognition IVR system with
Nuance. The program, Voice Broker, allows for up to 360
simultaneous customers to call in and get quotes on stock and
options... it handles up to 50,000 requests each day. The system was
found to be 95% accurate and set the stage for other companies
such as Sears, Roebuck and Co., and United Parcel Service of
America Inc., and E*Trade Securities to follow in their footsteps.
1996 BellSouth launches the world's first voice portal, called Val and
later Info By Voice.

33

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


95+
1997 Dragon introduced "Naturally Speaking", the first "continuous
speech" dictation software available (meaning you no longer need to
pause between words for the computer to understand what you're
saying).
1998 Lernout & Hauspie bought Kurzweil. Microsoft invested $45
million in Lernout & Hauspie to form a partnership that will eventually
allow Microsoft to use their speech recognition technology in their
systems.
1999 Microsoft acquired Entropic, giving Microsoft access to what
was known as the "most accurate speech recognition system" in the
Old VCR!

34

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


2000

2000 Lernout & Hauspie acquired Dragon Systems for approximately


$460 million.

2000 TellMe introduces first world-wide voice portal.

2000 NetBytel launched the world's first voice enabler, which includes
an on-line ordering application with real-time Internet integration for
Office Depot.

35

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


2000s

2001ScanSoft Closes Acquisition of Lernout & Hauspie Speech and


Language Assets.
2003ScanSoft Ships Dragon NaturallySpeaking 7 Medical, Lowers
Healthcare Costs through Highly Accurate Speech Recognition.
2003ScanSoft closes deal to distribute and support IBM ViaVoice
Desktop Products.

36

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Signal Variability
Speech recognition is a difficult problem, largely because of the many
sources of variability associated with the signal.

The acoustic realisations of phonemes, the recognition systems


smallest sound units of which words are composed, are highly
dependent on the context in which they appear.

These phonetic variables are exemplified by the acoustic differences


of the phoneme 't/'in two, true, and butter in English.

At word boundaries, contextual variations can be quite dramatic, and


devo andare sound like devandare in Italian.
37

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


What is a speech recognition system?

Speech recognition is generally used as a human computer interface


for other software. When it functions in this role, three primary tasks
need be performed.

Pre-processing, the conversion of spoken input into a form the


recogniser can process.

Recognition, the identification of what has been said.

Communication, to send the recognised input to the application that


requested it.

38

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


How is pre-processing performed

To understand how the first of these functions is performed, we must


examine,

Articulation, the production of the sound.

Acoustics, the stream of the speech itself.

What characterises the ability to understand spoke input, Auditory


perception.

39

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Neural networks
"if speech recognition systems could learn speech knowledge
automatically and represent this knowledge in a parallel distributed
fashion for rapid evaluation … such a system would mimic the function
of the human brain, which consists of several billion simple, inaccurate
and slow processors that perform reliable speech processing", (Waibel
and Hampshire, 1989).

An artificial neural network is a computer program, which attempt to


emulate the biological functions of the Human brain. They are an
excellent classification systems, and have been effective with noisy,
patterned, variable data streams containing multiple, overlapping,
interacting and incomplete cues, (Markowitz, 1995). 40

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Neural networks
Neural networks do not require the complete specification of a
problem, learning instead through exposure to large amount of
example data. Neural networks comprise of an input layer, one or
more hidden layers, and one output layer. The way in which the
nodes and layers of a network are organised is called the networks
architecture.

The allure of neural networks for speech recognition lies in their


superior classification abilities.

Considerable effort has been directed towards development of


networks to do word, syllable and phoneme classification.
41

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


The Phonetic Typewriter
Developed for Finnish (a phonetic language, written as it is said).
Trained on one speaker, will generalise to others.

A neural network is trained to cluster together similar sounds, which are


then labelled with the corresponding character.

When recognising speech, the sounds uttered are allocated to the closest
corresponding output, and the character for that output is printed.

• requires large dictionary of minor variations to correct general


mechanism

• noticeably poorer performance on speakers it has not been trained on

42

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


The Phonetic Typewriter (cont’d)
a a a ah h æ æ ø ø e e e

o a a h r æ l ø y y j i

o o a h r r r g g y j i

o o m a r m n m n j i i

l o u h v vm n n h hj j j

l u v v p d d t r h hi j

. . u v tk k p p p r k s

. . v k pt t p t p h s s
43

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Character recognition Neural networks

Recognition of both printed and handwritten


characters is a typical domain where neural
networks have been successfully applied.

44

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


A multilayer feedforward network for printed character recognition can
be used.

For simplicity, you can limit your task to the recognition of digits from 0
to 9. Each digit is represented by a 5 ´ 9 bit map

In commercial applications, where a better resolution is required, at least


16 ´ 16 bit maps are used.

45

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Bit maps for digit recognition

1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45

46

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


How do choose the architecture of a neural network?

The number of neurons in the input layer is decided by the number of


pixels in the bit map. The bit map in our example consists of 45 pixels,
and thus we need 45 input neurons.

The output layer has 10 neurons – one neuron for each digit to be
recognized.

47

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


How do we determine an optimal number of hidden neurons?

• Complex patterns cannot be detected by a small number of hidden


neurons; however too many of them can dramatically increase the
computational burden.

• Another problem is overfitting The greater the number of hidden


neurons, the greater the ability of the network to recognise existing
patterns. However, if the number of hidden
neurons is too big, the network might simply
memorise all training examples.

48

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)


Neural network for printed digit recognition

0 1 1 0
1 2 2 1
1 3
3 0
1
1 4
4 0
0 5 2
5 0
3
6 0
1 41 4
7 0
1 42 5

1 43
8 0
1 44 9 0
1 45 10 0
49

Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)

You might also like