DL Review

Uploaded by

ashwinkumarrcm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

DL Review

Uploaded by

ashwinkumarrcm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

DEPARMENT OF ARTIFICIAL INTELLIGENCE AND

MACHINE LEARNING

MINI PROJECT

LIP READING USING DEEP LEARNING MODEL

AND OPEN CV

BY
ASHWIN KUMAR C (221501014)
HARIHARAN A (221501035)
ABSTRACT

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches
separated the problem into two stages: designing or learning visual features, and prediction. More recent deep
lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However,
existing work on models trained end-to-end perform only word classification, rather than sentence-level
sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton
& Basala, 1982), in- dicating the importance of features capturing temporal context in an ambiguous
communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length
sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the
connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is
the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features
and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped
speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-
art accuracy (Gergen et al., 2016).

KEY POINTS:

 Automated Lipreading

 Classification with deep learning

 Sequence prediction in speech recognition

INTRODUCTION

Lipreading, the process of interpreting spoken language by visually observing the movements of a speaker's
mouth, is an essential aspect of human communication and speech comprehension. This skill is particularly
significant in situations where auditory cues are limited or absent, such as in noisy environments or for
individuals with hearing impairments. The importance of lipreading is underscored by the well-documented
McGurk effect (McGurk & MacDonald, 1976), a perceptual phenomenon where conflicting auditory and visual
speech signals result in the perception of a completely different phoneme. This effect highlights the complex
interplay between visual and auditory information in speech perception and underscores the challenges
involved in accurately interpreting spoken language through lipreading alone.

Lipreading is inherently a difficult task for humans, especially in the absence of contextual information. The
subtle and often ambiguous nature of lip movements, compounded by the fact that many visual speech cues
(such as those made by the lips, tongue, and teeth) are latent or partially obscured, makes accurate lipreading
a formidable challenge. Pioneering studies by Fisher (1968) and Woodward & Barber (1960) revealed that
certain visual phonemes, or "visemes," are frequently confused with one another, leading to a high rate of
errors in human lipreading. Fisher (1968), for example, identified five categories of visual phonemes from a list
of 23 initial consonant phonemes that are commonly mistaken for each other when viewed without auditory
input. These errors are often asymmetrical, with similar patterns observed for final consonant phonemes as
well.

As a result, human lipreading performance is generally poor, even among individuals with extensive experience
in lipreading. Research by Easton & Basala (1982) demonstrated that hearing-impaired individuals achieve an
accuracy of only 17±12% when attempting to lipread a limited set of 30 monosyllabic words, and 21±11% for a
set of 30 compound words. These findings underscore the inherent limitations of human lipreading,
particularly when contextual information is scarce.

Given these challenges, there is a compelling need to automate the lipreading process, leveraging the power of
modern technology to overcome the limitations of human perception. The potential applications of automated
lipreading are vast and varied, encompassing areas such as improved hearing aids, silent dictation in public
spaces, enhanced security measures, robust speech recognition in noisy environments, biometric identification,
and the processing of silent films.

However, automating lipreading is a complex task, primarily due to the need to extract and interpret
spatiotemporal features from video sequences. Unlike traditional approaches, which often separate feature
extraction and prediction into distinct stages, recent advances in deep learning have enabled the development
of models that can learn these features end-to-end. Despite this progress, most
existing models have been limited to word-level classification, lacking the ability to perform sentence-level
sequence prediction—a crucial requirement for practical lipreading applications.

In this context, we introduce LipNet, a novel end-to-end model designed to address the challenges of sentence-
level lipreading. To the best of our knowledge, LipNet is the first model capable of simultaneously learning
spatiotemporal visual features and a sequence model to make sentence-level predictions. Drawing inspiration
from advancements in automatic speech recognition (ASR), LipNet operates at the character level and employs
spatiotemporal convolutional neural networks (STCNNs), recurrent neural networks (RNNs), and the
connectionist temporal classification (CTC) loss function (Graves et al., 2006) to achieve this goal.

Our empirical evaluation on the GRID corpus (Cooke et al., 2006), one of the few publicly available datasets for
sentence-level lipreading, demonstrates the effectiveness of LipNet. The model achieves a remarkable 95.2%
sentence-level word accuracy on an overlapped speakers split—a benchmark task widely used in the lipreading
research community. This performance not only surpasses the previous state-of-the-art accuracy of 86.4%
reported by Gergen et al. (2016) for word-level classification but also generalizes well to unseen speakers,
achieving an accuracy of 88.6%.

Furthermore, we compare LipNet's performance with that of hearing-impaired individuals tasked with
lipreading the same sentences from the GRID corpus. On average, these individuals achieve an accuracy of
52.3%, while LipNet attains 1.69 times higher accuracy, underscoring the model's potential to outperform
human lipreaders in practical applications.

To further understand LipNet's decision-making process, we apply saliency visualization techniques (Zeiler &
Fergus, 2014; Simonyan et al., 2013) to interpret the model's learned behavior. These visualizations reveal that
LipNet focuses on phonologically important regions in the video, confirming that the model is effectively
capturing the relevant visual cues for accurate lipreading. Additionally, by analyzing intra-viseme and inter-
viseme confusion matrices at the phoneme level, we observe that the majority of LipNet's errors occur within
viseme categories, suggesting that while the model is highly effective, some ambiguities remain when
contextual information is insufficient for disambiguation.

In summary, LipNet represents a significant advancement in the field of automated lipreading, offering a
powerful tool for sentence-level speech interpretation from visual input. Its high accuracy, ability to generalize
across speakers, and interpretability make it a promising solution for a wide range of practical applications in
communication, security, and beyond.

ANN Paper
No ratings yet
ANN Paper
7 pages
LipReadNet: Advancing Lip Reading
No ratings yet
LipReadNet: Advancing Lip Reading
6 pages
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
No ratings yet
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
6 pages
ANN Paper
No ratings yet
ANN Paper
6 pages
Lip Reading with CNN for Noisy Environments
No ratings yet
Lip Reading with CNN for Noisy Environments
5 pages
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
No ratings yet
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
11 pages
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
No ratings yet
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
5 pages
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
No ratings yet
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
8 pages
Decoder-Encoder LSTM For Lip Reading: Souheil Fenghour Daqing Chen Perry Xiao
No ratings yet
Decoder-Encoder LSTM For Lip Reading: Souheil Fenghour Daqing Chen Perry Xiao
5 pages
584 Camera Ready
No ratings yet
584 Camera Ready
6 pages
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
No ratings yet
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
11 pages
Deep Learning Lip-Reading Survey
No ratings yet
Deep Learning Lip-Reading Survey
8 pages
Deep Learning for Visual Lip Reading
No ratings yet
Deep Learning for Visual Lip Reading
15 pages
Chung 18
No ratings yet
Chung 18
28 pages
Lip Reading With Hahn Convolutional Neural Networks
No ratings yet
Lip Reading With Hahn Convolutional Neural Networks
28 pages
Lipreading Using A Comparative Machine Learning Approach
No ratings yet
Lipreading Using A Comparative Machine Learning Approach
7 pages
Second Paper
No ratings yet
Second Paper
7 pages
Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Lip Reading via Mutual Information Maximization
No ratings yet
Lip Reading via Mutual Information Maximization
8 pages
Deep Learning Lip Reading Model
No ratings yet
Deep Learning Lip Reading Model
6 pages
Deep Learning Lip-Reading Survey
No ratings yet
Deep Learning Lip-Reading Survey
22 pages
Phoneme-Based Lip-Reading System
No ratings yet
Phoneme-Based Lip-Reading System
10 pages
Review I - Documentation Format
No ratings yet
Review I - Documentation Format
20 pages
2001 08702v1
No ratings yet
2001 08702v1
6 pages
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
No ratings yet
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
5 pages
Lip-Reading Dataset Construction
No ratings yet
Lip-Reading Dataset Construction
6 pages
Lip Decoder
No ratings yet
Lip Decoder
11 pages
Analysis of Lip-Reading Using Deep Learning Techniques A Review
No ratings yet
Analysis of Lip-Reading Using Deep Learning Techniques A Review
6 pages
Deep Learning for Lip Reading
No ratings yet
Deep Learning for Lip Reading
5 pages
Engineering Science and Technology, An International Journal
No ratings yet
Engineering Science and Technology, An International Journal
10 pages
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
10 pages
1 s2.0 S2666764923000450 Main
No ratings yet
1 s2.0 S2666764923000450 Main
10 pages
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
No ratings yet
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
45 pages
A Lip Reading Method Based On 3D Convolutional Vision Transformer
No ratings yet
A Lip Reading Method Based On 3D Convolutional Vision Transformer
8 pages
Vision Based Lip Reading System Using Deep Learning: July 2022
No ratings yet
Vision Based Lip Reading System Using Deep Learning: July 2022
7 pages
Callip: Lipreading Using Contrastive and Attribute Learning: Yiyang Huang Xuefeng Liang Chaowei Fang
No ratings yet
Callip: Lipreading Using Contrastive and Attribute Learning: Yiyang Huang Xuefeng Liang Chaowei Fang
9 pages
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
No ratings yet
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
8 pages
Silent Speech Interpretation Using Ai: Dr.M. Hemalatha, M. Akshayaa
No ratings yet
Silent Speech Interpretation Using Ai: Dr.M. Hemalatha, M. Akshayaa
3 pages
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
No ratings yet
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
8 pages
23MCI10142, 23MCI10007 - Project Report
No ratings yet
23MCI10142, 23MCI10007 - Project Report
38 pages
2.1 s2.0 S0925231225009610 Main
No ratings yet
2.1 s2.0 S0925231225009610 Main
10 pages
Continuous Lipreading Based On Acoustic Temporal Alignments: Empirical Research Open Access
No ratings yet
Continuous Lipreading Based On Acoustic Temporal Alignments: Empirical Research Open Access
15 pages
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
No ratings yet
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
11 pages
Lip Reading Using CNN and LTSM
No ratings yet
Lip Reading Using CNN and LTSM
9 pages
Video-Based Lip Reading Models
No ratings yet
Video-Based Lip Reading Models
9 pages
Learning Individual Speaking Styles For Accurate L
No ratings yet
Learning Individual Speaking Styles For Accurate L
11 pages
Deformation Flow Based Two-Stream Network For Lip Reading
No ratings yet
Deformation Flow Based Two-Stream Network For Lip Reading
7 pages
Hybrid CNN-ViT for Lip Reading 2024
No ratings yet
Hybrid CNN-ViT for Lip Reading 2024
11 pages
Icassp19 Zhoupan
No ratings yet
Icassp19 Zhoupan
5 pages
Lip Reading Using Deep Learning in Turkish Language
No ratings yet
Lip Reading Using Deep Learning in Turkish Language
12 pages
Paper 28
No ratings yet
Paper 28
6 pages
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
No ratings yet
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
11 pages
Toward Language-Independent Lip Reading A Transfer Learning Approach
No ratings yet
Toward Language-Independent Lip Reading A Transfer Learning Approach
4 pages
Wa0011.
No ratings yet
Wa0011.
11 pages
Park College of Engineering and Teknology Lip Reading Using Neural Network
No ratings yet
Park College of Engineering and Teknology Lip Reading Using Neural Network
10 pages
Final Project Report
No ratings yet
Final Project Report
21 pages
Pseudo-Convolutional Policy Gradient For Sequence-to-Sequence Lip-Reading
No ratings yet
Pseudo-Convolutional Policy Gradient For Sequence-to-Sequence Lip-Reading
8 pages
Phoneme-Based Lip Reading System
No ratings yet
Phoneme-Based Lip Reading System
21 pages
Accenture Questions - Hireproooooooo
No ratings yet
Accenture Questions - Hireproooooooo
363 pages
DV - Unit I
No ratings yet
DV - Unit I
81 pages
UNIT 2 Sse
No ratings yet
UNIT 2 Sse
39 pages
Bda Unit III
No ratings yet
Bda Unit III
23 pages
BDT Notes
No ratings yet
BDT Notes
40 pages
Low-Fat Ice Cream with Carob Molasses
No ratings yet
Low-Fat Ice Cream with Carob Molasses
5 pages
Internal Auditors Exam ISO 22000
No ratings yet
Internal Auditors Exam ISO 22000
4 pages
Đề Cương Ôn Thi Giữa Học Kỳ i Gr 11
No ratings yet
Đề Cương Ôn Thi Giữa Học Kỳ i Gr 11
4 pages
Formant Frequencies of 13 Accents of The British Isles Ferragne Pellegrino 2010
No ratings yet
Formant Frequencies of 13 Accents of The British Isles Ferragne Pellegrino 2010
34 pages
Agricultural Irrigation Robot
No ratings yet
Agricultural Irrigation Robot
21 pages
Adult Lab Values Cheat Sheet
No ratings yet
Adult Lab Values Cheat Sheet
3 pages
Starbucks' Water Waste Controversy
No ratings yet
Starbucks' Water Waste Controversy
5 pages
Composite Hose Information
No ratings yet
Composite Hose Information
3 pages
Wetland Inventory and Profiling PDF
100% (1)
Wetland Inventory and Profiling PDF
26 pages
Ma 1118 Se 1617 S 1 Main
No ratings yet
Ma 1118 Se 1617 S 1 Main
5 pages
CANUSA CPS Data Sheet
No ratings yet
CANUSA CPS Data Sheet
4 pages
14 StringMatching
No ratings yet
14 StringMatching
25 pages
Ma Emf Mag1000 MT101-KFL-VN180907-2018.09.07
No ratings yet
Ma Emf Mag1000 MT101-KFL-VN180907-2018.09.07
24 pages
Geotechnical Engineering Calculations
No ratings yet
Geotechnical Engineering Calculations
5 pages
Casa Del Bambino Emmanuel Montessori Contreras Compound, Alangilan, Batangas City
No ratings yet
Casa Del Bambino Emmanuel Montessori Contreras Compound, Alangilan, Batangas City
37 pages
Joseph Pientka Briefing Document
75% (4)
Joseph Pientka Briefing Document
7 pages
Forecasting Theories
No ratings yet
Forecasting Theories
6 pages
Air Sumur
No ratings yet
Air Sumur
7 pages
SPE-193512-MS Evaluation of A Depleted Oil Reservoir For Gascap Blowdown Using 3D Simulation Models: A Case Study of Chrome Field Development
No ratings yet
SPE-193512-MS Evaluation of A Depleted Oil Reservoir For Gascap Blowdown Using 3D Simulation Models: A Case Study of Chrome Field Development
19 pages
Sociology, Society and The State: Institutionalizing Sociological Practice in The Philippines
No ratings yet
Sociology, Society and The State: Institutionalizing Sociological Practice in The Philippines
12 pages
Memorial On Behalf of Petitioner
No ratings yet
Memorial On Behalf of Petitioner
55 pages
Q4 - Sci-DLP - Phases of The Moon
100% (3)
Q4 - Sci-DLP - Phases of The Moon
8 pages
Toppers Group Bengali GENERAL SCIENCE
No ratings yet
Toppers Group Bengali GENERAL SCIENCE
360 pages
1g Checklist of Indian Pyralidae Species With A Key To Subfamilies
No ratings yet
1g Checklist of Indian Pyralidae Species With A Key To Subfamilies
14 pages
English File 4th Edition Upper Intermediate Student's Book
No ratings yet
English File 4th Edition Upper Intermediate Student's Book
170 pages
First Choice歡樂大集合醫學 3
No ratings yet
First Choice歡樂大集合醫學 3
164 pages
Nov-Dec 2024 Results Release Dates
No ratings yet
Nov-Dec 2024 Results Release Dates
2 pages
C How To Program 9th Edition Deitel Paul - Ebook PDF Download
100% (2)
C How To Program 9th Edition Deitel Paul - Ebook PDF Download
86 pages
The Voynich Manuscript
No ratings yet
The Voynich Manuscript
2 pages
COMM0999-myRoadmap-T1 2024 TEMPLATE For Moodle
No ratings yet
COMM0999-myRoadmap-T1 2024 TEMPLATE For Moodle
4 pages

DL Review

Uploaded by

DL Review

Uploaded by

DEPARMENT OF ARTIFICIAL INTELLIGENCE AND

LIP READING USING DEEP LEARNING MODEL

 Classification with deep learning

 Sequence prediction in speech recognition

You might also like