Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3728
Performance Analysis of Audio and Video
Synchronization using Spreaded Code Delay
Measurement Technique
A.Thenmozhi
PG scholar, Department of ECE, Anna University, Chennai
Email: thenmozhi94.a@gmail.com
Dr.P.Kannan
Professor & HOD, Department of ECE, Anna University, Chennai
Email: deepkannan@yahoo.co.in
-------------------------------------------------------------------ABSTRACT---------------------------------------------------------------
The audio and video synchronization plays an important role in speech recognition and multimedia
communication. The audio-video sync is a quite significant problem in live video conferencing. It is due to use of
various hardware components which introduces variable delay and software environments. The objective of the
synchronization is used to preserve the temporal alignment between the audio and video signals. This paper
proposes the audio-video synchronization using spreading codes delay measurement technique. The performance
of the proposed method made on home database and achieves 99% synchronization efficiency. The audio-visual
signature technique provides a significant reduction in audio-video sync problems and the performance analysis
of audio and video synchronization in an effective way. This paper also implements an audio- video synchronizer
and analyses its performance in an efficient manner by synchronization efficiency, audio-video time drift and
audio-video delay parameters. The simulation result is carried out using mat lab simulation tools and simulink. It
is automatically estimating and correcting the timing relationship between the audio and video signals and
maintaining the Quality of Service.
Keywords-Audio spreading codes, Hamming distance correlation, Spectrograph, Synchronization, Video spreading
codes.
--------------------------------------------------------------------------------------------------------------------------------------------------
Date of Submission: April 17, 2018 Date of Acceptance: June 23, 2018
--------------------------------------------------------------------------------------------------------------------------------------------------
1. INTRODUCTION
The Audio and Video synchronization is defined as the
relative temporal distinction between the sound (audio)
and image (video) during the transmission and reception.
It is also known as audio-video sync, A/V sync and
Audio/Video sync. Lip synchronization (lip sync or lip
synch) refers to the voice is synchronized with lip
movements. Human can able to detect the distinction
between the audio and corresponding video presentation
less than 100ms in lip sync problem. The lip sync becomes
a significant problem in the digital television industry,
filming, music, video games and multimedia application.
It is corrected and maintained by audio-video
synchronizers. In multimedia technology, the audio and
video synchronization plays an important role in
synchronizing audio and video streams. With the
advancement of interactive multimedia application,
distinct multimedia services like content on demand
services, visual collaboration, video telephony, distance
education and E-learning are in huge demand. In
multimedia system applications, audio-visual streams are
saved, transmitted, received and broadcasted. During an
interaction time, the timing relations between audio-video
streams have to be conserved in order to provide the finest
perceptual quality.
2. PROPOSED METHODOLOGY
The proposed framework is automatically measuring and
maintaining the perfect synchronization between audio
and video using audio-visual spreading codes. Fig.1.
shows the proposed framework for audio and video
synchronization based on audio-visual spreading codes.
During transmission, the audio and video signals are
processed individually. The audio spreading code is
extracted from the spectrograph of the input audio which
is broken up into chunks. The spectrogram is the visual
way of representing the spectrum of sounds and it can be
used to display the spoken word phonetically. It is also
called spectral waterfalls, voice grams or voiceprints. The
video spreading code is computed by the absolute
difference the consecutive video frames where the input
video is broken up into video frames and finally to attain a
coarse absolute difference image. The audio-visual
signature or A/V sync signature based on content and
don’t change excessively. It is an authentication
mechanism and formed by taking hash of the original
audio-video streams. The robust hash filters the little
changes in the signal processing and reduces the audio-
visual spreading code sizes. It is based on the difference
between the successive audio and video frames. Within the
communication network, the audio-video streams
encounter different signal processing namely audio
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3729
compression, video compression, format conversion, audio
down sampling, video down sampling etc.., and their
relative temporal alignment between audio and video
signals may be altered.
Fig.1. Audio-Video sync using audio-visual spreading codes.
Fig.2 Audio – Video Synchronizer
At the detection, the processed audio and video codes are
extracted from the processed audio and video streams.
During synchronization, the processed audio and video
Video
Extract Input video
spreading codes
Extract Input audio
spreading codes
Network
Extract
Processed video
spreading codes
Extract
Processed audio
spreading codes
Spreading codes
comparison
(Hamming
distance video
correlation)
Spreading codes
comparison
(Hamming
distance audio
correlation)
Time
Alignment
(A/V Sync
adjust)
EstimateDelay
EstimateDelay
Key
Audio
Key
A/V
Sync
Key
Key
Low pass
Filter
Video
Signal source
Low pass
Filter
Audio Analog to
Digital converter
Synchronizer
(A/V Sync)
Audio Analog to
Digital converter
Audio
Signal source
Audio signal
Video signal
Delay line
Delay line
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3730
spreading codes are compared with the corresponding
input audio and video signatures using Hamming distance
correlation. The output of the Hamming distance is used to
estimate the temporal misalignment between the audio-
visual streams. Finally the measured delays are used to
correct the relative misalignment between audio-visual
streams.
The test content used for the performance assessment of
the system consisted of A/V clips of a variety of content
types such as scripted dramas; animation program, music
concert, news programs, sports and lives music. The input
audio and video is taken from the recorded dataset for the
audio and video synchronization. The input video is
divided into frames. A low-pass filter (LPF) is a filter that
passes low frequency signals & attenuates high frequency
signals by the cutoff frequency. It prevents the high
pitches and removes the short-term fluctuations in the
audio – video signals. It also produces the smoother form
of a signal.
An analog-to-digital converter (ADC) converts an input
analog voltage or current to a digital magnitude of the
voltage or current. It converts a continuous time and
continuous amplitude analog signal to a discrete time and
a discrete amplitude signal. The delay line produces a
specific delay in the audio and video signal transmission
path. The Synchronizer is a variable audio delay used to
correct and maintain the audio and video
synchronization or timing.
3. EXPERIMENTAL RESULTS
A/V CONTENT
The test content used for the performance assessment of
the system consisted of 5 seconds A/V clips of a variety of
content types such as scripted dramas, talk programs,
sports and live music. Fig.3, shows the input audio and
video is taken from the recorded dataset for the audio and
video synchronization. The frame number is given as the
input for synchronization purpose.
Fig.3. Input audio and video.
VIDEO FRAME
The input video is divided into frames for generating the
video spreading codes. The input video is divided into 30
frames/seconds. There are totally 74 frame are available in
the input video frame. Fig.4, shows the frame conversion
of the input video.
Fig.4. The frame conversion.
AUDIO SPREADING CODES GENERATION
The audio spreading code is primarily based on the coarse
representation of the spectrograph onto random vectors.
Fig.5, shows the audio spreading codes generation using
spectrograph.
Fig.5. The audio spreading codes extraction using
spectrogram.
VIDEO SPREADING CODES GENERATION
The video spreading code is based on coarse illustration of
the distinction image between two consecutive. Fig.6,
shows the video spreading codes generation.
Fig.6. The video spreading codes generation.
HAMMING VIDEO AND HAMMING AUDIO
The Hamming distance correlation is used to calculate the
temporal misalignment between audio-visual streams and
the quality of the audio-video synchronization can be
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3731
measured. Fig.7, shows the hamming video and Fig.8,
shows the hamming audio.
Fig.7. The hamming video.
Fig.8. The hamming audio.
Table.1 The hamming distance for video and audio.
INPUT
IMAGE
ENCODE
IMAGE
HAMMING
VIDEO
HAMMING
AUDIO
1010 1010 0 1
1010 1010 0 1
1010 1010 0 1
1010 1010 1 1
1110 1110 0 1
1010 1010 0 1
1010 1010 0 1
1110 1110 0 1
1110 1101 2 1
1010 1010 0 1
1110 1110 0 1
1110 1010 1 1
1001 1001 0 1
1001 1001 0 1
1010 1100 2 1
1001 1101 1 1
1110 1110 0 1
1110 1110 0 1
1011 1001 1 1
1000 1000 0 1
1000 1000 0 1
1011 1010 1 1
1011 1011 0 1
From the Table 4.1, it is inferred that the hamming
distance for the video and audio. Hamming code is a set of
error-correction code s that can be used to detect and
correct bit errors. It is used to find the misalignment
between the audio and video streams.
TIME ALIGNMENT
The estimated relative misalignment is used to achieve the
same alignment between the audio and video streams that
was present before processing. It aligns the audio and
video frame in an appropriate manner. Fig.9, shows the
relative time alignment between the audio and video
stream. It decodes the corresponding video frame that is
given as input in Fig. 3.3 with proper time alignment
between the input and processed video frames. Fig.10,
shows the decoded input video frame.
Fig.9. The audio-video stream time alignment.
Fig.10. The decoded video frame.
AUDIO – VIDEO SYNCHRONIZATION
Fig.11, shows the audio and video synchronization using
signature. The A/V sync using spreading codes provides
perfect synchronization between the corresponding audio
and video streams. Finally, the reliability measures along
with the estimated delays can be used to detect or correct
the relative misalignment between the A/V streams. It can
detect and maintain the audio - video sync accuracy.
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3732
Fig.11. The audio – video synchronization.
AV SIGNALS
The test content used for the performance assessment of
the system consisted of 5 seconds A/V clips. Fig. 12,
shows the input audio is taken from the recorded dataset
for the audio and video synchronization and every 10 msec
for audio.
Fig.12 Input audio signal.
Fig.13 Input video signal.
Fig. 13, shows the input video is taken from the recorded
dataset for the audio and video synchronization. Every
video frame plays 3 msec. The input video is divided into
50 frames/seconds. There are totally 74 frame are
available in the input video.
NOISE REMOVAL
Fig.14 shows the audio low pass filtered output. The filter
allows the frequencies below the cut off frequency but the
high frequencies in the input signal are attenuated.
Fig.14 Audio Low passes filter output.
Fig.15 Video Low passes filter output.
Fig.15 shows the video low pass filtered output. The filter
allows the frequencies below the cut off frequency but the
high frequencies in the input signal are attenuated.
ANALOG TO DIGITAL CONVERSION
Fig. 16 Audio Analog to Digital converter.
Fig.16 shows the audio analog to digital converter output.
ADC converts the analog audio signal into digital
representing the amplitude of the voltage.
Fig.17 Video Analog to Digital converter.
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3733
Fig.17 shows the video analog to digital converter output.
ADC converts the analog video signal into digital
representing the amplitude of the voltage.
A/V SYNCHRONIZATION
The objective of the synchronization is to line up both
the audio and video signals that are processed individually.
Fig. 18 A/V signal Synchronization.
Fig.18 shows the A/V signal synchronization. It aligns
both audio and video signals. The synchronization is
guaranteeing that the audio and video streams matched
after processing.
4. PERFORMANCE ANALYSIS OF AUDIO-
VIDEO SYNCHRONIZATION
SYNCHRONIZATION EFFICIENCY
The Synchronization efficiency is the process of
establishing consistency among A/V content from a source
to a processed or target A/V content storage in percentage.
If the synchronization efficiency is high, then the audio
and video are perfectly synchronized. Otherwise the audio
and video synchronization will be poor. It is expressed as
𝛈=Pout/Pin (1)
Where,
𝛈 = the Synchronization Efficiency.
Pout = the synchronized A/V stream.
Pin = the unsynchronized A/V stream.
AUDIO–VIDEO TIME DRIFT
The Time drift is defined as the amount of time the audio
departs from perfect synchronization with the video where
a positive number indicates the audio leads the video while
the negative number indicates the audio lags the video.
The audio – video time drift can be represented as
tA/V = tr - tp (2)
Where,
tA/V = A/V Time drift.
tr = Source time.
tp = deviation time.
AUDIO TO VIDEO DELAY
The Audio to Video Delay is referred as the relative time
alignment delay between the audio and video streams. The
amount of the visual data is much bigger than audio data
and the delays which are generated to the audio and video
streams are typically unequal. The solution to audio to
video delay is to add fixed delays to match the video
delay. Finally, the estimated delays are used to correct the
relative misalignment between the audio and video
streams. The A/V delay is given as
D = t ± t0 (3)
Where,
D = The Audio – Video delay.
t = the audio/video time.
t0 = the extra audio/video time.
Table.2.Performance analysis for audio and video
synchronization.
PARAMETERS A/V CONTENT
Synchronization
Efficiency
99 %
Audio and video sync time
drift
16 ms
Audio to video delay 16 ms
From the Table.2, it is inferred that the audio and video
synchronization parameter. The synchronization efficiency
is very high. The audio – video sync time drift and audio
to video delay are very less.
5. CONCLUSION
Thus the audio and video synchronization using spreading
codes technique was implemented and their performances
were analyzed sufficiently and appropriately. The
proposed system would automatically estimate and
preserve the perfect synchronization between the audio
and video streams and it would maintain the perceptual
quality of audio and video. This method provides high-
quality accuracy and low computational complexity. The
experimental test results were shown the guarantee and
quite simple process applicable for the real world
multimedia application and offline applications. This
method is suitable for content distribution network,
communication network and traditional broadcast
networks. In future work, the proposed framework will be
developed with modified structures to provide vast
improvement in real time application. Improvement of
future work may also include improvement of the
signature matching and thus increase the synchronization
rate. Also we want to automate the detection of time drift
to achieve a completely unsupervised synchronization
process.
Int. J. Advanced Networking and Applications
Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290
3734
6. ACKNOWLEDGMENT
At first, I thank Lord Almighty to give knowledge to
complete the survey. I would like to thank my professors,
colleagues, family and friends who encouraged and helped
us in preparing this paper.
REFERENCES
[1] Alka Jindal, Sucharu Aggarwal, “Comprehensive
overview of various lip synchronization
techniques” IEEE International transaction on
Biometrics and Security technologies, 2008.
[2] Anitha Sheela.k, Balakrishna Gudla, Srinivasa
Rao Chalamala, Yegnanarayana.B, “Improved lip
contour extraction for visual speech recognition”
IEEE International transaction on Consumer
Electronics,pp.459-462, 2015.
[3] N. J. Bryan, G. J. Mysore and P. Smaragdis,
“Clustering and synchronizing multicamera video
via landmark cross-correlation,” in IEEE
International Conference on Acoustics, Speech
and Signal Processing (ICASSP), March 2012,
pp. 2389–2392.
[4] Claus Bauer, Kent Terry, Regunathan
Radhakrishnan, “Audio and video signature for
synchronization” IEEE International conference
on Multimedia and Exposition Community
(ICME), pp.1549-1552, 2008.
[5] N. Dave, N. M. Patel. "Phoneme and Viseme
based Approach for Lip Synchronization.",
International Journal of Signal Processing,
Image Processing and Pattern Recognition, pp.
385-394, 2014.
[6] Dragan Sekulovski, Hans Weda, Mauro Barbieri
and Prarthana Shrestha, “Synchronization of
Multiple Camera Videos Using Audio-Visual
Features,” in IEEE Transactions On Multimedia,
Vol. 12, No. 1, January 2010.
[7] Fumei Liu, Wenliang, Zeliang Zhang, “Review of
the visual feature extraction research” IEEE 5th
International Conference on software
Engineering and Service Science, pp.449-452,
2014.
[8] Josef Chalaupka, Nguyen Thein Chuong, “Visual
feature extraction for isolated word visual only
speech recognition of Vietnamese” IEEE 36th
International conference on Telecommunication
and signal processing (TSP), pp.459-463, 2013.
[9] K. Kumar, V. Libal, E. Marcheret, J. Navratil,
G.Potamianos and G. Ramaswamy, “Audio-
Visual speech synchronization detection using a
bimodal linear prediction model”. in Computer
Vision and Pattern Recognition Workshops,
2009, p. 54.
[10]Laszlo Boszormenyi, Mario Guggenberger,
Mathias Lux, “Audio Align-synchronization of
A/V streams based on audio data” IEEE
International journal on Multimedia, pp.382-383,
2012.
[11]Y. Liu, Y. Sato, “Recovering audio-to-video
synchronization by audiovisual correlation
analysis”. in Pattern Recognition, 2008, p. 2.
[12]C. Lu and M. Mandal, “An efficient technique for
motion-based view-variant video sequences
synchronization,” in IEEE International
Conference on Multimedia and Expo, July 2011,
pp. 1–6.
[13]Luca Lombardi, Waqqas ur Rehman Butt, “A
survey of automatic lip reading approaches”
IEEE 8th
International Conference Digital
Information Management (ICDIM), pp.299-302,
2013.
[14]Namrata Dave, “A lip localization based visual
feature extraction methods” An International
journal on Electrical and computer Engineering,
vol.4, no.4, December 2015.
[15]P. Shrstha, M. Barbieri, and H. Weda,
“Synchronization of multi-camera video
recordings based on audio,” in Proceedings of the
15th international conference on Multimedia
2007, pp.545–548.
Author Details
A. Thenmozhi (S.Anbazhagan)
completed B.E (Electronics and
Communication Engineering) in 2016
from Anna University, Chennai. She has
published 2 papers in National and
International Conference proceedings. Her area of interest
includes Electronic System Design, Signal Processing,
Image Processing and Digital Communication.
Dr. P. Kannan (Pauliah Nadar
Kannan) received the B.E. degree from
Manonmaniam Sundarnar University,
Tirunelveli, India, in 2000, the M.E.
degree from the Anna University,
Chennai, India, in 2007, and the Ph.D
degree from the Anna University Chennai, Tamil Nadu,
India, in 2015. He has been Professor with the Department
of Electronics and Communication Engineering, PET
Engineering College Vallioor, Tirunelveli District, Tamil
Nadu, India. His current research interests include
computer vision, biometrics, and Very Large Scale
Integration Architectures.

More Related Content

DOCX
Ben ce sound recording glossary version 2
PPT
PPT
Digital Audio
PDF
SignalWire - Telecom of Tomorrow
DOCX
Ben ce sound recording glossary
PDF
Multi media networking
DOCX
Ig2 task 1 work sheet
PDF
Java Abs Implementation Of Vvoip
Ben ce sound recording glossary version 2
Digital Audio
SignalWire - Telecom of Tomorrow
Ben ce sound recording glossary
Multi media networking
Ig2 task 1 work sheet
Java Abs Implementation Of Vvoip

What's hot (20)

PDF
SoundField UPM-1 Review
DOCX
Sound recording glossary preivious
DOC
3 Digital Audio
DOCX
Ig2 task 1 work sheet
PPTX
Audio compression
DOCX
Ig2 task 1 work sheet s
DOCX
Ig2 task 1 work sheet s
DOCX
Sound recording glossary
PPTX
Fisheries Acoustics 101 - Principles
PPT
Chapter 02 audio recording - part i
PPTX
Principles Of Basic Acoustics 103
PDF
A Survey On Audio Watermarking Methods
PPTX
Multimedia seminar ppt
DOCX
Sound recording glossary
PPTX
From audio to hi style, gain, & output
PDF
ecegwp
PPT
Iptv presentation
PPT
Speech Compression
PPTX
Audio and Video Compression
SoundField UPM-1 Review
Sound recording glossary preivious
3 Digital Audio
Ig2 task 1 work sheet
Audio compression
Ig2 task 1 work sheet s
Ig2 task 1 work sheet s
Sound recording glossary
Fisheries Acoustics 101 - Principles
Chapter 02 audio recording - part i
Principles Of Basic Acoustics 103
A Survey On Audio Watermarking Methods
Multimedia seminar ppt
Sound recording glossary
From audio to hi style, gain, & output
ecegwp
Iptv presentation
Speech Compression
Audio and Video Compression
Ad

Similar to Performance Analysis of Audio and Video Synchronization using Spreaded Code Delay Measurement Technique (20)

DOCX
Richard Swales U1157958
PDF
Introduction to Audiovisual Communications
PPTX
Video signal-ppt
PDF
Video quality testing
PPTX
Multimedia
PDF
MULTECH2 LESSON 5.pdf
PDF
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
PPTX
Mpeg 7 video signature tools for content recognition
DOCX
PPTX
Issue in handling multimedia online
PDF
A guideline to video codecs delay
PDF
A guideline to video codecs delay
PPTX
Ip live production
PPTX
"High-Quality Audios & Videos Solutions"
PDF
Self-Organized Inter-Destination Multimedia Synchronization for Adaptive Medi...
PPT
Multimedia applications
PPT
Image and Video Compression, A brief history - Wang.ppt
PPT
multimedia chapter1
 
PPTX
Multimedia Systems by Sahil Punni
PPTX
Multimedia-Lecture-6.pptx
Richard Swales U1157958
Introduction to Audiovisual Communications
Video signal-ppt
Video quality testing
Multimedia
MULTECH2 LESSON 5.pdf
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Mpeg 7 video signature tools for content recognition
Issue in handling multimedia online
A guideline to video codecs delay
A guideline to video codecs delay
Ip live production
"High-Quality Audios & Videos Solutions"
Self-Organized Inter-Destination Multimedia Synchronization for Adaptive Medi...
Multimedia applications
Image and Video Compression, A brief history - Wang.ppt
multimedia chapter1
 
Multimedia Systems by Sahil Punni
Multimedia-Lecture-6.pptx
Ad

More from Eswar Publications (20)

PDF
Content-Based Image Retrieval Features: A Survey
PDF
Clickjacking Attack: Hijacking User’s Click
PDF
Android Based Home-Automation using Microcontroller
PDF
Semantically Enchanced Personalised Adaptive E-Learning for General and Dysle...
PDF
App for Physiological Seed quality Parameters
PDF
What happens when adaptive video streaming players compete in time-varying ba...
PDF
WLI-FCM and Artificial Neural Network Based Cloud Intrusion Detection System
PDF
Spreading Trade Union Activities through Cyberspace: A Case Study
PDF
Identifying an Appropriate Model for Information Systems Integration in the O...
PDF
Link-and Node-Disjoint Evaluation of the Ad Hoc on Demand Multi-path Distance...
PDF
Bridging Centrality: Identifying Bridging Nodes in Transportation Network
PDF
A Literature Survey on Internet of Things (IoT)
PDF
Automatic Monitoring of Soil Moisture and Controlling of Irrigation System
PDF
Multi- Level Data Security Model for Big Data on Public Cloud: A New Model
PDF
Impact of Technology on E-Banking; Cameroon Perspectives
PDF
Classification Algorithms with Attribute Selection: an evaluation study using...
PDF
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
PDF
Network as a Service Model in Cloud Authentication by HMAC Algorithm
PDF
Explosive Detection Approach by Printed Antennas
PDF
Bandwidth Estimation Techniques for Relative ‘Fair’ Sharing in DASH
Content-Based Image Retrieval Features: A Survey
Clickjacking Attack: Hijacking User’s Click
Android Based Home-Automation using Microcontroller
Semantically Enchanced Personalised Adaptive E-Learning for General and Dysle...
App for Physiological Seed quality Parameters
What happens when adaptive video streaming players compete in time-varying ba...
WLI-FCM and Artificial Neural Network Based Cloud Intrusion Detection System
Spreading Trade Union Activities through Cyberspace: A Case Study
Identifying an Appropriate Model for Information Systems Integration in the O...
Link-and Node-Disjoint Evaluation of the Ad Hoc on Demand Multi-path Distance...
Bridging Centrality: Identifying Bridging Nodes in Transportation Network
A Literature Survey on Internet of Things (IoT)
Automatic Monitoring of Soil Moisture and Controlling of Irrigation System
Multi- Level Data Security Model for Big Data on Public Cloud: A New Model
Impact of Technology on E-Banking; Cameroon Perspectives
Classification Algorithms with Attribute Selection: an evaluation study using...
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...
Network as a Service Model in Cloud Authentication by HMAC Algorithm
Explosive Detection Approach by Printed Antennas
Bandwidth Estimation Techniques for Relative ‘Fair’ Sharing in DASH

Recently uploaded (20)

PDF
The AI Revolution in Customer Service - 2025
PDF
Internet of Things (IoT) – Definition, Types, and Uses
PDF
Fitaura: AI & Machine Learning Powered Fitness Tracker
PDF
Altius execution marketplace concept.pdf
PDF
Intravenous drug administration application for pediatric patients via augmen...
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PPTX
Blending method and technology for hydrogen.pptx
PPTX
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
PPT
Overviiew on Intellectual property right
PDF
Human Computer Interaction Miterm Lesson
PDF
Technical Debt in the AI Coding Era - By Antonio Bianco
PPTX
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
PDF
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Advancements in abstractive text summarization: a deep learning approach
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PPTX
Information-Technology-in-Human-Society (2).pptx
PPT
Storage Area Network Best Practices from HP
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Ericsson 5G Feature,KPIs Analysis_ Overview, Dependencies & Recommendations (...
The AI Revolution in Customer Service - 2025
Internet of Things (IoT) – Definition, Types, and Uses
Fitaura: AI & Machine Learning Powered Fitness Tracker
Altius execution marketplace concept.pdf
Intravenous drug administration application for pediatric patients via augmen...
EIS-Webinar-Regulated-Industries-2025-08.pdf
Blending method and technology for hydrogen.pptx
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
Overviiew on Intellectual property right
Human Computer Interaction Miterm Lesson
Technical Debt in the AI Coding Era - By Antonio Bianco
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
CEH Module 2 Footprinting CEH V13, concepts
Advancements in abstractive text summarization: a deep learning approach
Data Virtualization in Action: Scaling APIs and Apps with FME
Information-Technology-in-Human-Society (2).pptx
Storage Area Network Best Practices from HP
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Ericsson 5G Feature,KPIs Analysis_ Overview, Dependencies & Recommendations (...

Performance Analysis of Audio and Video Synchronization using Spreaded Code Delay Measurement Technique

  • 1. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3728 Performance Analysis of Audio and Video Synchronization using Spreaded Code Delay Measurement Technique A.Thenmozhi PG scholar, Department of ECE, Anna University, Chennai Email: [email protected] Dr.P.Kannan Professor & HOD, Department of ECE, Anna University, Chennai Email: [email protected] -------------------------------------------------------------------ABSTRACT--------------------------------------------------------------- The audio and video synchronization plays an important role in speech recognition and multimedia communication. The audio-video sync is a quite significant problem in live video conferencing. It is due to use of various hardware components which introduces variable delay and software environments. The objective of the synchronization is used to preserve the temporal alignment between the audio and video signals. This paper proposes the audio-video synchronization using spreading codes delay measurement technique. The performance of the proposed method made on home database and achieves 99% synchronization efficiency. The audio-visual signature technique provides a significant reduction in audio-video sync problems and the performance analysis of audio and video synchronization in an effective way. This paper also implements an audio- video synchronizer and analyses its performance in an efficient manner by synchronization efficiency, audio-video time drift and audio-video delay parameters. The simulation result is carried out using mat lab simulation tools and simulink. It is automatically estimating and correcting the timing relationship between the audio and video signals and maintaining the Quality of Service. Keywords-Audio spreading codes, Hamming distance correlation, Spectrograph, Synchronization, Video spreading codes. -------------------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: April 17, 2018 Date of Acceptance: June 23, 2018 -------------------------------------------------------------------------------------------------------------------------------------------------- 1. INTRODUCTION The Audio and Video synchronization is defined as the relative temporal distinction between the sound (audio) and image (video) during the transmission and reception. It is also known as audio-video sync, A/V sync and Audio/Video sync. Lip synchronization (lip sync or lip synch) refers to the voice is synchronized with lip movements. Human can able to detect the distinction between the audio and corresponding video presentation less than 100ms in lip sync problem. The lip sync becomes a significant problem in the digital television industry, filming, music, video games and multimedia application. It is corrected and maintained by audio-video synchronizers. In multimedia technology, the audio and video synchronization plays an important role in synchronizing audio and video streams. With the advancement of interactive multimedia application, distinct multimedia services like content on demand services, visual collaboration, video telephony, distance education and E-learning are in huge demand. In multimedia system applications, audio-visual streams are saved, transmitted, received and broadcasted. During an interaction time, the timing relations between audio-video streams have to be conserved in order to provide the finest perceptual quality. 2. PROPOSED METHODOLOGY The proposed framework is automatically measuring and maintaining the perfect synchronization between audio and video using audio-visual spreading codes. Fig.1. shows the proposed framework for audio and video synchronization based on audio-visual spreading codes. During transmission, the audio and video signals are processed individually. The audio spreading code is extracted from the spectrograph of the input audio which is broken up into chunks. The spectrogram is the visual way of representing the spectrum of sounds and it can be used to display the spoken word phonetically. It is also called spectral waterfalls, voice grams or voiceprints. The video spreading code is computed by the absolute difference the consecutive video frames where the input video is broken up into video frames and finally to attain a coarse absolute difference image. The audio-visual signature or A/V sync signature based on content and don’t change excessively. It is an authentication mechanism and formed by taking hash of the original audio-video streams. The robust hash filters the little changes in the signal processing and reduces the audio- visual spreading code sizes. It is based on the difference between the successive audio and video frames. Within the communication network, the audio-video streams encounter different signal processing namely audio
  • 2. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3729 compression, video compression, format conversion, audio down sampling, video down sampling etc.., and their relative temporal alignment between audio and video signals may be altered. Fig.1. Audio-Video sync using audio-visual spreading codes. Fig.2 Audio – Video Synchronizer At the detection, the processed audio and video codes are extracted from the processed audio and video streams. During synchronization, the processed audio and video Video Extract Input video spreading codes Extract Input audio spreading codes Network Extract Processed video spreading codes Extract Processed audio spreading codes Spreading codes comparison (Hamming distance video correlation) Spreading codes comparison (Hamming distance audio correlation) Time Alignment (A/V Sync adjust) EstimateDelay EstimateDelay Key Audio Key A/V Sync Key Key Low pass Filter Video Signal source Low pass Filter Audio Analog to Digital converter Synchronizer (A/V Sync) Audio Analog to Digital converter Audio Signal source Audio signal Video signal Delay line Delay line
  • 3. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3730 spreading codes are compared with the corresponding input audio and video signatures using Hamming distance correlation. The output of the Hamming distance is used to estimate the temporal misalignment between the audio- visual streams. Finally the measured delays are used to correct the relative misalignment between audio-visual streams. The test content used for the performance assessment of the system consisted of A/V clips of a variety of content types such as scripted dramas; animation program, music concert, news programs, sports and lives music. The input audio and video is taken from the recorded dataset for the audio and video synchronization. The input video is divided into frames. A low-pass filter (LPF) is a filter that passes low frequency signals & attenuates high frequency signals by the cutoff frequency. It prevents the high pitches and removes the short-term fluctuations in the audio – video signals. It also produces the smoother form of a signal. An analog-to-digital converter (ADC) converts an input analog voltage or current to a digital magnitude of the voltage or current. It converts a continuous time and continuous amplitude analog signal to a discrete time and a discrete amplitude signal. The delay line produces a specific delay in the audio and video signal transmission path. The Synchronizer is a variable audio delay used to correct and maintain the audio and video synchronization or timing. 3. EXPERIMENTAL RESULTS A/V CONTENT The test content used for the performance assessment of the system consisted of 5 seconds A/V clips of a variety of content types such as scripted dramas, talk programs, sports and live music. Fig.3, shows the input audio and video is taken from the recorded dataset for the audio and video synchronization. The frame number is given as the input for synchronization purpose. Fig.3. Input audio and video. VIDEO FRAME The input video is divided into frames for generating the video spreading codes. The input video is divided into 30 frames/seconds. There are totally 74 frame are available in the input video frame. Fig.4, shows the frame conversion of the input video. Fig.4. The frame conversion. AUDIO SPREADING CODES GENERATION The audio spreading code is primarily based on the coarse representation of the spectrograph onto random vectors. Fig.5, shows the audio spreading codes generation using spectrograph. Fig.5. The audio spreading codes extraction using spectrogram. VIDEO SPREADING CODES GENERATION The video spreading code is based on coarse illustration of the distinction image between two consecutive. Fig.6, shows the video spreading codes generation. Fig.6. The video spreading codes generation. HAMMING VIDEO AND HAMMING AUDIO The Hamming distance correlation is used to calculate the temporal misalignment between audio-visual streams and the quality of the audio-video synchronization can be
  • 4. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3731 measured. Fig.7, shows the hamming video and Fig.8, shows the hamming audio. Fig.7. The hamming video. Fig.8. The hamming audio. Table.1 The hamming distance for video and audio. INPUT IMAGE ENCODE IMAGE HAMMING VIDEO HAMMING AUDIO 1010 1010 0 1 1010 1010 0 1 1010 1010 0 1 1010 1010 1 1 1110 1110 0 1 1010 1010 0 1 1010 1010 0 1 1110 1110 0 1 1110 1101 2 1 1010 1010 0 1 1110 1110 0 1 1110 1010 1 1 1001 1001 0 1 1001 1001 0 1 1010 1100 2 1 1001 1101 1 1 1110 1110 0 1 1110 1110 0 1 1011 1001 1 1 1000 1000 0 1 1000 1000 0 1 1011 1010 1 1 1011 1011 0 1 From the Table 4.1, it is inferred that the hamming distance for the video and audio. Hamming code is a set of error-correction code s that can be used to detect and correct bit errors. It is used to find the misalignment between the audio and video streams. TIME ALIGNMENT The estimated relative misalignment is used to achieve the same alignment between the audio and video streams that was present before processing. It aligns the audio and video frame in an appropriate manner. Fig.9, shows the relative time alignment between the audio and video stream. It decodes the corresponding video frame that is given as input in Fig. 3.3 with proper time alignment between the input and processed video frames. Fig.10, shows the decoded input video frame. Fig.9. The audio-video stream time alignment. Fig.10. The decoded video frame. AUDIO – VIDEO SYNCHRONIZATION Fig.11, shows the audio and video synchronization using signature. The A/V sync using spreading codes provides perfect synchronization between the corresponding audio and video streams. Finally, the reliability measures along with the estimated delays can be used to detect or correct the relative misalignment between the A/V streams. It can detect and maintain the audio - video sync accuracy.
  • 5. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3732 Fig.11. The audio – video synchronization. AV SIGNALS The test content used for the performance assessment of the system consisted of 5 seconds A/V clips. Fig. 12, shows the input audio is taken from the recorded dataset for the audio and video synchronization and every 10 msec for audio. Fig.12 Input audio signal. Fig.13 Input video signal. Fig. 13, shows the input video is taken from the recorded dataset for the audio and video synchronization. Every video frame plays 3 msec. The input video is divided into 50 frames/seconds. There are totally 74 frame are available in the input video. NOISE REMOVAL Fig.14 shows the audio low pass filtered output. The filter allows the frequencies below the cut off frequency but the high frequencies in the input signal are attenuated. Fig.14 Audio Low passes filter output. Fig.15 Video Low passes filter output. Fig.15 shows the video low pass filtered output. The filter allows the frequencies below the cut off frequency but the high frequencies in the input signal are attenuated. ANALOG TO DIGITAL CONVERSION Fig. 16 Audio Analog to Digital converter. Fig.16 shows the audio analog to digital converter output. ADC converts the analog audio signal into digital representing the amplitude of the voltage. Fig.17 Video Analog to Digital converter.
  • 6. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3733 Fig.17 shows the video analog to digital converter output. ADC converts the analog video signal into digital representing the amplitude of the voltage. A/V SYNCHRONIZATION The objective of the synchronization is to line up both the audio and video signals that are processed individually. Fig. 18 A/V signal Synchronization. Fig.18 shows the A/V signal synchronization. It aligns both audio and video signals. The synchronization is guaranteeing that the audio and video streams matched after processing. 4. PERFORMANCE ANALYSIS OF AUDIO- VIDEO SYNCHRONIZATION SYNCHRONIZATION EFFICIENCY The Synchronization efficiency is the process of establishing consistency among A/V content from a source to a processed or target A/V content storage in percentage. If the synchronization efficiency is high, then the audio and video are perfectly synchronized. Otherwise the audio and video synchronization will be poor. It is expressed as 𝛈=Pout/Pin (1) Where, 𝛈 = the Synchronization Efficiency. Pout = the synchronized A/V stream. Pin = the unsynchronized A/V stream. AUDIO–VIDEO TIME DRIFT The Time drift is defined as the amount of time the audio departs from perfect synchronization with the video where a positive number indicates the audio leads the video while the negative number indicates the audio lags the video. The audio – video time drift can be represented as tA/V = tr - tp (2) Where, tA/V = A/V Time drift. tr = Source time. tp = deviation time. AUDIO TO VIDEO DELAY The Audio to Video Delay is referred as the relative time alignment delay between the audio and video streams. The amount of the visual data is much bigger than audio data and the delays which are generated to the audio and video streams are typically unequal. The solution to audio to video delay is to add fixed delays to match the video delay. Finally, the estimated delays are used to correct the relative misalignment between the audio and video streams. The A/V delay is given as D = t ± t0 (3) Where, D = The Audio – Video delay. t = the audio/video time. t0 = the extra audio/video time. Table.2.Performance analysis for audio and video synchronization. PARAMETERS A/V CONTENT Synchronization Efficiency 99 % Audio and video sync time drift 16 ms Audio to video delay 16 ms From the Table.2, it is inferred that the audio and video synchronization parameter. The synchronization efficiency is very high. The audio – video sync time drift and audio to video delay are very less. 5. CONCLUSION Thus the audio and video synchronization using spreading codes technique was implemented and their performances were analyzed sufficiently and appropriately. The proposed system would automatically estimate and preserve the perfect synchronization between the audio and video streams and it would maintain the perceptual quality of audio and video. This method provides high- quality accuracy and low computational complexity. The experimental test results were shown the guarantee and quite simple process applicable for the real world multimedia application and offline applications. This method is suitable for content distribution network, communication network and traditional broadcast networks. In future work, the proposed framework will be developed with modified structures to provide vast improvement in real time application. Improvement of future work may also include improvement of the signature matching and thus increase the synchronization rate. Also we want to automate the detection of time drift to achieve a completely unsupervised synchronization process.
  • 7. Int. J. Advanced Networking and Applications Volume: 10 Issue: 01 Pages: 3728-3734 (2018) ISSN: 0975-0290 3734 6. ACKNOWLEDGMENT At first, I thank Lord Almighty to give knowledge to complete the survey. I would like to thank my professors, colleagues, family and friends who encouraged and helped us in preparing this paper. REFERENCES [1] Alka Jindal, Sucharu Aggarwal, “Comprehensive overview of various lip synchronization techniques” IEEE International transaction on Biometrics and Security technologies, 2008. [2] Anitha Sheela.k, Balakrishna Gudla, Srinivasa Rao Chalamala, Yegnanarayana.B, “Improved lip contour extraction for visual speech recognition” IEEE International transaction on Consumer Electronics,pp.459-462, 2015. [3] N. J. Bryan, G. J. Mysore and P. Smaragdis, “Clustering and synchronizing multicamera video via landmark cross-correlation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012, pp. 2389–2392. [4] Claus Bauer, Kent Terry, Regunathan Radhakrishnan, “Audio and video signature for synchronization” IEEE International conference on Multimedia and Exposition Community (ICME), pp.1549-1552, 2008. [5] N. Dave, N. M. Patel. "Phoneme and Viseme based Approach for Lip Synchronization.", International Journal of Signal Processing, Image Processing and Pattern Recognition, pp. 385-394, 2014. [6] Dragan Sekulovski, Hans Weda, Mauro Barbieri and Prarthana Shrestha, “Synchronization of Multiple Camera Videos Using Audio-Visual Features,” in IEEE Transactions On Multimedia, Vol. 12, No. 1, January 2010. [7] Fumei Liu, Wenliang, Zeliang Zhang, “Review of the visual feature extraction research” IEEE 5th International Conference on software Engineering and Service Science, pp.449-452, 2014. [8] Josef Chalaupka, Nguyen Thein Chuong, “Visual feature extraction for isolated word visual only speech recognition of Vietnamese” IEEE 36th International conference on Telecommunication and signal processing (TSP), pp.459-463, 2013. [9] K. Kumar, V. Libal, E. Marcheret, J. Navratil, G.Potamianos and G. Ramaswamy, “Audio- Visual speech synchronization detection using a bimodal linear prediction model”. in Computer Vision and Pattern Recognition Workshops, 2009, p. 54. [10]Laszlo Boszormenyi, Mario Guggenberger, Mathias Lux, “Audio Align-synchronization of A/V streams based on audio data” IEEE International journal on Multimedia, pp.382-383, 2012. [11]Y. Liu, Y. Sato, “Recovering audio-to-video synchronization by audiovisual correlation analysis”. in Pattern Recognition, 2008, p. 2. [12]C. Lu and M. Mandal, “An efficient technique for motion-based view-variant video sequences synchronization,” in IEEE International Conference on Multimedia and Expo, July 2011, pp. 1–6. [13]Luca Lombardi, Waqqas ur Rehman Butt, “A survey of automatic lip reading approaches” IEEE 8th International Conference Digital Information Management (ICDIM), pp.299-302, 2013. [14]Namrata Dave, “A lip localization based visual feature extraction methods” An International journal on Electrical and computer Engineering, vol.4, no.4, December 2015. [15]P. Shrstha, M. Barbieri, and H. Weda, “Synchronization of multi-camera video recordings based on audio,” in Proceedings of the 15th international conference on Multimedia 2007, pp.545–548. Author Details A. Thenmozhi (S.Anbazhagan) completed B.E (Electronics and Communication Engineering) in 2016 from Anna University, Chennai. She has published 2 papers in National and International Conference proceedings. Her area of interest includes Electronic System Design, Signal Processing, Image Processing and Digital Communication. Dr. P. Kannan (Pauliah Nadar Kannan) received the B.E. degree from Manonmaniam Sundarnar University, Tirunelveli, India, in 2000, the M.E. degree from the Anna University, Chennai, India, in 2007, and the Ph.D degree from the Anna University Chennai, Tamil Nadu, India, in 2015. He has been Professor with the Department of Electronics and Communication Engineering, PET Engineering College Vallioor, Tirunelveli District, Tamil Nadu, India. His current research interests include computer vision, biometrics, and Very Large Scale Integration Architectures.