0% found this document useful (0 votes)

24 views16 pages

Speech Proceesing End Sem Reveiw FINAL

Uploaded by

battelgroundwithfriends

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views16 pages

Speech Proceesing End Sem Reveiw FINAL

Uploaded by

battelgroundwithfriends

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

21AIE315 - AI IN SPEECH PROCESSING

DISTINCTIVE SPEAKER IDENTIFICATION

USING CNN

TEAM 13
CH.EN.U4AIE21011 – Chennu Chaitanya
CH.EN.U4AIE21016 – Hemanth
CH.EN.U4AIE21052 – Sai Akshay
ABSRACT:

 CNNVoiceDetect applies deep learning to speaker identification tasks, utilizing a Convolutional Neural Network
(CNN) based model.

 The study lays a foundation for further research in voice-based authentication systems and establishes a reliable way
to identify speakers from audio data.

 The project aims to create an efficient and reliable speaker identification system applicable in various domains such
as security and speech recognition.

 The CNN architecture incorporates residual blocks for feature extraction and pooling layers for dimensionality
reduction, enabling efficient processing of audio data.CNNVoiceDetect offers an intricate solution for speaker
identification through innovative deep learning methods.

 Future research may involve exploring optimization tactics, increasing speaker diversity in the dataset, and
implementing the model in real-world settings for enhanced security and user authentication.
PROBLEM STATEMENT

 The problem addressed in this research is the development of a speaker identification system capable of
accurately distinguishing between different speakers based on audio samples.

 The Traditional methods often face challenges such as limited scalability, susceptibility to noise, and dependence
on handcrafted features.

 Addressing these limitations, CNNVoiceDetect aims to leverage deep learning techniques to overcome these
challenges and achieve robust speaker identification performance across diverse datasets and environmental
conditions.
OBJECTIVE

 The aim of the project is to develop and evaluate deep learning models, particularly Convolutional Neural
Networks (CNNs), for speaker identification and verification tasks.

 The project aims to address the challenges of speaker recognition under various conditions, including noisy
environments and unconstrained audio data. By leveraging large-scale datasets such as VoxCeleb1 and
VoxCeleb2, the project seeks to achieve high accuracy rates in identifying speakers and distinguishing between
them.

 Additionally, the project aims to explore different methodologies for fine-tuning pre-trained models and
evaluating their performance using metrics such as accuracy, Equal Error Rate (EER), and precision.

 Ultimately, the goal is to contribute to the advancement of speaker recognition technology, with potential
applications in voice authentication, security systems, and other audio-based tasks.
LITERATURE REVIEW:
Title Year Objective Technology Used

Deep Speaker The paper proposes leveraging deep The paper utilizes deep neural networks,
Embeddings for Short- neural networks for speaker verification specifically convolutional and fully-connected
Duration Speaker 2017 with short-duration recordings, comparing attention models, to learn speaker embeddings
Verification directly from time-frequency speech
deep embeddings with traditional i-vectors
representationsWav2vec 2.0 models ,Support
and advocating for treating speech as Vector Machines (SVMs) with features extracted
images for improved recognition from the wav2vec 2.0 model
Unraveling Adversarial To address the threat posed by adversarial LightResNet34 and ECAPA-TDNN for attack
Examples against Speaker 2024 examples to speaker recognition systems. classification and detection
Identification –
Techniques for Attack
Detection and Victim
Model Classification
SPEAKER The objective of the paper is to address speaker Utilize Deep Neural Networks (DNNs),
RECOGNITION FOR 2019 recognition in multi-speaker conversations by particularly x-vectors, known for their
MULTI-SPEAKER combining deep neural network (DNN) effectiveness in both speaker recognition and
CONVERSATIONS embeddings, specifically x-vectors, with diarization tasks
USING X-VECTORS speaker diarization techniques.
LITERATURE REVIEW:
Title Year Objective Technology Used

VoxCeleb2: Deep Speaker The primary objective of this paper These models effectively
Recognition is speaker recognition under noisy and recognize speaker identities
2018 unconstrained conditions from voice under various
conditions using the
VoxCeleb2 dataset

Clova Baseline System for the Presenting Clova's baseline system for the Utilizing ResNet architecture,
VoxCeleb Speaker Recognition 2020 VoxCeleb Speaker Recognition Challenge incorporating techniques like self-
Challenge 2020 2020, focusing on ResNet-based models. attentive pooling (SAP) and Attentive
Statistics Pooling (ASP) for feature
aggregation

Fine-tuning wav2vec2 for The primary objective of this The authors utilize
speaker recognition. 2021 paper is to explore applying the wav2vec2 framework,
the wav2vec2 which is originally designed
framework to speaker for speech recognition.
recognition instead of speech
recognition.
LITERATURE REVIEW:
Title Year Objective Technology Used

Joint Speaker Counting, To propose a unified model for speaker- To propose a unified model for
Speech Recognition, and attributed automatic speech recognition (SA- speaker-attributed automatic speech
Speaker Identification for 2020 ASR) that addresses the challenges posed by recognition (SA-ASR) that addresses
Overlapped Speech of Any overlapped speech the challenges posed by overlapped
Number of Speakers speech

Strategies for Improving The objective of this study is to enhance the Time-domain implementation of
Speaker Discrimination in 2020 speaker discrimination capability of SpeakerBeam (TD-SpeakerBeam),
Target Speech Extraction SpeakerBeam for target speech extraction,
utilization of spatial features, multi-
task learning with SI-loss.
PROPOSED WORK:

Data Collection: The primary dataset utilized in this study is the VoxCeleb dataset, a large-scale audio-visual speaker recognition
dataset. VoxCeleb2 consists of over a million utterances from more than 6,000 speakers, collected from open-source media sources.
Additionally, noise samples are obtained from various sources to augment the dataset for robustness testing.

Data Preprocessing: Audio data undergoes preprocessing to ensure uniformity and compatibility with the model. This includes
resampling audio files to a consistent sample rate of 16 kHz, segmenting longer audio recordings into shorter clips, and augmenting data
by adding noise to simulate real-world conditions.

Model Development: The core of the methodology involves the development of CNN- based models for speaker recognition. The
model architecture consists of multiple layers of convolutional, pooling, and fully connected layers, designed to extract relevant features
from input audio spectrograms and classify them into speaker identities. The model architecture is based on prior research and
experimentation to optimize performance.

.
PROPOSED WORK:

Data Collection & Dataset

Model Training Model Evaluation
Preprocessing Generation

Prediction
PROPOSED WORK:

Model Training: The residual_block function allows for the effective training of deep neural networks by enabling the learning of
complex features while mitigating the vanishing gradient problem. It promotes better information flow and gradient propagation through
the network, leading to improved performance and convergence during training. The convolutional layers within the residual block are
responsible for learning spatial hierarchies of features within the input data. Activation functions introduce non-linearity into the model,
enabling it to learn complex mappings between inputs and outputs.

Model Evaluation: The trained models are evaluated using various metrics such as accuracy, Equal Error Rate (EER), and
precision-recall curves. Evaluation is performed on separate validation and test datasets to assess the generalization capability of the
models.
PROPOSED WORK:
RESULTS:

 The results obtained from the conducted experiments showcase the performance of the proposed CNN-based models for
speaker recognition.

 The models achieved a commendable accuracy of 95.4% on the test dataset, indicating their ability to effectively identify
speakers. Despite the high accuracy, the models exhibited a 7% Equal Error Rate (EER), suggesting a moderate level of
error in distinguishing between genuine and impostor speakers.

 Additionally, the precision of the models was measured at 87%, highlighting their capability to correctly identify true
positives while minimizing false positives.
RESULTS:
Our model finally predict the speaker from the test dataset for real-time period
Conclusion:

 The project successfully implements speaker identification using deep neural networks, showcasing the effectiveness of
deep embeddings compared to traditional methods.

 The model achieves high accuracy and demonstrates the potential for real-world applications in speaker recognition
systems.

 Robustness against variations in speech recordings and background noise levels.

 Scalability to accommodate a larger number of speakers and datasets.

 Adaptability for fine-tuning or retraining with additional data.

References:

[1] Gautam Bhattacharya, Md Jahangir Alam, Patrick Kenny (2017, August). In Conference: Interspeech 2017
[2] Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak(2024). https://arxiv.org/html/2402.19355v1 IEEE.
[3] David Snyder; Daniel Garcia-Romero; Gregory Sell(2019Published in:
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[4Joon Son Chung, Arsha Nagrani, Andrew Zisserman (2018). IEEE Journal of Biomedical and Health Informatics.
[5] Hee Soo Heo, Bong-Jin Lee, Jaesung Huh, Joon Son Chung. (2020, September). In International Conference on Text, Speech, and
Dialogue (pp. 423-436). Cham: Springer International Publishing.
[6]Nik Vaessen; David A. Van Leeuwen (2022)
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
https://ieeexplore.ieee.org/document/9413520/
[7] Naoyuki Kanda; Xuankai Chang; Yashesh Gaur; Xiaofei Wang(2021) Published in:
2021 IEEE Spoken Language Technology Workshop (SLT)
[8] Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, (2020)
[9] Nithin Rao Koluguri, Jason Li, Vitaly Lavrukhin, Boris Ginsburg (2020) arXiv:2010.12653v1 [eess.AS] 23 Oct 2020
Thank You

CNNs for Speaker Verification
No ratings yet
CNNs for Speaker Verification
6 pages
Chung18a PDF
No ratings yet
Chung18a PDF
6 pages
Mini Project Report Template
No ratings yet
Mini Project Report Template
31 pages
VoxCeleb - A Large-Scale Speaker Identification Dataset
No ratings yet
VoxCeleb - A Large-Scale Speaker Identification Dataset
6 pages
4 - Frame Level Speaker Embeddings For Text Independent Speaker Recognition and Analysis of End To End Model
No ratings yet
4 - Frame Level Speaker Embeddings For Text Independent Speaker Recognition and Analysis of End To End Model
7 pages
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
No ratings yet
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
6 pages
Updated Poster
No ratings yet
Updated Poster
1 page
Autospeech: Neural Architecture Search For Speaker Recognition
No ratings yet
Autospeech: Neural Architecture Search For Speaker Recognition
5 pages
Thesis Bich Ngoc Do
No ratings yet
Thesis Bich Ngoc Do
72 pages
A Hybrid of Deep Neural Network and EXtreme Gradie
No ratings yet
A Hybrid of Deep Neural Network and EXtreme Gradie
12 pages
Speaker Recognition: SRT Project of Signal Processing
No ratings yet
Speaker Recognition: SRT Project of Signal Processing
27 pages
Speaker Recognition Based On Deep Learning: An Overview
No ratings yet
Speaker Recognition Based On Deep Learning: An Overview
39 pages
Thesis On Speaker Recognition System
100% (2)
Thesis On Speaker Recognition System
4 pages
Anabarasi (1) 11
No ratings yet
Anabarasi (1) 11
16 pages
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
No ratings yet
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
6 pages
Xu 2020
No ratings yet
Xu 2020
5 pages
AI Speaker Recognition Insights
No ratings yet
AI Speaker Recognition Insights
5 pages
1 s2.0 S095741742302746X Main
No ratings yet
1 s2.0 S095741742302746X Main
13 pages
Overview of Speaker Modeling
No ratings yet
Overview of Speaker Modeling
27 pages
El 29 2 15
No ratings yet
El 29 2 15
8 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Voice (Speaker) Recognition Using Neural Networks: Synopsis
No ratings yet
Voice (Speaker) Recognition Using Neural Networks: Synopsis
4 pages
Este Es 1 Make 01 00031 PDF
No ratings yet
Este Es 1 Make 01 00031 PDF
17 pages
CNN Model for Speaker Identification
No ratings yet
CNN Model for Speaker Identification
14 pages
Project Report: "In Pursuit of Global Competitiveness"
75% (4)
Project Report: "In Pursuit of Global Competitiveness"
9 pages
2018ac04523 FR
No ratings yet
2018ac04523 FR
27 pages
Applsci 15 02924
No ratings yet
Applsci 15 02924
27 pages
An Overview of Text-Independent Speaker Recognitio PDF
No ratings yet
An Overview of Text-Independent Speaker Recognitio PDF
31 pages
A Lightweight CNN-Conformer Model For Automatic Speaker Verification
No ratings yet
A Lightweight CNN-Conformer Model For Automatic Speaker Verification
5 pages
Deep4SNet: Deep Learning For Fake Speech Classification
No ratings yet
Deep4SNet: Deep Learning For Fake Speech Classification
12 pages
Speaker Recognition Thesis
100% (3)
Speaker Recognition Thesis
8 pages
Wangzhi
No ratings yet
Wangzhi
3 pages
Utterance Based Speaker Identification
No ratings yet
Utterance Based Speaker Identification
14 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
2018ac04523 Final Report
No ratings yet
2018ac04523 Final Report
27 pages
A Study of Federated Learning Based Speaker Verifi
No ratings yet
A Study of Federated Learning Based Speaker Verifi
14 pages
Speaker Recognition PHD Thesis
100% (3)
Speaker Recognition PHD Thesis
5 pages
Report
No ratings yet
Report
3 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
Final Deepfake Voice Detection Report
No ratings yet
Final Deepfake Voice Detection Report
36 pages
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
No ratings yet
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
5 pages
Seminar Report Parthiv
No ratings yet
Seminar Report Parthiv
58 pages
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
No ratings yet
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
7 pages
Final Report Complete PDF
No ratings yet
Final Report Complete PDF
26 pages
Wirdiani+19396 AAP
No ratings yet
Wirdiani+19396 AAP
8 pages
Deep Feature Yuanliu Speechcom Liu
No ratings yet
Deep Feature Yuanliu Speechcom Liu
13 pages
Lab 8 Problem Statement
No ratings yet
Lab 8 Problem Statement
2 pages
Bhattacharya17 - Interspeech Bab 2
No ratings yet
Bhattacharya17 - Interspeech Bab 2
5 pages
2017 Interspeech Embeddings
No ratings yet
2017 Interspeech Embeddings
5 pages
Zhangdejun
No ratings yet
Zhangdejun
3 pages
Final Thesis Speech Recognition
No ratings yet
Final Thesis Speech Recognition
45 pages
Distinguishing Between Two Human Voices Using AI
No ratings yet
Distinguishing Between Two Human Voices Using AI
11 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Sc-Ecapatdnn: Ecapa-Tdnn With Separable Convolutional For Speaker Recognition
No ratings yet
Sc-Ecapatdnn: Ecapa-Tdnn With Separable Convolutional For Speaker Recognition
12 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
Full Text 01
No ratings yet
Full Text 01
54 pages
SV - VLSP2021 The Smartcall - ITS S Systems
No ratings yet
SV - VLSP2021 The Smartcall - ITS S Systems
5 pages
Hexatalk Using ANN and DNNS
No ratings yet
Hexatalk Using ANN and DNNS
4 pages
Meyo Stream Policy 4.0-EN-2
No ratings yet
Meyo Stream Policy 4.0-EN-2
1 page
Lab 2: Security Fabric: Do Not Reprint © Fortinet
No ratings yet
Lab 2: Security Fabric: Do Not Reprint © Fortinet
27 pages
2023EIR211E01MEMO
No ratings yet
2023EIR211E01MEMO
17 pages
Computer Diagnostics for G11 Students
No ratings yet
Computer Diagnostics for G11 Students
4 pages
3HAC049406-003 CD IRC5c - Rev10
No ratings yet
3HAC049406-003 CD IRC5c - Rev10
60 pages
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
No ratings yet
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
5 pages
DAA Assignment-4 & 5
No ratings yet
DAA Assignment-4 & 5
2 pages
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
No ratings yet
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
39 pages
Word Processing Application
No ratings yet
Word Processing Application
44 pages
Daftar Hadir Apel TGL 7 Oktober 2024
No ratings yet
Daftar Hadir Apel TGL 7 Oktober 2024
4 pages
1X-F4-xx: Product Data Sheet
No ratings yet
1X-F4-xx: Product Data Sheet
2 pages
Slides1 Introduction-Merged
No ratings yet
Slides1 Introduction-Merged
96 pages
PD Syllabus
No ratings yet
PD Syllabus
2 pages
OUA-Memo - 0421136 - May 2021 Mi-TechTalk Webinar Sessions On Microsoft 365 - 2021 - 04 - 28
No ratings yet
OUA-Memo - 0421136 - May 2021 Mi-TechTalk Webinar Sessions On Microsoft 365 - 2021 - 04 - 28
6 pages
Python Snake Game Project Report
No ratings yet
Python Snake Game Project Report
11 pages
Access Control Paxton KP50
No ratings yet
Access Control Paxton KP50
4 pages
Intelligent Document Processing Guide
No ratings yet
Intelligent Document Processing Guide
26 pages
Electrical Network Transfer Function
100% (2)
Electrical Network Transfer Function
21 pages
Pneumonia Detection Using CNN Published
No ratings yet
Pneumonia Detection Using CNN Published
10 pages
DW3000 User Manual
No ratings yet
DW3000 User Manual
255 pages
HKICO 2019-2020 Mock Final Blocky
50% (2)
HKICO 2019-2020 Mock Final Blocky
8 pages
April 5, 2024 - Ticket - Cebu To Ormoc
No ratings yet
April 5, 2024 - Ticket - Cebu To Ormoc
1 page
Aspiring IT Professional Resume
No ratings yet
Aspiring IT Professional Resume
3 pages
II ST QP Format ML
No ratings yet
II ST QP Format ML
1 page
Wireless Sensor Network Protocols
No ratings yet
Wireless Sensor Network Protocols
35 pages
Data Science & AI Certification Program For Managers & Leaders Learnbay
100% (1)
Data Science & AI Certification Program For Managers & Leaders Learnbay
41 pages
Training Report Deepak
No ratings yet
Training Report Deepak
36 pages
IT's Role in Banking Evolution
No ratings yet
IT's Role in Banking Evolution
9 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Mikrotik Manual Configuration
No ratings yet
Mikrotik Manual Configuration
28 pages

Speech Proceesing End Sem Reveiw FINAL

Uploaded by

Speech Proceesing End Sem Reveiw FINAL

Uploaded by

21AIE315 - AI IN SPEECH PROCESSING

DISTINCTIVE SPEAKER IDENTIFICATION

Data Collection & Dataset

 Robustness against variations in speech recordings and background noise levels.

 Scalability to accommodate a larger number of speakers and datasets.

 Adaptability for fine-tuning or retraining with additional data.

You might also like