0% found this document useful (0 votes)

56 views6 pages

(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M

The paper presents a comprehensive survey on Automatic Speaker Recognition (ASR) focusing on deep learning methodologies, including speaker verification, identification, and robust recognition techniques. It discusses the use of Autoencoders for feature extraction and noise reduction, as well as various deep learning architectures such as CNNs and RNNs. The study aims to enhance the accuracy of ASR systems in challenging conditions and provides insights into future research directions in the field.

Uploaded by

EighthSenseGroup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views6 pages

(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M

Uploaded by

EighthSenseGroup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

RESEARCH ARTICLE OPEN ACCESS

Automatic Speaker Recognition: A Survey

Rizwan K Rahim[1],Tharikh Bin Siyad [2], Muhammed Ameen M.A [3],
Muhammed Salim K.T [4] , Selin M [5]
[1], [2], [3], [4], [5]
Computer Science and Engineering, APJ Abdul Kalam Technological University – India

ABSTRACT
Speaker recognition is the task of identifying persons from their voices. Recently, deep learning has dramatically
revolutionized speaker recognition. This paper, reviews several major subtasks of speaker recognition, including
speaker verification, identification, and robust speaker recognition, with a focus on deep learning-based methods.
An Automatic Speaker Recognition is a biometric system that allows you to identify and verify people, using voice
as a discriminatory feature. Automatic Speaker Recognition (ASR) using Autoencoder is discussed here. This paper
discusses the Deep Learning methodologies for ASR followed by different Feature Extraction techniques. Then the
Autoencoder technology, its working, and its architecture and, how the ASR works using Deep Learning are
discussed. Finally, A survey about robust speaker recognition from the perspectives of domain adaptation and
speech enhancement, which are two major approaches to dealing with domain mismatch and noise problems is done.
Keywords- Deep Learning, Automatic Speaker Recognition, Auto-encoder, Feature Extraction, MFCC.

I. INTRODUCTION

An Automatic Speaker Recognition (ASR), is a non- Here, the reader gets a comprehensive overview of
invasive biometric system because it manipulates the the deep learning-based speaker recognition methods
voice as a discriminatory feature, also it presents a in terms of the vital subtasks and research topics,
great versatility during the evaluation so this is a including speaker identification, voice diarization,
process that only requires that the user speaks, which and obust speaker recognition. From this study, we
constitutes a natural act of human’s behavior [3]. It is hope to provide a useful resource for the speaker
known that a speaker’s voice contains personal traits recognition community. The main contributions of
of the speaker, given the unique pronunciation organs this article are to summarize deep learning-based
and speaking manner of the speaker, e.g. the unique feature extraction techniques for speaker verification
vocal tract shape, larynx size, accent, and rhythm. and identification, Make an overview of the deep
Therefore, it is possible to identify a speaker from learning-based speaker diarization, with an emphasis
his/her voice automatically via a computer system. on recent supervised, end-to-end, and online
This technology is termed automatic speaker diarization, and Survey robust speaker recognition
recognition, which is the core topic of this paper. from the perspectives of domain adaptation and
Speaker recognition is a fundamental task of speech speech enhancement, and Domain adaptation and
processing and finds its wide applications in real- speech enhancement are two major approaches to
world scenarios. For example, it is used for the voice- dealing with domain mismatch and noise problems.
based authentication of personal smart devices, such
as cellular phones, vehicles, and laptops. It Many studies have proposed techniques to improve
guarantees the transaction security of bank trading the accuracy of ASR in noisy and reverberant
and remote payment. It has been widely applied to conditions. One approach is to enhance the noisy
forensics for investigating a suspect to be guilty or features by applying noise removal techniques,
not guilty, or surveillance and automatic identity Others designed discriminative, handcrafted features
tagging. It is important in audio-based information that are more robust against noise and reverberation.
retrieval for broadcast news, meeting recordings, and Many works also propose adapting the acoustic
telephone calls. It can also serve as a frontend of models to noisy conditions. For deep learning
automatic speech recognition (ASR) for improving frameworks, various architectures are investigated to
the transcription performance of multi-speaker find better systems such as recurrent neural networks
conversations [4]. (RNN) and convolutional neural networks (CNN).

ISSN: 2347-8578 www.ijcstjournal.org Page 179

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

But here, We are trying to deal with Automatic The back-end first computes a similarity score
Speaker Recognition using the Auto-encoder between enrollment and test speaker features and
technology [5]. For the feature extraction techniques, then compares the score with a threshold:
Mel Frequency Cepstral Coefficients(MFCC)
predominates. Even though MFCC is the most cited
and used, there are some robust feature extraction
techniques that will work more accurately and
efficiently.

1.1 Overview and scope

This summary outlines four major research branches

of speaker recognition, which are speaker
verification, identification, and robust speaker
recognition respectively. The flowcharts of the first
three branches are burst; speaker recognition deals
with the challenges of noise and domain mismatch ………(1)
troubles. The topics of the overview are organized in
Fig. 2, which are characterized briefly as follows. where f(·) indicates a function for calculating the
similarity, w stands for the parameters of the back-
Speaker verification aims at verifying whether an end, xe and xt are the enrollment and test speaker
utterance is pronounced by a hypothesized speaker features respectively, ξ is the threshold, H0 represents
based on his/her pre-recorded utterances [6]. Speaker the hypothesis of xe and xt belonging to the same
verification algorithms can be classified into stage- speaker, and H1 is the opposite hypothesis of H0. One
wise and end-to-end ones. A stage-wise speaker of the major responsibilities of the back-end is to
verification system usually consists of a front-end for compensate for channel variability and reduce
the extraction of speaker features and a back-end for interferences, e.g. language mismatch. Because most
the resemblance calculation of speaker features. The back-ends aim at alleviating the interferences, which
front-end transforms an utterance in the time domain belongs to the problem of powerful speaker
or time-frequency domain into a high-dimensional recognition.
feature vector. It accounts for the recent advantage of
deep learning-based speaker recognition.

Fig 1.1 Overview of deep learning-based speaker recognition

comprises many layers with various neurons in each

II. DEEP LEARNING layer. These layers can vary from a few to thousands
and each layer may further comprise thousands of
Deep learning also called Deep Neural Network neurons (processing unit). The simplest function in a
neuron is to multiply the input values with the

ISSN: 2347-8578 www.ijcstjournal.org Page 180

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

allocated weight to each input and sum up the result Voice Activity Detection is a strategy used in speech
[8]. processing to recognize speech existence of speech
absence in audio. This procedure processes the
Deep learning approaches are reasonable for us to speech signals to rule out the silence fraction,
solve many problems. In the future, it is foreseeable otherwise, the training might be biased [11]. Long
that deep learning could demonstrate perfect theories Term Spectral Divergence (LTSD) algorithm [22]
to explain its performance. Meanwhile, its capacities was used concurrently with a noise compression
for unsupervised learning will be enhanced since script from SOX1 to perform this task. LTSD
there are millions of pieces of data in the world but it algorithm breaks an utterance into overlapped frames
is not applicable to add labels to all of them. It is also and gives scores for each frame on the probability
predicted that neural network structures will become that there is voice activity in the frame. The
more complex so that they can extract more probability is then developed to extract all the
semantically significant features [7]. What is more, duration with voice activity.
deep learning will combine with reinforcement
learning and we can use these benefits to accomplish
more tasks.

Deep learning models usually recognize hierarchical

structures to connect their layers [9]. The output of a
lower layer can be considered as the input of a
massive layer via simple linear or nonlinear
computations. These models can transform low-level
features of the data into high-level abstract features.
Owing to this characteristic, deep learning models
can be more powerful than shallow machine learning
models in feature representation.

There are many kinds of Deep Learning technologies

that we can use for ASR programs. Even though the
most cited and used one is Convolutional Neural
Network (CNN), it has so many drawbacks. So we
are using Auto-encoder as the Deep Learning
technology.

III. METHODOLOGY
A systematic workflow of the proposed Automatic
speaker recognition system is shown in Fig. 3.1.
Given an input speech signal, voice activity detection
is accomplished to identify speech presence or speech
absence in the given speech signal [10]. An Auto-
Encoder is used to denoise the noisy input and
enhance the quality & intelligibility of distorted
speech signals. Then audio feature vectors are
extracted and used to train the models using the
Gaussian mixture model. Lastly, the network Fig 3.1 Systematic workflow of the proposed
recognizes the speaker by testing the sample with the system
trained model.
3.2. AUTO-ENCODER

Speech enhancement (SE) aims to improve the

quality and intelligibility of deformed speech signals,
which may be caused by background noises,
3.1. VOICE ACTIVITY DETECTION interference, and recording accessories [12]. SE
strategies are generally used as pre-processing in

ISSN: 2347-8578 www.ijcstjournal.org Page 181

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

various audio-related applications, such as speech Mel Frequency Cepstral Coefficient (MFCC)
communication, automatic speech recognition (ASR),
speaker recognition, hearing assistance, and cochlear Mel Frequency Cepstral Coefficients (MFCC) is one
implants Denoising Autoencoders (DAE) has been of the most cited and used methods in the speech
widely explored in the field of speech signal processing community. It’s established on the
processing. documented the usefulness of DAE on simulation of cochlear auditory capability, with the
dereverberation and distant-talking speech design of a uniformly spaced filterbank in the Mel
recognition. [10] investigated the performance of frequency scale, which when altered to the linear
DAE on unsupervised domain adaptation for speech frequency scale, the spacing between filters is linear
emotion recognition. in the range of first 1000 Hz.

Auto-encoder allows our speaker verification system Feature extraction acts as a crucial position in
to quickly adapt to the release of any new models of training a model. It is essential to extract a set of
smart speakers. Finally, as the next step, we plan to features from audio signals. A group of extracted
extend the exploration beyond smart speakers to features is provided as input to the classifier. In
other fields in the industry where labeled speakers are speech recognition feature vector represents the
scarce but unlabeled data is abundant. speech waveforms. There are various feature
extraction strategies available to extract the features
An autoencoder-based semi-supervised curriculum from audio signals such as MFCC, delta MFCC,
learning scheme is proposed to automatically LPCC, PCA, etc [15].
accumulate unlabeled data and iteratively update the
corpus during training. This new training technique The Fourier transformation of the time-domain audio
allows us to (1) progressively improve the size of the signal into the frequency domain is called a spectrum.
training corpus by using unlabeled data and rectifying By using fast Fourier transformation samples from
previous labels at run-time; and (2) improve each frame are converted into frequency domain i.e.
robustness when generalizing to numerous spectrum. Mel scales for frequency f find out by
conditions, such as out-of-domain and text- using the equation:
independent speaker verification tasks. It is also
discovered that a denoising autoencoder can 𝑀𝑒𝑙 𝑓 = 2595 log10( 𝑓/700 + 1)
considerably enhance the clustering accuracy when it
is trained on a carefully-selected subset of speakers. Log magnitude of mel is called the mel spectrum.
DCT (Discrete Cosine Transform) applied to mel
An autoencoder-based semi-supervised curriculum spectrum and mel frequency cepstral coefficients
learning approach is proposed to rapidly adapt the features are computed. The computation of MFCC
speaker verification system to the unseen new features comprises numerous phases such as pre-
domain in which no labeled data is available [13]. processing, framing, windowing, estimation of
discrete Fourier transform, mel frequency, and
3.3. FEATURE EXTRACTION inverse document frequency.

An Automatic Speaker Recognition (ASR), is a non- Functionally, this scheme is established on the
invasive biometric system because they use the voice introductory process of windowing and overlapping,
as a discriminatory feature, also it illustrates a great then the signal power spectrum is assessed and
versatility during the examination so this is a process distributed into sub-bands through a Mel filterbank,
that only requires that the user speaks, which after that is logarithmically compressed, and finally
constitutes a natural act of human’s behaviour [3]. the Discrete Cosine Transform (DCT) is applied for
The voice has six information statuses, from spectral accumulating information in the first coefficients.
(lower level) to semantic level (upper level), and the
complexity during the information extraction
procedure rises proportionally respecting the level on
which it’s worked [14]. In real circumstances, the
voice can be supported by all kinds of noise, such as
public transport sound, channel distortion, and even
reverberation, because it is significant to use reliable
techniques in noisy conditions.

ISSN: 2347-8578 www.ijcstjournal.org Page 182

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

speaker “X” (This is the recognition phase), then the

voice is matched to the speaker “X” (Which we get
from the enrollment phase) voice print only. Based
on the amount of similarity we can set the threshold
for matching.

3.4.2. Speaker Enrollment: In this phase when a

new user comes into the system their voice samples
are stored and the d-vector is calculated of all the
samples and an average is taken and stored as that
user's voiceprint. so that next time the same user
Fig 4.1 Process of MFCC feature extraction comes we can match it with this stored voiceprint.
Here longer voice samples help to capture features
better and more samples help to show the variation of
the user's voice. A good voice sample falls in the
range of 3–5 seconds. Speaker Enrollment is also
known as Speaker Identification.

Figure 4.4 shows both Speaker Verification and

Speaker Identification.

Fig 4.2 Scaled Filterbank on Mel Frequency

3.4. SPEAKER RECOGNITION

Speaker recognition is a technique used to

automatically understand a speaker from a recording Fig 4.4 Speaker Verification and Speaker
of their voice or speech utterance. It has evolved into Identification
an economical and reliable method for person
identification and verification. This paper presents IV. CONCLUSION
the advancement of an automatic speaker recognition
system that incorporates the classification and Here presented a light introduction on an Automatic
recognition of speakers. Four classifier models, Speaker Recognition System using Deep Learning
namely, Support Vector Machines, K-Nearest and the present scenario where it is now. In the
Neighbors, Multilayer Perceptrons (MLP), and proposed system, the first phase is voice activity
Random Forest (RF), are trained using the WEKA detection and it is done by Long Term Spectral
data mining tool[16]. Auto-WEKA is assigned to Divergence (LTSD) algorithm. Then we discussed
specify the best classifier model together with its best the Auto-encoder technology as the denoising or
hyper-parameters. The performance of each model is Speech Enhancement technique we use in the ASR
assessed in WEKA manipulating 10-fold system. We familiarize ourselves with the feature
cross-validation. The following evaluation extraction procedure and the most important phase in
measurements are used; RMSE, Accuracy, Precision, the system. A well-known feature extraction method
and Recall are used to evaluate the performance of is used i.e., MFCCs (Mel Frequency Cepstral
the models. The Speaker verification is again divided Coefficient). Then we looked into the Speaker
into two, Speaker Verification and Speaker Verification procedures and their classifications.
Enrollment.

3.4.1. Recognition: It is a one-on-one matching

process. Here we have prior information that this is

ISSN: 2347-8578 www.ijcstjournal.org Page 183

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

V. REFERENCES [12]Cheng Yu, Ryandhimas E. Zezario, Syu-Siang

Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang
[1] T. Kinnunen and H. Li, "An overview of text- Lu, Hsin-Min Wang, Senior Member, IEEE, and Yu
independent speaker recognition: From features to Tsao, Senior Member, IEEE. Speech Enhancement
super vectors," Speech communication, vol. 52, no. 1, based on Denoising Autoencoder with Multi-
pp. 12-40, 2010. branched Encoders

[2] R. R Ramachandran, K. R. Farrell, R. [13] Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei
Ramachandran, and R. J. Mammone, "Speaker Machine Intelligence Technology, Alibaba Group.
recognition general classifier approaches and data Autoencoder-based Semi-Supervised Curriculum
fusion methods," Pattern Recognition, vol. 35, no. 12, Learning For Out-of-domain Speaker Verification.
pp. 2801-2821, 2002.
[14] Kiran Adnan, Rehan Akbar. An analytical study
[3] Campbell, Edward & Lara, José & Hernández- of information extraction from unstructured and
Sierra, Gabriel. (2018). Feature extraction of multidimensional big data Journal of Big Data 6,
Automatic Speaker Recognition, analysis,s and Article number: 91 (2019)
evaluation in a real environment.
[15] Vaisali A. Kherdekar, Dr.Sachin A.Naik
[4] Červa, Petr & Silovský, Jan & Zdánský, Jindrich (2021)Convolution Neural Network Model for
& Nouza, Jan & Seps, Ladislav. (2013). Speaker- Recognition of Speech for Words used in
adaptive speech recognition using speaker diarization Mathematical Expression.
for improved transcription of large spoken archives.
Speech Communication. 55. 1033-1046. [16] Tumisho Billson Mokgonyane,Tshephisho
Joseph Sefara,Thipe Isaiah Modipa,Mercy Mosibudi
[5] Yu, Dong & Li, Jinyu. (2017). Recent progress in Mogal,Madimetja Jonas Manamela .(2019)
deep learning-based acoustic models. IEEE/CAA Automatic Speaker Recognition System based on
Journal of Automatica Sinica. Machine Learning Algorithms

[6] Das, Rohan & Prasanna, S.. (2017). Speaker [17] D. Ferbrache, "Passwords are the broken-the
Verification from Short Utterance Perspective: A future shape of biometrics," Biometric Technology
Review. IETE Technical Review. 35. 1-19. Today, vol. 2016, no. 3, pp. 5-7, 2016.

[7] Sarker, I.H. Machine Learning: Algorithms, Real- [18] L. Hamid, "Biometric technology: not a
World Applications, and Research Directions. SN password replacement, but a compliment," Biometric
COMPUT. SCI. 2, 160 (2021). Technology Today, vol. 2015, no. 6, pp. 7-10, 2015.

[8] Steven Walczak, Narciso Cerpa, Artificial Neural [19] N. Singh, R. Khan, and R. Shree, "Applications
Networks. Encyclopedia of Physical Science and of speaker recognition," Pmcedia engineering, vol.
Technology (Third Edition), 2003 38, pp. 3122-3126, 2012.

[9] Jianzhu Ma,1, Michael Ku Yu,1,2, Samson [20] A. Larcher, K. A. Lee, B. Ma, and H. Li, "Text-
Fong,1,3, Keiichiro Ono,1 Eric Sage,1 Barry dependent speaker verification: Classifiers, databases
Demchak,1 Roded Sharan,4 and Trey Ideker1,2,3,* and rsr2015," Speech Communication, vol. 60, pp.
Using deep learning to model the hierarchical 56-77, 2014.
structure and function of a cell
[21] E. Aliyu, O. Adewale, and A. Adetunmbi,
[10] Dharm Singh Jat, ... Charu Singh, Voice Activity "Development of a text-dependent speaker
Detection-Based Home Automation System for recognition system," International Journal of
People With Special Needs. Intelligent Speech Signal Computer Applications, vol. 69, no. 16, 2013.
Processing, 2019
[22] E. Variani, X. Lei, E. McDermott, I. Lopez-
[11] rVAD: An unsupervised Segment-based Robust Moreno, and J. GonzalezDominguez, "Deep neural
Voice Activity Detection Method Zheng-Hua Tana, networks for small footprint text-dependent speaker
Achintya kr. Sarkara,b, Najim Dhaka verification." in ICASSP, vol. 14, 2014, pp. 4052-
4056.

ISSN: 2347-8578 www.ijcstjournal.org Page 184

Speaker Recognition Based On Deep Learning: An Overview
No ratings yet
Speaker Recognition Based On Deep Learning: An Overview
39 pages
Applsci 15 02924
No ratings yet
Applsci 15 02924
27 pages
Speaker Recognition Review
No ratings yet
Speaker Recognition Review
14 pages
1 s2.0 S095741742302746X Main
No ratings yet
1 s2.0 S095741742302746X Main
13 pages
An Overview of Text-Independent Speaker Recognitio PDF
No ratings yet
An Overview of Text-Independent Speaker Recognitio PDF
31 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Este Es 1 Make 01 00031 PDF
No ratings yet
Este Es 1 Make 01 00031 PDF
17 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
Xu 2020
No ratings yet
Xu 2020
5 pages
Speaker Recognition Overview
No ratings yet
Speaker Recognition Overview
30 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
Puede-ser-Speaker Identification Based On Hybrid Feature
No ratings yet
Puede-ser-Speaker Identification Based On Hybrid Feature
6 pages
A Study of Federated Learning Based Speaker Verifi
No ratings yet
A Study of Federated Learning Based Speaker Verifi
14 pages
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
Speaker Recognition Survey: Methods & Trends
No ratings yet
Speaker Recognition Survey: Methods & Trends
28 pages
Phonetic DNN for Speaker Recognition
No ratings yet
Phonetic DNN for Speaker Recognition
5 pages
An Investigation Into The Reliability of Speaker R
No ratings yet
An Investigation Into The Reliability of Speaker R
24 pages
4 - Frame Level Speaker Embeddings For Text Independent Speaker Recognition and Analysis of End To End Model
No ratings yet
4 - Frame Level Speaker Embeddings For Text Independent Speaker Recognition and Analysis of End To End Model
7 pages
Thesis On Speaker Recognition System
100% (2)
Thesis On Speaker Recognition System
4 pages
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
No ratings yet
Safety, Security, and Convenience: The Benefits of Voice Recognition Technology
5 pages
A Hybrid of Deep Neural Network and EXtreme Gradie
No ratings yet
A Hybrid of Deep Neural Network and EXtreme Gradie
12 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
CNNs for Speaker Verification
No ratings yet
CNNs for Speaker Verification
6 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
6 - Conception of Speaker Recognition Methods A Review
No ratings yet
6 - Conception of Speaker Recognition Methods A Review
6 pages
El 29 2 15
No ratings yet
El 29 2 15
8 pages
Voice Syn - NN
No ratings yet
Voice Syn - NN
15 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Overview of Speaker Modeling
No ratings yet
Overview of Speaker Modeling
27 pages
VoxCeleb - A Large-Scale Speaker Identification Dataset
No ratings yet
VoxCeleb - A Large-Scale Speaker Identification Dataset
6 pages
10.1007@s11042 019 08293 7
No ratings yet
10.1007@s11042 019 08293 7
16 pages
Speaker Recognition PHD Thesis
100% (3)
Speaker Recognition PHD Thesis
5 pages
Utterance Based Speaker Identification
No ratings yet
Utterance Based Speaker Identification
14 pages
Automatic Speaker Recognition System Based On Machine Learning Algorithms
0% (1)
Automatic Speaker Recognition System Based On Machine Learning Algorithms
12 pages
Thesis Bich Ngoc Do
No ratings yet
Thesis Bich Ngoc Do
72 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
Speech Proceesing End Sem Reveiw FINAL
No ratings yet
Speech Proceesing End Sem Reveiw FINAL
16 pages
1 en 16 Chapter OnlinePDF
No ratings yet
1 en 16 Chapter OnlinePDF
15 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
9 pages
Bhattacharya17 - Interspeech Bab 2
No ratings yet
Bhattacharya17 - Interspeech Bab 2
5 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
8 pages
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
No ratings yet
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
5 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
Deep4SNet: Deep Learning For Fake Speech Classification
No ratings yet
Deep4SNet: Deep Learning For Fake Speech Classification
12 pages
An Overview of The Development of Speaker Recognition
No ratings yet
An Overview of The Development of Speaker Recognition
11 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
2017 Interspeech Embeddings
No ratings yet
2017 Interspeech Embeddings
5 pages
Final
No ratings yet
Final
9 pages
Final Report Complete PDF
No ratings yet
Final Report Complete PDF
26 pages
++++tutorial Text Independent Speaker Verification
No ratings yet
++++tutorial Text Independent Speaker Verification
22 pages
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
No ratings yet
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
20 pages
Mini Project Report Template
No ratings yet
Mini Project Report Template
31 pages
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
No ratings yet
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
6 pages
(IJCST-V12I1P2) :DR .Elham Hamed HASSAN, Eng - Rahaf Mohamad WANNOUS
100% (1)
(IJCST-V12I1P2) :DR .Elham Hamed HASSAN, Eng - Rahaf Mohamad WANNOUS
11 pages
Genetic Algorithms for Network Paths
No ratings yet
Genetic Algorithms for Network Paths
26 pages
(IJCST-V11I6P8) :subhadip Kumar
No ratings yet
(IJCST-V11I6P8) :subhadip Kumar
7 pages
(IJCST-V11I4P10) :Dr.N.Satyavathi, Dr.E.Balakrishna
No ratings yet
(IJCST-V11I4P10) :Dr.N.Satyavathi, Dr.E.Balakrishna
6 pages
Smart Irrigation System for Lemons
No ratings yet
Smart Irrigation System for Lemons
9 pages
(IJCST-V11I4P5) :P Jayachandran, P.M Kavitha, Aravind R, S Hari, PR Nithishwaran
No ratings yet
(IJCST-V11I4P5) :P Jayachandran, P.M Kavitha, Aravind R, S Hari, PR Nithishwaran
4 pages
(IJCST-V11I6P8) :subhadip Kumar
No ratings yet
(IJCST-V11I6P8) :subhadip Kumar
7 pages
(IJCST-V11I4P15) :M. I. Elalami, A. E. Amin, S. A. Elsaghier
No ratings yet
(IJCST-V11I4P15) :M. I. Elalami, A. E. Amin, S. A. Elsaghier
8 pages
(IJCST-V11I4P16) :nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
No ratings yet
(IJCST-V11I4P16) :nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
7 pages
(IJCST-V11I4P12) :N. Kalyani, G. Pradeep Reddy, K. Sandhya
No ratings yet
(IJCST-V11I4P12) :N. Kalyani, G. Pradeep Reddy, K. Sandhya
16 pages
(IJCST-V11I6P4) :sina Ahmadi
No ratings yet
(IJCST-V11I6P4) :sina Ahmadi
11 pages
Blockchain for Legal Inquiry Monitoring
No ratings yet
Blockchain for Legal Inquiry Monitoring
9 pages
Deep Learning for Email Spam Detection
No ratings yet
Deep Learning for Email Spam Detection
4 pages
(IJCST-V11I3P25) :pooja Patil, Swati J. Patel
No ratings yet
(IJCST-V11I3P25) :pooja Patil, Swati J. Patel
5 pages
(IJCST-V11I3P12) :prabhjot Kaur, Rupinder Singh, Rachhpal Singh
No ratings yet
(IJCST-V11I3P12) :prabhjot Kaur, Rupinder Singh, Rachhpal Singh
6 pages
(Ijcst-V11i4p1) :fidaa Zayna, Waddah Hatem
No ratings yet
(Ijcst-V11i4p1) :fidaa Zayna, Waddah Hatem
8 pages
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
No ratings yet
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
4 pages
Machine Learning Models for Alzheimer's MRI
No ratings yet
Machine Learning Models for Alzheimer's MRI
8 pages
Image Deblurring Techniques Review
No ratings yet
Image Deblurring Techniques Review
15 pages
CNN Hyperparameter Tuning Guide
No ratings yet
CNN Hyperparameter Tuning Guide
7 pages
(IJCST-V11I2P15) :jas Simran Kaur, Rupinder Kaur, Balpreet Kaur
No ratings yet
(IJCST-V11I2P15) :jas Simran Kaur, Rupinder Kaur, Balpreet Kaur
9 pages
(IJCST-V11I3P2) :K.Vivek, P.Kashi Naga Jyothi, G.Venkatakiran, SK - Shaheed
No ratings yet
(IJCST-V11I3P2) :K.Vivek, P.Kashi Naga Jyothi, G.Venkatakiran, SK - Shaheed
4 pages
(IJCST-V11I3P9) :raghu Ram Chowdary Velevela
No ratings yet
(IJCST-V11I3P9) :raghu Ram Chowdary Velevela
6 pages
(IJCST-V11I3P1) :Dr.P.Bhaskar Naidu, P.Mary Harika, J.Pavani, B.Divyadhatri, J.Chandana
No ratings yet
(IJCST-V11I3P1) :Dr.P.Bhaskar Naidu, P.Mary Harika, J.Pavani, B.Divyadhatri, J.Chandana
4 pages
(IJCST-V11I3P5) :P Adhi Lakshmi, M Tharak Ram, CH Sai Teja, M Vamsi Krishna, S Pavan Malyadri
No ratings yet
(IJCST-V11I3P5) :P Adhi Lakshmi, M Tharak Ram, CH Sai Teja, M Vamsi Krishna, S Pavan Malyadri
4 pages
(IJCST-V11I2P18) :hanadi Yahya Darwisho
No ratings yet
(IJCST-V11I2P18) :hanadi Yahya Darwisho
32 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
(IJCST-V11I2P10) :mr. V. Raghu Ram Chowdary, G. Tejaswini, E. Divya, K. Aditya, M. Varjan Babu
No ratings yet
(IJCST-V11I2P10) :mr. V. Raghu Ram Chowdary, G. Tejaswini, E. Divya, K. Aditya, M. Varjan Babu
6 pages
Properties of Matter Explained
No ratings yet
Properties of Matter Explained
112 pages
Multiplying and Dividing by Powers of 10
No ratings yet
Multiplying and Dividing by Powers of 10
6 pages
St. Joseph's College of Engineering, Chennai-119 Department of Mechanical Engineering Sub. Name: Dynamics of Machinery Sub - Code: ME2302
No ratings yet
St. Joseph's College of Engineering, Chennai-119 Department of Mechanical Engineering Sub. Name: Dynamics of Machinery Sub - Code: ME2302
7 pages
Solutions For Exercises in Mechanical Behavior of Materials, 2nd Edition by William Hosford
No ratings yet
Solutions For Exercises in Mechanical Behavior of Materials, 2nd Edition by William Hosford
6 pages
Statistics: Rejection Regions Guide
67% (3)
Statistics: Rejection Regions Guide
12 pages
Cellular Respiration for Students
No ratings yet
Cellular Respiration for Students
15 pages
FMC Conventional Wellhead Breakdown
100% (1)
FMC Conventional Wellhead Breakdown
13 pages
1 Topic: Applications of Partial Differential Equations
No ratings yet
1 Topic: Applications of Partial Differential Equations
20 pages
Forecasting: Roles, Steps, Techniques
No ratings yet
Forecasting: Roles, Steps, Techniques
23 pages
The Definite German Articles: Der Die Das
No ratings yet
The Definite German Articles: Der Die Das
5 pages
Pic18 (L) f2x4xk22 - Instruction Set
No ratings yet
Pic18 (L) f2x4xk22 - Instruction Set
49 pages
Advanced Experimental and Numerical Techniques For Cavitation-Erosion (Chahine 2014)
No ratings yet
Advanced Experimental and Numerical Techniques For Cavitation-Erosion (Chahine 2014)
407 pages
BBACA 2019 Pat. SEM III CA 302 Data Structure MCQ
No ratings yet
BBACA 2019 Pat. SEM III CA 302 Data Structure MCQ
22 pages
Colleges Pune City 3
No ratings yet
Colleges Pune City 3
4 pages
Astm A367 22
No ratings yet
Astm A367 22
3 pages
Third Term ss2 Physics
No ratings yet
Third Term ss2 Physics
90 pages
Vectors and Equilibrium Guide
No ratings yet
Vectors and Equilibrium Guide
14 pages
988.18 Redigel PCA
No ratings yet
988.18 Redigel PCA
1 page
Tamil Nadu Board Question Paper For Class 12 Physics 2015
No ratings yet
Tamil Nadu Board Question Paper For Class 12 Physics 2015
24 pages
Thyristor Three-Phase Rectifier/Inverter Guide
100% (1)
Thyristor Three-Phase Rectifier/Inverter Guide
8 pages
Kinematics of Rectilinear Motion
No ratings yet
Kinematics of Rectilinear Motion
20 pages
Class X (Mathematics) : Holiday Homework
No ratings yet
Class X (Mathematics) : Holiday Homework
7 pages
Optimization of Submerged Arc Welding
No ratings yet
Optimization of Submerged Arc Welding
4 pages
Argocd
No ratings yet
Argocd
14 pages
ELE220A - Electrical Drives Exp.9: Speed Control of BLDC Motor
No ratings yet
ELE220A - Electrical Drives Exp.9: Speed Control of BLDC Motor
19 pages
PDC - Vortex - Xceed - Kuwait - Cs - ROP DATA PDF
No ratings yet
PDC - Vortex - Xceed - Kuwait - Cs - ROP DATA PDF
2 pages
Lab Report For Monossacharide
100% (1)
Lab Report For Monossacharide
15 pages
Kurmanji Basic Learning Manual
No ratings yet
Kurmanji Basic Learning Manual
32 pages
Second Internal Test - December II PUC Statistics
No ratings yet
Second Internal Test - December II PUC Statistics
2 pages
Beestar Math Worksheet - Grade 1: Target: 5 Mins - 90%
No ratings yet
Beestar Math Worksheet - Grade 1: Target: 5 Mins - 90%
2 pages

(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M

Uploaded by

(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M

Uploaded by

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

RESEARCH ARTICLE OPEN ACCESS

Automatic Speaker Recognition: A Survey

ISSN: 2347-8578 www.ijcstjournal.org Page 179

1.1 Overview and scope

This summary outlines four major research branches

Fig 1.1 Overview of deep learning-based speaker recognition

comprises many layers with various neurons in each

ISSN: 2347-8578 www.ijcstjournal.org Page 180

Deep learning models usually recognize hierarchical

There are many kinds of Deep Learning technologies

Speech enhancement (SE) aims to improve the

ISSN: 2347-8578 www.ijcstjournal.org Page 181

ISSN: 2347-8578 www.ijcstjournal.org Page 182

speaker “X” (This is the recognition phase), then the

3.4.2. Speaker Enrollment: In this phase when a

Figure 4.4 shows both Speaker Verification and

Fig 4.2 Scaled Filterbank on Mel Frequency

3.4. SPEAKER RECOGNITION

Speaker recognition is a technique used to

3.4.1. Recognition: It is a one-on-one matching

ISSN: 2347-8578 www.ijcstjournal.org Page 183

V. REFERENCES [12]Cheng Yu, Ryandhimas E. Zezario, Syu-Siang

ISSN: 2347-8578 www.ijcstjournal.org Page 184

You might also like