B4 A Novel Biometric Identification System Based On Fingertip Electrocardiogram and Speech Signals
B4 A Novel Biometric Identification System Based On Fingertip Electrocardiogram and Speech Signals
a r t i c l e i n f o a b s t r a c t
Article history: In this research work, we propose a one-dimensional Convolutional Neural Network (CNN) based
Available online 10 November 2021 biometric identification system that combines speech and ECG modalities. The aim is to find an effective
identification strategy while enhancing both the confidence and the performance of the system. In our
Keywords:
first approach, we have developed a voting-based ECG and speech fusion system to improve the overall
Biometric identification
Biometric recognition
performance compared to the conventional methods. In the second approach, we have developed a robust
CNN rejection algorithm to prevent unauthorized access to the fusion system. We also presented a newly
Fingertip ECG developed ECG spike and inconsistent beats removal algorithm to detect and eliminate the problems
Speech caused by portable fingertip ECG devices and patient movements. Furthermore, we have achieved a
system that can work with only one authorized user by adding a Universal Background Model to our
algorithm. In the first approach, the proposed fusion system achieved a 100% accuracy rate for 90 people
by taking the average of 3-fold cross-validation. In the second approach, by using 90 people as genuine
classes and 26 people as imposter classes, the proposed system achieved 92% accuracy in identifying
genuine classes and 96% accuracy in rejecting imposter classes.
© 2021 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/j.dsp.2021.103306
1051-2004/© 2021 Elsevier Inc. All rights reserved.
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
curacy of identification. In this paper, we present an original work the data comes to lie on the first coordinate (which is called the
addressing the use of ECG signals with speech signals for human first principal component), second greatest variance on the second
identification purposes. coordinate, and so forth. ECG data with reduced dimensionality
In our previous work, an ECG-based authentication system has then trained with a discriminant method using Bayes’ theo-
was proposed using three different feature extraction methods: rem, and their system achieved an 85.3% identification rate over
Autocorrelation and Discrete Cosine Transform (AC / DCT), Mel- lead-I ECG signals of 60 subjects.
frequency cepstrum coefficients (MFCC), and QRS complex of ECG In previous researches, analytic features were used to capture
signal. In the research, we presented the average frame recognition local information in a heartbeat signal. The performance of such
rate of each individual [14]. In our other work, we have developed systems depends on the accurate detection of fiducial points or
a dry contact portable fingertip ECG device that can collect the discriminant amplitude of the features. Therefore, W. Yongjin et
ECG signals over the left and right hand’s thumbs. Besides, we ana- al. [13] have suggested an appearance-based feature extraction
lyzed our existing system’s performance by increasing the number method capturing the holistic patterns in a heartbeat signal. Their
of people in the fingertip ECG dataset collected with the proposed proposed approach depends on estimating and comparing the sig-
device [11]. nificant coefficients of the discrete cosine transform of the autocor-
In this work, we propose a better identification system that related heartbeat signals. The solution they demonstrated has been
combines the benefits of ECG and speech modalities. Thus, the tested in ECG data from two public databases, PTB and MIT-BIH,
speech signals increase the system’s performance, whereas the ECG and has achieved 94.47% and 97.8% accuracy rates, respectively.
signal increases the confidence of the system. Furthermore, we Z. Zhao and L. Yang [12] introduced a new non-fiducial fea-
proposed an algorithm that prevents unauthorized users’ access ture extraction method based on matching pursuit (MP) sparse
while at the same time identifying the genuine class within the decomposition for ECG identification. In their system, the R lo-
system’s response time. We have added a UBM dataset to the algo- cation of ECG signals detected using the wavelet-based QRS de-
rithm so that it can work even if there is one registered authorized lineation method was passed through the MP Sparse Decomposi-
user. Thus, the proposed system contains practical solutions for tion algorithm. By finding the best matching feature projections of
identification and verification tasks in a single algorithm. More- multidimensional ECG data onto the span of an overcomplete dic-
over, we present a newly developed ECG spike and inconsistent tionary, the Support Vector Machine (SVM) classifier was trained
beats removal algorithm to detect and eliminate the unnecessary and achieved a 95.3% identification rate for 20 people.
signals that affect the system performance. X. Lei et al. [18] proposed a deep learning feature-based identi-
The organization of this paper is as follows: Section 2 presents fication system which, reduces the dependence of algorithm accu-
the related work on ECG-based biometric identification systems racy on the origin and length of the ECG signals. Unlike any other
and speech-based identification systems. In Section 3, the method- method, both fiducial and non-fiducial ECG features were used in
ology of pre-processing stage, feature extraction algorithms, post- their system. A 1-D CNN algorithm was developed to obtain tem-
processing stage, and classifier are given. Section 4 contains the de- poral points of the ECG signals and DCT of the segmented ECG
tails of the proposed system. Section 5 gives the information about vectors. Their system was tested on a publicly available database
ECG and speech databases and shows the experimental results. The using the backpropagation neural network (BPNN) and a non-linear
discussion and conclusion sections are presented in Sections 6 and SVM as classifiers and achieved a 99.33% accuracy rate at 100 sub-
7, respectively. jects.
L. Wieclaw et al. [19] proposed a biometric identification sys-
2. Related works tem based on deep learning techniques. They constructed a fin-
gertip ECG-based acquisition system which obtained the signals
Before presenting the details of the fingertip ECG and speech- from the subject’s right and left-hand fingers by using Ag/AgCl
based identification system employed in this study, we first electrodes. Using the fingertip ECG of 18 subjects, their system
present the literature reviews for ECG-based biometric identifica- achieved 96% identification performance.
tion systems and speech-based identification systems, respectively. H. Gürkan et al. [20] proposed a 2-D CNN-based biometric iden-
tification system. Their system works with the principle of finding
2.1. ECG-based identification systems the QRS complex of each people in the database and putting them
into a 256 x 256 QRS image matrix. By designing a proper 2-D
In recent years, there has been a shift towards the use of med- CNN architecture for the QRS matrix, they achieved an identifica-
ical signals for biometric purposes due to the high security de- tion rate of 98.08% for a database consists of 46 people.
mands, and among all medical signals, the ECG signal has been R. Srivastva et al. [21] proposed an ensemble learning technique
accepted to be the most prominent [10]. The first study of using by gathering four fine-tuned models i.e. “ResNet” and “DenseNet”
ECG as a biometric identification purpose was presented by L. Biel into one stacking model i.e. “PlexNet”. Therefore, PlexNet takes ad-
et al. [15] in 2001, and a 12-lead ECG monitor was used to col- vantage of transfer learning and making a novel model for ECG
lect 360 biometric features for each person. The identification was biometrics that is both robust and secure. By combining two pub-
achieved with 100% accuracy among 22 subjects. licly available databases, PTB and CYBHI, 176 subjects with two
Discriminant analysis has been proposed by M. Kyoso and A. recorded sessions of each were added in a mixed dataset. Then, by
Uchiyama [16] to improve the performance of the ECG-based sys- using the mixed dataset, 2D ECG images of size 150 x 150 contain-
tem. To perform the identification process, they have suggested ing from three heartbeats were feedforward through and trained
extraction of four feature parameters: P wave duration, PQ inter- four paths of PlexNet, separately. Best identification performance
val, QRS interval, and QT interval, which are not affected by the was obtained as 99.66% when 63 people in the CYBHI dataset were
R-R interval or electrode’s condition. used.
Z. Zhang and D. Wei [17] improve M. Kyoso and A. Uchiyama’s Y. Zhang et al. [22] proposed a solution for the three impedi-
work and suggested new fiducial points: QRS, Q, R, and S durations ments to fingertip ECG: the impact of variation in ECG measure-
Q, R, S, and T amplitudes, ST segments, QRS area, and PR interval. ments system, the high computational complexity of traditional
In their system, ECG records were passed through the Principal CNN models, and lack of sufficient fingertip samples. Therefore,
Component Analysis (PCA) to be transformed into a new coordi- they suggested using a recurrence plot (RP) which reflects the
nate system such that the greatest variance by any projection of motion laws the information of the nonlinear signal in the high-
2
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
3
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
4
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Table 1
1-D CNN architecture.
N
log2
k=2 cal
(2)
In equation (2), cal is the calibration value (30 in our application),
k is the number of significant MFCC features which we want to
find by using the VQ algorithm.
( X − Xmin ) (b − a)
X = a + (3)
X max − X min
In the proposed method, b and a is equal to 1 and 0, respectively.
Fig. 3. P-QRS-T interval of a person.
3.4. 1-D convolutional neural network classifier
3.3. Post-processing stage Artificial Neural Networks (ANN) have become popular over
a wide range of data with non-linearity features during the last
3.3.1. Vector quantization decade. It provides a complex decision with the increase of hid-
Vector quantization (VQ) is used to prevent the overfitting den layers where classical machine learning algorithms could not.
problem in machine learning. It eliminates the redundant features CNN is the feed-forward ANN with additional convolutional and
to prevent the correlated data may confuse the classifier. In the subsampling layers [35]. We can either train a massive 2-D visual
ECG-based identification process, the VQ algorithm was applied database or 1-D speech or ECG database with a proper training
only onto the dataset used in the training process, and 16 signifi- process by selecting appropriate convolutional and hidden layers.
cant P-QRS-T features were found for each person because the ECG The proposed classifier contains two independent 1-D CNN archi-
signal of a person does not vary much. In the speech-based identi- tectures. Both architectures are summarized in Table 1.
fication, the VQ process was applied on both training and test data, In the ECG-based biometric identification problem, both the 1-D
and 32 significant MFCC features were found for every 10 seconds CNN-based [36–38] and 2-D CNN-based [21,22,39] models have
been proposed in the literature in recent years. Although all these
speech signal of a person. The following equation shows how the
works that ECG signal is used as a single modality have high
algorithm proceeded for the speech-based identification system.
biometric identification performances, 2-D CNN-based models pro-
rt vide slightly better biometric identification performance than 1-D
N= (1)
f t − ( f t × ovl) CNN-based models. However, 1-D CNN-based models have less
computational complexity than 2-D CNN-based models [35]. Be-
where N represents number of MFCC features found in a specific cause when an NxN image is convolved with a K xK kernel in the
response time, rt represents response time of the algorithm (10 two-dimensional convolution operation, the computational com-
s in our application), f t represents MFCC framing time (0.02 s in plexity will be O ( N 2 K 2 ), whereas it will be O ( N K ) in the one-
our application), ovl represents the frame overlapping percentage dimensional convolution operation [35]. Therefore 1-D CNN-based
(0.65 in our application). models can be preferred in low-cost and real-time applications.
5
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
In the proposed method (shown in Fig. 4), both the ECG and
speech signals will be assumed to be recorded by an acquisition
system with both microphone and instrumentation amplifier. In
the data acquisition stage, filters were applied to both signals to
eliminate baseline wander, muscle noise, power-line noise, or un-
wanted frequency components. The minimum time for framing
was considered to be 10 seconds for the testing phase. No time
constraints were set on the signal to be trained in our experiments.
Half of the speech signals were assigned in the testing folder and
the other half in the training folder. ECG signals recorded in the
Fig. 4. Block diagram of the proposed biometric recognition system.
first week/month were assigned to the testing folder. On the other
hand, the second records taken a week/month later were assigned
to the training folder. Then, both ECG and speech signals were to this class. This newly appointed class can also be called Uni-
passed through the pre-processing stage, and inconsistent beats versal Background Model (UBM). When the ECG threshold value
and spikes were filtered out from the ECG signal, whereas speech (T hECG ) is higher than zero, we add a class in which the unautho-
signals were separated from their silent parts. After that, MFCC rized users were rejected for ECG-based identification algorithm.
features were extracted from speech signals, and the P-QRS-T fea- The same process was also applied for the speech-based identi-
tures were extracted from ECG signals. Then, the VQ algorithm was fication algorithm when the value of speech threshold (T hSpeech )
applied to the P-QRS-T features in the training folder, and 16 sig- is higher than zero. ECG UBM and speech UBM were also passed
nificant P-QRS-T features were found. For the speech signals in the through the same process with genuine classes in the training
training and testing folders, every MFCC features found in 10 sec- stage until the vector quantization. For ECG UBM, we found 1 sig-
onds passed through VQ, and 32 significant MFCC features were nificant P-QRS-T feature for each ECG signal in the UBM database,
extracted. Two 1-D CNN classifiers were trained using the features consisting of 152 people. For the speech UBM, we found 32 signif-
in the training folder. Then, features from the testing folder were icant MFCC features for every person’s speech signal, and a total
given as input to the CNN algorithm to get probabilistic scores. of 4320 significant MFCC features were found for 135 people. Af-
The higher the values of the scores indicate, the higher chance ter that, both 1-D CNNs were trained by using the features of both
of class ID. The number of scores we get is related to the num- genuine classes and additional UBM class (Class ID 0).
ber of classes we trained. If there is no class to reject the unau- Two main experiments have been done in the evaluation of
thorized user, the proposed system will appoint it to one known the proposed system. One of them was made where the thresh-
(genuine) class. Therefore, class ID 0 is defined as an imposter re- old value was set to zero, and the other was made by taking
jection class, and the proposed system assigns unauthorized users the threshold value of the point where False Acceptance Rate
6
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
7
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Table 2
Properties of test and train sets.
Table 3
Average accuracy (%) of 3-fold cross validation.
Table 4
Classification performance of the proposed method for each fold.
Table 5
Identification accuracy for different score fusion algorithms.
8
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Table 6 of 0.02 whereas ADAM and RMSProp started to learn the model
Accuracy rates of the system tested with 90 genuine and 26 imposter people. with the minimum of 0.005 learning rate. Learning the model was
Proposed # of Genuine Imposter Equal completed in approximately 72 minutes in ADAM and RMSProp
system subjects accuracy accuracy error rate whereas 57 minutes in SGDM. By the condition of increasing the
(%) (%) (%) imposter rejection accuracy of the fusion system, the optimum pa-
ECG 90 71.08 71.05 28.92 rameters for the learning rate and batch size of the CNN were
Speech 90 86.48 87.82 12.95 found as 0.01 and 128, respectively.
Fusion 90 91.68 96.05 6.82 In the third experiment, we increased the ECG dataset by
adding the remaining ECG signals measured at the palm in the
used both imposter and genuine people for testing the system. In CYBHi database [40] and ECG signals measured at the arm in the
the performance evaluation, ECG and speech (rejection) thresholds ECG-ID database [44] to our system. Our ECG database was in-
were calculated by finding the intersection of FRR (genuine) and creased to 226 people, and randomly selected 176 people used as
FAR (imposter), where both of them were minimized (shown in genuine classes, whereas 50 people for imposter classes. As for the
Fig. 7 and Fig. 8). The thresholds at the intersection point were speech database, we selected 361 people in the “train-clean-360”
found as 0.815 and 0.53 for the ECG (rejection) threshold and dataset from LibriSpeech, consisting of 921 people. We randomly
speech (rejection) threshold, respectively. After that, the perfor- selected 176 people for genuine, 50 people for imposter classes,
mance of the system was evaluated by using the obtained rejection whereas the remaining 135 people for constructing UBM. The ex-
thresholds, and the results are given in Table 6. periment was conducted using the previous (rejection) thresholds
Decreasing the threshold provides convenience, whereas in- and compare the results with new threshold values found from
creasing the threshold provides better security; however, it makes the intersection of FRR and FAR. The results given in Table 9 show
users do multiple attempts to be accepted into the system. It must that (rejection) threshold values change whenever the dataset is
be chosen wisely, taking into account the application to be used. increased, so a suitable decision rule must be set for the cause
In Table 6, the Genuine Accuracy and Imposter Accuracy were of balancing both imposter rejection and genuine acceptance accu-
found by the following equations racy.
In the fourth experiment, the effect of the time constraint has
Number of Genuine classes correctly identified
Genuine Accuracy = (4) been examined on 90 people’s fingertip ECG and speech data in
Number of genuine attempts the first fold. Using the databases in previous experiments, we
Number of Imposter classes correctly rejected changed the response time of the system to 1, 3, 5, 10, 20, and
Imposter Accuracy = (5)
Number of imposters attempts 60 seconds, respectively. If the length of ECG or speech signals of
some people did not meet the required time constant, we would
In Fig. 7 and Fig. 8, False Acceptance Rate (FAR) and False Re-
evaluate the system by the maximum length of signal each has.
jection Rate (FRR) were found by the following equations
The experiment was conducted for each class trained with suffi-
Number of incorrectly accept access attempts cient features without time constraints. In the performance evalu-
by unauthorized users ation, we tested the system by giving the features to the system in
FAR = (6)
Number of unauthorized users attempts a specific time range, and imposter rejection and genuine identifi-
Number of false rejection over genuine users cation accuracy rates are given in Table 10 and Table 11.
FRR = (7)
Number of genuine users attempts In the fifth experiment, we exchanged the LibriSpeech database
with the RedDots database [45], which had 62 speakers, includ-
In Fig. 7, it can be easily seen that the system accepted all the ing 49 male speakers and 13 female speakers from 21 countries.
imposter users when the value of T hECG was low. Increasing the The records were taken through mobile crowd-sourcing to benefit
value of T hECG improved the imposter rejection accuracy whereas a potentially wider population and greater diversity. The language
decreased the accuracy of correctly identifying genuine users. It was also English; however, the dataset was constructed to increase
was the same with speech signal (shown in Fig. 8). However, it
the difficulty of the identification system, which had background
differs at some point: the increases of the speech UBM dataset
noises such as the sound of a musical instrument, mouse-clicking
improved the imposter rejection accuracy significantly while the
sounds, the voice of crowds at the background, or had speak-
value of T hSpeech was still low. In our research, we increased the
ers whose pronouncing of the spoken language were bad. The
speech UBM dataset to 250 people and saw that it decreased FAR
database was used to simulate the performance of the proposed
(imposter) to 42% while T hSpeech was at ‘0’ whereas did not af-
algorithm for the speech signals recorded in noisy environments,
fect genuine accuracy. It indicated that increasing of speech UBM
and the results are given in Table 12.
dataset significantly improved the imposter rejection performance.
In the final experiment, a number of 1, 3, and 5 people
However, it could not be said for the ECG system because when we
were randomly selected as genuine classes, whereas the remain-
increase the ECG UBM dataset to 550 people, it only decreased the
ing classes in the 116 people were selected as imposter classes
FAR (imposter) to 90% while T hECG was at ‘0’. It also affected the
and used for training the proposed system. In addition to the num-
accuracy of genuine users to decrease for the ECG system. There-
ber of genuine classes, we exchanged person/people in the genuine
fore, we suggest that rather than increasing the ECG UBM dataset,
class/classes three times from the database. Then we conducted a
ECG UBM should be split into multiple (rejection) classes, not af-
test to evaluate the average genuine identification accuracy and the
fecting the genuine accuracy rate.
average imposter rejection accuracy of the proposed system, and
Tables 7 and 8 show the accuracy rates of the system with
the results are given in Table 13.
respect to changes in learning rates and batch size of the CNN,
respectively. The CNN was optimized according to our classifica-
tion problem. The hyperparameters used in the training of the 6. Discussion
CNN were fine-tuned by making performance comparison [43]. For
our application, we concluded that the stochastic gradient descent To the best of our knowledge, there is no research work which
with momentum (SGDM) was a better optimizer than ADAM and combines the ECG and speech modalities in the literature for bio-
RMSProp due to its performance and learning speed. SGDM opti- metric identification purposes. Therefore, we discussed the identi-
mizer started to learn the model with the minimum learning rate fication results of the relevant research in that area with respect
9
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Table 7
Accuracy rates of the system tested with 90 genuine and 26 imposter people by changing the learning rates of
the CNN (the batch size is 128).
Genuine accuracy (%) Imposter accuracy (%)
Learning rate 0.01 0.005 0.001 0.0005 0.01 0.005 0.001 0.0005
ECG 71.09 66.07 70.50 72.87 71.05 76.32 73.03 73.03
Speech 86.48 87.57 85.66 86.18 87.82 88.71 87.82 87.07
Fusion 91.68 93.47 89.90 91.49 96.05 93.42 93.42 93.42
Table 8
Accuracy rates of the system tested with 90 genuine and 26 imposter people by changing the batch size of the
CNN (the learning rate is 0.01).
Genuine accuracy (%) Imposter accuracy (%)
Batch size 64 128 256 512 64 128 256 512
ECG 70.10 71.09 76.04 73.66 71.05 71.05 75.66 73.68
Speech 84.14 86.48 87.70 88.17 85.14 87.82 88.11 87.37
Fusion 91.09 91.68 91.29 95.25 96.05 96.05 90.13 94.74
Table 9
Accuracy rates of the system tested with 176 genuine and 50 imposter people.
Previous rejection thresholds New rejection thresholds
Proposed Genuine Imposter Genuine Imposter Equal error
system accuracy accuracy accuracy accuracy rate (%)
(%) (%) (%) (%)
ECG 78.74 60.31 73.24 72.76 27.03
Speech 59.85 99.09 85.83 88.56 13.09
Fusion 74.61 100.0 89.54 96.10 10.30
10
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Table 14
Comparison of the biometric identification systems in the literature.
Author Year DB Modality NS Results (%) RT
L. Biel et al. [15] 2001 Private ECG 20 IDR 98 NA
Z. Zhang et al. [17] 2006 NA ECG 60 IDR 85.3 10 s
W. Yongjin et al. [13] 2007 PTB ECG 13 IDR 74.45 1
MIT-BIH IDR 74.95 HB
Z. Zhao et al. [12] 2011 QT ECG 20 IDR 95.3 NA
H. Yu et al. [24] 2015 TIMIT Speech 20 IDR 97.6 ∼9 s
N. Almaadeed et al. [25] 2015 Grid Speech 34 IDR 97.5 NA
N. Aboelenien et al. [26] 2016 CHAINS Speech 36 IDR 91 2s
X. Lei et al. [18] 2016 PTB ECG 100 IDR 99.3 NA
L. Wieclaw et al. [19] 2017 Private Fingertip ECG 18 IDR 96 NA
H. Gürkan et al. [20] 2019 MIT-BIH Arr. ECG 46 IDR 99.3 NA
A. Imran et al. [27] 2019 MOOC Speech 119 IDR 93.37 3s
IDR 94.44 5s
IDR 94.64 7s
S. El-Moneim et al. [28] 2020 Chinese Mandarin Speech 5 IDR 98.7 NA
Corpus
R. Srivastva et al. [21] 2021 CYBHI Fingertip ECG 63 IDR 99.66 3
HB
Y. Zhang et al. [22] 2021 CYBHI Fingertip ECG 60 IDR 98.77 3
HB
The proposed method 2021 CYBHI + Private Fingertip ECG 90 IDR 90.17 10 s
(ECG modality)
The proposed method 2021 LibriSpeech Speech 90 IDR 97.87 10 s
(speech modality)
The proposed method 2021 LibriSpeech Speech 90 IDR 99.97 10 s
(ECG&speech fusion) CYBHI + Private Fingertip ECG
where DB refers to database, NS refers to number of subjects, IDR refers to identification rate, RT refers to response time
indicating the frame length of ECG or speech signals while evaluating the system, Private refers to a database constructs
by authors, NA indicates that information is not available or computable, HB refers to Heartbeat.
The proposed speech-based identification system outperformed experiment, we compared both imposter and genuine class accu-
the other works given in Table 14 in terms of the accuracy, num- racy rates by changing the response time of the proposed system.
ber of subjects, and identification performance. Although accuracy In the fifth experiment, the speech dataset was exchanged with
rates were decrease when the system set as verification, they did the RedDots database, given 48 genuine classes and 14 imposter
not decrease as much as our identification system that uses only classes; in this case, the proposed method achieved 73.52% gen-
ECG signals. In addition, the contribution of speech signals on the uine identification accuracy and 85.36% imposter rejection accu-
performance of our proposed fusion-based identification system is racy. In the final experiment, we showed that the proposed system
relatively high. Thus, it causes our fusion systems to get vulnerable also worked when a few people were registered.
to spoofing attacks when the weight of speech signals compared It should be noted that when the proposed system is set as a
to ECG signals is increased. It is a major drawback of using speech verification system, it begins to reject unauthorized people. How-
signals. In order to eliminate the drawbacks which arise from ECG ever, the identification accuracy of the proposed system decreases
and speech modalities alone, we combined ECG and speech modal- for genuine people. Also, we provide users a choice of how much
ities to increase the performance of the overall system. Thus, while security or convenience is needed for the proposed system accord-
speech signals are increasing identification performance, ECG sig- ing to the application.
nals increase the confidence of the proposed system because of its It is concluded that the performance of the proposed fusion
liveliness detection. system is higher than that of the performances of the other works
that use a single type of modality.
7. Conclusion
Declaration of competing interest
In this research, we introduced a fusion-based system that com-
bines the ECG and speech modalities for identification and verifica- The authors declare that they have no known competing finan-
tion tasks in a single algorithm. The proposed fusion algorithm was cial interests or personal relationships that could have appeared to
developed to work on real-time security applications with multiple influence the work reported in this paper.
users. In the algorithm, we provided a solution for the degradation
of fingertip ECG signals caused by patient’s movements. The pro- Acknowledgment
posed fusion method worked with the principle of a voting method
by looking at each independent system’s outcome. This research work was supported by Coordination Office for
The first experiment results showed that the proposed fusion- Scientific Research Projects, FMV ISIK University (Project Number:
based system achieved a 100% accuracy rate for 90 people when 14A203) and Scientific Research Projects Unit, Bursa Technical Uni-
there was no imposter rejection feature. The second experiment versity (Project Number: 181N14).
results showed that our algorithm rejected the imposter classes
with a 96.05% accuracy rate, whereas accepted and identified gen- References
uine classes with a 91.68% accuracy rate for 90 people. In the third
experiment, we increased both the ECG and speech databases, [1] S. Chauhan, A. Arora, A. Kaul, A survey of emerging biometric modalities, Proc.
Comput. Sci. 2 (2010) 213–218, https://doi.org/10.1016/j.procs.2010.11.027.
evaluated the system performance with 176 genuine, 50 imposter [2] A. Fratini, M. Sansone, P. Bifulco, M. Cesarelli, Individual identification via
classes, and achieved an 89.54% accuracy rate in the genuine electrocardiogram analysis, Biomed. Eng. Online 14 (1) (2015) 78, https://
classes, whereas 96.1% imposter rejection accuracy. In the fourth doi.org/10.1186/s12938-015-0072-y.
11
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
[3] J. Ribeiro Pinto, J. Cardoso, A. Lourenco, Evolution, current challenges, and fu- [28] S. El-Moneim, M. Nassar, M. Dessouky, N. Ismail, A. El-Fishawy, F. Abd El-Samie,
ture possibilities in ECG biometrics, IEEE Access 6 (2018) 34746–34776, https:// Text-independent speaker recognition using LSTM-RNN and speech enhance-
doi.org/10.1109/ACCESS.2018.2849870. ment, Multimed. Tools Appl. 79 (2020) 24013–24028, https://doi.org/10.1007/
[4] W. Wójcik, K. Gromaszek, M. Junisbekov, Face recognition: issues, methods and s11042-019-08293-7.
alternative applications, https://doi.org/10.5772/62950, 2016. [29] J. Pan, W.J. Tompkins, A real-time QRS detection algorithm, IEEE Trans. Biomed.
[5] M. Bassiouni, A machine learning technique for person identification using ECG Eng. 32 (1985) 230–236.
signals, IOSR J. Appl. Phys. 1 (2016) 37. [30] J. Sohn, S. Kim, W. Sung, A statistical model based voice activity detector, IEEE
[6] J. Arteaga-Falconi, H. Al Osman, A. El Saddik, ECG authentication for mobile de- Signal Process. Lett. 6 (1999) 1–3, https://doi.org/10.1109/97.736233.
vices, IEEE Trans. Instrum. Meas. 65 (2015) 1–10, https://doi.org/10.1109/TIM. [31] S.B. Davis, P. Mermelstein, Comparison of parametric representations for mono-
2015.2503863. syllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust.
[7] N. Samarin, D. Sannella, A key to your heart: biometric authentication based Speech Signal Process. (1980) 357–366.
on ECG signals, arXiv:1906.09181 [abs]. [32] S. Chakroborty, G. Saha, Feature selection using singular value decomposition
[8] T.W. Shen, W.J. Tompkins, Y.H. Hu, One-lead ECG for identity verification, in: and QR factorization with column pivoting for text-independent speaker iden-
Proceedings of the Second Joint 24th Annual Conference and the Annual Fall tification, Speech Commun. 52 (9) (2010) 693–709, https://doi.org/10.1016/j.
Meeting of the Biomedical Engineering Society, in: Engineering in Medicine specom.2010.04.002.
and Biology, vol. 1, 2002, pp. 62–63. [33] N. Almaadeed, A. Aggoun, A. Amira, Text-independent speaker identification
[9] W.-H. Jung, S.-G. Lee, ECG identification based on non-fiducial feature ex- using vowel formants, J. Signal Process. Syst. 82 (3) (2016) 345–356, https://
traction using window removal method, Appl. Sci. 7 (2017) 1205, https:// doi.org/10.1007/s11265-015-1005-5.
doi.org/10.3390/app7111205. [34] M. Xu, L.-Y. Duan, J. Cai, L.-T. Chia, C. Xu, Q. Tian, HMM-based audio keyword
[10] B. Noureddine, R. Fournier, A. Naït-ali, F. Reguig, A novel biometric authenti- generation, in: K. Aizawa, Y. Nakamura, S. Satoh (Eds.), Advances in Multimedia
cation approach using ECG and EMG signals, J. Med. Eng. Technol. 39 (2015) Information Processing - PCM 2004, Springer Berlin Heidelberg, Berlin, Heidel-
1–13, https://doi.org/10.3109/03091902.2015.1021429. berg, 2005, pp. 566–574.
[11] G. Guven, H. Gürkan, U. Guz, Biometric identification using fingertip electro- [35] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, D.J. Inman, 1D convo-
cardiogram signals, Signal Image Video Process. 12 (2018) 933–940, https:// lutional neural networks and applications: a survey, Mech. Syst. Signal Process.
doi.org/10.1007/s11760-018-1238-4. 151 (2021) 107398, https://doi.org/10.1016/j.ymssp.2020.107398.
[12] Z. Zhao, L. Yang, ECG identification based on matching pursuit, in: 2011 4th [36] Q. Zhang, D. Zhou, X. Zeng, HeartID: a multiresolution convolutional neural
International Conference on Biomedical Engineering and Informatics (BMEI), network for ECG-based biometric human identification in smart health applica-
vol. 2, 2011, pp. 721–724. tions, IEEE Access 5 (2017) 11805–11816, https://doi.org/10.1109/ACCESS.2017.
[13] W. Yongjin, A. Foteini, D. Hatzinakos, K. Plataniotis, Analysis of human electro- 2707460.
cardiogram for biometric recognition, EURASIP J. Adv. Signal Process. (2008), [37] E.J. da Silva Luz, G.J.P. Moreira, L.S. Oliveira, W.R. Schwartz, D. Menotti, Learning
https://doi.org/10.1155/2008/148658. deep off-the-person heart biometrics representations, IEEE Trans. Inf. Forensics
[14] H. Gürkan, U. Guz, S. Yarman, A novel biometric authentication approach Secur. 13 (5) (2018) 1258–1270, https://doi.org/10.1109/TIFS.2017.2784362.
using electrocardiogram signals, in: Conference Proceedings: Annual Interna- [38] R. Donida Labati, E. Muñoz, V. Piuri, R. Sassi, F. Scotti, Deep-ECG: convolu-
tional Conference of the IEEE Engineering in Medicine and Biology Society, tional neural networks for ECG biometric recognition, Pattern Recognit. Lett.
IEEE Engineering in Medicine and Biology Society, Conference 2013, 2013, 126 (2019) 78–85, Robustness, Security and Regulation Aspects in Current Bio-
pp. 4259–4262. metric Systems.
[15] L. Biel, O. Pettersson, L. Philipson, P. Wide, ECG analysis: a new approach in [39] N. Bento, D. Belo, H. Gamboa, ECG biometrics using spectrograms and deep
human identification, IEEE Trans. Instrum. Meas. 50 (3) (2001) 808–812. neural networks, Int. J. Mach. Learn. Comput. 10 (2020) 259–264.
[16] M. Kyoso, A. Uchiyama, Development of an ECG identification system, in: 2001 [40] H. Plácido da Silva, A. Lourenco, A. Fred, N. Raposo, M. Aires-de Sousa, Check
Conference Proceedings of the 23rd Annual International Conference of the your biosignals here: a new dataset for off-the-person ECG biometrics, Com-
IEEE Engineering in Medicine and Biology Society, vol. 4, 2001, pp. 3721–3723, put. Methods Programs Biomed. 113 (2014) 503–514, https://doi.org/10.1016/j.
https://doi.org/10.1109/IEMBS.2001.1019645. cmpb.2013.11.017.
[17] Z. Zhang, D. Wei, A new ECG identification method using Bayes’ teorem, in: [41] G. Guven, Fingertip ECG signal based biometric recognition system, Mas-
TENCON 2006 - 2006 IEEE Region 10 Conference, 2006, pp. 1–4. ter’s Thesis, FMV ISIK University, 2016 (Supervisor: Associate Professor Hakan
[18] X. Lei, Y. Zhang, Z. Lu, Deep learning feature representation for electrocardio- Gürkan, Co-supervisor: Associate Professor Umit Guz).
gram identification, in: 2016 IEEE International Conference on Digital Signal [42] V. Panayotov, G. Chen, D. Povey, S. Khudanpur, Librispeech: an ASR corpus
Processing (DSP), 2016, pp. 11–14. based on public domain audio books, 2015, pp. 5206–5210.
[19] L. Wieclaw, Y. Khoma, P. Fałat, D. Sabodashko, V. Herasymenko, Biometrie [43] M.A. Ozdemir, O.K. Cura, A. Akan, Epileptic EEG classification by using time-
identification from raw ECG signal using deep learning techniques, in: 2017 frequency images for deep learning, Int. J. Neural Syst. 31 (08) (2021) 2150026,
9th IEEE International Conference on Intelligent Data Acquisition and Ad- https://doi.org/10.1142/S012906572150026X.
vanced Computing Systems: Technology and Applications (IDAACS), vol. 1,
[44] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
2017, pp. 129–133.
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit, and
[20] H. Gürkan, A. Hanilci, ECG based biometric identification method using QRS PhysioNet: components of a new research resource for complex physiologic sig-
images and convolutional neural network, Pamukkale Üniv. Müh. Bilim. Derg. nals, Circulation 101 (23) (2000) e215–e220, https://doi.org/10.1161/01.CIR.101.
26 (2) (2020) 318–327. 23.e215.
[21] R. Srivastva, A. Singh, Y. Singh PlexNet, A fast and robust ECG biometric system [45] K.A. Lee, A. Larcher, H. Aronoitz, G. Wang, P. Kenny, The RedDots challenge:
for human recognition, Inf. Sci. 558 (2021), https://doi.org/10.1016/j.ins.2021. towards characterizing speakers from short utterances, in: Interspeech 2016,
01.001. San Francisco, 2016, https://sites.google.com/site/thereddotsproject/reddots-
[22] Y. Zhang, Z. Zhao, D. Yanjun, X. Zhang, Y. Zhang, Human identification driven challenge.
by deep CNN and transfer learning based on multiview feature representations
of ECG, Biomed. Signal Process. Control 68 (2021) 102689, https://doi.org/10.
1016/j.bspc.2021.102689.
[23] S. Marinov, I. För, S. Högskolan, I. Skövde, Text dependent and text independent
speaker verification systems. Technology and applications. Gokhan Guven graduated from Isik University, En-
[24] H. Yu, Z. Ma, M. Li, J. Guo, Histogram transform model using MFCC features gineering Faculty, Department of Electrical-Electronics
for text-independent speaker identification, in: Conference Record - Asilomar Engineering, Istanbul, Turkey, in 2014. He received the
Conference on Signals, Systems and Computers 2015, 2015, pp. 500–504. M.S. degree in Electronics Engineering from Isik Uni-
[25] N. Almaadeed, A. Aggoun, A. Amira, Speaker identification using multimodal versity, Graduate School of Science, Istanbul, Turkey,
neural networks and wavelet analysis, Biometrics 4 (2015) 18–28, https://doi. in 2016. He has been pursuing the Ph.D. degree in
org/10.1049/iet-bmt.2014.0011.
Electronics Engineering from Isik University, School of
[26] N. Aboelenien, K. Amin, M. Ibrahim, M.M. Hadhoud, Improved text-
Graduate Studies, since 2016. He was a research and
independent speaker identification system for real time applications, in: Japan-
teaching assistant at Isik University, Engineering Fac-
Egypt International Conference on Electronics, Communications and Computers
(JEC-ECC), 2016, pp. 58–62. ulty, Department of Electrical-Electronics Engineering between 2014 and
[27] A. Imran, Z. Kastrati, T. Svendsen, A. Kurti, Text-independent speaker id em- 2017. He has been working as a senior researcher at the Department of In-
ploying 2D-CNN for automatic video lecture categorization in a MOOC setting, formation Technologies, The Scientific and Technological Research Council
in: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence of Turkey (TUBITAK) since 2017. His research areas include speech pro-
(ICTAI), 2019, pp. 273–277. cessing and bio-signal processing.
12
G. Guven, U. Guz and H. Gürkan Digital Signal Processing 121 (2022) 103306
Umit Guz received the B.S. degree in Electron- ing, automatic speech recognition, natural language processing, machine
ics Engineering from the Istanbul University, College learning, and bio-signal processing.
of Engineering, Turkey, in 1994, the M.S. and Ph.D.
degrees in Electronics Engineering from the Institute Hakan Gürkan received the B.S., M.S., and Ph.D.
of Science, Istanbul University, Turkey, in 1997 and degrees in Electronics and Communication Engineer-
2002, respectively. He was awarded a post-doctoral ing from Istanbul Technical University, Turkey, in
research fellowship by the Scientific and Technolog- 1994, 1998, and 2005, respectively. He was a Re-
ical Research Council of Turkey (TUBITAK) in 2006. search Assistant in the Department of Electronics
He was accepted as an international research fellow Engineering, Engineering Faculty, Isik University, Is-
by the SRI (Stanford Research Institute)-International, Speech Technology tanbul, Turkey, from 1998 to 2005. Then, he worked
and Research (STAR) Laboratory, Menlo Park, CA, USA, in 2006. He was as an Assistant Professor and an Associate Professor
awarded a J. William Fulbright post-doctoral research fellowship, USA, in in the Department of Electrical-Electronics Engineer-
2007. He was accepted as an international research fellow by the Inter- ing, Engineering Faculty at Isik University, Istanbul, from 2009 to 2014 and
national Computer Science Institute (ICSI), Speech Group at the University from 2014 to 2017, respectively. In 2018, he joined Electrical-Electronics
of California at Berkeley, Berkeley, CA, USA, in 2007 and 2008. He worked Engineering Department at Bursa Technical University, where he is cur-
as an Assistant Professor and an Associate Professor in the Department rently working as a Professor. Also, he is head of the department of
of Electrical-Electronics Engineering, Engineering Faculty at Isik University, Electrical-Electronics Engineering at Bursa Technical University. His main
Istanbul, from 2008 to 2013 and from 2013 to 2019, respectively. He has research areas include biometric identification, machine learning, pattern
been a full-time professor in the Department of Electrical-Electronics Engi- recognition, biomedical and speech signals modeling, representation, and
neering, Faculty of Engineering and Natural Sciences at Isik University, Sile, compression.
Istanbul, Turkey, since 2019. His research interest covers speech process-
13