Maahi Over-6-54 Merged
Maahi Over-6-54 Merged
Maahi Rajpoot
Faculty of Engineering
Indian Ins tute of Science
Bangalore – 560 012 (INDIA)
November, 2023
Declaration of Originality
I, Maahi Rajpoot, with SR No. 13-19-03-19-52-22-1-21866 hereby declare that the material
presented in the project report titled
Deep Learning based CCTV FootageSuper-resolution for Human Subject
represents original work carried out by me at Synopsys India pvt ltd as part of the project
credit requirements of the Master of Technology (Online) degree in ELECTRONICS AND
COMMUNICATION at the Indian Institute of Science, between August, 2022 to July, 2025.
• I have not committed any plagiarism of intellectual property. I have clearly indicated and
referenced the contributions of others.
• I have understood that any false claim will result in severe disciplinary action.
• I have understood that the work may be screened for any form of academic misconduct.
In our capacities as internal project guide and faculty mentor of the above-mentioned work, we
certify that the above statements are true to the best of our knowledge, and we have carried out
due diligence to ensure the originality of the report.
ACKNOWLEDGEMENTS
I would like to express my heartfelt gratitude to everyone who has played a pivotal
role in the successful completion of my MTech project thesis on Deep
Learning-based CCTV Footage Super-resolution for Human Subject Recognition
at the Indian Institute of Science, Bangalore.
I would also like to extend my gratitude to Deep Shekhar, my internal guide, for
his invaluable advice and encouragement during the course of this project. His
technical expertise and practical insights have been instrumental in addressing key
challenges and achieving the project objectives.
A special thanks goes out to all those who have directly or indirectly contributed
to this endeavor, including my family and friends, for their constant support and
encouragement. Their unwavering belief in my abilities has been a source of
strength throughout this journey.
Experiments are conducted on two datasets: the DroneSURF Dataset and the
CVBL-CCTV-Face Dataset. The DroneSURF Dataset includes images captured
from drones, presenting challenges such as varying altitudes, angles, and lighting
conditions. The CVBL-CCTV-Face Dataset comprises facial images collected
under various conditions, including different poses, expressions, and lighting
environments. These datasets provide a comprehensive benchmark for testing the
effectiveness of super-resolution techniques.
Abstract vi
Contents viii
List of Figures x
List of Tables xi
Abbreviations xii
1 Introduction 1
1.1 History of Face Recognition . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Early Development . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Linear Algebra Techniques . . . . . . . . . . . . . . . . . . . . 6
1.1.3 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Principal Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work 15
2.1 Convolutional Neural Network Models . . . . . . . . . . . . . . . . . 15
2.2 Advanced CNN Models with Novel Loss Functions . . . . . . . . . . . 16
2.3 Machine Learning and Hybrid Models . . . . . . . . . . . . . . . . . . 16
2.4 Specialized Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Our Contribution to the Field . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Dataset Creation . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Face Recognition Model Comparison . . . . . . . . . . . . . . 18
2.5.3 Practical Applications . . . . . . . . . . . . . . . . . . . . . . 19
3 Dataset 20
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
viii
ix
5 Results 27
5.1 Illustrative Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Mathematical Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 DroneSURF Dataset . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Additional Face Recognition Models . . . . . . . . . . . . . . . . . . . 28
5.3.1 Dlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2 ArcFace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.3 FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Bibliography 37
List of Figures
x
List of Tables
3.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
xi
Abbreviations
SR Super-Resolution
ESRGAN Enhanced Super-Resolution Generative Adversarial Network
VDSR Very Deep Super-Resolution
FSRCNN Feature Super-Resolution Convolutional Neural Network
SRCNN Super-Resolution Convolutional Neural Network
GAN Generative Adversarial Network
PSNR Peak Signal-to-Noise Ratio
SSIM Structural Similarity Index Measure
DIP Deep Image Prior
RCAN Residual Channel Attention Network
xii
Chapter 1
Introduction
1
2
The core objective of this project is to explore how super-resolution techniques can
be integrated with deep learning-based facial recognition models to improve human
subject identification in low-resolution CCTV footage. In particular, this study
focuses on using super-resolution techniques to enhance image quality and,
subsequently, improve the performance of the FaceNet model, a state-of-the-art
deep learning-based facial recognition system. FaceNet has gained widespread
popularity due to its ability to map facial images into a compact Euclidean space,
where the distances between points directly correspond to the similarity of faces.
This feature makes FaceNet particularly suitable for human identification tasks, as
it enables highly accurate recognition, even with subtle differences in facial
features.
IISc Bengaluru
3
The DroneSURF dataset [6] is used for evaluating the performance of these
super-resolution models. This dataset consists of images captured by drones,
offering a unique challenge due to the dynamic nature of the environment in which
the images are collected. The DroneSURF dataset includes various factors that
can impact the quality of the images, such as varying altitudes, angles, lighting
conditions, and poses of individuals. These variations make it an ideal testbed for
evaluating the robustness of super-resolution models in real-world surveillance
IISc Bengaluru
4
scenarios. The dataset provides a diverse set of facial images, collected under
different conditions, including varying lighting environments, facial expressions,
and orientations. As such, it serves as a comprehensive benchmark for assessing
how well super-resolution techniques can enhance the performance of facial
recognition systems in complex, uncontrolled environments.
The significance of this research lies in its potential to address one of the most
persistent problems in surveillance and security systems—accurate human
identification from low-resolution footage. By combining advanced super-resolution
techniques with deep learning-based facial recognition models, this project aims to
develop a solution that can significantly improve recognition accuracy, even in
challenging conditions. The success of this approach could lead to the development
of more reliable and effective surveillance systems, particularly for defense
applications, where the need for precise identification is critical.
Furthermore, the findings of this research could have broader implications beyond
surveillance. Improved facial recognition accuracy in low-resolution footage could
benefit a wide range of applications, including law enforcement, border control,
crowd management, and even personal security systems. With the rapid expansion
of CCTV systems and the increasing reliance on video-based surveillance for public
safety, the ability to enhance image quality and improve recognition performance is
more important than ever. This research, therefore, not only addresses a critical
gap in current surveillance systems but also contributes to the broader field of
computer vision and machine learning, particularly in the context of real-world
applications where image quality can vary significantly.
IISc Bengaluru
5
Face recognition, as a field of study, has a rich history that spans several decades,
marked by advancements in technology and the evolution of scientific
understanding. The journey from manual methods to the sophisticated deep
learning algorithms used today showcases the rapid progress made in automating
human subject identification.
The roots of face recognition can be traced back to the 1960s and 1970s when the
field was in its infancy. Early efforts at face recognition involved manually
extracting key facial features from photographs. Researchers would measure
distances between facial landmarks such as the eyes, nose, and mouth. These
measurements were then used for comparing different facial features and making
identifications. This manual process was time-consuming and often lacked the
precision necessary for reliable recognition, especially in large datasets.
Nevertheless, this period laid the foundation for more advanced computational
IISc Bengaluru
6
One of the significant early milestones was the creation of the first computer-based
face recognition system in 1965 by Woodrow W. Bledsoe. Bledsoe’s approach
involved encoding facial features by manually plotting points on a face and storing
them as data, which was then compared to a database of known faces. Though
rudimentary by today’s standards, this work introduced the concept of using a
computer for facial analysis and set the stage for future developments.
The 1980s and 1990s marked the shift toward more mathematical and algorithmic
approaches to face recognition. Researchers began to explore the application of
linear algebra techniques to facial recognition. A key breakthrough in this era was
the development of the Eigenfaces method by Sirovich and Kirby in 1991, which
introduced the concept of dimensionality reduction using Principal Component
Analysis (PCA). The Eigenfaces method involved projecting high-dimensional
facial data into a lower-dimensional space, effectively capturing the most
significant variations in facial features while discarding noise. This allowed for
more efficient processing of facial data and improved the system’s ability to handle
large datasets.
IISc Bengaluru
7
These advancements were crucial because they addressed some of the fundamental
challenges in face recognition systems—variability in facial appearance and
background conditions. However, the reliance on handcrafted features and linear
models still limited the flexibility and scalability of these systems, especially when
faced with highly diverse datasets.
The late 1990s and early 2000s saw the advent of machine learning algorithms that
pushed face recognition into a new era. Instead of relying solely on linear
transformations, machine learning approaches used algorithms that could learn
from data and improve over time. Support Vector Machines (SVMs), introduced
by Vapnik in 1995, became popular for face classification tasks due to their ability
to handle high-dimensional data efficiently. SVMs worked by finding the optimal
hyperplane that best separates different classes of data in a higher-dimensional
space, making them particularly well-suited for face recognition applications.
Alongside SVMs, neural networks began to gain traction in face recognition tasks.
The introduction of multi-layer perceptrons (MLPs) allowed researchers to train
deeper models capable of learning more complex relationships between facial
IISc Bengaluru
8
features. Although these models were limited by computational power and data
availability, they set the stage for the next generation of face recognition systems
based on deep learning.
The true revolution in face recognition came with the advent of deep learning
techniques, particularly Convolutional Neural Networks (CNNs), which have since
become the gold standard for many computer vision tasks, including face
recognition. CNNs, introduced by Yann LeCun and his colleagues in the late
1980s, became widely popular after the success of AlexNet in the 2012 ImageNet
competition. CNNs are able to automatically learn hierarchical feature
representations from raw image data, eliminating the need for manual feature
extraction, which was a major limitation in earlier methods.
This shift towards deep learning significantly improved the accuracy and
scalability of face recognition systems. Several landmark models in face recognition
emerged in the 2010s, including DeepFace (2014), developed by Facebook, which
utilized a deep CNN architecture to achieve human-level accuracy in face
recognition tasks. Another important model was FaceNet (2015), developed by
Google, which introduced a novel approach called triplet loss to train the network.
This method ensured that images of the same person were embedded closer
together in feature space, while images of different people were placed farther
apart. This led to a significant improvement in both the accuracy and
generalization of face recognition systems.
Other important deep learning models in this space include VGGFace, OpenFace,
and ArcFace, all of which have achieved state-of-the-art performance in face
recognition tasks. These models leverage vast amounts of labeled data and
advanced neural network architectures to achieve unprecedented accuracy in both
controlled and real-world settings.
IISc Bengaluru
9
1.2 Motivation
The motivation behind this project stems from the growing need to enhance
surveillance and security systems in increasingly complex environments. As
urbanization increases and more CCTV cameras are deployed worldwide, the
reliance on video surveillance for public safety and law enforcement is stronger
than ever. However, one of the most persistent challenges faced by these systems is
the low quality of video footage, especially in large-scale surveillance systems that
capture footage from distant or obscured angles, in low-light conditions, or at high
altitudes. In such cases, the low resolution of the video footage makes it difficult to
accurately recognize and identify individuals, which is crucial for security purposes.
IISc Bengaluru
10
Furthermore, the findings from this research are expected to have broader
implications for surveillance systems worldwide, where the need for efficient,
high-accuracy recognition from low-resolution video footage is becoming
increasingly critical. Whether used for tracking suspects in criminal investigations,
managing crowds at public events, or improving security at critical infrastructure
sites, the ability to enhance the recognition accuracy of facial recognition systems
will play a crucial role in strengthening public safety and national security in the
years to come.
This research aims to tackle one of the most pressing issues in modern surveillance
systems: improving human subject recognition accuracy in low-resolution CCTV
footage. To this end, we present several key contributions to the field, each of
which adds significant value to the current state of research and provides practical
solutions for enhancing the performance of facial recognition systems in real-world
conditions.
IISc Bengaluru
11
IISc Bengaluru
12
• Fusion Model Development: Based on the findings from the previous ex-
periments, this study proposes a fusion model that combines the strengths
of super-resolution techniques with the best-performing face recognition mod-
els. The fusion approach optimizes the recognition accuracy by leveraging the
complementary strengths of each individual component. The experimental re-
sults indicate that the fusion model significantly outperforms the standalone
models, making it a promising approach for real-world surveillance systems.
This contribution provides a new avenue for future research and development
in facial recognition and image enhancement technologies.
IISc Bengaluru
13
IISc Bengaluru
14
also includes the performance of the proposed fusion model, demonstrating its
superiority over individual models in terms of recognition accuracy.
IISc Bengaluru
Chapter 2
Related Work
The earliest notable model, DeepFace [11], utilized a 9-layer deep CNN architecture
for face verification tasks. DeepFace [11] incorporated multi-task learning and
supervised pre-training on the Labeled Faces in the Wild (LFW) [12] dataset,
achieving near-human performance. However, it faced challenges in generalizing
across different demographics, poses, illumination conditions, and ages.
Following DeepFace [11], VGGFace [13] employed a deeper CNN [10] architecture
based on the VGG-16 model, trained on the VGGFace dataset [13] containing over
2.6 million images. VGGFace [13] improved accuracy but struggled with variations
15
16
Dlib [7] Face Recognition combines machine learning algorithms, including a face
detection pipeline based on Histogram of Oriented Gradients (HOG) features and
a face recognition model using deep metric learning techniques. It can be trained
on various datasets, including LFW [12] and CIFAR-10 [17]. However, enhancing
Dlib’s capability to handle large-scale datasets with diverse facial variations and
improving its scalability for complex face recognition tasks remains a challenge.
DeepID3 [18] extends the DeepID [19] models by incorporating deeper layers and
attention mechanisms to enhance feature learning and discriminative power.
IISc Bengaluru
17
Trained on datasets like LFW [12], YouTube Faces [20], and MegaFace [21],
DeepID3 [18] achieved high accuracy. Nevertheless, it faced challenges in handling
variations in pose, illumination, and expression, as well as scalability issues in
large-scale face recognition scenarios.
OpenFace [22] utilizes deep neural networks (DNNs) [23] for face detection, facial
landmark localization, and face verification, employing a multi-task learning
approach. Despite its versatility, optimizing OpenFace [22] for real-time
performance on resource-constrained devices and enhancing its accuracy in
handling diverse facial variations and environmental conditions remain areas for
improvement.
IISc Bengaluru
18
Our research makes significant contributions to the field of face recognition and
surveillance technology in several key areas:
IISc Bengaluru
19
The insights gained from our research have practical implications for enhancing
security and surveillance operations, particularly in defense and public safety
applications. By providing a detailed evaluation of face recognition models and a
high-quality dataset, we contribute to the development of more effective
surveillance technologies that can better protect public safety and national
security.
IISc Bengaluru
Chapter 3
Dataset
3.1 Dataset
20
21
The DroneSURF [6] dataset is another critical resource utilized in this project,
specifically for testing the robustness of super-resolution techniques in facial
recognition. This dataset consists of images captured by drones, providing a unique
perspective on surveillance footage. The use of drones introduces several challenges
that traditional CCTV footage does not, such as varying altitudes, dynamic
angles, and unpredictable lighting conditions. These factors make the dataset
particularly useful for evaluating how well facial recognition systems can adapt to
real-world scenarios where environmental conditions are constantly changing.
A key strength of the DroneSURF dataset is its wide range of video content,
capturing over 200 videos across 58 unique subjects. This variety ensures that the
models trained and tested on it can generalize well to different environments and
conditions. With a large number of annotated faces (over 786,000), the dataset
provides a comprehensive benchmark for evaluating the effectiveness of image
enhancement techniques, including super-resolution, in improving the accuracy of
facial recognition in low-resolution images.
IISc Bengaluru
22
Through the use of the DroneSURF dataset, we aim to push the boundaries of
facial recognition systems, ensuring that they can perform accurately and reliably
even in less-than-ideal conditions.
Table 3.1: Characteristics
IISc Bengaluru
Chapter 4
The architecture has three modules: the first module is for face detection and
cropping using MTCNN[27] (Figure 4.1(a)). The second module is
super-resolution for enhancing the image (Figure 4.1(b)). The third module is for
face recognition using the FaceNet[9] model (Figure 4.1(c)).
23
24
The process begins with an input image containing multiple faces. Multi-task
Cascaded Convolutional Networks (MTCNN) [27] are employed to detect faces
within the input image due to their high accuracy and efficiency in face detection
tasks. MTCNN [27] works by running a series of neural networks to first identify
potential face regions and then refine these detections to pinpoint the exact
locations of the faces. The output of this stage is the detected faces, which are
cropped from the input image, resulting in several low-resolution (LR) facial
images ready for further processing.
4.1.2 Super-Resolution
The low-resolution facial images (LR) obtained from the MTCNN[27] detection
phase are then fed into a super-resolution model. This model can be one of the
advanced super- resolution techniques such as GFPGAN[1], CodeFormer[2], Un-
paired SR[3], ESRGAN[4], or Real-ESRGAN[5]. Each of these models applies deep
learning algorithms to enhance the resolution of the input images, producing
high-quality super-resolved (SR) images. The super-resolution process involves
upscaling the images and recovering high-frequency details that were lost in the
low-resolution version. The output of this stage is the super-resolved (SR) facial
images, which exhibit significantly improved clarity and detail compared to the
original low-resolution images.
The final stage involves facial recognition using the FaceNet[9] deep neural
network. Both the super-resolved (SR) images and the high-resolution (HR)
reference images are passed through the FaceNet[9] network to perform facial
recognition. FaceNet[9] processes these images to generate their corresponding
IISc Bengaluru
25
embeddings, which are fixed-size vectors that capture the essential features of the
faces. For the super-resolved images, the embeddings generated (SR Embedding)
represent the enhanced facial details, while the high-resolution images generate
their own set of embeddings (HR Embedding). These embeddings are then
compared to compute distance scores, which quantify the similarity between the
SR and HR embeddings. Lower distance scores indicate higher similarity,
facilitating accurate recognition of individuals. This process ensures that even
faces enhanced from low-resolution images can be accurately recognized,
demonstrating the effectiveness of combining super-resolution with advanced facial
recognition models.
GPUs: NVIDIA GPUs RTX 4090 with CUDA support for accelerated
computation.
Storage: High-capacity SSDs for fast data access and storage of large datasets
and model checkpoints.
IISc Bengaluru
26
CUDA and cuDNN: NVIDIA CUDA and cuDNN libraries for GPU
acceleration.
IISc Bengaluru
Chapter 5
Results
The results of our thesis demonstrate the effectiveness of the proposed face
recognition enhancement process for CCTV footage. Initially, low-resolution video
frames with multiple individuals are used. The system detects and crops individual
faces, which are then enhanced to higher resolution. These enhanced faces are
27
28
identified with unique IDs and matching scores, showcasing the improved accuracy
and reliability of face recognition. This process highlights the importance of
resolution enhancement in surveillance footage, significantly boosting recognition
performance from low-quality video inputs.
The results of the experiments are presented in terms of the recognition accuracy of
the FaceNet [9] model on both regular and super-resolved images for each dataset.
For the DroneSURF Dataset [6], GFPGAN [1] significantly outperformed other
models, achieving an accuracy of 69.16%. CodeFormer [2] also showed notable
improvement (59.56%), while ESRGAN [4] and Real-ESRGAN Real-ESRGAN [5]
showed relatively lower accuracy, indicating variability in performance across
different datasets.
Two other models were also evaluated for face recognition their performence is
given in the table below.
IISc Bengaluru
29
5.3.1 Dlib
5.3.2 ArcFace
5.3.3 FaceNet
FaceNet [9] outperformed both ArcFace [8] and Dlib [7], showing superior
performance in embedding generation with high accuracy and consistency. FaceNet
IISc Bengaluru
30
IISc Bengaluru
Chapter 6
6.1 Conclusion
31
32
IISc Bengaluru
33
implications for a variety of industries, including public safety, defense, and private
sector security.
IISc Bengaluru
34
While the results presented in this study were promising, they were based on
controlled datasets that, although challenging, do not fully replicate the
complexity of real-world surveillance environments. Future research should focus
on conducting field trials using real-world CCTV footage from various urban,
industrial, and defense environments. This will help assess the practical
applicability and performance of these techniques under more varied and
unpredictable conditions, such as changing weather, crowded scenes, or low-light
environments.
IISc Bengaluru
35
For example, noise reduction techniques, particularly those based on deep learning,
can be employed to remove artifacts introduced by low-light conditions or
poor-quality sensors. Contrast enhancement methods could help reveal subtle
facial features that are critical for recognition. These complementary techniques
could be used in a multi-step pipeline, where super-resolution is followed by
additional image enhancement steps to maximize recognition accuracy.
Facial recognition systems could also benefit from integrating multiple recognition
modalities, such as combining facial recognition with other biometrics, like gait
analysis, voice recognition, or license plate recognition. A hybrid system that uses
different recognition techniques can provide more accurate and reliable results by
cross-validating information from multiple sources.
IISc Bengaluru
36
In addition to enhancing accuracy, future work should also focus on improving the
scalability of these techniques for large-scale deployments. Surveillance systems
often involve monitoring vast areas with numerous cameras, which results in a high
volume of video data. Processing this data in real-time or even offline for
recognition requires highly scalable solutions that can handle large amounts of
data efficiently.
Developing scalable algorithms that can process video frames in parallel or using
distributed computing resources would enable the use of super-resolution and
facial recognition technologies in large-scale urban and defense applications. These
solutions must strike a balance between accuracy, efficiency, and cost-effectiveness.
6.3 Summary
IISc Bengaluru
Bibliography
[1] X. Wang, Y. Li, H. Zhang, and Y. Shan, “Towards real-world blind face restora-
tion with generative facial prior,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021.
[2] Q. Guo, X. Li, Y. Zhang, Y. Fu, and T. H. Li, “Codeformer: Towards robust
face restoration with codebook lookup transformer,” in Proceedings of the 30th
ACM International Conference on Multimedia (MM ’22), 2022.
[4] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, and
X. Tang, “Esrgan: Enhanced super-resolution generative adversarial networks,”
in Proceedings of the European Conference on Computer Vision Workshops
(ECCVW), 2018.
37
Bibliography 38
[8] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin
loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 2019.
[11] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap
to human-level performance in face verification,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[15] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hyper-
sphere embedding for face recognition,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2017.
[17] A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech-
nical Report, University of Toronto, 2009.
IISc Bengaluru
Bibliography 39
[18] Y. Sun, D. Liang, X. Wang, and X. Tang, “Deepid3: Face recognition with very
deep neural networks,” arXiv preprint arXiv:1502.00873, 2015.
[19] Y. Sun, X. Wang, and X. Tang, “Deep learning face representation by joint
identification-verification,” in Advances in neural information processing sys-
tems, 2014, pp. 1988–1996.
[25] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,”
arXiv preprint arXiv:1411.7923, 2014.
IISc Bengaluru
Bibliography 40
[27] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment
using multi-task cascaded convolutional networks,” IEEE Signal Processing Let-
ters, vol. 23, no. 10, pp. 1499–1503, 2016.
IISc Bengaluru