0% found this document useful (0 votes)
21 views15 pages

Detail-Preserving Face Sketch Synthesis

This document summarizes a research paper on using generative adversarial learning for detail-preserving face sketch synthesis. The paper proposes a new framework that uses a modified high-resolution network as the generator to transform face photos to sketches. In addition to an adversarial loss, the framework includes a detail loss to force synthesized sketches to have details similar to the corresponding photos, and a style loss to give the sketches a style resembling hand-drawn sketches. Experimental results showed the proposed approach achieved superior performance over state-of-the-art methods on objective metrics and visual perception for detail preservation in face sketch synthesis.

Uploaded by

ammuarati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

Detail-Preserving Face Sketch Synthesis

This document summarizes a research paper on using generative adversarial learning for detail-preserving face sketch synthesis. The paper proposes a new framework that uses a modified high-resolution network as the generator to transform face photos to sketches. In addition to an adversarial loss, the framework includes a detail loss to force synthesized sketches to have details similar to the corresponding photos, and a style loss to give the sketches a style resembling hand-drawn sketches. Experimental results showed the proposed approach achieved superior performance over state-of-the-art methods on objective metrics and visual perception for detail preservation in face sketch synthesis.

Uploaded by

ammuarati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Neurocomputing 438 (2021) 107–121

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Generative adversarial learning for detail-preserving face sketch


synthesis
Weiguo Wan a, Yong Yang b, Hyo Jong Lee c,⇑
a
School of Software and Internet of Things Engineering, Jiangxi University of Finance and Economics, Nanchang 330032, China
b
School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330032, China
c
Division of Computer Sciences and Engineering, CAIIT, Jeonbuk National University, Jeonju 54896, South Korea

a r t i c l e i n f o a b s t r a c t

Article history: Face sketch synthesis aims to generate a face sketch image from a corresponding photo image and has
Received 19 August 2020 wide applications in law enforcement and digital entertainment. Despite the remarkable achievements
Revised 28 November 2020 that have been made in face sketch synthesis, most existing works pay main attention to the facial con-
Accepted 8 January 2021
tent transfer, at the expense of facial detail information. In this paper, we present a new generative adver-
Available online 18 January 2021
Communicated by Zidong Wang
sarial learning framework to focus on detail preservation for realistic face sketch synthesis. Specifically,
the high-resolution network is modified as generator to transform a face image from photograph to
sketch domain. Except for the common adversarial loss, we design a detail loss to force the synthesized
Keywords:
Face sketch synthesis
face sketch images have proximate details to its corresponding photo images. In addition, the style loss is
Detail-preserving adopted to restrain the synthesized face sketch images have vivid sketch style as the hand-drawn sketch
Generative adversarial learning images. Experimental results demonstrate that the proposed approach achieves superior performance,
High-resolution network compared to state-of-the-art approaches, both on visual perception and objective evaluation.
Face sketch recognition Specifically, this study indicated the higher FSIM values (0.7345 and 0.7080) and Scoot values (0.5317
and 0.5091) than most comparison methods on the CUFS and CUFSF datasets, respectively.
Ó 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction difficult to directly match face sketch images to photo images com-
pared with homogeneous face recognition [4,5]. Various
In recent years, face sketch synthesis has attracted significant approaches have been put forward to solve the modality gap prob-
attention in the computer vision and pattern recognition area, lem in face sketch recognition, such as modality-invariant feature
due to its important applications in law enforcement agencies extraction [6], common subspace projection [7], and photo to
and digital entertainment [1]. In many criminal cases, only limited sketch synthesis [8–12]. Among them, face sketch synthesis is
information about the suspects is available, because of the low the most commonly used method, which transforms the face
quality or lack of surveillance videos. In these situations, the modality from photo to sketch by utilizing photo–sketch image
sketches drawn by artists according to the reminiscence of eyewit- pairs as training data, thus the face modality discrepancy is
nesses are usually taken as the substitute for identifying the sus- reduced. After generating face sketches from digital photos, the
pects. For example, Fig. 1 shows real instances of forensic face face sketch recognition can be performed with conventional homo-
sketches released by the FBI. Once the police obtain the sketches, geneous face recognition methods.
they can narrow down the list of possible suspects by retrieving Existing face sketch synthesis approaches can be roughly classi-
the face datasets in law enforcement departments or surveillance fied into two categories: traditional exemplar-based approaches,
camera footage with the sketches [2]. Moreover, the sketch-style and deep learning-based approaches. Exemplar-based approaches
face images are also applied in animation production and used as search several nearest photo patches from the training photos for
avatars on social media[3]. each patch in the test photo. Then, the reconstruction weights
For face sketch to photo recognition, due to the large modality can be calculated by approximating the nearest photo patches to
discrepancy between face sketches and digital photos, it is more the test photo patch, which are then utilized to construct the target
sketch patch [13]. The exemplar-based methods are easy to under-
stand and implement. But the searching of nearest photo patches is
⇑ Corresponding author. time-consuming, and the synthesized sketch images suffer from
E-mail address: [email protected] (H.J. Lee). block artifacts. Recently, deep learning-based methods have

https://doi.org/10.1016/j.neucom.2021.01.050
0925-2312/Ó 2021 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 1. Examples of face sketch for real law enforcement application.

become new trends in the face sketch synthesis field. These meth- The remaining parts of this paper are organized as follows: Sec-
ods aim at training a deep model that is able to generate face tion 2 introduces the related works, while Section 3 introduces our
sketches from face photos, without further training photo–sketch face sketch synthesis framework in detail. Section 4 describes the
pairs during testing. The transformation process of deep experiments on multiple face sketch datasets and evaluates the
learning-based approaches is fast; however, the synthesized sketch performance of the proposed face sketch synthesis approach. Sec-
images usually lack facial details, and contain serious noise effects tion 5 concludes the paper and recommends future works.
[14].
In order to address the aforementioned challenges, we present a
face photo-to-sketch synthesis approach based on generative 2. Related works
adversarial learning framework. First, we modify the high-
resolution network [15], which has the ability to maintain high- In this section, we survey the representative works related to
resolution representations through the whole process, to generate face sketch synthesis, and introduce the Generative Adversarial
realistic face sketch images. Then, a detail loss function, which is Network (GAN) briefly.
calculated between the Laplacian of Gaussian (LoG) filter of the
input photo image and the predicted sketch image, is proposed, 2.1. Face sketch synthesis
to restrain the generator to produce face sketch images with more
detailed information. In addition, the pseudo sketch feature is uti- Traditional exemplar-based face photo-to-sketch synthesis
lized to calculate style loss to make the synthesized face sketch approaches generate a face sketch image by linear combination
image having a similar style to the training sketch images. Exper- of training sketch patches. The exemplar-based methods mainly
imental results indicate that the proposed face sketch synthesis consist of neighbor patch selection in the training photo–sketch
method achieves superior performance, both on visual perceptual image pairs, and linear combinations with reconstruction weights
and objective evaluation. in two steps. In the first step, for each test photo patch, several
closest training photo patches are chosen from contiguous posi-
tions. In the second step, a weight vector is computed between
1.1. Contributions the patch of test photo and the chosen patches of training photos.
After that, the image patch of target sketch can be achieved
The main contributions of our work in this paper are summa- through weighted averaging of the corresponding training sketch
rized as follows: patches to the chosen training photo patches. The exemplar-
based face sketch synthesis approach was firstly put forward by
1) We construct a new generative adversarial learning frame- Tang et al. [16], in which all of the training photo–sketch pairs
work for face sketch synthesis. Our method is capable of gener- were utilized to generate a target sketch. Liu et al. [17] suggested
ating unabridged facial content and preserving face details of to perform face photo to sketch generation on the image patch
the input photo image. level, where the reconstruction coefficients are computed with
2) We modify the high-resolution network to merge the high- locally linear embedding. Gao et al. [18] proposed a face sketch
level feature maps gradually, and utilize it as generator to syn- synthesis method based on an embedded hidden Markov model.
thesize realistic face sketch images. The gradual merging strat- To consider the dependency relationship between neighboring
egy has been experimentally proved can preserve more detail sketches, Wang et al. [19] employed the Markov random fields
information from source photos. (MRF) algorithm to represent the neighbor constraints among
3) A LoG-based detail loss is designed to cope with the problem adjacent sketch patches. It chose only the best sketch patch from
of lack of facial details in synthesized sketches. To the best of the training data for each test photo patch. Later, Zhou et al. [20]
our knowledge, it is the first time to employ this idea to con- utilized a Markov weight fields (MWF) model to select multiple
straint the detail information in image transformation tasks. nearest patches to generate the sketch patch. To speed up the
4) The effectiveness of the proposed method is qualitatively and sketch generation process, Song et al. [21] transformed face sketch
quantitatively validated on multiple public datasets. Compared generation task to a spatial sketch denoising (SSD) issue, and
with the state-of-the-art approaches, our results have abundant obtained real-time performance on GPU. Peng et al. [22] presented
facial details and vivid sketch style. a face photo-sketch synthesis approach based on superpixel seg-
108
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

mentation. Zhang et al. [23] developed a robust face sketch gener- advantage of face labels and compositional loss to add facial details
ation approach that can produce arbitrarily stylistic sketches with for sketch portrait generation. Zhang et al. [40] utilized multido-
only a single sketch example. Wang et al. [24] conducted the main adversarial learning to synthesize high-quality sketch
neighbor sketch selection and reconstruction weight calculation images. In [41], Zhu et al. proposed a GAN-based deep collaborative
with a Bayesian framework. Wang et al. [13] also proposed a ran- face photo-sketch synthesis framework with two opposite net-
dom sampling and locality constraint (RSLCR) based synthesis works which are able to utilize the mutual interaction of two oppo-
method, in which the training photo and sketch patches are ran- site mappings. The previous methods require a mass of paired
domly sampled, and the locality constraint was utilized to calcu- photo–sketch images as training data, which are difficult to obtain.
late the reconstruction weight coefficients. Recently, Zhang et al. To handle this problem, Zhu et al. [42] developed the Cycle-
[25] designed a dual-transfer framework to preserve the identity- consistent adversarial networks (CycleGAN) to deal with the
specific information with inter- and intra-domain transfer pro- unpaired image-to-image translation tasks. Based on CycleGAN,
cesses. While the exemplar-based methods can synthesize facial Wang et al. [43] applied the multiple adversarial networks on
components well, most of these face sketch synthesis methods the feature maps with different resolution. Chen et al. [44] pre-
are usually time-consuming, and suffer from blocking effects sented a semi-supervised learning (SSL) framework to resolve the
around image boundaries. limited training photo–sketch pairs for face sketch synthesis,
Deep learning-based methods aim at learning a deep network which obtained impressive performance in the wild. Zhu et al.
model that has the ability to rapidly generate a face sketch image [45] employed knowledge transfer framework for training a face
from the face photo image in the testing phase. Zhang et al. [26] photo-sketch synthesis model with a small set of photo-sketch
firstly put forward a deep learning-based face sketch generation pairs.
approach with fully convolutional networks (FCN). Reference [11] It can be seen that GAN has become an important deep learning
modified the convolutional neural network (CNN) with a multi- framework for face sketch synthesis research and attracted more
layer perceptron convolutional layer to generate sketch images. and more attentions. For GAN, the design of the generator is most
Sheng et al. [27] proposed another CNN-based method that utilized critical. In this study, a modified high-resolution network is pro-
the enhanced cross-layer cost aggregation and 3D PatchMatch to posed as the generator to produce realistic face sketch images.
extract the feature maps to generate face sketch images. Instead
of generating the sketch image directly, Jiang et al. [12] put for-
3. The proposed method
ward the learning of the residual map between the face photo
and its corresponding sketch image. For the purpose of improving
This section presents a generative adversarial learning frame-
the quality of the synthesized sketches, some researches proposed
work for face photo-to-sketch synthesis. The overall architecture
to train deep networks with the assistance of facial components
of the proposed method is first introduced, and then the designed
[28–30]. With the great success of GAN in image-to-image trans-
generator and discriminator structures are described in detail.
formation, the researches have explored the employment of GAN
Afterwards, the loss functions employed for training the proposed
in face sketch synthesis, and achieved impressive performance.
networks are defined.
While the deep learning-based face sketch generation approaches
have an aptitude for identity-preserving, facial structures and
details are usually missed during the synthesis phase. 3.1. Overall architecture
To cope with the weaknesses of the traditional exemplar- and
deep learning-based face sketch synthesis methods. The proposed This paper aims to design a framework that is able to synthesize
method put forward a novel face sketch synthesis framework a realistic face sketch image with abundant facial details. Fig. 2
which takes the texture details and sketch style into consideration, shows the overall architecture, which mainly consists of a GAN
and the synthesized face sketches not only can preserve the facial deep learning framework. In this work, we firstly improve the
details of the input face photos, but also have vivid style as the real high-resolution network as generator G, which is fed with a face
face sketches in training dataset. photo as input image, and a corresponding sketch image with rich
facial details can be obtained. Then, a naive CNN is used as discrim-
2.2. Generative adversarial networks inator D to distinguish the synthesized sketches and real sketches.
Assuming (x, y) represents a photo–sketch image pair sampled
GAN was first devised by Goodfellow et al. [31] to generate real- from training data, y ^=G(x) is the target face sketch image. The
istic images by learning the distribution of the training images objective of GAN for face sketch synthesis can be represented as:
with a game theoretic min–max optimization framework. It has
min max V ðD; GÞ ¼ ExPx ðxÞ ½logð1  DðGðxÞÞÞ
made great achievements in numerous computer vision applica- G D
tions, such as image super-resolution [32], image style transfer þ EyPy ð yÞ ½log Dð yÞ ð1Þ
[33], image restoration [34], and texture synthesis [35]. Two neural
networks are trained alternately in GAN: a generative model G that where Px (x) and Py (y) are the data probability distributions of train-
generates new samples that are similar to the training data, and a ing photo and sketch images. G(x) is the synthesized fac sketch
discriminative model D that distinguish samples in disguise. They image by the generator G. D(G(x)) and D(y) are the outputs of the
are trained by using adversarial loss, which compels the generated discriminator D whose inputs are the synthesized fac sketch image
images to have similar distribution to the real images in the train- G(x) and the real fac sketch image y, respectively.
ing data. In order to add more constraints in the generation pro- To train the proposed face sketch synthesis framework, beside
cess, various derivatives of GAN have been developed, such as the adversarial loss, three more loss functions are employed,
conditional GAN (CGAN) [36], and auxiliary classifier GAN including detail loss, style loss, and total variation loss. For simple
(ACGAN) [37]. They usually took extra information (e.g. labels, depiction, Fig. 2 only shows the detail loss and the style loss. In the
attention, and attributes) as auxiliary input to serve as specific con- detail loss, the LoG feature maps of the input photo and synthe-
ditions. Due to the great generation capability of GAN, numerous sized sketch images are extracted, and their difference is calculated
GAN-based face sketch synthesis methods have been proposed. as loss value. In the style loss, the K nearest photo-sketch pairs are
Wang et al. [38] presented a back-projection procedure for the first selected based on the VGG19 feature similarity of input photo
synthesized sketches with GAN (BPGAN). Gao et al. [39] took and training photos. Then, the distance between feature maps of
109
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 2. Illustration of the proposed face sketch synthesis method.

synthesized sketch and pseudo feature maps constructed from the multi-resolution group convolution, the input channels are divided
K selected neighbor sketches measures the style loss value. into several subsets of channels, and the convolutions are sepa-
rately performed over each subset at different spatial resolutions.
3.2. Detailed generator and discriminator networks In multi-resolution convolution, the input channels are divided
into several subsets with multiple resolutions, and the output
3.2.1. Generator network channels are also divided into several subsets with multiple reso-
In the proposed method, the modified high-resolution network lutions. The input and output subsets are connected in a fully con-
is utilized as the basic structure of the generator. The original high- nected fashion, and the connections include three different types:
resolution network was first proposed for human pose estimation 1) regular convolution; 2) upsampling with bilinear interpolation;
[46], and further applied for other computer vision tasks, e.g. face and 3) downsampling with 2-strided and 3  3 convolutions. The
landmark detection, semantic segmentation, object detection, and output channel maps of each subset are a summation of the out-
image classification [15]. The main characteristics of the high- puts from each subset of the input channel maps.
resolution network are that it can maintain the high-resolution In the original high-resolution network, the outputs of different
representation throughout the whole network, and repeatedly con- resolution convolutions at the final stage are rescaled and concate-
ducts multi-scale fusion across parallel convolutions from different nated at once, which leads to loss of detail information in lower
stages. Motivated by its remarkable success in high-resolution rep- resolution feature maps. Instead, we upsample and merge them
resentation, we employ the high-resolution network as generator gradually, as compared in Fig. 4(a) and (b). The gradual merging
for face sketch synthesis, and modify its head structure, in order strategy helps to preserve more facial details in synthesized sketch
to gradually aggregate the feature maps from low-resolution to image. As depicted in Algorithm 1, from the lowest resolution, we
high-resolution. first interpolate the feature map up a scale, and concatenate it with
Fig. 3 shows the basic high-resolution network, which in the the higher resolution feature map. Afterward, two convolutional
proposed approach is composed of four stages. The 1st stage is layers with kernel sizes of 1  1 and 3  3 are conducted, and
the high-resolution convolutions. From the 2nd stage, a the channels of the output feature map are the same as those of
multi-resolution block is composed of a multi-resolution group the higher feature map. At the end, the high-resolution output
convolution, and a multi-resolution convolution is placed. In can be obtained. Moreover, we replace the batch normalization

Fig. 3. The basic structure of the high-resolution network. The horizontal direction corresponds to the stage of the high-resolution network, while the vertical direction
represents the resolution of the feature maps.

110
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 4. The fusion strategies of high-level feature maps in (a) original high-resolution network, and (b) our method. Here, Ó indicates the concatenation operation.

in the original high-resolution network with the instance normal- 3.3. Loss functions
ization, which has been proven to achieve better performance in
image-to-image transform tasks [47]. This part introduces the loss functions employed for training
Considering the small numbers of training photo-sketch pairs, the proposed GAN model. Assume that x and y are the training
we narrow down the size of high-resolution network model. The ^
photo–sketch pair, z is the selected neighbor sketch image, and y
detailed instantiation of our generator network is described here. is the synthesized sketch.
The 1st stage is composed of three residual units, and the feature
map in each unit has the shape of 224  224  32. The size of 1) Adversarial loss for generator is a basic loss in GAN, which
the feature map compresses by half, and the channels of the fea- aims at guiding the generator to produce data that look real,
ture map double every resolution reduction. There are 1, 3, and 2 and is able to deceive the discriminator. It can be represented
multi-resolution blocks in the 2nd, 3rd, and 4th stages, respec- as:
tively. Each branch in the multi-resolution group convolution con- h i
tains three residual units, while each unit contains two Ladv  G ¼ ExPphoto ðxÞ ðDðGðxÞÞ  1Þ2 ð2Þ
convolution layers with filter size of 3  3 in each resolution.
where Ladv G is the adversarial loss for generator, Pphoto (x) is data
probability distributions of training photo images.
Algorithm 1. Modified Feature Map Fusion Strategy.
2) Detail loss is designed to measure the facial detail discrep-
Input: Feature maps of different resolution convolutions at ancy. Existing face sketch synthesis methods rarely pay atten-
the final stage: F 1 ; F 2 ; . . . ; F N (N is the number of tion to detail preservation, and thus inevitably result in blur
resolutions); and artifact effect. In this work, we propose a detail loss to
for t = 1 to N 1 do increase the detailed information in resultant sketch images.
1. Interpolate the feature map up a scale: In order to avoid the influence by the deformation in hand-
F t = upsample(F t ) drawn sketch image, we calculate the mean square error
2. Concatenate it with the higher resolution feature map: between the LoG maps of the predicted sketch image and the
b
F t = concatnate(F t , F tþ1 ) input photo image, instead of the ground-truth hand-drawn
3. Conduct convolutions on concatenated feature map: sketch. That is although the face photo and its synthesized
sketch have diverse modality, their facial details should be sim-
F tþ1 = convolute( b
Ft)
ilar, as shown in Fig. 5. Therefore, the detail loss can constrain
end for
the synthesized image has proximate detail information with
Output: fused high-resolution feature map F N .
source image. Eq. (3) represents the definition of detail loss:
^Þ ¼ kLoGðxÞ  LoGðy
Ldetail ðx; y ^Þk22 ð3Þ
where Ldetail represents the detail loss, LoG means the Laplacian of
3.2.2. Discriminator network
Gaussian filter process.
The basic structure of the discriminator network includes five
3) Style loss is utilized to enforce the generated face sketch
convolutional layers with strides 2, kernel size 3, and padding 1.
images to have similar style to the real hand-drawn face sketch
The numbers of filters are 32, 32, 64, 64, and 128, respectively.
images. It also helps for face sketch recognition due to the
After each convolutional layer, the batch normalization and ReLU
modality-discrepancy being reduced after transforming face
activation are stacked. Followed the basic discriminator network,
photo into sketch domain. The high-level feature maps
a convolutional layer with output size of 7  7  1 and a Sigmoid
extracted by the pre-trained VGG deep model are commonly
activation layer are conducted to predict probability scores
used to represent image style [33]. In this paper, we adopt the
between 0 and 1, which are utilized to distinguish whether the
pseudo-sketch feature [44] to measure the style loss. The flow-
observed sketch image is real or fake. The detailed structure and
chart of style loss is displayed in Fig. 6, and the formula of style
parameter setting of the discriminator network can be seen in
loss is as follow:
Table 1.
111
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Table 1
Architecture of the discriminator network, where ‘conv’ means the convolution layers.

Layer Conv Conv Conv Conv Conv Conv


Kernel Size 33 33 33 33 33 33
Stride 2 2 2 2 2 1
Output size 112  112  32 56  56  32 28  28  64 14  14  64 7  7128 7  71

Fig. 5. Comparison of the facial details between face photo and sketch. (a) Photo image and its facial details extracted with LoG filter; (b) sketch image and its facial details
extracted with LoG filter.

Fig. 6. Illustration of the style loss. The face feature maps are extracted with the VGG19 network. For each feature map batch of input photo, the nearest batch is selected from
feature map batches of neighbor photos at the same position. Then, the corresponding feature map batches of sketches are used to construct the pseudo sketch feature map.
The distance between the pseudo sketch feature map and the feature map of the synthesized sketch forms the style loss.

J 
 
X
5 X    2
^Þ ¼  ^Þ  W0j Ui ðxÞ  Wj Ui ðy^Þ is a k  k patch centered at a point j of feature map
Wj U ðy
i
Lstyle ðx; y  ð4Þ  
2
i¼1 j¼1 Ui ðy^Þ, J is the number of total patches; and W0j Ui ðxÞ is the feature
map patch of the neighbor sketch whose corresponding photo has
where Lstyle represents the style loss, Ui ðÞ is the extracted feature the best match feature map in this batch with input photo x.
map at the first channel of ith group of convolution layers in the 4) Total variation loss is commonly used in deep learning-based
VGG19 network, which has 5 groups of convolution layers; style transfer tasks, which is able to eliminate noise and artifact

112
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

in the target images [48–50]. To further improve the quality of select 268 photo–sketch pairs in the CUFS dataset and 250 photo–
the generated face sketch image, we conduct the total variation sketch pairs in the CUFSF dataset for training, as the commonly-
loss on the output sketch of the generator: used data splitting rule in most face sketch synthesis literature
X  [13] [38]. The remaining photo-sketch pairs are used for testing.
^Þ ¼
Lt ðy ðy ^m;n Þ2 þ ðy
^mþ1;n  y ^m;n Þ2
^m;nþ1  y ð5Þ The photo and sketch images of CUFS and CUFSF datasets are with
m;n
size of 200  250, and when feeding into network for training, they
where y^m;n is the pixel value at (m, n) of the synthesized face sketch are resized to 224  224. The deep network models in this work
image. are implemented with Pytorch and trained on a Nvidia Titan X
The total loss function for training the generator is formed by GPU. The training epochs number is 40, and the Adam optimizer
weighted combining of the aforesaid loss functions: [52] with learning rate 0.002 is employed for network training.
To evaluate the performance of different face sketch synthesis
LG ¼ k1 Ladv  G þ k2 Lstyle þ k3 Ldetail þ k4 Lt ð6Þ
methods, apart from the visual comparison, two image quality
where the parameters of k are the weighting coefficients, which are evaluation indices were employed for objectively assessing the
set as [1] to scale the loss values at the same order of magnitude. synthesized face sketches. Feature similarity image measurement
5) Adversarial loss for discriminator is used to train the discrim- (FSIM) [53] is a frequently used index for evaluating image quality,
inator network in GAN. The well-trained discriminator is able to which has high consistency with human visual perception and can
distinguish the true sketch image from a generated pseudo capture similarity between low-level features of the synthesized
sketch image. The loss function for the discriminator is calcu- image and the ground-truth image. Assume that M, N be the syn-
lated on the generated sketch G(x) and real sketch z, which thesized face sketch image and its ground-truth hand-drawn
we randomly select from the neighbor sketch images: sketch image, respectively. The FSIM value of M and N can be rep-
h i 1 h i resented as:
1 P
LD ¼ EzPsketch ðzÞ ðDðzÞ  1Þ2 þ ExPphoto ðxÞ DðGðxÞÞ2 ð7Þ
2 2 SPC  SG maxðPC ðM Þ; PC ðNÞÞ
FSIMðM; NÞ ¼ P ð8Þ
where LD is the adversarial loss for discriminator, P sketch (z) is data maxðPC ðMÞ; PC ðN ÞÞ
probability distributions of training sketch images. where PC is image phase congruency, SPC and SG are defined as
follows:
4. Experimental results
2PC ðM Þ  PC ðNÞ þ T 1
SPC ¼ ð9Þ
4.1. Datasets and implemental details PC 2 ðM Þ þ PC 2 ðNÞ þ T 1

We evaluate the proposed method on two public face sketch 2GMðM Þ  GMðNÞ þ T 1
datasets: CUFS [19] and CUFSF [51], which comprise 606 and SG ¼ ð10Þ
GM2 ðM Þ þ GM 2 ðNÞ þ T 1
1143 photo–sketch pairs, respectively. The photo–sketch pairs in
the CUFS dataset are well-aligned; however, CUFSF is a more chal- where GM means image gradient magnitude, T 1 and T 2 are positive
lenging dataset, due to the sketches having serious deformation, constants to increase the stabilities of SPC and SG .
and are not being well aligned with the photos. Moreover, the pho- In addition, Fan et al. [54] proposed a new image quality evalu-
tos in the CUFSF dataset are captured under various light environ- ation index specifically for face sketch synthesis, named structure
ments. Fig. 7 shows the examples of photo-sketch pairs in the CUFS co-occurrence texture (Scoot), which has the ability to measure
and CUFSF datasets. the perceptual similarity between face sketches by simultaneously
The experiment setting and implementation details of the pro- considering the co-occurrence texture statistics and block-level
posed method are introduced here. For convenient comparison, we spatial structure. The Scoot value of M and N can be represented as:

Fig. 7. Examples of the CUFS dataset (first three columns) and CUFSF dataset (last two columns).

113
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

1 detail information are achieved with the proposed feature map


ScootðM; NÞ ¼      ð11Þ
~ 0 ~ N0  fusion strategy. From Table 2, it can be seen that the performance
1 þ W M W 
2 improves as more components are added in the generator network.
where M and N are the quantized M and N, respectively. W
0 0 ~ ðÞ rep-
resents the operation of calculating the average feature at different 4.2.2. Effectiveness of the proposed detail loss
orientation vectors. The second experiment fixed the modified high-resolution net-
work, and tested the performance of the loss functions in our
4.2. Ablation study method. The images in second row of Fig. 8 depict the input photo
and the synthesized results without style loss and detail loss, with-
The proposed face sketch synthesis method integrates the mod- out detail loss, and with all aforementioned losses, respectively.
ified high-resolution network together with several elaborate loss From visual comparison, the style loss polishes up sketch style in
functions. In order to verify the effectiveness of these components, the generated results; however, the detail loss preserves more
two groups of experiments were conducted on the CUFS dataset, in facial detail from source photo significantly. Table 3 indicates that
which the effectiveness of our proposed feature map fusion strat- each loss function has positive effectiveness for training the pro-
egy and detail loss were evaluated emphatically. posed generator network. Please note that the first adversarial loss
in Table 3 includes a simple mean squared error loss, because only
4.2.1. Effectiveness of the modified fusion strategy the adversarial loss cannot train the GAN model well.
The first experiment fixed all of the loss functions, and tested
the performance of the different components in the modified 4.3. Comparison with state-of-the-art face sketch synthesis methods
high-resolution network. The images in first row of Fig. 8 illustrate
the synthesized face sketches by original fusion strategy and our To demonstrate the performance of the proposed face sketch
modified one, from which we can observe that less noise and finer synthesis method, six state-of-the-art approaches are compared,

Fig. 8. Results of face sketch synthesis with different fusion strategies and different loss functions.

114
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Table 2
The ablation study of the components in the modified high-resolution network on the CUFS dataset.

High-resolution network Modified fusion strategy Instance normalization FSIM Scoot


p
0.7273 0.5123
p p
0.7316 0.5268
p p p
0.7345 0.5317

Table 3
The ablation study of the loss functions on the CUFS dataset.

Adversarial loss Style loss Detail loss Total variation loss FSIM Scoot
p
0.7081 0.4750
p p
0.7290 0.5236
p p p
0.7313 0.5258
p p p p
0.7345 0.5317

including exemplar-based methods (MWF [20], SSD [21], and noise problem exists in the synthesized sketch images. The
RSLCR [13]), and deep learning-based methods (FCN [25], BPGAN BPGAN method conducts a back-projection step on the synthe-
[38], and SSL [44]). Figs. 9 and 11 make visual comparisons sized sketches by GAN model, and can generate clear and clean
between these methods and the proposed method on the CUFS face sketch images. However, the sketch images by the BPGAN
and CUFSF datasets, respectively. From them we can observe that method have some deformation compared with the input photo
the MWF method leads to blurring effects around the image edge images, especially on the CUFSF dataset. The SSL method can
area, because of the weighted averaging of multiple neighbor obtain excellent results, and has the closest performance to our
sketches. Due to the denoising process in the SSD method, the method, except for some fine-grained facial components, like eyes
detailed information of the sketch image weakens together with and plications. In summary, the exemplar-based methods usually
the noise. For RSLCR method, while better results can be obtained, suffer from block and blur effects, while the deep learning-based
the weighted averaging on randomly selected sketch patches methods show noise and deformation problems. Comparatively,
results in blurring as well, like the MWF method. The FCN the proposed method generates the most realistic face sketch
method is able to generate intact facial structure, but a serious images with abundant facial details, unabridged facial content,

Fig. 9. Comparisons of the synthesized face sketch images by different methods on the CUFS dataset.

115
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 10. Local cut-out effect on Fig. 9; same marshalling sequence as Fig. 9.

and less noise. In order to visually compare the synthesized Table 4 shows the average FSIM and Scoot indices that we cal-
sketch images by different methods more clearly, we cut the culated as objective evaluation of the CUFS and CUFSF datasets. It
sub-images in Fig. 9, and display them in Fig. 10,11. It can be indicates that our approach achieved the highest index values on
observed that the proposed approach can surpassingly preserve the CUFS dataset, and comparable results on the challenging CUFSF
the detailed information of the face photo images, such as hair, dataset. The objective evaluation results further support the effec-
eyelids, and plication. tiveness of our approach.

Fig. 11. Comparisons of the synthesized face sketch images by different methods on the CUFSF dataset.

116
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Table 4
Objective evaluation of the synthesized sketch images by different methods on the CUFS and CUFSF datasets. The best values are depicted in bold font.

Methods MWF SSD RSLCR FCN BPGAN SSL Proposed


CUFS FSIM 0.7145 0.6959 0.6966 0.6936 0.6899 0.7256 0.7345
Scoot 0.4833 0.4544 0.4499 0.4527 0.4680 0.4878 0.5317
CUFSF FSIM 0.7029 0.6824 0.6650 0.6624 0.6814 0.7159 0.7080
Scoot 0.4882 0.4687 0.4531 0.4378 0.4936 0.5038 0.5091

sketches from various aspects, e.g. facial details, sketch style


et al., and to qualitatively mark the synthesized face sketches of
the comparison methods from 1 to 7. The statistical results of the
user study are shown in Fig. 12. From the figure, it can be seen that
the proposed method achieves the highest mean opinion score of
6.7, which indicates the superiority of the proposed method.

4.4. Face sketch synthesis on real-world photos

In previous section, the test photos for sketch synthesis exper-


iments were captured under the same controlled environment as
the training photos, such as simple background, uniform illumina-
tion, and normal expression. However, in the real-world situation,
Fig. 12. User study results on the synthesized face sketches of Fig. 9. the background, illumination, and expression may vary. To prove
the robust of our approach in the real-world case, we conducted
the face sketch synthesis experiment on photo images from the
Moreover, a user study is conducted to help with the perfor- CelebA face dataset [55]. The test photo images are aligned accord-
mance measurement of the different face sketch synthesis meth- ing to the coordinates of the center of two eyes, and cropped with
ods. The synthesized face sketches by different methods in Fig. 9 size of 100  125. The lower image resolution, compared to the
are rearranged randomly, and the method names are hided before training data, makes the synthesis task more challenging.
showing to the participants. There are 20 subjects participated in The RSLCR method [13] and the SSL method [44], as the repre-
the user study, and all of them have prior experience on image pro- sentative of exemplar- and deep learning-based face sketch syn-
cessing. The subjects are asked to evaluate the synthesized face thesis methods, respectively, are utilized to compare with our

Fig. 13. Face sketch synthesis results on real-world photos from the CelebA face dataset.

117
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 14. Comparison results with state-of-the-art GAN-based methods. First three rows are from CUFS dataset and the last row are from CUFSF dataset.

approach. Fig. 13 displays the comparison results, from which we The visual comparisons of synthesized sketches are shown in
can see that the synthesized sketches by the RSLCR method are Fig. 14. It is obvious that the CGAN method causes various degrees
blurred and smooth. The main reason is the huge differences of deformation and noise in synthesized face sketches. The results
between the training photos and test photos make the test photo by the CycleGAN method have serious artifacts and thus look
patch hard to approximate with exemplar-based methods. In con- unnatural. The CA-GAN method can capture overall face structure
trast, the deep learning-based methods like SSL and our method well, however, some details are lost, especially for the photos from
are little influenced by image discrepancy between the training CUFS dataset. By contrast, our results achieve best performance in
photos and test photos. From the visual comparisons, our results terms of style transfer and detail preserving.
have better sketch style and more facial details, e.g. skin texture
and teeth area. The experiment on real-world photos indicates 4.6. Generalizability
the robustness and generalization of our proposed method.
To indicate the generalizability of the proposed method, we
4.5. Comparison with state-of-the-art GAN-based methods have added the experiments on non-frontal faces and images with-
out human faces, the synthesized results are shown in Fig. 15. In
To further demonstrate the superiority of the proposed face this figure, the first row displays the non-frontal face images and
sketch synthesis approach, more recent GAN-based methods were their corresponding sketch images, the second row displays the
compared, include CGAN [36], CycleGAN [42], and CA-GAN [39]. images with human faces and their corresponding sketch images.
118
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

Fig. 15. Synthesized sketches on non-frontal faces and images without human faces.

Fig. 16. Face sketch recognition results by different methods on the (a) CUFS and (b) CUFSF datasets.

Table 5
Face sketch recognition accuracies (%) of VGGFace2 on the CUFS and CUFSF datasets.

Methods CUFS CUFSF


Rank-1 Rank-5 Rank-10 Rank-1 Rank-5 Rank-10
Photo 74.5 91.1 95.0 40.5 66.5 74.1
MWF 67.2 87.0 92.6 17.1 37.6 48.2
SSD 62.7 84.3 91.7 23.9 45.0 55.8
RSLCR 72.8 91.4 95.3 27.2 46.9 56.8
FCN 67.5 89.3 93.5 32.3 57.4 68.1
BPGAN 67.8 87.3 93.5 20.9 42.5 52.6
SSL 83.1 95.6 97.6 42.8 70.4 78.2
Proposed 83.4 96.2 97.9 46.5 70.6 78.4

From Fig. 15, it can be observed that the proposed method can represents better synthesized sketch quality, and more effective
achieve impressive results, even though in the challenging sketch generation method. To conduct the face sketch recognition
scenarios. experiments, VGGFace2 [56] was adopted to extract facial features
from sketch images, and Euclidean distance was utilized to mea-
4.7. Face sketch recognition sure the feature similarity. In this work, the synthesized sketches
by different methods or the face photos were taken as probe
Face sketch recognition between synthesized face sketches and images to match the gallery images consisting of the correspond-
original hand-drawn sketches is an alternative way to quantita- ing hand-drawn sketches. For the CUFS dataset, 338 synthesized
tively assess the performance of the face sketch synthesis sketches or face photos were used as the probe set, and the corre-
approaches [?,13]. Higher face sketch recognition accuracy sponding ground-truth sketches drawn by the artist were taken as
119
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

the gallery set. For the CUFSF dataset, 944 synthesized sketches or Declaration of Competing Interest
face photos were taken as the probe set, and the corresponding
hand-drawn sketch images were taken as the gallery set. The authors declare that they have no known competing finan-
Fig. 16 shows the face sketch recognition results on the CUFS cial interests or personal relationships that could have appeared
and the CUFSF datasets. In comparison, our method obtained the to influence the work reported in this paper.
highest recognition accuracies on both datasets, of 100% and
94.17% in the CUFS and CUFS datasets at Rank-50, respectively.
Acknowledgments
Here, Rank-n represents the recognition accuracy of the top-n best
matches. Table 5 displays the exact recognition accuracies at Rank-
This study has been supported in part by the Basic Science
1, Rank-5, and Rank-10. It is evident that the proposed method
Research Program through the National Research Foundation of
achieved best values at all the three indices. The preeminent recog-
Korea (NRF) funded by the Ministry of Education
nition results demonstrate that our synthesized face sketch images
(GR2019R1D1A3A03103736), and by the National Natural Science
with more facial details and better sketch style promote the face
Foundation of China (62072218), and by the Natural Science Foun-
sketch recognition performance. In addition, compared to the
dation of Jiangxi Province (20192ACB20002 and
direct face sketch recognition without synthesis process, transfer-
20192ACBL21008), and by the Talent project of Jiangxi Thousand
ring face images from photo domain to sketch domain by the pro-
Talents Program (jxsq2019201056), and by the Project of the Edu-
posed method, the face sketch recognition performance is indeed
cation Department of Jiangxi Province (GJJ200541), and by the
improved.
Postdoctoral Research Projects of Jiangxi Province (2020KY44).

4.8. Future scopes References

[1] M. Zhang, R. Wang, X. Gao, J. Li, D. Tao, Dual-transfer face sketch–photo


With the development of deep learning technology, the face synthesis, IEEE Trans. Image Process. 28 (2) (2019) 642–657.
sketch synthesis has achieved excellent performance in generating [2] N. Wang, M. Zhu, J. Li, B. Song, Z. Li, Data-driven vs. model-driven: fast face
sketch synthesis, Neurocomputing 257 (2017) 214–221.
vivid face sketch images. However, there are still shortcomings
[3] M. Zhang, J. Li, N. Wang, X. Gao, Compositional model-based sketch generator
which are not addressed very well in the related research topics: in facial entertainment, IEEE Trans. Cybern. 48 (3) (2018) 904–915.
1) The existing works mainly focus on the frontal face transforma- [4] W. Wan, Y. Gao, H.J. Lee, Transfer deep feature learning for face sketch
recognition, Neural Comput. Appl. 31 (12) (2019) 9175–9184.
tion, the results on non-frontal face are still unsatisfactory. More
[5] H. Cheraghi, H.J. Lee, Sp-net: A novel framework to identify composite sketch,
attentions and efforts can be paid to address this problem; 2) IEEE Access 7 (2019) 131749–131757.
The semi-supervised and unsupervised face sketch synthesis meth- [6] Y. Jin, J. Lu, Q. Ruan, Coupled discriminative feature learning for heterogeneous
ods will become trends to handle the lack of training photo-sketch face recognition, IEEE Trans. Inf. Forensics Secur. 10 (3) (2015) 640–652.
[7] J. Huo, Y. Gao, Y. Shi, W. Yang, H. Yin, Heterogeneous face recognition by
pair data; 3) Although the performance of face sketch recognition margin-based cross-modality metric learning, IEEE Trans. Cybern. 48 (6)
has been improved by transforming the face photo to sketch (2018) 1814–1826.
domain, the recognition accuracy is not high as normal face recog- [8] M. Zhang, N. Wang, Y. Li, X. Gao, Neural probabilistic graphical model for face
sketch synthesis, IEEE Trans. Neural Networks Learn. Syst. 31 (7) (2020) 2623–
nition. More factors, e.g. facial attributes and identity-aware, can 2637.
be considered to further boost the accuracy of face sketch [9] M. Zhang, N. Wang, Y. Li, R. Wang, X. Gao, Face sketch synthesis from coarse to
recognition. fine, in: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7558–
7565..
[10] W. Wan, H.J. Lee, A joint training model for face sketch synthesis, Appl. Sci. 9
(9) (2019) 1731.
[11] L. Jiao, S. Zhang, L. Li, F. Liu, W. Ma, A modified convolutional neural network
5. Conclusions for face sketch synthesis, Pattern Recogn. 76 (2018) 125–136.
[12] J. Jiang, Y. Yu, Z. Wang, X. Liu, J. Ma, Graph-regularized locality-constrained
In order to handle the loss of facial details in generated face joint dictionary and residual learning for face sketch synthesis, IEEE Trans.
Image Process. 28 (2) (2018) 628–641.
sketches, we proposed a detail-preserving face photo-to-sketch [13] N. Wang, X. Gao, J. Li, Random sampling for fast face sketch synthesis, Pattern
synthesis approach based on GAN in this paper. Firstly, we modi- Recogn. 76 (2018) 215–227.
fied the high-resolution network to gradually fuse the feature [14] M. Zhang, N. Wang, Y. Li, X. Gao, Deep latent low-rank representation for face
sketch synthesis, IEEE Trans. Neural Networks Learn. Syst. 30 (10) (2019)
maps from different resolutions. In addition, we designed a loss 3109–3123.
function named detail loss to force the generated face sketch image [15] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, J.
to preserve more detailed information from the photo image. We Wang, High-resolution representations for labeling pixels and regions, arXiv
preprint arXiv:1904.04514..
utilized the style loss and total variation loss to further improve
[16] X. Tang, X. Wang, Face photo recognition using sketch, in: Proceedings.
the performance of our approach. The experimental results on International Conference on Image Processing..
the CUFS, CUFSF, and CelebA face datasets indicate that the face [17] Q. Liu, X. Tang, H. Jin, H. Lu, S. Ma, A nonlinear approach for face sketch
synthesis and recognition, in: 2005 IEEE Computer Society Conference on
sketch images synthesized by our approach not only have better
Computer Vision and Pattern Recognition (CVPR’05)..
facial detail information, but also lead to higher objective evalua- [18] X. Gao, J. Zhong, J. Li, C. Tian, Face sketch synthesis algorithm based on e-hmm
tion indices and face recognition accuracy, compared to the exist- and selective ensemble, IEEE Trans. Circ. Syst. Video Technol. 18 (4) (2008)
ing face sketch synthesis approaches. In future, we will explore 487–496.
[19] X. Wang, X. Tang, Face photo-sketch synthesis and recognition, IEEE Trans.
face sketch synthesis with unpaired data, hence vast photo images Pattern Anal. Mach. Intell. 31 (11) (2008) 1955–1967.
from normal face datasets can be used to train face sketch synthe- [20] H. Zhou, Z. Kuang, K.-Y.K. Wong, Markov weight fields for face sketch
sis networks. synthesis, in: 2012 IEEE Conference on Computer Vision and Pattern
Recognition, IEEE, 2012, pp. 1091–1097.
[21] Y. Song, L. Bao, Q. Yang, M.-H. Yang, Real-time exemplar-based face sketch
synthesis, in: European Conference on Computer Vision, Springer, 2014, pp.
800–813.
CRediT authorship contribution statement [22] C. Peng, X. Gao, N. Wang, J. Li, Superpixel-based face sketch–photo synthesis,
IEEE Trans. Circ. Syst. Video Technol. 27 (2) (2015) 288–299.
Weiguo Wan: Methodology, Writing - original draft. Yong [23] S. Zhang, X. Gao, N. Wang, J. Li, Robust face sketch style synthesis, IEEE Trans.
Image Process. 25 (1) (2015) 220–232.
Yang: Investigation. Hyo Jong Lee: Supervision, Writing - review [24] N. Wang, X. Gao, L. Sun, J. Li, Bayesian face sketch synthesis, IEEE Trans. Image
& editing. Process. 26 (3) (2017) 1264–1274.

120
W. Wan, Y. Yang and Hyo Jong Lee Neurocomputing 438 (2021) 107–121

[25] M. Zhang, R. Wang, X. Gao, J. Li, D. Tao, Dual-transfer face sketch–photo [50] H. Zhang, K. Dana, Multi-style generative network for real-time transfer, in:
synthesis, IEEE Trans. Image Process. 28 (2) (2018) 642–657. Proceedings of the European Conference on Computer Vision (ECCV), 2019, pp.
[26] L. Zhang, L. Lin, X. Wu, S. Ding, L. Zhang, End-to-end photo-sketch generation 349–365.
via fully convolutional representation learning, in: Proceedings of the 5th ACM [51] W. Zhang, X. Wang, X. Tang, Coupled information-theoretic encoding for face
on International Conference on Multimedia Retrieval, 2015, pp. 627–634. photo-sketch recognition, in: CVPR 2011, IEEE, 2011, pp. 513–520..
[27] B. Sheng, P. Li, C. Gao, K.-L. Ma, Deep neural representation guided face sketch [52] D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv preprint
synthesis, IEEE Trans. Visualiz. Comput. Graph. 25 (12) (2018) 3216–3230. arXiv:1412.6980..
[28] D. Zhang, L. Lin, T. Chen, X. Wu, W. Tan, E. Izquierdo, Content-adaptive sketch [53] L. Zhang, L. Zhang, X. Mou, D. Zhang, Fsim: a feature similarity index for image
portrait generation by decompositional representation learning, IEEE Trans. quality assessment, IEEE Trans. Image Process. 20 (8) (2011) 2378–2386.
Image Process. 26 (1) (2016) 328–339. [54] D.-P. Fan, S. Zhang, Y.-H. Wu, Y. Liu, M.-M. Cheng, B. Ren, P.L. Rosin, R. Ji, Scoot:
[29] J. Yu, S. Shi, F. Gao, D. Tao, Q. Huang, Composition-aided face photo-sketch a perceptual metric for facial sketches, in: Proceedings of the IEEE
synthesis, arXiv preprint arXiv:1712.00899.. International Conference on Computer Vision, 2019, pp. 5612–5622.
[30] S. Zhang, R. Ji, J. Hu, Y. Gao, C.-W. Lin, Robust face sketch synthesis via [55] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in:
generative adversarial fusion of priors and parametric sigmoid, IJCAI (2018) Proceedings of the IEEE International Conference on Computer Vision, 2015,
1163–1169. pp. 3730–3738.
[31] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. [56] Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman, Vggface2: a dataset for
Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural recognising faces across pose and age, in: 2018 13th IEEE International
information processing systems, 2014, pp. 2672–2680.. Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE, 2018,
[32] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. pp. 67–74..
Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution
using a generative adversarial network, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
[33] Z. Xu, M. Wilber, C. Fang, A. Hertzmann, H. Jin, Learning from multi-domain Weiguo Wan received the B.S. degree in mathematics
artistic images for arbitrary style transfer, arXiv preprint arXiv:1805.09987.. and applied mathematics from Jiangxi Normal Univer-
[34] X. Yu, Y. Qu, M. Hong, Underwater-gan: Underwater image restoration via sity, Nanchang, China, in 2014 and Ph.D. degree in
conditional generative adversarial network, in: International Conference on computer science and engineering from Jeonbuk
Pattern Recognition, Springer, 2018, pp. 66–75.. National University, Jeonju, South Korea, in 2020. He is
[35] W. Xian, P. Sangkloy, V. Agrawal, A. Raj, J. Lu, C. Fang, F. Yu, J. Hays, Texturegan: currently a Lecturer with the School of Software and
controlling deep image synthesis with texture patches, in: Proceedings of the Internet of Things Engineering, Jiangxi University of
IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8456– Finance and Economics, Nanchang, China. His current
8465. research interests include computer vision, deep learn-
[36] P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with ing, face sketch synthesis and recognition, and remote
conditional adversarial networks, in: Proceedings of the IEEE Conference on sensing image fusion.
Computer Vision and Pattern Recognition, 2017, pp. 1125–1134.
[37] A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary
classifier gans, in: International Conference on Machine Learning, 2017, pp.
2642–2651.
[38] N. Wang, W. Zha, J. Li, X. Gao, Back projection: an effective postprocessing
method for gan-based face sketch synthesis, Pattern Recogn. Lett. 107 (2018) Yong Yang (M’13?SM’16) received the Ph.D. degree
59–65. from Xi’an Jiaotong University, Xi’an, China, in 2005.
[39] F. Gao, S. Shi, J. Yu, Q. Huang, Composition-aided sketch-realistic portrait From 2009 to 2010, he was a Post-Doctoral Research
generation, arXiv preprint arXiv:1712.00899.. Fellow with Chonbuk National University, Jeonju, South
[40] S. Zhang, R. Ji, J. Hu, X. Lu, X. Li, Face sketch synthesis by multidomain Korea. He is currently a Full Professor and the Vice Dean
adversarial learning, IEEE Trans. Neural Networks Learn. Syst. 30 (5) (2018) with the School of Information Technology, Jiangxi
1419–1428. University of Finance and Economics, Nanchang, China.
[41] M. Zhu, N. Wang, X. Gao, J. Li, Z. Li, Face photo-sketch synthesis via knowledge His current research interests include image fusion and
transfer, IJCAI (2019) 1048–1054.
super resolution, image processing and analysis, and
[42] J.-Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation
pattern recognition. He is a Senior Member of CCF. He
using cycle-consistent adversarial networks, in: Proceedings of the IEEE
received the title of Jiangxi Province Young Scientist in
International Conference on Computer Vision, 2017, pp. 2223–2232.
[43] L. Wang, V. Sindagi, V. Patel, High-quality facial photo-sketch synthesis using 2012 and was selected as the Jiangxi Province Thousand
multi-adversarial networks, in: 2018 13th IEEE International Conference on and Ten Thousand Talent in 2015.
Automatic Face & Gesture Recognition (FG 2018), IEEE, 2018, pp. 83–90..
[44] C. Chen, W. Liu, X. Tan, K.-Y. K. Wong, Semi-supervised learning for face sketch
synthesis in the wild, in: Asian Conference on Computer Vision, Springer, Hyo Jong Lee (M’91) received the B.S., M.S., and Ph.D.
2018, pp. 216–231.. degrees in computer science from The University of
[45] M. Zhu, J. Li, N. Wang, X. Gao, A deep collaborative framework for face photo– Utah, USA, where he was involved in computer graphics
sketch synthesis, IEEE Trans. Neural Networks Learn. Syst. 30 (10) (2019) and parallel processing. He is currently a Professor with
3096–3108. the Division of Computer Science and Engineering and
[46] K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representation learning the Director of the Center for Advanced Image and
for human pose estimation, in: Proceedings of the IEEE Conference on
Information Technology, Chonbuk National University,
Computer Vision and Pattern Recognition, 2019, pp. 5693–5703.
Jeonju, South Korea. His research interests include
[47] Z. Xu, X. Yang, X. Li, X. Sun, The effectiveness of instance normalization: a
strong baseline for single image dehazing, arXiv preprint arXiv:1805.03305.. image processing, medical imaging, parallel algorithms,
[48] C. Li, M. Wand, Combining markov random fields and convolutional neural deep learning, and brain science.
networks for image synthesis, in: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2016, pp. 2479–2486.
[49] P. Kaur, H. Zhang, K. Dana, Photo-realistic facial texture transfer, in: 2019 IEEE
Winter Conference on Applications of Computer Vision (WACV), IEEE, 2019,
pp. 2097–2105..

121

You might also like