Application of deep learning in image recognition
Application of deep learning in image recognition
Lingyun Li
Rose-Hulman Institute of Technology, Computer Science
5500 Wabash Ave, Terre Haute, IN 47803
[email protected]
[email protected]
Abstract. Deep learning is a new field in machine learning, which is based on the neural
network model to extract features by simulating human brain cognition. Image recognition is
an essential subject in the field of deep learning and has made significant progress in recent
years. In the training process, only the computer needs to cooperate, and no personnel need to
participate in the process to get a good image recognition effect. This paper makes an in-depth
analysis on the application of deep learning in image recognition, discusses the necessary steps
of image trial and the basic principle of the neural network, and puts forward some problems
and popular deep learning models used in image recognition.
1. Introduction
With the development of image technology, every moment of our life contains a lot of image
information, but the speed of the human visual system to process the image is very limited, far short
than the speed of the image produced. Therefore, people started to use the computer to achieve more
accurate identification of images, and obtain image data that cannot be seen by human eyes. In image
classification, recognition, and other problems, a deep learning method can simulate how the human
brain responds to images, and analyze the image data more thoroughly. It has an excellent
performance in large-scale image processing and has been widely used in many fields. Compared with
other neural network structures, the convolutional neural network requires relatively few parameters,
which enables it to be widely used. This article will introduce deep learning techniques in the field of
the image. In the second part, the principle and development of deep learning will be introduced. In
the third part, the three steps of image recognition are introduced: preprocessing, feature extraction,
and classification and recognition. In the fourth part, a standard deep learning model for image
recognition will be introduced. Finally, two neural networks that have achieved good rankings in the
Large Scale Visual Recognition Challenge will be introduced.
2.1. Principle
Deep learning is a new field extended from machine learning, aiming at building a neural network that
can analyze and learn data like the human brain, and a series of new algorithms generated with the
improvement of hardware computational power.[1] Deep learning is a form of data-based learning.
The specific learning process is generally understood as that the computer runs data iteratively and
updates the parameters between the layers of the deep learning network to make the training result
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
closer to the real value. The ultimate goal of deep learning is to learn the internal rules or trends of
sample data and extract the acquired information such as characters and images so that machines can
have the ability of analysis and learning like the human brain and can recognize characters and images
and other information. In recent years, many countries have made breakthroughs in the research field
of deep learning, especially the application of deep learning in the field of image recognition, which
can effectively solve many difficulties in recognizing intricate patterns. It can be said that deep
learning has effectively promoted the rapid development and progress of image recognition
technology.
2.2. Development
The development process of machine learning is divided into two periods: the first is the shallow
learning period, and the second is the deep learning period. At present, most of the classification,
regression, and other learning algorithms belong to the shallow structure. The shallow structure
usually contains only two or less nonlinear feature layers, and the typical shallow structure includes
logistic regression (LR) and support vector machine (SVM). SVM is one of the most successful
classification models.
2
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
recognition. The image recognition framework is usually divided into three stages: preprocessing,
feature extraction, classification and recognition of image results.
3.1. Preprocessing
In image recognition, preprocessing is the first step. Before adding the algorithm, some processing
steps are done on input images to get inputs that are clearer or more consistent with the algorithm.
In the process of preprocessing, the relevant data of the image should be read first. The image data
is then stored in the form of 0 and 1. The storage form of a color image in a computer is usually a two-
dimensional matrix, and the two dimensions are the width and height of the image. The three-
dimensional matrix of color images can be divided into three two-dimensional matrices, namely R, G
and B. The elements in the matrix respectively represent the brightness of R, G and B at the
corresponding position of the image, with the range from 0 to 255. One of the preprocessing methods
for color images is normalization. To some extent, this method can be understood as changing the
pixel value from 0 to 255 to 0-1, reducing its distribution distance and thus reducing data storage and
computation.
3
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
parameters of the classifier. With the optimal parameters, the machine can classify the input data into a
particular category accurately.
4
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
4.3. Pooling
The pooling (subsampling) layer is after the convolution layer, aiming to simplify the output from the
previous feature maps. For example, each neuron in the pooling layer might be the summation of
neurons in a 2 x 2 region in the convolution layer. One commonly used pooling method is to obtain the
square root of the sum of squares of all neurons in the 2 x 2 region of the convolutional layer as the
output. Although different methods have different mathematical algorithms, they all aim to simplify
the output information from the convolution layer.
5
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
5.1. AlexNet
The most significant breakthrough of deep learning in the ILSVRC challenge occurred in 2012, when
Hinton's research term used AlexNet, a new variant of convolution network, to win the champion of
the ImageNet image classification tasks. Hinton's team used AlexNet to reduce the error rate to 15.315
percent, far below the second-place error rate. [7] Below is the architecture of AlexNet.
5.2. VGG
One improvement of VGG16 over AlexNet is to replace the larger convolution kernel in AlexNet
(11x11, 7x7, 5x5) with successive 3x3 convolution kernel. VGG uses convolutional layers with the
smaller kernel (3x3) instead of one with a larger kernel. On the one hand, it can reduce the number of
parameters; on the other hand, it provides more nonlinear mapping, which helps reduce the overall
computation workload.
A small convolution kernel is a vital characteristic of VGG. Although VGG imitates the network
structure of AlexNet, it does not adopt the original convolution kernel size in AlexNet. Instead, it
reduces the size of the convolution kernel and increases the number of convolution layers to achieve
the same performance.
6. Conclusion
Deep learning mainly refers to learning the internal rules and representation level of samples, and its
ultimate goal is to enable machines to have the ability of analysis and learning like humans, and to
recognize data such as image and language accurately. Meanwhile, deep learning is also a relatively
complex machine learning algorithm. Therefore, this paper discusses the application of deep learning
in image recognition, mainly analyzes the convolutional network which is often used in image
6
CISAI 2020 IOP Publishing
Journal of Physics: Conference Series 1693 (2020) 012128 doi:10.1088/1742-6596/1693/1/012128
recognition, with the ultimate purpose of promoting the rapid development of the field of image
recognition.
Acknowledgement
This research work was based on the support of 2020 Summer ATEE-Plan Online Research Program
and the guidance of Prof. Yiyun Shi.
References
[1] Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Deep learning, Review Insight.
[2] Vikramaditya Jakkula, Tutorial on Support Vector Machine (SVM), School of EECS,
Washington State University, Pullman 99164.
[3] William S Noble, What is a support vector machine? 2006 Nature Publishing Group
[4] Kim Esbensen, Paul Geladi, Principal Component Analysis, Research Group for Chemometrics,
Institute of Chemistry, Umei University, S 901 87 Urned (Sweden)
[5] Muhammad Imran Razzak, Saeeda Naz and Ahmad Zaib, Deep Learning for Medical Image
Processing: Overview, Challenges and Future
[6] Steve Lawrence, C. Lee Giles, Ah Chung Tsoi, Andrew D.Back, Face Recognition: A
Convolutional Neural Network Approach,
[7] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep
Convolutional Neural Networks