0% found this document useful (0 votes)
396 views29 pages

Brief Introduction of Mobilenetv1 V2 V3 Lightweight Network

MobileNetV1 uses depthwise separable convolutions to build a lightweight network that reduces computation costs by 8-9 times compared to standard convolutions, with only a small reduction in accuracy. MobileNetV2 improves on this with bottleneck units that use linear bottlenecks and inverted residuals. It introduces a bottleneck structure with expanded intermediate channels to extract features before dimension reduction, improving accuracy while maintaining efficiency. MobileNetV3 further develops this approach with automated architecture search and manual tuning to create an even more lightweight network.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
396 views29 pages

Brief Introduction of Mobilenetv1 V2 V3 Lightweight Network

MobileNetV1 uses depthwise separable convolutions to build a lightweight network that reduces computation costs by 8-9 times compared to standard convolutions, with only a small reduction in accuracy. MobileNetV2 improves on this with bottleneck units that use linear bottlenecks and inverted residuals. It introduces a bottleneck structure with expanded intermediate channels to extract features before dimension reduction, improving accuracy while maintaining efficiency. MobileNetV3 further develops this approach with automated architecture search and manual tuning to create an even more lightweight network.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Brief introduction of mobilenetv1 / V2 / V3 lightweight

network
developpaper.com/brief-introduction-of-mobilenetv1-v2-v3-lightweight-network

July 31, 2020

Mobilenet series is a very important lightweight network family. It is from Google.


Mobilenetv1 uses deep separable convolution to build lightweight network. Mobilenetv2
proposes innovative transformed residual with linear Although there are more layers in the
bottleneck unit, the overall network accuracy and speed have been improved. Mobilenetv3
uses automl technology and manual fine tuning to build a more lightweight network

  Source: Xiaofei’s algorithm Engineering Notes official account

MobileNetV1

Paper: mobilenets: efficient volatile neural networks for mobile vision


applications

Thesis address: http://arxiv.org/pdf/1704.04861.pdf

Introduction

1/29
Mobilenet constructs a very lightweight and low delay model based on deep separable
convolution, and can further control the size of the model through two super parameters.
This model can be applied to terminal equipment, which has very important practical
significance.

Depthwise Separable Convolution

2/29
Suppose that the input and output of the standard convolution are $d_ F\times D_
Characteristic graphs of F / times M $, mathbb {f} $and $d_ F \times D_ The size of the
convolution kernel is $d_ K\times D_ Then the output characteristic graph is calculated
as follows:

3/29
The calculation amount is as follows:

The calculation amount and input dimension $M $, output dimension $n $, convolution


kernel size $d_ K $and feature map size $d_ F $.

Mobilenet optimizes the amount of computation through the depth separable convolution
optimization. It transforms the standard convolution into deep convolution and $1 /
times 1 $pointwise convolution. BN and relu are connected behind each layer. Each input
dimension of deep convolution corresponds to a convolution kernel. For the same input,
the output characteristic graph of deep convolution is calculated as follows:

4/29
The $\ hat {mathbb {K}} is the size of $d_ K\times D_ The deep convolution kernel of K /
times M $, $m of $/ hat {mathbb {K}} $_ The convolution kernel of {th} $corresponds to
$m of input $/ mathbb {f} $_ Characteristic graph of {th} $and $m of output $\ hat
{mathbb {g}} $_ The amount of computation of the depth convolution is as follows

Although depth convolution is more efficient, it does not fuse multiple input dimensions
well. Therefore, additional layers are needed to linearly combine the outputs. Here, a new
feature graph is generated by using $1 / times 1 $pointwise convolution, which is depth
separable convolution. The computational complexity is as follows:

The scaling ratio of the depth separable convolution and the standard convolution is as
follows

Mobilenet uses a depth of $3 / times 3 $to decouple convolution, so the amount of


computation will be reduced by 8-9 times, and the accuracy rate will be slightly reduced.

Network Structure and Training

5/29
The structure of mobilenet is shown in Table 1. Except for the first layer, other layers are
depth separable convolution. Except for the last full connection layer, each layer is
connected with BN and relu, with a total of 28 layers.

6/29
It is mentioned in the paper that the efficiency of the network can not be directly
represented by the amount of calculation, but also depends on the specific
implementation method of the operation. As shown in Table 2, most of the computation
and parameters of mobilenet are on Pointwise convolution. There are efficient
implementation methods for both CPU and GPU devices. As for the training settings, the
paper also has a more detailed reference, interested can look at the original text.

Width Multiplier: Thinner Models

Although mobilenet is already very lightweight, we can use the width scaling factor $/
alpha $to further reduce the weight. The input and output dimensions of each layer are
changed to $/ alpha M $and $/ alpha n $. The calculation amount after scaling is changed
to:

In (0,1] $, the calculation amount of the width scaling factor is about $/ alpha ^ 2 $,
which enables users to trade off the accuracy and speed according to the task.

Resolution Multiplier: Reduced Representation

Mobilenet can also scale the size of the model through the resolution scaling factor $/ Rho
$. Combined with the width scaling factor $/ alpha $, the calculation amount after scaling
is as follows:

In (0,1] $, the computational complexity of the resolution scaling factor is about $\ Rho ^
2 $.

7/29
The effects of depth separable convolution, width scaling factor and depth scaling factor
are also compared.

Experiments
The experiment of mobilenet is very detailed, and the performance comparison is
conducted on various tasks. Here, only part of the results are listed. The specific can be
seen in the original text, but it is a pity that there is no time-consuming comparison result
of reasoning.

Compare the full convolution version and depth separable convolution version of
mobilenet.

8/29
Compare the effect of width scaling and direct removal of the last five layers of $14 / times
14 / times 512 $deep deconvolution.

Compare the effect of different width scaling factors.

Compare the effect of different resolution scaling.

CONCLUSION

Mobilenet uses deep separable convolution to construct a lightweight network, which can
reduce the amount of parameters and calculation by about 8 times without a significant
decrease in accuracy, which is of great practical significance.

MobileNetV2

9/29
Paper: mobilenetv2: inverted residuals and linear bottlenecks

Thesis address: http://arxiv.org/pdf/1801.04381.pdf

Introduction

Mobilenetv2 proposes a new layer unit called transformed residual with linear bottleneck.
This structure is similar to the residual network unit and contains shorcut. The difference
is that the structure has less input and output dimensions. In the middle, linear
convolution is used to expand the dimension, then deep convolution is used to extract
features, and finally the dimension is reduced by mapping. The network performance can
be well maintained and the network is lighter.

Linear Bottlenecks

The key information in the high-dimensional features of neural networks is distributed in


a decentralized manner, which can be represented by compact low-dimensional features.
Therefore, in theory, the dimension of operation space can be reduced by reducing the
dimension of layer output. However, when there is nonlinear activation in the layer, the
above theory may be broken, so the nonlinear operation of low dimensional features is
removed

According to the properties of relu, if the output is non-zero, it is equivalent to a


linear change of the input space. It can be considered that part of the input space
has a linear change, and the network only processes these non-zero outputs. Since
the key information of the feature is usually non-zero after relu, it can be considered
that the key information (low dimensional feature) of relu is linear operation.

10/29
In this paper, the two-dimensional input is linearly increased to $d $by the matrix
$t $, and then nonlinearly activated by relu. Finally, the matrix $T ^ {- 1} $is used
to restore the two-dimensional input. From the visualization results, the lower the
dimension is, the more information is lost by relu. This shows that if the input
features of nonlinear operations can be compressed into lower dimensional features,
the complexity of input features should be large enough to keep the complete
information of nonlinear operations.

11/29
Assuming that the key information output from the layer can be represented by low-
dimensional features, linear bootleneck can be used for extraction. The structure is shown
in Fig. 2C. The dimension is reduced by the point wise convolution after the deep
convolution, but the nonlinear activation is not used after the dimension reduction, and
only the high-dimensional features are activated nonlinearly. Figure 2D is the first part of
structure C. the two together form a complete mobile netv2 bottleneck. First, dimension is
increased by pointwise convolution, then features are extracted by deep convolution, and
finally dimensionality is reduced by pointwise convolution. The proportion of dimension
increase is called expansion ratio.

Inverted residuals

12/29
Mobile netv2’s residual block is similar to RESNET’s residual block. The focus is to better
return gradient and feature reuse. The difference is that mobilenetv2 connects with
bottleneck features, that is, features with smaller dimensions. As described above, the
lower dimensional features contain all the necessary information, while the expansion
layer is only a means to realize nonlinear changes.

13/29
The operation and input and output of the residual block are shown in Table 1. Although
there is one more pointwise convolution compared with mobilenetv1, this structure allows
less input and output dimensions. As can be seen from the comparison in Table 3,
mobilenetv2 uses less memory. However, the setting of expansion ratio can allow many
structural changes. If it is set to 0, it means identity mapping; if it is greater than 1, it will
be RESNET’s residual block.

Model Architecture

14/29
Mobilenetv2 unit includes two types: stripe = 1 and stripe = 2.

15/29
The overall structure of mobilenetv2 is shown in Table 2. It is constructed by stacking the
structure of figure 4D. The first layer uses the ordinary convolution layer. In addition, the
width scaling factor and resolution scaling factor can be used to trade off the accuracy and
delay.

Experiments

16/29
17/29
This paper compares the performance of mobile netv2 and other networks in image
classification.

18/29
This paper compares the performance of mobile netv2 in target detection with other
networks.

19/29
This paper compares the performance of mobile netv2 in semantic segmentation with
other networks.

20/29
In addition, the paper verifies the improvement of transformed residual with linear bottle
neck.

Conclusions

Mobilenetv2 is based on converted residual with linear bottleneck for lightweight network
construction. The overall structure is quite innovative, including inverted residuals and
expansion layer. The analysis of linear bottlenecks is also very enlightening. Up to now,
many terminal algorithms still use mobilenetv2 as the backbone network.

MobileNetV3

Paper: searching for mobilenetv3

Thesis address: http://arxiv.org/pdf/1905.02244.pdf

21/29
Introduction

Mobile netv3 is built based on automl and optimized by manual fine tuning. Platform
aware NAS and netadapt are used for global search and local search respectively. Manual
tuning adjusts the structure of the front and rear layers of the network, adds se module to
bottleneck, and proposes computationally efficient h-swim nonlinear activation.

Network Search

MobileNetV3 first use MnasNet platform-aware NAS to search the structure of each
block, then search results in accordance with the default network structure, and interested
can go to see the official account. Platform aware NAS mainly uses the weighted $ACC
(m) times [lat (m) / tar] ^ w $of accuracy and actual delay as the optimization index to
approach Pareto optimization (accuracy and delay can not be increased at the same time).
In practice, it is found that for small models, the increase of delay will lead to a sharp
increase in precision, so it is necessary to increase $W = – 0.07 $to $W = – 0.15 $, and
increase the penalty for the increase of delay.

  after completing the preliminary network search, the paper uses netadapt to adjust
layer by layer. As a supplement to the search method of mnasnet, the specific steps of
netadapt are as follows:

1. The seed network based on mnasnet search method is used as the beginning.
2. A new proposal set is generated. Each proposal represents a modification to the seed
network, which must bring a delay reduction of $/ delta = 0.01 $times.
3. For each proposal, the training model of the previous step is used to initialize the
parameters, and the missing parameters are randomly initialized. Then finetune $t
= 10000 $round is used to get the approximate accuracy rate.
4. Select the best proposal according to the index.
5. Iteration step 234 until the target delay is satisfied.

The original netadapt uses the delay as the index of step 4, and the paper is modified to
the ratio of accuracy and delay $\ frac {delta ACC} {mid / delta latency / mid} $, which
can achieve a good trade-off. The proposal of step 2 still needs to meet step 2. In addition
to the modified convolution kernel operation of netadapt, the proposal of step 2 includes
the following two types:

Reduce the size of any expansion layer


Reduce the size of all bottlenecks of the same size

Redesigning Expensive Layers

After getting the search results, the paper finds that the overhead of the front and back
layers of the network is relatively high, so specific modifications are made to these layers.

22/29
  the transformation of the last few layers is shown in Fig. 5. AVG pool is pre installed, so
that the subsequent operations to high dimension can be carried out on the $1 / times 1
$feature map instead of the $7 / times 7 $feature map, saving a lot of time. Since the AVG
pool front operation has saved a lot of computation, there is no need for the dconv +
pointwise conv operation of the previous bottleneck (this operation can generate 320
dimensional features from 160 dimension to 1280 dimension) to reduce the calculation
amount, and directly remove it to further save the calculation amount. This improvement
can bring about a speed increase of 7 milliseconds (11%).

  for the first few layers, the general network uses 32 dimensional convolution of $3 /
times 3 $. This paper considers that there is redundancy in these convolutions. Through
experiments, the dimension reduction of 16 dimensions does not affect the accuracy rate,
which brings about a 2-millisecond speed increase. The hwish non-linear activation
proposed in the paper is used for nonlinear activation, and the effect is not different from
other functions.

Nonlinearities

Swish, as a substitute for relu, can significantly improve the accuracy. Swish is defined as
follows:

Due to the sigmoid function included in swish, it is not well optimized on mobile devices.
Therefore, sigmoid is replaced by piecewise linear simulation $/ frac {relu6 (x + 3)} {6} $

23/29
From the visualization results in Figure 6, the curves of swish and h-swish are very close.
The deeper the network is, the less time-consuming the nonlinear operation will be (the
size of the feature map will be reduced by half). Therefore, h-swim is only used in the
second half of the network.

Large squeeze-and-excite

24/29
Mobile netv3’s bottleneck adds se module to V2, where se ratio is fixed to 0.25. It is
mentioned in the paper that the implementation here is different from mansnet, which is
fixed to 1 / 4 of the expansion layer. However, it seems to me that there is no difference.
Please inform me if you know it.

MobileNetV3 Definitions

25/29
Mobilenetv3 is divided into two versions: mobilenetv3 large and mobilenetv3 small.

Experiments

The experiment of this paper is very full, only the main experimental results of some tasks
are pasted here, others can view the original text.

26/29
27/29
This paper compares the performance of mobilenetv3 and other networks in image
classification.

This paper compares the performance of mobilenetv3 and other networks in target
detection.

Conclusion
Mobilenetv3 first uses automl method to obtain the optimal network structure, and then
achieves the final accuracy through manual partial modification. Although the network is
not directly obtained through search, the experimental effect is still there, and the
improvement in it is worth reference and reference.

Conclusion

Mobilenet family is a very important lightweight network family. Mobilenetv1 uses deep
separable convolution to construct lightweight network. Mobilenetv2 proposes an
innovative inverted residual with linear Although there are more layers in the bottleneck
unit, the overall network accuracy and speed have been improved. Mobilenetv3 uses
automl technology and manual fine tuning to build a more lightweight network

   

28/29
If this article is helpful to you, please give me a like or read it
More content, please pay attention to WeChat official account.

29/29

You might also like