Applied Sciences: Go Wider: An Efficient Neural Network For Point Cloud Analysis Via Group Convolutions
Applied Sciences: Go Wider: An Efficient Neural Network For Point Cloud Analysis Via Group Convolutions
sciences
Article
Go Wider: An Efficient Neural Network for Point
Cloud Analysis via Group Convolutions
Can Chen, Luca Zanotti Fragonara * and Antonios Tsourdos
School of Aerospace, Transport and Manufacturing, Cranfield University, Cranfield MK43 0AL, UK;
[email protected] (C.C.); [email protected] (A.T.)
* Correspondence: [email protected]
Received: 13 February 2020; Accepted: 26 March 2020; Published: 1 April 2020
Abstract: In order to achieve a better performance for point cloud analysis, many researchers
apply deep neural networks using stacked Multi-Layer-Perceptron (MLP) convolutions over an
irregular point cloud. However, applying these dense MLP convolutions over a large amount
of points (e.g., autonomous driving application) leads to limitations due to the computation and
memory capabilities. To achieve higher performances but decrease the computational complexity,
we propose a deep-wide neural network, named ShufflePointNet, which can exploit fine-grained
local features, but also reduce redundancies using group convolution and channel shuffle operation.
Unlike conventional operations that directly apply MLPs on the high-dimensional features of a point
cloud, our model goes “wider” by splitting features into groups with smaller depth in advance,
having the respective MLP computations applied only to a single group, which can significantly
reduce complexity and computation. At the same time, we allow communication between groups by
shuffling the feature channel to capture fine-grained features. We further discuss the multi-branch
method for wider neural networks being also beneficial to feature extraction for point clouds.
We present extensive experiments for shape classification tasks on a ModelNet40 dataset and semantic
segmentation task on large scale datasets ShapeNet part, S3DIS and KITTI. Finally, we carry out an
ablation study and compare our model to other state-of-the-art algorithms to show its efficiency in
terms of complexity and accuracy.
Keywords: point cloud; semantic segmentation; shape classification; feature shuffle; neural network
1. Introduction
Point cloud processing is an increasingly important task for a wide range of applications such as:
environment perception for autonomous driving [1–4], virtual reconstruction [5], or augmented reality
(AR) [6]. It belongs to a deep learning branch that is typically referred to as geometric deep learning [7].
However, it is still challenging to analyse and extract useful features of the underlying shape
representation efficiently because of the disadvantages of having a large amount of points and an
unstructured distribution. On the other hand, point cloud data can provide a very accurate geometric
information of a three-dimensional object.
Considering the remarkable success and advantages of Convolutional Neural Networks (CNNs),
various authors firstly tried to apply them over standard grid structure voxelized from an unordered
point cloud, as CNNs only work on a regular grid data structure [8,9]. However, this intuitive way of
applying deep learning to point clouds leads to high memory and computation requirements due to
the natural sparse and irregular structure of the data. PointNet [10] treats point cloud directly as a set
of unordered points, leveraging a Multi-Layer-Perceptron (MLP) network and a symmetric function
(e.g., max pooling) to exploit global features and make unordered points invariant to permutations.
The main drawback is that the local information is not used in the stacked MLP layers. In order to solve
this issue, PointNet++ [11] proposes a hierarchical neural network joining PointNet with a sampling
and grouping layer to capture local representation within the data. Similarly, DGCNN [12] extracts
local features by introducing an edge convolution operation on points and edges connecting each
point and corresponding neighbours. Chen et al. [13] proposes an attention-aware neural network to
learn local features by highlighting different attention coefficient for neighbouring points. Another
CNN-like approach is PointCNN [14], which manages to transform an unordered point cloud to a latent
canonical order by using a χ-convolutional operator. RS-CNN [15] and ConvPoint [16] attempt to learn
irregular CNN-like filters to capture local features for point cloud. Our model applies feature shuffling
operations on PointNet++ [11] for less model computation but still with high accuracy performance.
As can be seen in Figure 1, it performs the best trade-off in accuracy and model complexity.
FLOPs (M)
time (ms)
25 1500
20
1000
15
10
0.87 0.88 0.89 0.90 0.91 0.92 0.87 0.88 0.89 0.90 0.91 0.92
accuracy (%) accuracy (%)
3.5
3.0
2.5
params (M)
2.0
PointNet
1.5 PointNet++
DGCNN v1
1.0 PointCNN
ShufflePointNet
0.5
0.87 0.88 0.89 0.90 0.91 0.92
accuracy (%)
Figure 1. Measurement of accuracy, parameters, floating point operations (FLOPs) and forward
time. It shows that our proposed ShufflePointNet (magenta diamond marker) tends to stay in the
bottom-right corner, which means that our model conjugates high accuracy with low model complexity.
We notice that the modern neural networks proposed for geometric deep learning [10–13] heavily
relies on dense MLP used to build repeated block structures with a different number of filters.
On the other hand, the elimination of redundancies within these dense MLP operations is rarely
mentioned for point cloud analysis, although it is a relatively common topic in the computer vision
domain [17–21]. Within the point cloud processing domain, many applications (e.g., autonomous
driving) normally have to process large scale point clouds, which make model complexity increase
significantly. Besides this, the MLP operation is not a spatial convolution, which limits the local
feature extraction capabilities of the overall model. As a result, applying a large number of deep MLP
convolutions easily leads to data overfitting. However, the MLP operation is efficient in dealing with
unstructured data (e.g., point cloud, social networks) as regular CNNs are only suited to deal with data
having a standard grid structure. On the other hand, these dense and deep 1 × 1 convolutions are not
efficient in terms of computation and memory. Therefore, we draw attention to the use of sparse MLP
Appl. Sci. 2020, 10, 2391 3 of 15
convolutions over irregular point clouds, which can leverage the advantages of MLP convolutions but
also reduce the redundancies by making use of sparse MLP convolutions.
In the area of point cloud analysis, there are also other methods proposed to decrease model
complexity. For instance, the models proposed in [14,15] attempt to develop efficient CNN-like kernels
to reduce the use of MLP-like operations. Conversely, PointNet++ [11] leverages the use of Farthest
Point Sampling (FPS) to downsample the points in advance before applying the PointNet architecture
to the features. However, only few researchers studied the impact of channel extension and wider
neural networks. Inspired by group convolutions as introduced by AlexNet and ShuffleNet [20,22],
we primarily focus on building a deep-wide neural network to reduce redundancy and also achieve
high performance. The main contributions of this paper are summarized as follows:
2. Related Work
3. Model Structure
In this section, we explain in more detail our model architecture: firstly, we introduce a process of
construction of a directed graph for local feature representation as shown in Figure 2. Then we present
group convolution and channel shuffle as shown in Figure 3. At last, we explain our MLP Shuffled
Group Convolution unit (SGC unit for short) and the overall model architecture in detail (see Figures 4
and 5 respectively).
We define X = xi ∈ RF , i = 1, 2, . . . , N as a raw point set and input for our model, where F is
the dimension of the point-cloud representation, N is the number of points, and xi is typically the 3D
coordinate of each point in a Cartesian reference system. The dimension F is not necessarily limited
to 3 because it is possible to use other features for each point such as: color, intensity, or the normal
component with respect to the surface.
in the rest of the paper, we are going to keep referring to F as the number of features, but the reader
should keep in mind that F has twice the dimensions when taking into consideration the local feature
and using an augmented feature space.
Appl. Sci. 2020, 10, 2391 5 of 15
xi3
xi4
ei3
xi2 ei2
ei4
.
.
ei1
xi
eij
.
xij
xi1
Figure 2. The process of construction of a directed acyclic graph for local feature extraction. On the left
is the raw point cloud data, on the right, the graph-like structure extracted from part of the point cloud
is represented. xi indicates the central point, eij are the directed edges connecting center point and its
corresponding neighbouring points xij .
Feature
Channel
MLP
Channel
Shuffle
Output
f/g
F/g
mlp {f/g }
N N
F K
split K
N F/g
f/g
K mlp {f/g}
K
N K
N
N
K K
K
N xF
max
pooling max mlp ma
FPS &FPS pooling max
{1024 } pooli
&knn &knn &knn pooling
SGC 2 MLP
SGC 1
{128, 256} {256}
{64, 128 }
Classification
N xF
max max
max feature fully
max pooling pooling max
pooling interpolate connected
...
FPS &FPS &FPS pooling max
&FPS pooling layers layers
&knn &knn &knn &knn pooling
&knn &knn
Figure 5. ShufflePointNet architecture: The architecture uses PointNet++ as backbone, and it contains
classification structure (top branch) and semantic segmentation structure (bottom branch) for point
cloud. In detail, N, F indicates the number of input points, and corresponding feature channels,
respectively. k is the number of the neighbours. In addition, the number in the brace {} indicates
the MLP filters. The classification model recursively applies Furthest Point Sampling (FPS) and a
Shuffled Group Convolution (SGC) module to capture downsampled local features, followed by a
shared full-connected layer to generate a global feature, which is used to predict classification scores.
The segmentation model extends the classification model by two more SGC modules and a feature
interpolation module to obtain a category score for each point.
∑ik=1 ωi (x) f i
f (x) = (3)
∑ik=1 ωi (x)
We then concatenate interpolated points features with corresponding abstraction level point
features as shown in Figure 6, and apply MLP convolution to fuse them together. Next, four
upsampling layers with respective filters (256, 256), (256, 256), (256, 128), (128, 128, 128) are employed
to extract fine-grained features for each point. Similar to classification structure, the global feature of
the point cloud is obtained by a max pooling; we then duplicate the global feature with N times and
finally apply four multiple layer perceptron (MLP) layers (256, 256, 128, s) with dropout probability
0.6 to transform the global feature to s categories. The number of the points N is 2048, 4096 and 16,384
for the ShapeNet part dataset [33], the Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [34] and
the KITTI dataset [35], respectively. s indicates the number of classes.
Interpolation Concat
4. Experiments
In this section, we carry out comprehensive experiments aimed at evaluating the performance
of ShufflePointNet when dealing with shape classification and segmentation tasks. In order to
demonstrate the effectiveness of our model, we highlight the trade-off between accuracy and
complexity of different recent state-of-the-art methods with respect to our model. Finally, we carry
out an ablation study to investigate different model settings. We also place implementation issues for
training on our Github https://github.com/FrankCAN/ShufflePointNet.
Appl. Sci. 2020, 10, 2391 9 of 15
4.1. Classification
4.1.1. Dataset
We evaluate our classification model on the ModelNet40 benchmark [36]. It contains 9843 training
models and 2468 testing models that are classified to 40 classes. We firstly uniformly downsampled all
models to 1024 points from a total of 2048 points, and then normalized them into a sphere with radius
one. Besides this, in order to improve the overall robustness, we further augmented the training data
by means of rotations, by random scaling the sampled points, and by jittering the position of every
point using Gaussian noise with zero mean and 0.01 standard deviation.
4.1.3. Results
Table 1 compares the performance of ShufflePointNet for classification to recent state-of-the-art
models in terms of accuracy and model complexity. Specifically, we decided to use as metric the mean
per-class accuracy (mA%), the overall accuracy (OA%), the number of model parameters (Million),
FLOPs (Million), and forward time (ms) on the ModelNet40 benchmark dataset [36]. It is worth
mentioning that, compared to the backbone structure PointNet++, our model outperforms it by 1.4% in
terms of accuracy, whilst also significantly reducing the amount of parameters, FLOPs and forward
time by 41%, 75% and 47.8%, respectively. These results are strongly supporting the effectiveness of
our deep-wide model as a reliable solution having high accuracy and lower computational complexity.
Our model achieves comparable performance with PointCNN [14], which is also
point-based-convolution methods over an unordered point cloud. PointCNN attempts to permute
the neighbouring points into a canonical order by learning a feature transformer, after which
standard CNNs are available to use. However, the performance is far from ideal due to the fact
that the positions of neighbours are unordered and unpredictable, while ShufflePointNet uses MLP
(Multi-Layer-Perceptron) convolutions over center points and corresponding neighbours, which is
Appl. Sci. 2020, 10, 2391 10 of 15
naturally invariant to point cloud ordering. In addition, the deep and wide architecture allows for
exploiting sufficient features with less computation. However, the receptive field of CNN is much
larger than that of MLP, which leads to the fact that PointCNN still can achieve remarkable performance
by leveraging powerful CNNs over approximate ordering of neighbours.
It is also possible to observe that PointCNN [14] has larger FLOPs but minor forward time, while
SpecGCN [39] has similar FLOPs to PointNet but requires a much longer inference time. In addition,
these results confirm our hypothesis that the use of FLOPs is an indirect method of estimating the
model complexity, and it cannot be used as the only metric to estimate the computational efficiency of
a deep learning model.
As shown in Table 2, when we split feature channels into two groups and disable channel shuffling
operation, the performance of overall accuracy drops significantly to 91.9%. We discuss the fact that,
when we disable shuffling operation, center point features and corresponding neighbouring points
features are captured, respectively, but still will concatenate before the max pooling layer. As a result,
without shuffling, only the last MLP layer contributes to exploiting local features from both center
points and corresponding neighbouring points.
Table 3 shows different settings for the number of points, low-level edge features ei j , and grouping
methods. It indicates that there is no improvement when we add neighbour features to represent
local features, and edge features connecting center point to corresponding neighbourhood is beneficial
for better results. In addition, k-NN searching slightly outperforms radius searching. We discuss the
reason that k-NN searching is more efficient when the layout of dataset can be more or less treated as
uniform distribution.
Appl. Sci. 2020, 10, 2391 11 of 15
4.2.5. Results
The mean Intersection over Union (mIoU) [10] is used as a metric to evaluate segmentation
performance. The IoU is calculated by averaging the IoUs for all parts belonging to the same categories;
then, the mIoU is computed as the mean IoUs for all shapes from the testing dataset.
For the semantic part segmentation task, Table 4 indicates that our segmentation model achieves
competitive results on the ShapeNet part dataset [33]. It is the best performing model in four categories,
similarly to the DGCNN and SGPN state-of-the-art models. Figure 7a illustrates some of the segmented
Appl. Sci. 2020, 10, 2391 12 of 15
shapes, whilst Figure 7b visualizes the errors of our prediction results compared to the ground truth
results. We present the ground truth on the left side and our predictions on the right side of the image.
avg air. bag cap car cha. ear. gui. kni. lam. lap. mot. mug pis. roc. ska. tab.
kd-net [23] 82.3 82.3 74.6 74.3 70.3 88.6 73.5 90.2 87.2 81.0 94.9 57.4 86.7 78.1 51.8 69.9 80.3
Kc-Net [38] 83.7 82.8 81.5 86.4 77.6 90.3 76.8 91.0 87.2 84.5 95.5 69.2 94.4 81.6 60.1 75.2 81.3
PointNet [10] 83.7 83.4 78.7 82.5 74.9 89.6 73.0 91.5 85.9 80.8 95.3 65.2 93.0 81.2 57.9 72.8 80.6
3DmFV [42] 84.3 82.0 84.3 86.0 76.9 89.9 73.9 90.8 85.7 82.6 95.2 66.0 94.0 82.6 51.5 73.5 81.8
RSNet [43] 84.9 82.7 86.4 84.1 78.2 90.4 69.3 91.4 87.0 83.5 95.4 66.0 92.6 81.8 56.1 75.8 82.2
PointNet++ [11] 85.1 82.4 79.0 87.7 77.3 90.8 71.8 91.0 85.9 83.7 95.3 71.6 94.1 81.3 58.7 76.4 82.6
DGCNN [12] 85.1 84.2 83.7 84.4 77.1 90.9 78.5 91.5 87.3 82.9 96.0 67.8 93.3 82.6 59.7 75.5 82.0
SGPN 85.8 80.4 78.6 78.8 71.5 88.6 78.0 90.9 83.0 78.8 95.8 77.8 93.8 87.4 60.1 92.3 89.4
OURS 85.1 82.7 83.6 86.2 77.3 90.3 78.7 90.9 87.4 84.6 94.6 68.5 94.5 82.9 51.3 73.4 82.3
Figure 7. Visualization of semantic part segmentation results. (a) visualizes some samples: airplane
(top), guitar (middle), and table (bottom), while (b) visualizes the predictions and corresponding errors
(right) compared to ground truth (left).
Tables 5 and 6 show the results on the S3DIS and KITTI dataset, respectively. It is worth noticing
that our segmentation model is also efficient on a super large scale point cloud, which shows its great
potential for autonomous driving applications.
5. Conclusions
In this paper, we propose a deep-wide neural network, named ShufflePointNet, to learn local
representations for point cloud data. Experiments carried out show state-of-the-art performance for
both the shape classification and semantic segmentation tasks on various datasets. The success of our
model verifies that a deep-wide neural network can achieve high performance whilst reducing model
complexity, which is crucial for real time applications.
In the future, we plan to apply our model on various real applications, such as environment
perception for autonomous aerial or ground vehicles which typically need to process very large-scale
point cloud data. In addition to geometric representations, we also plan to extend our model for other
unstructured data (e.g., social networks). Furthermore, an efficient CNN-like operation development
for unstructured data analysis is also a possible strategy to improve our model.
Author Contributions: Conceptualization, C.C.; methodology, C.C.; software, C.C.; validation, C.C. and L.Z.F.;
formal analysis, C.C.; investigation, C.C. and L.Z.F.; resources, A.T.; data curation, C.C.; writing—original draft
preparation, C.C.; writing—review and editing, C.C., L.Z.F., and A.T.; visualization, C.C.; supervision, L.Z.F. and
A.T.; project administration, L.Z.F.; funding acquisition, A.T. All authors have read and agreed to the published
version of the manuscript.
Funding: The HumanDrive project is a CCAV/Innovate UK funded R&D project (Project ref: 103283) led by
Nissan and supported by world-class experts from nine other industry and academic organisations (Hitachi,
Horiba MIRA, Aimsun, Atkins, University of Leeds, Cranfield University, Highways England, SBD Automotive
Ltd. and the Transport Systems Catapult) is developing an advanced vehicle control system, designed to allow
the vehicle to emulate a “natural” human driving style using machine learning and developing an Artificial
Intelligence to enhance the user comfort, safety and experience. For more, check out www.humandrive.co.uk.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June
2018; pp. 4490–4499.
2. Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT,
USA, 18–22 June 2018; pp. 918–927.
3. Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection
from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8.
4. Liu, Z.; Chen, H.; Di, H.; Tao, Y.; Gong, J.; Xiong, G.; Qi, J. Real-Time 6D Lidar SLAM in Large Scale Natural
Terrains for UGV. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China,
26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 662–667.
5. Bruder, G.; Steinicke, F.; Nüchter, A. Poster: Immersive point cloud virtual environments. In Proceedings of
the 2014 IEEE Symposium on 3D User Interfaces (3DUI), Minneapolis, MN, USA, 29–30 March 2014; IEEE:
Piscataway, NJ, USA, 2014; pp. 161–162.
6. Wei, J.; Ye, G.; Mullen, T.; Grundmann, M.; Ahmadyan, A.; Hou, T. Instant Motion Tracking and Its
Applications to Augmented Reality. arXiv 2019, arXiv:1907.06796.
7. Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond
euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [CrossRef]
8. Maturana, D.; Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition.
In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Hamburg, Germany, 28 September–2 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 922–928.
9. Wang, D.Z.; Posner, I. Voting for Voting in Online Point Cloud Object Detection. Robot. Sci. Syst. 2015, 1,
10-15607.
10. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and
segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Honolulu, HI, USA, 21–26 July 2017; Volume 1, p. 4.
Appl. Sci. 2020, 10, 2391 14 of 15
11. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric
space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA,
4–9 December 2017; pp. 5099–5108.
12. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph CNN for learning on
point clouds. arXiv 2018, arXiv:1801.07829.
13. Chen, C.; Fragonara, L.Z.; Tsourdos, A. GAPNet: Graph Attention based Point Neural Network for
Exploiting Local Feature of Point Cloud. arXiv 2019, arXiv:1905.08705.
14. Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. PointCNN: Convolution On X-Transformed Points.
In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada,
3–8 December 2018; pp. 828–838.
15. Liu, Y.; Fan, B.; Xiang, S.; Pan, C. Relation-Shape Convolutional Neural Network for Point Cloud
Analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seoul,
Korea, 27 October–3 November 2019; pp. 8895–8904.
16. Boulch, A. ConvPoint: Continuous convolutions for point cloud processing. Comput. Graph. 2020, 88, 24–34.
[CrossRef]
17. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, Honolulu, HI, USA,
21–26 July 2017; pp. 1492–1500.
18. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.
Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017,
arXiv:1704.04861.
19. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear
bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake
City, UT, USA, 18–22 June 2018; pp. 4510–4520.
20. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for
mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856.
21. Zhang, T.; Qi, G.J.; Xiao, B.; Wang, J. Interleaved group convolutions. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4373–4382.
22. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural
networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA,
3–6 December 2012; pp. 1097–1105.
23. Klokov, R.; Lempitsky, V. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models.
In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017;
pp. 863–872.
24. Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975,
18, 509–517. [CrossRef]
25. Riegler, G.; Osman Ulusoy, A.; Geiger, A. Octnet: Learning deep 3d representations at high resolutions.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA,
21–26 July 2017; pp. 3577–3586.
26. Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and multi-view cnns for object
classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5648–5656.
27. Wang, C.; Pelillo, M.; Siddiqi, K. Dominant set clustering and pooling for multi-view 3d object recognition.
In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017;
Volume 12.
28. Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs.
arXiv 2013, arXiv:1312.6203.
29. Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized
spectral filtering. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona,
Spain, 5–10 December 2016; pp. 3844–3852.
Appl. Sci. 2020, 10, 2391 15 of 15
30. Zhang, Y.; Rabbat, M. A Graph-CNN for 3D Point Cloud Classification. In Proceedings of the 2018
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada,
15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6279–6283.
31. Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT,
USA, 18–22 June 2018; pp. 4558–4567.
32. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design.
In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September
2018; pp. 116–131.
33. Yi, L.; Kim, V.G.; Ceylan, D.; Shen, I.; Yan, M.; Su, H.; Lu, C.; Huang, Q.; Sheffer, A.; Guibas, L. A scalable
active framework for region annotation in 3d shape collections. ACM Trans. Graph. (TOG) 2016, 35, 210.
[CrossRef]
34. Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing
of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1534–1543.
35. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013,
32, 1231–1237. [CrossRef]
36. Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for
volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern recognition,
Boston, MA, USA, 7–12 June 2015; pp. 1912–1920.
37. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
38. Shen, Y.; Feng, C.; Yang, Y.; Tian, D. Mining point cloud local structures by kernel correlation and graph
pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City,
UT, USA, 18–22 June 2018; pp. 4548–4557.
39. Wang, C.; Samari, B.; Siddiqi, K. Local spectral graph convolution for point set feature learning.
In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September
2018; pp. 52–66.
40. Atzmon, M.; Maron, H.; Lipman, Y. Point convolutional neural networks by extension operators. arXiv 2018,
arXiv:1803.10091.
41. Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA,
21–26 July 2017; pp. 1907–1915.
42. Ben-Shabat, Y.; Lindenbaum, M.; Fischer, A. 3d point cloud classification and segmentation using 3d
modified fisher vector representation for convolutional neural networks. arXiv 2017, arXiv:1711.08241.
43. Huang, Q.; Wang, W.; Neumann, U. Recurrent slice networks for 3d segmentation of point clouds.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT,
USA, 18–22 June 2018; pp. 2626–2635.
44. Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. Segcloud: Semantic segmentation of 3d point clouds.
In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October
2017; IEEE: Piscataway, NJ, USA, 2017; pp. 537–547.
45. Wang, S.; Suo, S.; Ma, W.C.; Pokrovsky, A.; Urtasun, R. Deep parametric continuous convolutional neural
networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake
City, UT, USA, 18–22 June 2018; pp. 2589–2597.
c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).