FCNN
FCNN
Networks
Abstract. The Fourier domain is used in computer vision and machine learn-
ing as image analysis tasks in the Fourier domain are analogous to spatial do-
main methods but are achieved using different operations. Convolutional Neu-
ral Networks (CNNs) use machine learning to achieve state-of-the-art results
with respect to many computer vision tasks. One of the main limiting aspects
of CNNs is the computational cost of updating a large number of convolution pa-
rameters. Further, in the spatial domain, larger images take exponentially longer
than smaller image to train on CNNs due to the operations involved in convolu-
tion methods. Consequently, CNNs are often not a viable solution for large im-
age computer vision tasks. In this paper a Fourier Convolution Neural Network
(FCNN) is proposed whereby training is conducted entirely within the Fourier
domain. The advantage offered is that there is a significant speed up in training
time without loss of effectiveness. Using the proposed approach larger images
can therefore be processed within viable computation time. The FCNN is fully
described and evaluated. The evaluation was conducted using the benchmark Ci-
far10 and MNIST datasets, and a bespoke fundus retina image dataset. The results
demonstrate that convolution in the Fourier domain gives a significant speed up
without adversely affecting accuracy. For simplicity the proposed FCNN con-
cept is presented in the context of a basic CNN architecture, however, the FCNN
concept has the potential to improve the speed of any neural network system in-
volving convolution.
1 Introduction
Convolutional Neural Networks (CNNs) [1] are a popular, state-of-the-
art, deep learning approach to computer vision with a wide range of ap-
plication in domains where data can be represented in terms of three
dimensional matrices. For example, in the case of image and video anal-
ysis. Historically, CNNs were first applied to image data in the context of
handwriting recognition [2]. Since then the viability of CNNs, and deep
learning in general, has been facilitated, alongside theoretical improve-
ments, by significant recent advancements in the availability of process-
ing power. For example, Graphics Processing Units (GPUs) allow us to
deal with the heavy computation required by convolution.
However, there are increasingly larger datasets to which we wish to
apply deep learning to [3] and, in the case of deep learning, a growing de-
sire to increase the depth of the networks used in order to achieve better
results [4,5]. This not only increases memory utilisation requirements,
but also computational complexity. In the case of CNNs, the most com-
putationally expensive element is the calculation of the spatial convolu-
tions. The convolution is typically conducted using a traditional sliding
window approach across the data matrix together with the application of
a kernel function of some kind [6]. However, this convolution is com-
putationally expensive, which in turn means that CNNs are often not vi-
able for large image computer vision tasks. To address this issue, this
paper proposes the idea of a using the Fourier domain. More specifically
this paper proposes the Fourier Convolution Neural Network (FCNN)
whereby training is conducted entirely in the Fourier domain. The ad-
vantage offered is that there is a significant speed up in training time
without loss of effectiveness. Using FCNN images are processed and
represented using the Fourier domain to which a convolution mechanism
is applied in a manner similar to that used in the context of more tradi-
tional CNN techniques. The proposed approach offers the advantage that
it reduces the complexity, especially in the context of larger images, and
consequently provides for significant increase in network efficiency.
The underlying intuition given by the Convolution Theorem which
states that for two functions κ and u, we have
F(κ ∗ u) = F(κ) F(u) (1)
where F denotes the Fourier transform, ∗ denotes convolution and de-
notes the Hadamard Pointwise Product. This allows for convolution to be
calculated more efficiently using Fast Fourier Transforms (FFTs). Since
convolution corresponds to the Hadamard product in the Fourier domain
and given the efficiency of the Fourier transform, this method involves
significantly fewer computational operations than when using the sliding
kernel spatial method, and is therefore much faster [7]. Working in the
Fourier domain is less intuitive as we cannot visualise the filters learned
by our Fourier convolution; this is a common problem with CNN tech-
niques and is beyond the scope of this paper. While the Fourier domain is
frequently used in the context of image processing and analysis [8,9,10],
there has been little work directed at adopting the Fourier domain with
respect to CNNs. Although FFTs, such as the Cooley-Tukey algorithm
[11], have been applied in the context of neural networks for image [12]
and time series [13] analysis. These applications date from the embryonic
stage of CNNs and, at that time, the improvement was minimal.
The concept of using the Fourier domain for CNN operations has
been previously proposed [7,14,15]. In both [7] and [14] the speed-up
of convolution in the Fourier domain was demonstrated. Down-sampling
within the Fourier domain was used in [15] where the ability to retain
more spatial information and obtain faster convergence was demonstrated.
However, the process proposed in [7,14,15] involved interchanges be-
tween the Fourier and spatial domains at both the training and testing
stages which added significant complexity. The FFT required is the com-
putationally intensive part of the process. FFTs, and inverse FFTs, needed
to be applied for each convolution; thus giving rise to an undesired com-
putational overhead. In the case of the proposed FCNN the data is con-
verted to the Fourier domain before the process starts, and remains in the
Fourier domain; no inverse FFTs are required at any point.
The layout of the rest of the paper is as follows. In §2, we present our
method of implementation of the specific layers that constitute our FC-
NNs, in §3 we present our experimental results. In §4 and §5 we present
a discussion together with conclusions concerning abilities of the FCNN.
2 The Fourier Convolution Neural Network (FCNN)
Approach
The FCNN was implemented using the deep learning frameworks Keras
[16] and Theano [17]. Theano is the machine learning backend of Keras.
This backend was used to code the Fourier layers. The Theano FFT func-
tion Theano was used to convert our training and test data. The Theano
FFT function is a tensor representation of the multi-dimensional Cooley-
Tukey algorithm. This function is the n-dimensional discrete Fourier
transform over any number of axes in an m-dimensional array by using
FFT. The multi-dimensional discrete Fourier transform used is defined
as:
m−1 n−1
`1 `
+ n2
XX −2πi
Akl = a`1 `2 e m (2)
`1 =0 `2 =0
1. κ̃i = F (κi ) , i = 1, . . . , N κ
2. ũi = F (ui ) , i = 1, . . . , N u
3. z̃i,j = κ̃i ũj , i = 1, . . . , N κ , j = 1, . . . , mu
4. zi,j = F −1 (z̃i,j ) , i = 1, . . . , N κ , j = 1, . . . , N u
Fig. 1. Our layer initially contains an X × Y × Z voxel. The truncation runs through the x-axis
of the Fourier data (thus truncating the Y and Z axis).
3 Evaluation
The evaluation was conducted using an Nvidia K40c GPU that con-
tains 2880 CUDA cores and comes with the Nvidia CUDA Deep Neu-
ral Network library (cuDNN) for GPU learning. For the evaluation both
the computation time and the accuracy of the layers in the spatial and
Fourier domains was compared. The FCNN and its spatial counterpart
were trained using the 3 datasets introduced above: MNIST, Cifar10 and
Kaggle fundus images. Each dataset was used to evaluate different as-
pects of the proposed FCNN. The MNIST dataset allows us to compare
high-level accuracy while demonstrating the speed up of doing convolu-
tions in the Fourier domain. The Cifar10 dataset was used to show that
the FCNN can learn a more complicated classification task to the same
degree as a spatial CNN with the same number of filters. The results are
presented below in terms of speed, accuracy and propagation loss. Fi-
nally, the large fundus Kaggle dataset was used to show that the FCNN
is better suited to dealing with larger images, than spatial CNNs, because
of the nature of the Fourier convolutions.
Table 1. Computation time for the convolution of a single images of varying size, using both
Fourier and spatial convolution layers.
Table 2. Computation time for pooling an image of the given size using: (i) Down-sampling, (ii)
Max pooling and (iii) Fourier pooling.
Fig. 2. Comparison of pooling using: (i) down-sampling (col. 1), (ii) max-pooling (col. 2) and
(iii) Fourier pooling (col. 3).
Fourier Pooling of fundus image
Fig. 3. Top-left) Original fundus image, Bottom-left) normal max-pooling and then resizing to
original size; Top-right) Fourier pooling, back to spatial domain and resize to original size;
Bottom-right) Fourier pooling, embed in a zero matrix and convert back to spatial
Training on the MNIST dataset
Fig. 4. top) FCNN bottom) Spatial CNN. Dark blue, black and red are validation values, lighter
colours are training values.
Training on the Cifar10 dataset
Fig. 5. Training on the Cifar10 dataset: top) FCNN bottom) Spatial CNN. Dark blue, black and
red are validation values, lighter colours are training values.
results are presented in Figures 4 and 5 using network one. The fundus
training was carried out on network two and epoch speeds were recorded
see 3. The accuracy achieved on the MNIST and Cifar10 test sets using
the FCNN is only marginally below the spatial CNN but the results are
achieved with a significant speed up. The MNIST training was twice as
fast on the FCNN in comparison the spatial CNN and the Cifar10 dataset
was trained in 6 times the speed. This is due to the Cifar dataset contain-
ing slightly larger images than MNIST and demonstrates how our FCNN
scales better to large images.
Table 3. Computation time in seconds for an epoch of re-sized fundus images. One epoch is
60,000 training images.
4 Discussion
The proposed FCNN technique allows training to be conducted entirely
in the Fourier domain, in other words only one FFT is required through-
out the whole process. The increase in computation time required for the
FFT is recovered because of the resulting speed up of the convolution.
Compared to spatial approach the evaluation results obtained evidence an
exponential increase in efficiency for larger images. Given a more com-
plex network, or a dataset of larger images, the benefit would be even
more pronounced.
The results presented demonstrated that using the Fourier representa-
tion training time, using the same layer structure, was considerably less
than when a spatial representation was used. The analogous Fourier do-
main convolutions and more spatially accurate pooling method allowed
for a retention in accuracy on both datasets introduced. It was conjec-
tured that the higher accuracy achieved using the proposed FCNN on the
Cifar10 dataset was due to the larger Fourier domain kernels within the
Fourier convolution layer. Due to the Fourier kernel size, more parame-
ters within the network were obtained than in the case of spatial window
kernels. This allowed for more degrees of freedom when learning fea-
tures of the images.
The reason for lower accuracy of the FCNN using the MNIST dataset
is likely due to the network being trained on very small images. This
creates boundary issues and information loss in the Fourier domain when
converting from the spatial. This is particularly relevant with respect to
smaller images; it is much less of an issue in larger images. Hence, when
dealing with larger images we would expect no reduction in accuracy
in the Fourier domain while achieving the speed-ups shown. To combat
this, we could consider boundary conditions with respect to all of our
Fourier layers, which is what is done in the spatial case.
5 Conclusion
This paper has proposed the idea of a Fourier Convolution Neural Net-
work (FCNNs) which offers run-time advantages, especially during train-
ing. The reported performance results were comparable with standard
CNNs but with the added advantage of a significant speed increase. As
a consequence the FCNN approach can be used to classify image sets
featuring large images; not possible using the spatial CNNs. The FCNN
layers are not specific to any architecture and therefore can be extended
to any network using convolution, pooling and dense layers. This is the
case for the vast majority of neural network architectures. For future
work the authors intend to investigate how the Fourier layers can be
optimised and implemented with respect to other network architectures
that have achieved state-of-the-art accuracies [4,5]. The authors specu-
late that, given the efficiency advantage offered by FCNNs, they would
be used to address classification tasks directed at larger images, and in a
much shorter time frames, than would be possible using standard CNNs.
6 Acknowledgement
The authors would like to acknowledge everyone in the Centre for Re-
search in Image Analysis (CRiA) imaging team at the Institute of Age-
ing and Chronic Disease at the University of Liverpool and the Fight for
Sight charity who have supported this work through funding.
References
1. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Wein-
berger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105.
Curran Associates, Inc., 2012. 1, 4
2. Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, L. D. Jackel, and D. Hender-
son. Advances in neural information processing systems 2. pages 396–404. Citeseer, 1990.
1
3. Kaggle. Kaggle datasets. https://www.kaggle.com/datasets. 2, 4, 6
4. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition. CoRR, abs/1512.03385, 2015. 2, 4, 15
5. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolu-
tions. In Computer Vision and Pattern Recognition (CVPR), 2015. 2, 15
6. Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann Le-
Cun. Overfeat: Integrated recognition, localization and detection using convolutional net-
works. CoRR, abs/1312.6229, 2013. 2
7. Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, and
Yann LeCun. Fast convolutional nets when fbfft : A gpu performance evaluation, 2015. 2, 3
8. Tony F. Chan and Chiu-Kwong Wong. Total variation blind deconvolution. IEEE Transac-
tions on Image Processing, 7(3):370–375, 1998. 2
9. Nico Persch, Ahmed Elhayek, Martin Welk, Andrés Bruhn, Sven Grewenig, Katharina Böse,
Annette Kraegeloh, and Joachim Weickert. Enhancing 3-d cell structures in confocal and
sted microscopy: a joint model for interpolation, deblurring and anisotropic smoothing. Mea-
surement Science and Technology, 24(12):125703, 2013. 2
10. Bryan M. Williams, Ke Chen, and Simon P. Harding. A new constrained total variational
deblurring model and its fast algorithm. Numerical Algorithms, 69(2):415–441, 2015. 2
11. James W Cooley and John W Tukey. An algorithm for the machine calculation of complex
fourier series. Mathematics of computation, 19(90):297–301, 1965. 3, 5
12. Patrizio Campisi and Karen Egiazarian. Blind Image Deconvolution. CRC Press, 2007. 3
13. Rajesh Kumar Himanshu Gothwal, Silky Kedawat. Cardiac arrhythmias detection in an ecg
beat signal using fast fourier transform and artificial neural network. Journal of Biomedical
Science and Engineering, 4:289–296, 2011. 3
14. Yann LeCun Michael Mathieu, Mikael Henaff. Fast training of convolutional networks
through ffts, 2014. 3
15. Ryan P.Adams Oren Rippel, Jasper Snoek. Spectral representations for convolutional neural
networks, 2015. 3
16. Franois Chollet. Keras. https://github.com/fchollet/keras, 2015. 4
17. Theano Development Team. Theano: A Python framework for fast computation of mathe-
matical expressions. arXiv e-prints, abs/1605.02688, May 2016. 4
18. Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010. 4
19. Alex Krizhevsky. Learning multiple layers of features from tiny images. https://www.
cs.toronto.edu/˜kriz/learning-features-2009-TR.pdf. 4
20. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani,
M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural
Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014. 4
21. Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward
neural networks. In In Proceedings of the International Conference on Artificial Intelligence
and Statistics (AISTATS10). Society for Artificial Intelligence and Statistics, 2010. 6