Report on Efficient Convolution Techniques in
CNNs
1. Introduction
Convolutional Neural Networks (CNNs) are the backbone of deep learning in computer vision, but
standard convolutions are computationally expensive. Efficient convolution techniques aim to
reduce computation, memory, and power usage while maintaining model performance. This report,
based on Dr. Dayal Kumar Behera’s lecture (SCE, KIIT DU), covers three main efficiency
strategies: Spatially Separable, Depthwise Separable, and FFT-based convolutions.
2. Standard Convolution
In standard 2D convolution, each output channel is produced by convolving all input channels using
K×K kernels. This results in high computational cost proportional to the number of channels and
kernel size.
Parameters = Cout × Cin × K²
Computational Cost = H × W × Cin × Cout × K²
3. Spatially Separable Convolution
If a 2D kernel can be decomposed into two 1D filters (K(x, y) = f(x)·g(y)), convolution can be
performed in two steps: horizontal and vertical. This reduces the cost from O(K²) to O(2K), resulting
in fewer computations, especially for large kernels.
Efficiency Gain: O(K²) → O(2K)
4. Depthwise Separable Convolution
Depthwise separable convolution, introduced in MobileNet and Xception architectures, splits
convolution into: 1. Depthwise Convolution: Applies one filter per input channel. 2. Pointwise
Convolution: Combines depthwise outputs via 1×1 convolution. This dramatically reduces the
number of parameters and operations, yielding up to 8× efficiency improvements.
Cost = HW(CinK² + CinCout)
Reduction Ratio = (1/Cout + 1/K²)
5. FFT (Fast Fourier Transform) Convolution
Using the convolution theorem, FFT convolution transforms both image and kernel into the
frequency domain, performs element-wise multiplication, and applies an inverse FFT. This reduces
computational complexity from O(HWK²) to O(N log N), making it effective for large kernels.
x * h = F■¹(F(x) · F(h))
Complexity: O(N log N)
6. Comparison of Convolution Techniques
Method Concept Parameters Complexity Use Case
Standard Full kernel per output CinCoutK² O(HWCinCoutK²) General CNNs
Method Concept Parameters Complexity Use Case
Spatially Separable Decompose 2D kernel 2K O(2K) Gaussian Filters
Depthwise Separable Channel-wise + 1×1 mix CinK² + CinCout O(HW(CinK²+CinCout)) MobileNets, Xcepti
FFT-based Frequency domain conv Depends on FFT O(N log N) Large Kernels
7. Conclusion
Efficient convolutional operations are vital for optimizing CNN architectures. Spatially separable
convolutions simplify symmetric filters, depthwise separable convolutions reduce redundancy for
lightweight networks, and FFT-based convolutions accelerate large kernel computations. These
methods collectively enable real-time and embedded AI applications with minimal computational
resources.