Convolutional Neural Networks (CNNs) in R
Last Updated :
23 Jul, 2025
Convolutional Neural Networks (CNNs) are a specialized type of neural network designed to process and analyze visual data. They are particularly effective for tasks involving image recognition and classification due to their ability to automatically and adaptively learn spatial hierarchies of features.
CNN Architecture Components
Here are the main CNN Architecture Components.
- Convolutional Layers: The core building blocks of CNNs, convolutional layers apply a set of filters (or kernels) to the input image. These filters slide over the image, performing element-wise multiplication with the input data and summing the results to produce a feature map. This process helps in capturing local patterns and textures in the image.
- Activation Functions: After convolution, activation functions such as ReLU (Rectified Linear Unit) are applied to introduce non-linearity into the model. This helps the network learn more complex patterns.
- Pooling Layers: Pooling (or subsampling) layers reduce the dimensionality of the feature maps, retaining the most important information while discarding less significant details. Max pooling, which selects the maximum value from a set of values, is a common technique.
- Fully Connected Layers: After several convolutional and pooling layers, the high-level features are flattened and passed through fully connected layers (dense layers). These layers combine the features to make predictions or classifications.
- Dropout Layers: Dropout is a regularization technique that helps prevent overfitting by randomly dropping units during training, which forces the network to learn more robust features.
- Output Layer: The final layer produces the output, such as class scores in a classification task or bounding box coordinates in object detection.
Applications of CNNs
- Image Recognition and Classification: CNNs are widely used for identifying and categorizing objects within images. For instance, they can classify images into different categories such as animals, vehicles, or scenes.
- Object Detection: CNNs can be employed to detect objects within an image and determine their locations. This is useful for applications like autonomous driving, where identifying and locating objects is crucial.
- Image Segmentation: In segmentation tasks, CNNs divide an image into segments and classify each segment. This is used in medical imaging for segmenting organs or tumors, or in autonomous vehicles for distinguishing road lanes and obstacles.
- Facial Recognition: CNNs can recognize and verify individuals based on facial features, which is used in security systems and social media applications.
- Style Transfer: CNNs can apply artistic styles from one image to another, creating visually appealing effects by learning the style features from a reference image and transferring them to a target image.
- Generative Models: CNNs are used in generative models like Generative Adversarial Networks (GANs) to create new images or data that mimic the distribution of a training dataset.
To set up Convolutional Neural Networks (CNNs) in R Programming Language you'll need to install and configure the necessary packages and preprocess your datasets.
Step 1: Installing the required packages
You need to install the keras and tensorflow packages, which provide the tools for building and training CNN models in R. These packages interface with TensorFlow, a popular deep learning framework.
R
# Install the Keras package from CRAN
install.packages("keras")
# Load the Keras library
library(keras)
# Install TensorFlow and Keras Python packages
# This command installs TensorFlow and Keras for Python and sets up the necessary environment
install_keras()
# Check TensorFlow version
library(tensorflow)
tf$constant("Hello TensorFlow!")
Output:
tf.Tensor(b'Hello TensorFlow!', shape=(), dtype=string)
Step 2: Loading and Preprocessing Datasets
For demonstration purposes, we’ll use the MNIST dataset, a classic dataset for image classification tasks. MNIST consists of handwritten digits (0-9) and is commonly used for benchmarking image processing models.
R
# Load the keras library
library(keras)
# Load the MNIST dataset
mnist <- dataset_mnist()
# Split into training and testing datasets
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
- dataset_mnist(): This function loads the MNIST dataset, a classic dataset used for training image processing systems.
- mnist: This variable now contains the MNIST dataset, which includes both the training and test datasets. It is structured as a list with training data and test data, including images and labels.
- x_train: Contains the images for training. It is a 3D array with dimensions (number of training samples, image height, image width). Each image is a grayscale image of size 28x28 pixels.
- y_train: Contains the labels for the training images. It is a vector of integers representing the digit each image corresponds to (0-9).
- x_test: Contains the images for testing. It has the same structure as x_train, but for the test dataset.
- y_test: Contains the labels for the test images, similar to y_train but for the test dataset.
Preprocess the Data
Reshape and Normalize Images.
R
# Reshape the images to (28, 28, 1) and normalize pixel values to the range [0, 1]
x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1))
x_train <- x_train / 255
x_test <- x_test / 255
- array_reshape(): This function reshapes the data to the specified dimensions. It is used here to adjust the shape of the images for compatibility with the CNN model.
- x_train and x_test: Initially, these are 3D arrays with dimensions (number of samples, image height, image width). MNIST images are 28x28 pixels in size.
- nrow(x_train): Number of training samples.
- 28, 28: Image dimensions (height and width).
- 1: Number of channels (grayscale images have one channel).
- Normalization: Pixel values in images typically range from 0 to 255. Normalizing these values to the range [0, 1] helps in improving the convergence and performance of the neural network.
- Division by 255: Each pixel value is divided by 255, which scales the pixel values from the original range [0, 255] to the new range [0, 1].
Convert Labels to One-Hot Encoding
R
one_hot_encode <- function(labels, num_classes) {
# Create a matrix of zeros
encoded_labels <- matrix(0, nrow = length(labels), ncol = num_classes)
# Set the appropriate index to 1 for each label
for (i in seq_along(labels)) {
encoded_labels[i, labels[i] + 1] <- 1
}
return(encoded_labels)
}
# Apply the custom one-hot encoding function
y_train <- one_hot_encode(y_train, 10)
y_test <- one_hot_encode(y_test, 10)
- one_hot_encode(labels, num_classes): defined to convert class labels into a one-hot encoded matrix.
- labels: A vector of class labels, where each label is an integer indicating the class (e.g., 0, 1, 2, ..., 9).
- num_classes: The total number of classes (e.g., 10 for the MNIST dataset).
- encoded_labels: A matrix initialized with zeros, where the number of rows equals the number of labels, and the number of columns equals the number of classes.
- encoded_labels[i, labels[i] + 1] <- 1: For each label, the matrix is updated to set the corresponding index to 1. The +1 is necessary because R uses 1-based indexing, while the labels are 0-based.
- y_train: The training labels are converted to a one-hot encoded matrix with 10 columns (one for each class).
- y_test: The test labels are similarly converted to a one-hot encoded matrix.
Step 3: Summary of the Dataset
Now we will check the Summary of the Dataset.
R
# Check dimensions and type of data
str(x_train)
str(y_train)
Output:
num [1:60000, 1:28, 1:28, 1] 0 0 0 0 0 0 0 0 0 0 ...
num [1:60000, 1:10] 0 1 0 0 0 0 0 0 0 0 ...
- str(x_train): Displays the structure of the x_train object. For an image dataset like MNIST, it will show that x_train is a 4D array with dimensions (number of samples, image height, image width, number of channels). It will also display the type of data contained in the array.
- str(y_train): Displays the structure of the y_train object. For one-hot encoded labels, it will show that y_train is a matrix with dimensions (number of samples, number of classes), where each row contains a one-hot encoded vector.
Step 4: Building a CNN Model
Now we will Building a CNN Model.
R
# Initialize the model
model <- keras_model_sequential()
# Add convolutional layers
model %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = 'relu',
input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
# Flatten the output from convolutional layers
layer_flatten() %>%
# Add fully connected layers
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 10, activation = 'softmax') # Output layer for 10 classes
- keras_model_sequential(): Initializes a sequential model, which is a linear stack of layers where you add layers one by one.
- layer_conv_2d(): Adds a 2D convolutional layer.
- filters: Number of filters (32, 64, 128) used to detect features.
- kernel_size: Size of the convolutional kernel (3x3).
- activation: Activation function used (ReLU in this case).
- input_shape: Shape of the input data for the first layer (28x28 pixels, 1 channel).
- layer_max_pooling_2d(): Adds a max pooling layer to reduce the spatial dimensions (height and width) of the feature maps.
- pool_size: Size of the pooling window (2x2).
- layer_flatten(): Flattens the 3D output from the convolutional layers into a 1D vector. This is necessary before passing the data to fully connected (dense) layers.
- layer_dense(): Adds a fully connected (dense) layer.
- units: Number of neurons in the layer (128 in the first dense layer, 10 in the output layer).
- activation: Activation function used (ReLU for the first dense layer, softmax for the output layer).
- layer_dropout(): Adds a dropout layer with a dropout rate of 0.5. This helps prevent overfitting by randomly setting a fraction of input units to zero during training.
Step 5: Compile and train the Model
Now we will Compile and train the Model.
R
model %>% compile(
optimizer = optimizer_adam(),
loss = 'categorical_crossentropy',
metrics = c('accuracy')
)
history <- model %>% fit(
x_train, y_train,
epochs = 10,
batch_size = 64,
validation_split = 0.2
)
Output:
Training the model- optimizer_adam(): Specifies the Adam optimizer, which is an adaptive learning rate optimization algorithm. Adam combines the advantages of two other popular optimizers: AdaGrad and RMSProp.
- 'categorical_crossentropy': This is the loss function used for multi-class classification problems where the labels are one-hot encoded. It measures how well the predicted probabilities match the actual distribution of the labels.
- c('accuracy'): Specifies the metric(s) to be evaluated during training and testing.
- x_train: The training data (images) used to train the model.
- y_train: The one-hot encoded labels corresponding to x_train.
- pochs = 10: Specifies the number of times the entire dataset will be passed through the model during training. In this case, the model will be trained for 10 epochs.
- batch_size = 64: Determines the number of samples processed before the model is updated. A batch size of 64 means that the model weights will be updated after processing 64 samples.
- validation_split = 0.2: The fraction of the training data to be used as validation data. In this case, 20% of the training data will be reserved for validation to monitor the model's performance on unseen data during training.
Step 6: Evaluate the Model
Now we will Evaluate the Model performance.
R
score <- model %>% evaluate(x_test, y_test)
print(score)
# Print evaluation results
cat('Test loss:', score$loss, '\n')
cat('Test accuracy:', score$accuracy, '\n')
Output:
$accuracy
[1] 0.9872
$loss
[1] 0.05498604
Test loss: 0.05498604
Test accuracy: 0.9872
- x_test: The test data (images) used to evaluate the model. This should be in the same format as the training data, typically a 4D array with shape (number of samples, height, width, channels).
- y_test: The one-hot encoded labels corresponding to x_test. This should be a 2D matrix with shape (number of samples, number of classes).
Step 7: Visualization of training and validation accuracy values
Now we will Visualize of training and validation accuracy values.
R
# Plot training & validation accuracy values
plot(history$metrics$accuracy, type = 'l', col = 'blue', ylim = c(0, 1), xlab = 'Epoch',
ylab = 'Accuracy', main = 'Model Accuracy')
lines(history$metrics$val_accuracy, type = 'l', col = 'red')
legend("bottomright", legend = c("Training Accuracy", "Validation Accuracy"),
col = c("blue", "red"), lty = 1)
# Plot training & validation loss values
plot(history$metrics$loss, type = 'l', col = 'blue',
ylim = c(0, max(history$metrics$loss, history$metrics$val_loss)), xlab = 'Epoch',
ylab = 'Loss', main = 'Model Loss')
lines(history$metrics$val_loss, type = 'l', col = 'red')
legend("topright", legend = c("Training Loss", "Validation Loss"),
col = c("blue", "red"), lty = 1)
Output:
model accuracyModel loss
Model lossConclusion
CNNs are powerful models for image recognition and classification tasks. With the `keras` package in R, you can build, train, and evaluate CNNs effectively. The above example demonstrates how to implement a CNN for classifying handwritten digits from the MNIST dataset. By experimenting with different architectures and hyperparameters, you can optimize your CNN model for various image-related tasks.
Similar Reads
Convolutional Neural Network (CNN) in Tensorflow Convolutional Neural Networks (CNNs) are used in the field of computer vision. There ability to automatically learn spatial hierarchies of features from images makes them the best choice for such tasks. In this article we will explore the basic building blocks of CNNs and show you how to implement a
4 min read
How do convolutional neural networks (CNNs) work? Convolutional Neural Networks (CNNs) have transformed computer vision by allowing machines to achieve unprecedented accuracy in tasks like image classification, object detection, and segmentation. CNNs, which originated with Yann LeCun's work in the late 1980s, are inspired by the human visual syste
7 min read
Math Behind Convolutional Neural Networks Convolutional Neural Networks (CNNs) are designed to process data that has a known grid-like topology, such as images (which can be seen as 2D grids of pixels). The key components of a CNN include convolutional layers, pooling layers, activation functions, and fully connected layers. Each of these c
7 min read
Emotion Detection Using Convolutional Neural Networks (CNNs) Emotion detection, also known as facial emotion recognition, is a fascinating field within the realm of artificial intelligence and computer vision. It involves the identification and interpretation of human emotions from facial expressions. Accurate emotion detection has numerous practical applicat
15+ min read
Convolutional Neural Network (CNN) in Machine Learning Convolutional Neural Networks (CNNs) are deep learning models designed to process data with a grid-like topology such as images. They are the foundation for most modern computer vision applications to detect features within visual data.Key Components of a Convolutional Neural NetworkConvolutional La
6 min read
Working of Convolutional Neural Network (CNN) in Tensorflow Convolutional Neural Networks (CNNs) are deep learning models particularly used for image processing tasks. In this article, weâll see how CNNs work using TensorFlow. To understand how Convolutional Neural Networks function it is important to break down the process into three core operations:Convolu
3 min read