Convolutional Neural Networks (CNNs) in R

Last Updated : 23 Jul, 2025

Convolutional Neural Networks (CNNs) are a specialized type of neural network designed to process and analyze visual data. They are particularly effective for tasks involving image recognition and classification due to their ability to automatically and adaptively learn spatial hierarchies of features.

CNN Architecture Components

Here are the main CNN Architecture Components.

Convolutional Layers: The core building blocks of CNNs, convolutional layers apply a set of filters (or kernels) to the input image. These filters slide over the image, performing element-wise multiplication with the input data and summing the results to produce a feature map. This process helps in capturing local patterns and textures in the image.
Activation Functions: After convolution, activation functions such as ReLU (Rectified Linear Unit) are applied to introduce non-linearity into the model. This helps the network learn more complex patterns.
Pooling Layers: Pooling (or subsampling) layers reduce the dimensionality of the feature maps, retaining the most important information while discarding less significant details. Max pooling, which selects the maximum value from a set of values, is a common technique.
Fully Connected Layers: After several convolutional and pooling layers, the high-level features are flattened and passed through fully connected layers (dense layers). These layers combine the features to make predictions or classifications.
Dropout Layers: Dropout is a regularization technique that helps prevent overfitting by randomly dropping units during training, which forces the network to learn more robust features.
Output Layer: The final layer produces the output, such as class scores in a classification task or bounding box coordinates in object detection.

Applications of CNNs

Image Recognition and Classification: CNNs are widely used for identifying and categorizing objects within images. For instance, they can classify images into different categories such as animals, vehicles, or scenes.
Object Detection: CNNs can be employed to detect objects within an image and determine their locations. This is useful for applications like autonomous driving, where identifying and locating objects is crucial.
Image Segmentation: In segmentation tasks, CNNs divide an image into segments and classify each segment. This is used in medical imaging for segmenting organs or tumors, or in autonomous vehicles for distinguishing road lanes and obstacles.
Facial Recognition: CNNs can recognize and verify individuals based on facial features, which is used in security systems and social media applications.
Style Transfer: CNNs can apply artistic styles from one image to another, creating visually appealing effects by learning the style features from a reference image and transferring them to a target image.
Generative Models: CNNs are used in generative models like Generative Adversarial Networks (GANs) to create new images or data that mimic the distribution of a training dataset.

To set up Convolutional Neural Networks (CNNs) in R Programming Language you'll need to install and configure the necessary packages and preprocess your datasets.

Step 1: Installing the required packages

You need to install the keras and tensorflow packages, which provide the tools for building and training CNN models in R. These packages interface with TensorFlow, a popular deep learning framework.

# Install the Keras package from CRAN
install.packages("keras")

# Load the Keras library
library(keras)

# Install TensorFlow and Keras Python packages
# This command installs TensorFlow and Keras for Python and sets up the necessary environment
install_keras()

# Check TensorFlow version
library(tensorflow)
tf$constant("Hello TensorFlow!")

Output:

tf.Tensor(b'Hello TensorFlow!', shape=(), dtype=string)

Step 2: Loading and Preprocessing Datasets

For demonstration purposes, we’ll use the MNIST dataset, a classic dataset for image classification tasks. MNIST consists of handwritten digits (0-9) and is commonly used for benchmarking image processing models.

# Load the keras library
library(keras)

# Load the MNIST dataset
mnist <- dataset_mnist()

# Split into training and testing datasets
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

dataset_mnist(): This function loads the MNIST dataset, a classic dataset used for training image processing systems.
mnist: This variable now contains the MNIST dataset, which includes both the training and test datasets. It is structured as a list with training data and test data, including images and labels.
x_train: Contains the images for training. It is a 3D array with dimensions (number of training samples, image height, image width). Each image is a grayscale image of size 28x28 pixels.
y_train: Contains the labels for the training images. It is a vector of integers representing the digit each image corresponds to (0-9).
x_test: Contains the images for testing. It has the same structure as x_train, but for the test dataset.
y_test: Contains the labels for the test images, similar to y_train but for the test dataset.

Preprocess the Data

Reshape and Normalize Images.

# Reshape the images to (28, 28, 1) and normalize pixel values to the range [0, 1]
x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1))

x_train <- x_train / 255
x_test <- x_test / 255

array_reshape(): This function reshapes the data to the specified dimensions. It is used here to adjust the shape of the images for compatibility with the CNN model.
x_train and x_test: Initially, these are 3D arrays with dimensions (number of samples, image height, image width). MNIST images are 28x28 pixels in size.
nrow(x_train): Number of training samples.
28, 28: Image dimensions (height and width).
1: Number of channels (grayscale images have one channel).
Normalization: Pixel values in images typically range from 0 to 255. Normalizing these values to the range [0, 1] helps in improving the convergence and performance of the neural network.
Division by 255: Each pixel value is divided by 255, which scales the pixel values from the original range [0, 255] to the new range [0, 1].

Convert Labels to One-Hot Encoding

one_hot_encode <- function(labels, num_classes) {
  # Create a matrix of zeros
  encoded_labels <- matrix(0, nrow = length(labels), ncol = num_classes)
  
  # Set the appropriate index to 1 for each label
  for (i in seq_along(labels)) {
    encoded_labels[i, labels[i] + 1] <- 1
  }
  
  return(encoded_labels)
}

# Apply the custom one-hot encoding function
y_train <- one_hot_encode(y_train, 10)
y_test <- one_hot_encode(y_test, 10)

one_hot_encode(labels, num_classes): defined to convert class labels into a one-hot encoded matrix.
labels: A vector of class labels, where each label is an integer indicating the class (e.g., 0, 1, 2, ..., 9).
num_classes: The total number of classes (e.g., 10 for the MNIST dataset).
encoded_labels: A matrix initialized with zeros, where the number of rows equals the number of labels, and the number of columns equals the number of classes.
encoded_labels[i, labels[i] + 1] <- 1: For each label, the matrix is updated to set the corresponding index to 1. The +1 is necessary because R uses 1-based indexing, while the labels are 0-based.
y_train: The training labels are converted to a one-hot encoded matrix with 10 columns (one for each class).
y_test: The test labels are similarly converted to a one-hot encoded matrix.

Step 3: Summary of the Dataset

Now we will check the Summary of the Dataset.

# Check dimensions and type of data
str(x_train)
str(y_train)

Output:

num [1:60000, 1:28, 1:28, 1] 0 0 0 0 0 0 0 0 0 0 ...
num [1:60000, 1:10] 0 1 0 0 0 0 0 0 0 0 ...

str(x_train): Displays the structure of the x_train object. For an image dataset like MNIST, it will show that x_train is a 4D array with dimensions (number of samples, image height, image width, number of channels). It will also display the type of data contained in the array.
str(y_train): Displays the structure of the y_train object. For one-hot encoded labels, it will show that y_train is a matrix with dimensions (number of samples, number of classes), where each row contains a one-hot encoded vector.

Step 4: Building a CNN Model

Now we will Building a CNN Model.

# Initialize the model
model <- keras_model_sequential() 

# Add convolutional layers
model %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = 'relu', 
                input_shape = c(28, 28, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  
  # Flatten the output from convolutional layers
  layer_flatten() %>%
  
  # Add fully connected layers
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = 10, activation = 'softmax')  # Output layer for 10 classes

keras_model_sequential(): Initializes a sequential model, which is a linear stack of layers where you add layers one by one.
layer_conv_2d(): Adds a 2D convolutional layer.
filters: Number of filters (32, 64, 128) used to detect features.
kernel_size: Size of the convolutional kernel (3x3).
activation: Activation function used (ReLU in this case).
input_shape: Shape of the input data for the first layer (28x28 pixels, 1 channel).
layer_max_pooling_2d(): Adds a max pooling layer to reduce the spatial dimensions (height and width) of the feature maps.
pool_size: Size of the pooling window (2x2).
layer_flatten(): Flattens the 3D output from the convolutional layers into a 1D vector. This is necessary before passing the data to fully connected (dense) layers.
layer_dense(): Adds a fully connected (dense) layer.
units: Number of neurons in the layer (128 in the first dense layer, 10 in the output layer).
activation: Activation function used (ReLU for the first dense layer, softmax for the output layer).
layer_dropout(): Adds a dropout layer with a dropout rate of 0.5. This helps prevent overfitting by randomly setting a fraction of input units to zero during training.

Step 5: Compile and train the Model

Now we will Compile and train the Model.

model %>% compile(
  optimizer = optimizer_adam(),
  loss = 'categorical_crossentropy',
  metrics = c('accuracy')
)

history <- model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 64,
  validation_split = 0.2
)

Output:

optimizer_adam(): Specifies the Adam optimizer, which is an adaptive learning rate optimization algorithm. Adam combines the advantages of two other popular optimizers: AdaGrad and RMSProp.
'categorical_crossentropy': This is the loss function used for multi-class classification problems where the labels are one-hot encoded. It measures how well the predicted probabilities match the actual distribution of the labels.
c('accuracy'): Specifies the metric(s) to be evaluated during training and testing.
x_train: The training data (images) used to train the model.
y_train: The one-hot encoded labels corresponding to x_train.
pochs = 10: Specifies the number of times the entire dataset will be passed through the model during training. In this case, the model will be trained for 10 epochs.
batch_size = 64: Determines the number of samples processed before the model is updated. A batch size of 64 means that the model weights will be updated after processing 64 samples.
validation_split = 0.2: The fraction of the training data to be used as validation data. In this case, 20% of the training data will be reserved for validation to monitor the model's performance on unseen data during training.

Step 6: Evaluate the Model

Now we will Evaluate the Model performance.

score <- model %>% evaluate(x_test, y_test)
print(score)
# Print evaluation results
cat('Test loss:', score$loss, '\n')
cat('Test accuracy:', score$accuracy, '\n')

Output:

$accuracy
[1] 0.9872

$loss
[1] 0.05498604

Test loss: 0.05498604
Test accuracy: 0.9872

x_test: The test data (images) used to evaluate the model. This should be in the same format as the training data, typically a 4D array with shape (number of samples, height, width, channels).
y_test: The one-hot encoded labels corresponding to x_test. This should be a 2D matrix with shape (number of samples, number of classes).

Step 7: Visualization of training and validation accuracy values

Now we will Visualize of training and validation accuracy values.

# Plot training & validation accuracy values
plot(history$metrics$accuracy, type = 'l', col = 'blue', ylim = c(0, 1), xlab = 'Epoch', 
     ylab = 'Accuracy', main = 'Model Accuracy')
lines(history$metrics$val_accuracy, type = 'l', col = 'red')
legend("bottomright", legend = c("Training Accuracy", "Validation Accuracy"), 
       col = c("blue", "red"), lty = 1)

# Plot training & validation loss values
plot(history$metrics$loss, type = 'l', col = 'blue', 
     ylim = c(0, max(history$metrics$loss, history$metrics$val_loss)), xlab = 'Epoch', 
     ylab = 'Loss', main = 'Model Loss')
lines(history$metrics$val_loss, type = 'l', col = 'red')
legend("topright", legend = c("Training Loss", "Validation Loss"), 
       col = c("blue", "red"), lty = 1)

Output:

Model loss

Conclusion

CNNs are powerful models for image recognition and classification tasks. With the `keras` package in R, you can build, train, and evaluate CNNs effectively. The above example demonstrates how to implement a CNN for classifying handwritten digits from the MNIST dataset. By experimenting with different architectures and hyperparameters, you can optimize your CNN model for various image-related tasks.

Math Behind Convolutional Neural Networks

arun_2810

Improve

Article Tags :

Convolutional Neural Networks (CNNs) in R

CNN Architecture Components

Applications of CNNs

Step 1: Installing the required packages

Step 2: Loading and Preprocessing Datasets

Preprocess the Data

Convert Labels to One-Hot Encoding

Step 3: Summary of the Dataset

Step 4: Building a CNN Model

Step 5: Compile and train the Model

Step 6: Evaluate the Model

Step 7: Visualization of training and validation accuracy values

Model loss

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?