Convolutional Variational Autoencoder in Tensorflow
Last Updated : 24 Apr, 2025
Comments
Improve
Suggest changes
2 Likes
Like
Report
In the age of Generative AI, the creation of generative models is very crucial for learning and synthesizing complex data distributions within the dataset. By incorporating convolutional layers with Variational Autoencoders, we can create a such kind of generative model. In this article, we will discuss about CVAE and implement it.
Convolutional Variational Autoencoder
A generative model which combines the strengths of convolutional neural networks and variational autoencoders. Variational Autoencoder (VAE) works as an unsupervised learning algorithm that can learn a latent representation of data by encoding it into a probabilistic distribution and then reconstructing back using the convolutional layers which enables the model to generate new, similar data points. The key working principles of a CVAE include the incorporation of convolutional layers, which are adept at capturing spatial hierarchies within data, making them particularly well-suited for image-related tasks. Additionally, CVAEs utilize variational inference, introducing probabilistic elements to the encoding-decoding process. Instead of producing a fixed latent representation, a CVAE generates a probability distribution in the latent space, enabling the model to learn not just a single deterministic representation but a range of possible representations for each input. Some of the key working principles are discussed below:
Convolutional Layers: CVAE leverages the power of convolutional layers to efficiently capture spatial hierarchies and local patterns within images which enables the model to recognize features at different scales, providing a robust representation of the input data.
Variational Inference: The introduction of variational inference allows CVAE to capture uncertainty in the latent space to generate a probability distribution rather than producing a single deterministic latent representation, providing a richer understanding of the data distribution and enabling the model to explore diverse latent spaces.
Reparameterization Trick: It involves sampling from the learned latent distribution during the training process, enabling the model to backpropagate gradients effectively.
Convolutional Variational Autoencoder in Tensorflow
Import required libraries
At first, we will import all required Python libraries like NumPy, Matplotlib, TensorFlow, Keras etc. We will disable the eager execution in TensorFlow to accommodate certain operations that are executed outside the TensorFlow runtime. The TensorFlow backend session is cleared to ensure a clean slate.
Now we will load the famous MNIST dataset and then change their datatypes to float 32. After that, we will reshape every image of the dataset in a fixed image shape (28,28,1). Then, we will define a small function (get_images_1_to_10) to select any 10 images with 0 to 9 labelling from the dataset.
Python3
# import minist dataset(x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data()image_shape=(28,28,1)latent_dim=2# change datatype and reshape datax_train=x_train.astype('float32')/255.x_train=x_train.reshape((x_train.shape[0],)+image_shape)x_test=x_test.astype('float32')/255.x_test=x_test.reshape((x_test.shape[0],)+image_shape)
Fetch each digit images
Python3
# function to fetch 10 images of label 0 to 9defget_images_1_to_10(x_train,y_train):selected_x,selected_y=[],[]foriinrange(10):number_index=np.where(y_train==i)[0]random_index=np.random.choice(len(number_index),1,replace=False)select_index=number_index[random_index]selected_x.append(x_train[select_index[0]])selected_y.append(y_train[select_index][0])returnnp.array(selected_x,dtype="float32").reshape((len(selected_x),)+image_shape),np.array(selected_y,dtype="float32")# select random 10 image of labeled 0 to 9selected_x,selected_y=get_images_1_to_10(x_train,y_train)
As we know Autoencoders used to have three layers which are Encoding layer, bottleneck or latent space layer and decoder or output layer. In Convolutional VAE model, the encoder and decodes has two-dimensional Convolutional layers with variational layer length. So, let's make the model layer by layer so we can gain better insights about the model structure.
Encoder Layer
This will consist of one input layer followed by two Convolutional layers and Dense layers.
Bottleneck or Latent Layer: This is the most important layer of any type of autoencoders. Here we will define the distribution function and pass it by a Dense layer.
Python3
# sampling function for latent layerdefsampling(args):z_mu,z_log_sigma=args# epsilon in simple normal distributionepsilon=k.random_normal(shape=(k.shape(z_mu)[0],latent_dim),mean=0.,stddev=1.)returnz_mu+k.exp(z_log_sigma)*epsilonz=tf.keras.layers.Lambda(sampling,output_shape=(latent_dim,))([z_mu,z_log_sigma])
Decoder Layer
We will first define reshaping layer before transposing the convolutional layers. There are four deconvolutional layers (just transpose of Convolutional) in consideration. However, you can define most dense model for better results but adding each layer will gradually increase the model's complexity and execution time. Also, you need more machine resources for more complex models.
Generally, VAE models not used to be evaluated on tradition loss functions. Most of the time there are custom and complex loss functions. However, here will use a simple custom loss function by incorporating reconstruction loss and KL loss.
We are using binary cross entropy as the reconstruction loss.
keras.metrics.binary_crossentropy is applied to the flatten input data x and the flattened reconstructed output z_decoded.
KL Divergence Loss calculate how probability distribution diverges from a second.
The formula used in the code involves mean and log-variance of the latent distribution.
The total loss is the sum of the reconstructed loss and the KL divergence loss.
Let's train our model with batch size 64 and four epochs. It is highly recommended to increase both on them for better results.
We will input data is used as both the input x and target y for training, this is a common in autoencoder setups where the goal is to reconstruct the input as the primary objective is to learn a compact representation of the input data also referred as latent space.
By training the autoencoder with the same input and target data, the model learns to encode the relevant features of the input and decode them to reconstruct the original data. This is particularly useful for tasks where capturing the inherent structure and patterns in the data is the primary goal.
Artificial Digit image Generations
Model training in completed. But evaluating CVAE models are slightly different. In general case, there are various model metrics to evaluate models. But they are all are useless for CVAE. The only method of evaluation is putting original and predicted or reconstructed images side by side and visualize how clear and similar they are. However, in our custom VAE loss function the loss in low as VAE is calculated over 1000 scale. So, let keep the values aside and visualize how much clear the reconstructed images are.
Python3
# select random 10 image of labeled 0 to 9 from test datasettest_x,test_y=get_images_1_to_10(x_test,y_test)gen_x=cvae.predict(test_x)plot_image(test_x,test_y,title="OrigialuniqueTestDigits")plot_image(gen_x,test_y,title="ArtificialGeneratedDigits")
Output:
Original vs Generated Images
So, from the output we can say the model prediction is well as there is very slight difference with original images. However, for more accurate predictions we need to go for more epochs and advanced loss reduction techniques.
Benefits of using Convolutional Variational Autoencoder (CVAE)
Some of the advantages of using CVAE is listed below:
Image Generation: CVAE excels in generating realistic and diverse images by learning a probabilistic representation of the data distribution. It can create novel samples which retain the inherent characteristics of the training dataset.
Latent Space Exploration: The probabilistic nature of the latent space in CVAE allows for smooth and continuous transitions between different data points which enables meaningful exploration of the latent space, facilitating interpolation between diverse samples.
Uncertainty Modeling: Unlike deterministic autoencoders, CVAEs provide a measure of uncertainty in their predictions which makes them particularly valuable in applications where understanding the model's confidence is crucial like medical image analysis or anomaly detection.
Robustness to Variations: CVAE is robust to variations in input data, making them suitable for tasks involving data with inherent variability like facial recognition under different lighting conditions or style transfer in images.
Conclusion
We can conclude that our CVAE model is well structed but for real-world complex datasets it is required to perform more complex and advance layering of the model and go for at least 20 epochs for better results.
We use cookies to ensure you have the best browsing experience on our website. By using our site, you
acknowledge that you have read and understood our
Cookie Policy &
Privacy Policy
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.