0% found this document useful (0 votes)
10 views

Liceria & Co.

The document describes a media caption generator that uses deep learning techniques to automatically generate descriptive captions for images and videos. It discusses the objectives, architecture, methodology using CNN and LSTM algorithms, and results of the image caption generation system.

Uploaded by

Gamer boy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Liceria & Co.

The document describes a media caption generator that uses deep learning techniques to automatically generate descriptive captions for images and videos. It discusses the objectives, architecture, methodology using CNN and LSTM algorithms, and results of the image caption generation system.

Uploaded by

Gamer boy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MEDIA CAPTION

GENERATOR
USING DEEP
LEARNING
TECHNIQUES RAHUL N MANESH
SARANG C SANTHOSH
SHYAMKUMAR S
SREEHARI E S
CONTENTS
INTRODUCTION
OBJECTIVES
ARCHITECTURE
METHODOLOGY
ALGORITHMS USED
MODEL
FLOWCHART
RESULTS
TASK IDENTIFICATION & ALLOCATION
CONCLUSION
INTRODUCTION
In modern communication, images and videos are key tools for conveying messages and
narratives effectively.

Descriptive captions greatly improve accessibility, aiding visually impaired individuals and
diverse audiences.

The media caption generator uses advanced tech like deep learning to craft contextually
fitting captions automatically.

Captions are presented as text overlays and converted into audio files for a comprehensive
accessibility approach.
OBJECTIVES
Detailed Image Descriptions

Spatial Awareness

Object Recognition

Affordability and Accessibility

Emphasis on Audio Cues


LITERATURE SURVEY
https://www.freecodecamp.org/news/building-an-image-caption-generator-with-deep-
learning-in-tensorflow-a142722e9b1f/
ARCHITECTURE
METHODOLOGY
MODULE 1: DATA COLLECTION AND PREPROCESSING -

Data cleaning-Replace ‘-’ with ‘ ‘

MODULE 2: FEATURE EXTRACTION -

Getting the best features from the images by selecting and combining variables into features,
thus, effectively reducing the amount of data i.e., converting fixed-length informative vector for
each image using CNN.

MODULE 3: LOADING THE TRAINING SET AND DATA GENERATOR MODEL -

To train the model, we will be using the 6000+ training images by generating the input and
output sequences in batches.
MODULE 4: TESTING THE MODEL AND EVALUATING -

The model will be trained and predictions are generated. The evaluation of image captioning
models is performed using metrics such as BLEU

MODULE 5: TEXT TO VOICE -

Finally, the generated captions are converted to voice using Python library (gTTS - Google Text-
To-Speech)
ALGORITHMS USED
1.CNN(Convolutional Neural Network) -
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can
take in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other.

2.LSTM(Long Short-Term Memory) -


Long Short-Term Memory (LSTM) networks are a type of recurrent neural network
capable of learning order dependence in sequence prediction problems. This is a
behavior required in complex problem domains like machine translation, speech
recognition, and more.
MODEL
FLOW CHART
RESULTS
TASK IDENTIFICATION & ALLOCATION

SARANG C SANTHOSH BACKEND DEVELOPMENT

SREEHARI E S FRONTEND DEVELOPMENT

SHYAMKUMAR S QA AND DOCUMENTATION

RAHUL N MANESH QA AND DOCUMENTATION


CONCLUSION

Real-Time Accessibility

Continuous Improvement

Fostering Independence
THANK YOU

You might also like