0% found this document useful (0 votes)
52 views18 pages

Image Captioning with RNN and CNN

Uploaded by

Rubak Daniel W
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views18 pages

Image Captioning with RNN and CNN

Uploaded by

Rubak Daniel W
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

IMAGE CAPTIONING BOT

Made By ~
Aman Bahuguna (18BCS2441)
Deepak Yadav (18BCS2446)
Gyan Ranjan Kumar (18BCS2431)
TABLE OF CONTENT

• Project Description
• Datasets
• Models: RNN + CNN
• Architecture details
• Evaluation problems
• Results
INTRODUCTION
What do you See in the Picture?
Well some of you
might say “A white
dog in a grassy area”,
some may say “White
dog with brown spots”
and yet some others
might say “A dog on
grass and some pink
flowers”.
But, can you write a computer program that takes an image as
input and produces a relevant caption as output?
APPLICATION OF IMAGE CAPTIONING

 Probably can be used in the applications where text is used mostly and with the
use of this we can infer a image in form of text.
 NLP is used extensively in the market now-a-days. For example, summarizing
or gaining insights from a large corpus of text. In the same way, we can use the
same concept to get insights from images as well.
 We can build a 360-degree metastore and make use of it in a wide variety of
business like making user searches more efficient on an e-commerce platform
based on metadata of products, other may be some other things like
recommendations and all.
 We can describe like what happen in a given video segment.
 Can be used to give something back to mankind for visually impaired people.
and many more.
DATASETS
 Flickr8k
 8000 images, each annotated with 5 sentences via AMT
Training Set — 6000 images
Dev Set — 1000 images
Test Set — 1000 images

• A child in a pink dress is climbing up a set of stairs in an


entry way
• A girl going into a wooden building .
• A little girl climbing into a wooden playhouse .
• A little girl climbing the stairs to her playhouse .
• A little girl in a pink dress going into a wooden cabin .
Keras is an open-source software library that provides
a Python interface for artificial neural networks. Keras acts as an
interface for the TensorFlow library.
MODELS: RNN + CNN

How to combine image and and sentence


RNN + CNN:
• Encoder-decoder model
• Multimodal layer
Encoder-decoder model: image caption
Multimodal Layer
ENCODER-DECODER MODEL:
MODEL ARCHITECTURE
ARCHITECTURE DETAILS:
WORD EMBEDDINGS

To encode the words in form of vector using Gloves


FULL MODEL DETAILS
OUTPUT GENERATIONS
OUTPUT
THANK YOU

You might also like