BAPATLA ENGINEERING COLLEGE :: BAPATLA
SECOND SHIFT POLYTECHNIC
Department Of Computer Engineering
RECIPE GENERATION FROM
FOOD IMAGE
Under The Guidance Of : Presented By :
E . Divya Kanthi M . Rupa : 23273-CM-049
Head Of The Department P . Sai Chand : 23273-CM-063
(Assistant Professor) U . Ravi Chandra : 23273-CM-080
A . Revanth : 23273-CM-007
CONTENTS
Abstract
Objective Of The Project
Existing System
Drawbacks of the Existing System
Literature Review
Proposed System
Front-end Design
Conclusion
Abstract
1. The system is designed to automatically generate
recipes from images of food.
2. It takes a food image as input and produces the
ingredients and step-by- step cooking instructions.
3. It uses convolutional neural networks (CNNs) to
understand the image and Transform to generate the
recipe text.
4. This system can be useful for home cooks, diet
monitoring, and food identification.
Objective Of The Project
1. Develop a system to generate a full recipe from a food image.
2. Identify ingredients in the dish using image analysis.
3. Generate easy-to-follow cooking steps based on the dish.
4. Help users recognize and cook food without basic knowledge.
5. Simplify cooking and support diet tracking using AI.
E xisting System
1. Most recipe recommendation systems depend on text or
voice search, not images.
2. Some apps suggest recipes from typed ingredients, but not
visual input.
3. Few systems, like Pic2Recipe, explore image-based recipe
generation.
4. Real-world use and accuracy of such systems are still
limited.
Drawbacks Of the Existing System
1. Rely on text or voice input, not visual input.
2. Food recognition from images is often inaccurate.
3. No proper link between image analysis and recipe creation.
4. Few datasets connect images with full recipes.
5. Existing models (e.g.,Pic2Recipe) show low real-world
accuracy.
Literature Review
1. Inverse Cooking: uses transformers to predict ingredients and generate
recipes directly from food images.
2 . FIRE: applies an end-to-end transformer model to jointly predict
ingredients and cooking instructions.
3 . AdaMine: uses adaptive triplet mining to learn strong cross-modal
embedding for image-to-recipe retrieval.
4 . Vision-Language Cross-Modal Transformers: combine Vision
Transformers and BERT for advanced image-text learning.
5 . GPT-4V: is a multimodal version of GPT-4 that processes images and
generates fluent recipes using vision-language understanding.
Proposed System
1. The user uploads a food image.
2. A CNN model extracts important visual features from the
image.
3. A model predicts ingredients, and a Transformer generates the
recipe.
4. The system shows ingredients and step-by-step cooking
instructions.
Front-end Design
1. The user opens the app and sees the upload interface.
2. The user uploads a food image.
3. The image is sent to the backend.
4. The frontend waits for the response.
5. Predicted ingredients and recipe are displayed.
6. The user sees results with proper feedback (like loading or
error messages).
Conclusion
1.The system generates recipes from food images using
CNNs and transformers.
2. It predicts ingredients and provides step-by-step
cooking instructions.
3. Useful for home cooks, bloggers, and health-conscious
users.
4. Future improvements: regional foods, diet options,
voice/camera input, and multi-language support.