SMART INDIA HACKATHON 2024
TITLE PAGE
• Problem Statement ID – SIH1604
• Problem Statement Title - Conversational
image recognition chatbot
• Theme- Smart Automation
• PS Category- Software
• Team ID-
• Team Omega
Team
Omega IMAGE RECOGNITION CHATBOT
• This app includes a chatbot with sections like
cooking,studying,travelling etc., where users can select
their topic.
• Users can upload pictures and the chatbot analyzes
them to answer questions asked by user.
• It provides personalized responses based on user data
using deep learning.
• The chatbot can be controlled with voice commands.
• With a single power button click users can take a picture,
select the section and ask the questions in voice and it
gives appropriate answers in voice.
• The chatbot delivers responses in both voice and text in
the user's native language.
• It can convert uploaded images into videos for multiple
purposes.
• Google's APIs is used to ensure accurate and up to
date information.
@SIH Idea submission- Template 2
Team
Omega TECHNICAL APPROACH
• Use Flask for the app interface and node.js Technologies:
Frontend: Flask
for backend part.
Backend: Node.js
• Integrate Dialog Flow for chatbot Chatbot: Dialog Flow
conversations. Image Processing: COCO dataset,YOLO model
• Use pre trained data sets like COCO, pre` Voice: Google Speech-to-Text, Text-to-Speech
trained models like YOLO to analyze Multilingual: Google Translate API
Video Conversion: FFmpeg
uploaded images.
Cloud: Google Cloud with Kubernetes
• Use Google Speech to Text for voice input
and Text to Speech for responses.
• Fetch real-time info using Google APIs.
• Translate responses using Google
Translate API.
• Convert images into videos with FFmpeg.
@SIH Idea submission- Template 3
Team
Omega FEASIBILITY AND VIABILITY
Feasibility and viability : Challenges and Risks :
• Combining NLP and image recognition is achievable • The chatbot may have trouble recognizing
using current deep learning models and frameworks. images due to quality and lighting differences
• Significant training data, AI expertise, and cloud and also needs to protect user data.
resources are needed for development and • It depends on external services like Google
integration. which might fail or change unexpectedly.
• Development and maintenance costs include cloud • Translating multiple languages can be hard
infrastructure, model training, and updates. and it might not always have the right
• High potential for applications in industries like information for every image.
e-commerce and healthcare.
• Cost of Scaling: As usage scales, cloud infrastructure Strategies to overcome:
and computing power expenses may rise significantly. • Improve image recognition by regularly
• Risks include technological advancements and upgrading the system and training it with a
ensuring user adoption. wider variety of images.
• The market for chatbots is competitive, but few • Safeguard user data with advanced encryption
solutions integrate image recognition, offering a unique and security practices.
value proposition.
@SIH Idea submission- Template 4
Team IMPACT AND BENEFITS
Omega
Positive impact : Benefits :
• Enhance learning with quick help for school tasks. • Empowers individuals with disabilities or language
• Improve travel experiences by easily identifying barriers.
landmarks and translating signs. • Allows users to instantly access information 24/7.
• Make informed decisions with personalized • Opens new markets with innovative services like
insights for products and equipment without virtual assistance in retail or healthcare, creating
external assistance. new revenue streams.
• Save time and improve daily life with efficient • Users benefit from increased productivity as the
solutions for various tasks. chatbot offers quick accurate answers allowing
Negative impact: them to resolve issues.
• Automation may reduce the need for human
customer service roles, leading to job losses.
• Over reliance on automated systems could
reduce human interaction and adaptability.
@SIH Idea submission- Template 5
Team
Omega
RESEARCH AND REFERENCES
References
[1] Stanford CS224N Custom Project\. report015.pdf (stanford.edu)
[2] Siri. Siri - Apple.
[3] Google assistant. Google Assistant, your own personal Google.
[4 Visual dialog . [1611.08669] Visual Dialog (arxiv.org)
[5] Microsoft COCO: Common Objects in Context. COCO - Common Objects in Context (cocodataset.org)
[6] Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003). A neural probabilistic language model.
Journal of machine learning research, 3(Feb):1137–1155.for neural dialogue generation. arXiv preprint
arXiv:1701.06547.
[7] Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162
[8] maskrcnn-benchmark https://github.com/facebookresearch/maskrcnn-benchmark
[9] arXiv:1805.08318
[10] NLTK. http://www.nltk.org/.
[11] Torch. Torch | Scientific computing for LuaJIT.
[12] Django. The web framework for perfectionists with deadlines | Django (djangoproject.com)
@SIH Idea submission- Template 6