Research Project

The document outlines a web developer's project focused on integrating generative AI for image analysis, including automatic image descriptions and inappropriate content detection. The developer plans to create a functional model while also preparing a backup video presentation if needed, with a structured timeline and risk mitigation strategies. Feedback from peers highlights the project's relevance and encourages further exploration of ethical implications and community engagement.

Uploaded by

bigflyspam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Research Project

Uploaded by

bigflyspam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

TOPIC

Generative AI. I know AI is all the hype right now, but I have a practical
reason to want to learn more about it. As a web developer, I sometimes
build features that allow users to upload images. I would like to explore
how AI can help me to automatically get a description of the image
uploaded, give it appropriate tags, and even warn the webmaster if an
inappropriate image is being uploaded.

FORMAT
I will strive to code a functional model, but I fear that it might be too time
consuming. If I can't finish it, I'll default to a 5 minutes video showcasing
my findings and the code I have so far.

EXPECTED LEARNINGS
The project aims to accomplish a comprehensive comparison of various
generative AI models available for different types of data generation.
Through this project, I hope to gain a deeper understanding of the
strengths, weaknesses, and applications of different generative AI models.
Additionally, I aim to explore the feasibility of integrating generative AI for
image analysis and business analytics in web development.

WEEKLY DELIVERABLES
Week 8: Obtain feedback and adjust the research project accordingly
Week 10: Start learning about integrating open source AI models to a
website
Week 11: Start coding the functional model
Week 12: Polish and submit the functional model (web app)

RISKS
1. Risk: Complexity and time-consuming development.
o Mitigation Plan: Break down tasks into smaller
components, allocate sufficient learning time, seek
guidance from experts, and have a backup plan for a
shorter video presentation if needed.
2. Risk: Insufficient knowledge or skills.
o Mitigation Plan: Dedicate time to self-learning, leverage
online resources, engage in online communities, and
allocate time for hands-on practice.
3. Risk: Technical difficulties and bugs.
o Mitigation Plan: Follow best practices in software
development, conduct systematic testing, seek
feedback, and actively address any issues or errors.

FEEDBACK

Aaron Welles
I really like your project idea about Generative AI! It's cool that you're trying to
link it with your web development work, that seems like a useful thing to be able
to do. Your plan to code a functional model sounds challenging but interesting. I
admire your optimism in taking on such a task, but I think it's smart to have a
backup plan of a video presentation.
The learning goals you've outlined seem really comprehensive and your week-by-
week plan gives you a clear path to follow. I notice that you've thought about
risks as well, which is great. Just a simple suggestion, maybe you could
consider having some check-in points to see how you're progressing
with your plans? It might help you stay on track or make any
adjustments if you need to. All the best with your project, I'm looking forward
to seeing how it turns out!

Felipe Goncalves
It's great to see that you have a practical reason for wanting to explore this field
as a web developer dealing with image uploads. Your idea of using AI to
automatically generate image descriptions, tags, and flag inappropriate content
is seriously cool!
I know coding a functional model might take some time, but your backup plan
with a video presentation is a smart move. Stay flexible and adapt as needed!
Your expected learnings, like understanding the strengths and weaknesses of
different generative AI models, are super valuable. It'll definitely help you
integrate AI into your web development projects. Your weekly deliverables are
well-planned, and I like that you've set aside time for feedback and adjustments.
You've also got solid mitigation plans for potential risks, like breaking tasks down
and seeking guidance. Consider thinking about the ethical implications of
using generative AI for image analysis. Also, stay updated on current
events and advancements in the field. It'll add an extra edge to your
research.
Your project is well-structured and shows you've got a clear vision. Good luck,
and I'm excited to see your progress!

Akinlolu Emmanuel Aluko

Hi Julien, Generative Ai has indeed been a blessing and will I say a burden to our
lives this day. My team and I almost presented the same topic but had to change
it at the last minute. From the look of things, am very sure, it is going to be an
interesting project, and I will love to learn from it.

From this feedback, I started looking for communities on Discord and other social
media platforms that are interested in AI.

EXPLORATION (Week 10)

This week I've been researching Image to Text AI libraries.

After quite a lot of research, I realized it was hard to find free resources.
After a while, I found Tesseract.js which can read text on images and
produce text from it, which can be really useful to extract text from
images that you would normally not be able to copy paste.

Tesseract.js Library:
https://github.com/naptha/tesseract.js/blob/master/docs/intro.md

Build A Javascript OCR App Tutorial (3 years old video, most of the code
was deprecated): https://www.youtube.com/watch?v=a1I3tcALTlc

I followed thoroughly the documentation on their GitHub and was able to

make a rudimentary local website.
From this, I learned that the technology used is called OCR for Optical
Character Recognition. It is a technology that converts printed or
handwritten text into machine-readable text. It involves preprocessing the
image, identifying text regions, segmenting characters, extracting
features, recognizing characters using algorithms or models, and refining
the results. It enables digitizing printed documents, automating data
entry, supporting text search, and enhancing accessibility.
The general process of OCR can be broken down into the following steps:
1. Preprocessing: The input image is preprocessed to enhance its
quality and make it easier to extract the text. This may involve
operations such as noise removal, contrast adjustment, and
image binarization (converting the image to black and white).
2. Text Localization: The OCR system identifies the regions in the
image that contain text. This can be done using techniques like
edge detection, contour analysis, or connected component
analysis.
3. Character Segmentation: In this step, individual characters are
separated from each other. If the text is handwritten or cursive,
this can be a challenging task due to characters being connected.
Segmentation techniques like projection profiles, contour
analysis, or neural networks are applied to separate characters.
4. Feature Extraction: Once the characters are isolated, their
features are extracted to represent them in a numerical form.
These features may include aspects such as stroke width, shape,
or the presence of specific lines or curves.
5. Character Recognition: The extracted features are compared
with a pre-trained database of characters or a statistical model.
This comparison is used to determine the most likely character
for each extracted feature set. Machine learning algorithms like
neural networks, hidden Markov models (HMM), or support vector
machines (SVM) are commonly used for character recognition.
6. Post-processing: After character recognition, additional
techniques are applied to refine the results. This may involve
error correction, spell-checking, and context-based analysis to
improve accuracy.
Tesseract.js is the one of the many libraries that use OCR. It's 17mb on
development, but I don't know how big it will be on development (17mb is
rather large for a website).

EXPLORATION (Week 11)

This week I got interested in TensorFlow.js, but spoiler alert, it was way
above my level.
TensorFlow Official Tutorial: https://www.tensorflow.org/js/tutorials
TensorFlow.js Quick Start (by Fireship, but it’s 5 years old and uses
Angular): https://www.youtube.com/watch?v=Y_XM3Bu-4yc

TensorFlow is a popular open-source library used for machine learning and

deep learning tasks. In simple terms, it helps computers learn from data
and make predictions or perform tasks automatically. It's like a toolkit that
provides various tools and functions to build and train different types of
machine learning models. The problem is, you need a quite extensive
knowledge of machine learning to understand this library, and I do not
have that yet.
TensorFlow uses a high-level API called Keras, which is integrated into its
framework. Keras simplifies the process of building neural networks (you
know? That thing we’ve all seen in YouTube videos but still don’t
understand what it is) by providing a user-friendly and intuitive interface.
It allows developers to define, configure, and train their deep learning
models using just a few lines of code. TensorFlow incorporates Keras as its
default high-level API, making it easier to create and train neural networks.
TensorFlow.js is a version of TensorFlow specifically designed to run
machine learning models in web browsers or on Node.js. It allows
developers to bring the power of TensorFlow to web applications, enabling
the execution of machine learning tasks directly in the browser without the
need for server-side computation. It is quite interesting, because so far, I
thought that we need a big computer to run those kinds of models. To be
honest, it can only run small neural networks, but it is still enough to do a
wide range of tasks like image classification, speech recognition, object
detection, etc.