Help needed: I am planning to build a question bank system using an LLM (Large Language Model). The idea is to upload photos of students’ practice exercises into the system. However, I’ve encountered a problem: in subjects like math, where questions often include geometric diagrams or other images, I don’t know how to extract information from these images into the database.
I would like to know whether it’s possible to use LLMs to process these images. Ideally, I want to keep the original images so that they can be included later when generating practice materials (such as PDFs or other formats).
I don’t have a programming background and plan to rely entirely on LLMs to assist with the development. I hope to receive some advice or guidance on how to proceed.
@sam_hong
with latest gemini 2.5 models you can upload the images of the questions and ask gemini to get the required text out of it and you can also ask gemini to give you a bounding box aroudn the “figure” in the question and using the bounding box co-ordinates you can snip the images out ofhte larger image and store them accordingly
Hi @sam_hong,
Your solution is a two step approach.
Utilize Gemini for Test & Image Processing:
Gemini’s multimodal capabilities allow it to analyze both text and images. To process math questions with images, you can upload the images to a platform like Google Cloud Storage or a similar service. Then, use Gemini’s API to send the image URLs along with a prompt instructing the model to extract and interpret the math question from the image.
You can use the Gemini Vision model, which is designed for image prompting and can interpret diagrams and charts embedded in images.
Store Extracted Data in a Database
Once Gemini processes the images and extracts the math questions, the next step is to store this information in a database. You can use platforms like Google Firestore or Airtable, which offer user-friendly interfaces suitable for non-coders. Design your database schema to include fields such as the question text, answer options, correct answer, and any associated images. This structured storage allows for easy retrieval and management of your question bank.
I hope these instructions helps in developing your project. 