Document Image Analysis
Document Image Analysis
Domain Banking
Project Difficulties level Basic
Problem Statement:
Documents are an important aspect of many enterprises in a variety of sectors,
including law, finance, and technology. Automatic document understanding, such as
invoices, contracts, and resumes, is lucrative, offering up a slew of new business
opportunities.
Over the last four decades, there has been a lot of research into document image
processing and comprehension. Work in the field has been applied in a variety of
domains, including office automation, forensics, and digital libraries, and includes
preprocessing, physical and logical layout analysis, optical and intelligent character
recognition (OCR/ICR), graphics analysis, form processing, signature verification, and
writer identification. There are several decent document processing and analysis
options available;
The goal of document image analysis is to recognize text and graphics components in
images and extract the desired information in the same way as a human would.
1
Click here to enter text.
4. Create a web portal to submit the document and display the results on the screen.
5. Keep an edit option as well on the screen to avoid any mistake done by system.
6. Create an API as well.
Dataset:
You have to collect your dataset for this project for the Indian continent, and based on
that, you have to design your solution and create a repo for the dataset.
Or use the following dataset to create an initial app - Dataset
2
Click here to enter text.
• Safe: It can be used without causing harm.
• Testable: It can be tested at the code level.
• Maintainable: It can be maintained, even as your codebase grows.
• Portable: It works the same in every environment (operating system)
• You have to maintain your code on GitHub.
• You have to keep your GitHub repo public so that anyone can check your code.
• Proper readme file you have to maintain for any project development.
• You should include basic workflow and execution of the entire project in the readme
file on GitHub
• Follow the coding standards: https://www.python.org/dev/peps/pep-0008/
Database:
• You are supposed to use a given dataset for this project which is a Cassandra
database.
• https://astra.dev/ineuron
Cloud:
• You can use any cloud platform for this entire solution hosting like AWS, Azure or
GCP
API Details or User Interface:
• You have to expose your complete solution as an API or try to create a user
interface for your model testing. Anything will be fine for us.
Logging:
• Logging is a must for every action performed by your code use the python logging
library for this.
Ops Pipeline:
• If possible, you can try to use AI ops pipeline for project delivery Ex. DVC, MLflow
, Sagemaker , Azure machine learning studio, Jenkins, Circle CI, Azure DevOps ,
TFX, Travis CI
Deployment:
• You can host your model in the cloud platform, edge devices, or maybe local, but
with a proper justification of your system design.
Solutions Design:
• You have to submit complete solution design strategies in HLD and LLD document
3
Click here to enter text.
System Architecture:
• You have to submit a system architecture design in your wireframe document and
architecture document.
Latency for model response:
• You have to measure the response time of your model for a particular input of a
dataset.
Optimization of solutions:
• Try to optimize your solution on code level, architecture level and mention all of
these things in your final submission.
• Mention your test cases for your project.
Submission requirements:
High-level Document:
4
Click here to enter text.
You have to create a high-level document design for your project. You can reference the
HLD form below the link.
Sample link:
HLD Document Link
Low-level document:
You have to create a Low-level document design for your project; you can refer to the LLD
from the below link.
Sample link
LLD Document Link
Architecture: You have to create an Architecture document design for your project;
you can refer to the Architecture from the below link.
Sample link
Architecture sample link
Wireframe: You have to create a Wireframe document design for your project; refer to
the Wireframe from the below link.
Demo link
Wireframe Document Link
Project code:
You have to submit your code GitHub repo in your dashboard when the final submission
of your project.
Demo link
Project code sample link :
5
Click here to enter text.
Demo link
DPR sample link
6
Click here to enter text.