02 Use Prebuilt Document Intelligence Models
02 Use Prebuilt Document Intelligence Models
• Many forms and documents that your business handles are common across
disparate companies in different sectors.
• For example, most companies use invoices and receipts. Microsoft Azure AI
Document Intelligence includes prebuilt models so you can handle common
document types easily.
• You work for a company that conducts polls for private companies and political
parties.
• In this module, you'll learn about the capabilities of the prebuilt models in
Azure AI Document Intelligence and how to use them.
Introduction
Learning objectives
• Identify business problems that you can solve by using prebuilt models in
Azure AI Document Intelligence.
• Analyze forms by using the General Document, Read, and Layout models.
• Analyze forms by using financial, ID, and tax prebuilt models.
Understand prebuilt models
What are prebuilt models?
• The general approach used in AI solutions is to provide a large quantity of
sample data and then train an optimized model by trying different data
features, parameters, and statistical treatments.
• The combination that best predicts the values that interest you constitute the
trained model and you can use this to predict values from new data.
• Many of the forms that businesses use from day to day are of a few common
types.
• For example, most businesses issue or receive invoices and receipts. Any
business that has employees in the United States must use the W-2 tax
declaration form.
• Also you often have more general documents that you might want to extract
data from.
• For these cases, Microsoft has helped by providing prebuilt models. Prebuilt
models are already trained on large numbers of their target form type.
Understand prebuilt models
What are prebuilt models?
• If you want to use Document Intelligence to extract data from one of these
common forms or documents, you can choose to use a prebuilt model and you
don't have to train your own.
• General document model. Extract text, keys, values, entities and selection
marks from documents.
• To select the right model for your requirements, you must understand these
features:
• Text extraction. All the prebuilt models extract lines of text and words from
hand-written and printed text.
• Entities. Text that includes common, more complex data structures can be
extracted as entities. Entity types include people, locations, and dates.
Understand prebuilt models
Features of prebuilt models
• Selection marks. Spans of text that indicate a choice can be extracted by
some models as selection marks. These marks include radio buttons and
check boxes.
• Tables. Many models can extract tables in scanned forms included the data
contained in cells, the numbers of columns and rows, and column and row
headings. Tables with merged cells are supported.
• Fields. Models trained for a specific form type identify the values of a fixed
set of fields. For example, the Invoice model includes CustomerName and
InvoiceTotal fields.
Understand prebuilt models
Input requirements
• The prebuilt models are very flexible but you can help them to return accurate
and helpful results by submitting one clear photo or high-quality scan for each
document.
• You must also comply with these requirements when you submit a form for
analysis:
• The file must be in JPEG, PNG, BMP, TIFF, or PDF format. Additionally, the
Read model can accept Microsoft Office files.
• The file must be smaller than 500 MB for the standard tier, and 4 MB for
the free tier.
•
Understand prebuilt models
Compare prebuilt models
• Use this table to select the best prebuilt model to support your business
requirements.
• In the following units you'll learn further details about each model and how to
set them up in Azure AI Document Intelligence
• If you have an industry-specific or unique form type that you use often, you
might be able to obtain more reliable and predictable results by using a
custom model.
• However, custom models take time to develop because you must invest the
time and resources to train them on example forms before you can use it.
• The larger the number of example forms you provide for training, the better
the model will be at prediction form content accurately.
Understand prebuilt models
Try out prebuilt models with Azure AI Document Intelligence
Studio
• Azure AI Document Intelligence is designed as a web service you can call
using code in your custom applications.
• However, it's often helpful to explore the models and how they behavior with
your forms visually.
• You can choose any of the prebuilt models in Azure AI Document Intelligence
Studio.
• Microsoft provides some sample documents for use with each model or you
can add your own documents and analyze them.
Understand prebuilt models
Try out prebuilt models with Azure AI Document Intelligence
Studio
Understand prebuilt models
Calling prebuilt models by using APIs
• Because Azure AI Document Intelligence implements RESTful web services, you
can use web service calls from any language that supports them.
• Java.
• Python.
• JavaScript.
Understand prebuilt models
Calling prebuilt models by using APIs
• Whenever you want to call Azure AI Document Intelligence, you must start by
connecting and authenticating with the service in your Azure subscription. To
make that connection, you need:
• The service endpoint. This value is the URL where the service is published.
• The API key. This value is a unique key that grants access.
• You obtain both of these values from the Azure portal.
• Because the service can take a few seconds to respond, it's best to use
asynchronous calls to submit a form and then obtain results from the analysis:
Use the General Document, Read, and
Layout models
• If you want to extract text, languages, and other information from documents
with unpredictable structures, you can use the read, general document, or
layout models.
• You want to know if Azure AI Document Intelligence can analyze and extract
values from these documents.
• Here, you'll learn about the prebuilt models that Microsoft provides for general
documents.
Use the General Document, Read, and
Layout models
Using the read model
• The Azure AI Document Intelligence read model extracts printed and
handwritten text from documents and images. It's used to provide text
extraction in all the other prebuilt models.
• The read model can also detect the language that a line of text is written in
and classify whether it's handwritten or printed text.
• For multi-page PDF or TIFF files, you can use the pages parameter in your
request to fix a page range for the analysis.
• The read model is ideal if you want to extract words and lines from documents
with no fixed or predictable structure.DD
Use the General Document, Read, and
Layout models
Using the general document
•model
The general document model extends the functionality of the read model by
adding the detection of key-value pairs, entities, selection marks, and tables.
• The model can extract these values from structured, semi-structured, and
unstructured documents.
• The general document model is the only prebuilt model to support entity
extraction.
• It can recognize entities such as people, organizations, and dates and it runs
against the whole document, not just key-value pairs.
• This approach ensures that, when structural complexity has prevented the
model extracting a key-value pair, an entity can be extracted instead.
• Remember, however, that sometimes a single piece of text might return both a
key-value pair and an entity.
Use the General Document, Read, and
Layout models
Using the general document
•model
The types of entities you can detect include:
• It's a good model to use when you need rich information about the structure of
a document.
• Tables can have complicated structures with or without headers, cells that span
columns or rows, and incomplete columns or rows.
• The layout model can handle all of these difficulties to extract the complete
document structure.
Use the General Document, Read, and
Layout models
Using the layout model
• For example, each table cell is extracted with:
• Selection marks are extracted with their bounding box, a confidence indicator,
and whether they're selected or not.
Use financial, ID, and tax
models
• Azure AI Document Intelligence includes some prebuilt models that are trained
on common form types.
• You can use these models to obtain the values of common fields from invoices,
receipts, business cards, and more.
• In your polling company, invoices and receipts are often submitted as photos
or scans of the paper documents.
• You want to know if Azure AI Document Intelligence can get this information
into your databases more efficiently than manual data entry.
• Here, you'll learn about the prebuilt models that handle financial, identity, and
tax documents.
Use financial, ID, and tax
models
Using the invoice model
• Your business both issues invoices and receives them from partner
organization.
• There might be many different formats on paper or in digitized forms and some
will have been scanned poorly at odd angles or from creased paper.
• Amounts such as the unit price, the quantity of items, the tax incurred, and
the line total.
Use financial, ID, and tax
models
Using the receipt model
• Receipts have similar fields and structures to invoices, but they record
amounts paid instead of amounts charged.
• The form has more than 14 boxes and describes the employee's earnings in a
year.
a) Read model.
b) General document model.
c) ID document model.
2. You are using the prebuilt layout model to analyze a document with many checkboxes. You
want to find out whether each box is checked or empty. What object should you use in the
returned JSON code?
d) Selection marks.
e) Bounding boxes.
f) Confidence indicators.
3. You submit a Word document to the Azure AI Document Intelligence general document model
for analysis but you receive an error. The file is A4 size, contains 1 MB of data, and is not
password-protected. How should you resolve the error?