Document Classification With LayoutLMv3

Document Classification with LayoutLMv3

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

173 views25 pages

Document Classification With LayoutLMv3

Document Classification with LayoutLMv3

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 25

9115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp Blog > Document Classification with Layoutimv3 Document Classification with LayoutLMv3 Document Classification with Transformers and PyTorch | Setup & Preproce... In this tutorial, we will explore the task of document classification using layout information and image content. We will use the LayoutLMv3 model, a state-of-the-art model for this task, and PyTorch Lightning, a lightweight PyTorch wrapper for high- performance training. Join the AI BootCamp! Ready to dive into the world of Al and Machine Learning? Join the Al BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more! JOIN NOW https shwww-mlexpert iofblogidocumentclassiieaton-witrayourinv3 1125asi, ssaMt Document Classfcation with LayouliMv8 |MLExper- Get Things Done wih Al Bootcamp We will start by preparing the dataset and data loaders, followed by building and training the model. We will then evaluate the performance of our model and analyze the results using a confusion matrix. Finally, we will explore ways to improve the performance of the model on specific classes. By the end of this tutorial, you will have a good understanding of how to use LayoutLMv3 for document classification and how to leverage PyTorch Lightning to train and evaluate deep learning models. © In this tutorial, we will be using Jupyter Notebook to run the code. If you prefer to. follow along, you can access the notebook here: open the notebook Notebook Setup We will begin by installing wkhtmltopdft, a utility that can convert HTML files into images 3%bash wget -q https: //github. com/wkhtmltopdf /packaging/releases/download/@.12.6-1/wkhtmltox_¢ cp [email protected]_and64.deb /usr/bin apt -qq install /usr/bin/wkhtmltox_0.12.6-1.bionic_amd64.deb Next, we will proceed to install all the necessary libraries: Ipip install -qqq transformers==4.27.2 --progress-bar off !pip install -qqq pytorch-lightning==1.9.4 --progress-bar off Ipip install -qqq torchnetrics==0.11.4 --progress-bar off Ipip install -qqq 4 2.3 --progress-bar off Ipip install -qqq 1.6.2 --progress-bar off !pip install -qqq 9.4.0 ~-progress-bar off Ipip install -qqq tensorboardx==2.5.1 --progress-bar off Ipip install -qqq_huggingface_hub Ipip install -qqq --upgrade --no-cache-dir gdown 11.1 --progress-bar off The essential libraries for this tutorial are: ntps:wwn.mlexpertiofblogldocument-classiiation-witriayoutinv’ 212597574, 643 AM Document Classification wih LayoullMv8 |MLExpert- Get Things Done wih Al Bootcamp * transformers : We'll be using the implementation of LayoutLMv3 from this library for our model. * pytorch-lightning :It will help us in fine-tuning our model. * torchnetrics : This library provides us with various metrics for classification and other tasks * easyocr : We'll be using this library to run OCR on the document images. Let's add all imports that we'll use: fron transformers import LayoutLMv3Featureéxtractor, LayoutLMv3TokenizerFast, LayoutLM fron tqdm import tqdm import torch fron torch.utils.data import Dataset, DataLoader import pytorch lightning as pl fron pytorch_lightning.callbacks import ModelCheckpoint fron PIL import Inage, InageDraw, ImageFont import nunpy as np fron sklearn.model_selection import train_test_split import imgkit import easyocr import torchvision.transforms as T from pathlib import Path import matplotlib.pyplot as plt import os import cv2 from typing import List import json fron torchmetrics import Accuracy fron huggingface_hub import notebook_login fron sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay %matplotlib inline pl.seed_everything(42) The last line sets the seed for PyTorch Lightning to 42. Setting a seed ensures that the random number generator used by PyTorch Lightning (and the underlying PyTorch framework) produces the same sequence of random numbers each time the code is run Data https:ihwww-mlexpertiofblogidocument classification witrayourinv3 3125971924, 843. aM Document Classetion wih LayouMva | MLExpert- Gat Things Dane wih Al Bootcamp The data is from Kaggle - Financial Documents Clustering?. It contains HTML documents (tables) from the publically available Hexaware Technologies financial annual reports?, It has 5 categories: * Income Statements (317 files) © Balance Sheets (282 files) *© Cash Flows (36 files) * Notes (702 files) © Others (1236 files) Download and extract an exact copy of the Kaggle files from my Google Drive: Igdown tMZXonmajLPKSzhZ2dt-Cd2RTSSYFHy® lunzip -q financial -documents.zip Inv “TableClassifierQuaterlywithNotes” “documents” Convert HTML to Images The documents are in HTML format, which is not usable for our model. We'll convert them to images. First, let's change the folder names to "snake case": for dir in Path("documents").glob("*"): dir.rename(str(dir).lower().replace(" ", "_")) List (Path("“documents").glob("*")) [PosixPath( ‘documents/notes*), PosixPath( ‘documents/cash_flow'), PosixPath( ‘documents/balance_sheets'), PosixPath( ‘documents/income_statement"), PosixPath( ‘documents/others')] ntps:wwn.mlexpertofblogldocument-classiiation-witrayoutlnv’ 425,9115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp We need a directory for the converted images and each class of documents for dir in Path("documents") .glob("*"): image_dir = Path(f"images/{dir.name}") image_dir.mkdir(exist_ok=True, parents=True) To convert the HTML files to images, we'll be utilizing the imgkit package: def convert_html_to_image(file_path: Path, images_dir: Path, scale: float = 1.0) -> Pai file_nane = file_path.with_suffix(".jpg”).name save_path = images_dir / file_path.parent.name / f"{file_name ingkit.from_file(str(file_path), save_path, options: ‘quiet’: '', ‘format’: ‘jpeg"? image = Image.open(save_path) width, height = image.size image = image.resize((int(width * scale), int(height * scale))) image. save(str(save_path)) return save_path document_paths = list(Path("“documents").glob("*/*")) for doc_path in tqdm(document_paths) : convert_html_to_image(doc_path, Path("image: ), scale=8.8) Let's look at a sample document image: image_paths = sorted(list(Path("images").glob("*/*.jpg"))) image = Image.open(image_paths[@]) . convert ("RGB") width, height = image. size image https hw mlexpert iofblogidocument classification witrayourinv3 51259115124, 8:43 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp As ataist [As at3ist_(Asat 31st March 2027 March 2038) March 2017 2 Financial ieee — ae pa igses? [958056 oe a Heerese (Tey nventones a2 i925 000292392530 | (_TtFinanciassets EE [lores raeate —__— (en Ine eee ere | eas equivalonis e832 [i7da 28382 a7r41 | [i s3845 57.08 /.838.45 i «ini io ao son Total Assets (= (8) ear | arma | [ieee ties je ees Equity & Labilties 7 ae a | DEquiy (0) Equity share capial Tesae yesene A.esSOs Tew OE [Tooter Equiy —————SSS—=iR SO tan gs ara | pee | Total Eauiy @ eisis fe 27098 a aa (_[euepites —— face Sates is ‘Giher Non Curent Financia iain 030 = — a —| (HLongiemprovsons aaa OL. Ese 2 () Deferod tax fables (Net) Tosisi (3996s oer si tt one a a (w) Trade payabies SBS7ATS [9s7263 257473 9.81262 [Other Curent Financilabiiies (20571 6d 9521 asad (__[fb) Other current liabilities 7as04 (iei1o —fresoa isis | [)Shor term provisions. 007 (38.26, —— 26 (@ Current tax Fates (Net) sees (7132 sexes 732 | {Cherm G. — smear jsearea paneer 25,027.08 seine mere https: shwwn-mlexpertiofblogidocumentclassiicaton-witrayourinv3Document Classification wth LayoutLMV | MLExpert- Get Things Dane with Al Bootcamp quay ano Laces (Ayr (5) + le ki | 42,309. 192,302.51 41,973.88 EasyOCR EasyOCR is a Python library for optical character recognition (OCR), which is the process of extracting text from images. EasyOCR uses deep learning models to recognize text and can handle a wide range of font styles, sizes, and orientations. reader = easyocr.Reader([ en" ]) We'll feed our sample document into the EasyOCR reader and see what it detects: image_path = image_paths[0] ocr_result = reader.readtext (str(image_path)) The ocr_result has the following format: text box coordinates [x,y], text, confidence Here's the first row from the result: ([[279, 13], [327, 13], (327, 27], [279, 27]], ‘In lacs)’, @.46634036192148154) We'll examine the OCR output overlaid on top of the document image: def create_bounding_box(bbox_data): xs =] ys = ( for x, y in bbox_data: xs-append(x) ys-append(y) left = int(min(xs)) top = int(min(ys)) right = int(max(xs)) ‘tps: mlexper ilblogidocument-lasication-wtriayoutin3 71259115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp bottom = int(max(ys)) return [left, top, right, bottom] font_path = Path(cv2.__path__[@]) / “qt/fonts/DejavuSansCondensed. ttf" font = ImageFont.truetype(str(font_path), size=12) Fig, (ax1, ax2) = plt-subplots(1, 2, Figsize-(28, 28)) left_image = Tmage.open(image_path). convert ("RGB") right_image = Image.new("RGB", left_image.size, (255, 255, 255)) left_draw = ImageDraw.Draw(left_image) right_draw = InageDraw.Draw(right_image) for i, (bbox, word, confidence) in enumerate(ocr_result): box = create_bounding_box(bbox) left_draw.rectangle(box, outline="blue", width=2) left, top, right, bottom = box left_draw.text((right + 5, top), text-str(i + 1), fill-"red", font=font) right_draw.text((left, top), text=-word, fill="black", font=font) ax1. imshow(left_image) x2. imshow(right_image) axtaxis("off"); ax2.axis("off")3 https:ihwww-mlexpertiofblogidocument classification witrayourinv39115124, 8:43 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp 2. i ai ntps:hwwwmlexpertofblogldocument-classiiation-witriayoutinv’9115124, 843 AM Document Classification with Layout MV | MLExpert- Get Things Done wth Al Bootcamp Ee 9) ES STE] We define a helper function create_bounding_box() that takes text box coordinates The function finds the minimum and maximum values of xs and ys , and returns the coordinates of the resulting bounding box as a list in the format left, top, right, bottom We can extract the OCR (Optical Character Recognition) result from each image and then save the results in JSON files: for image_path in tqdm(image_paths): ocr_result = reader.readtext(str(image_path), batch_size=16) ocr_page = [] for bbox, word, confidence in ocr_result: ocr_page.append({ "word": word, “bounding box": create_bounding_box(bbox) » with image_path.with_suffix(".json").open("w") as f: json.dump(ocr_page, #) LayoutLMv3 LayoutLMv3¢ is a state-of-the-art pre-trained language model developed by Microsoft Research Asia. It is designed to handle document analysis tasks that require understanding of both text and layout information, such as document classification, information extraction, and question answering The model is built on top of the transformer architecture and trained on massive amounts of annotated document images and text. LayoutLMv3 is capable of recognizing and encoding both the textual content and the visual layout of a document, allowing it to provide superior performance on document analysis tasks. You can use LayoutLMv3 for various tasks, such as document classification, named entity recognition, and question answering. To use LayoutLMv3, you can fine-tune the pre ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’ 1012597524, 843m Document Casicaton wih Layout |NLEXpert- Get Things Dane with Al Bootcamp trained model on your specific task with a small amount of task-specific data. Hugging Face Transformers and PyTorch provide easy-to-use APIs that will allow us to fine-tune LayoutLMv3 for document classification. Preprocessing LayoutLMVv3 uses text, bounding boxes and images as input. To prepare all, we can use the LayoutLMv3Processor. The processor combines OCR, tokenization and image preprocessing feature_extractor = LayoutLMv3Featureéxtractor(apply_ocr=False) tokenizer = LayoutLMv3TokenizerFast.from_pretrained( “microsoft/layoutInv3-base” d processor = LayoutLMv3Processor(feature_extractor, tokenizer) The LayoutLmv3Featureextractor uses Tesseract OCR as the default option. However, Tesseract OCR was very slow during my experiments. Instead, we'll use a custom OCR engine (EasyOCR). Consider Google Cloud Vision or Amazon Textract, if you require a faster and more accurate OCR solution. We'll apply the processor to the sample document. LayoutLMv3 requires that each bounding box be normalized to be on a 0-1000 scale. We'll need the image width and height scale for that: image_path = image_paths[@] image = Image.open(image_path) . convert ("RGB") width, height = image.size width_scale = 1000 / width height_scale = 1000 / height Next, we'll take the OCR and extract words and bounding boxes: def scale_bounding_box(box: List[int], width_scale : float = 1.0, height scale : float return [ ntps:wwn.mlexpertofblogldocument-classiiation-witrayoutlnv’ 1259115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done with Al Boatcamp int(box[@] * width_scale), int(box[1] * height_scale), int(box[2] * width_scale), int(box[3] * height_scale) json_path = image_path.with_suffix(".json") with json_path.open("r") as f: ocr_result = json.load(f) words = [] boxes = [] for row in ocr_result: boxes append(scale_bounding_box(row["bounding_box"], width_scale, height_scale)) words. append(row[ "word" ]) len(words), Len(boxes) (a74, 174) We define the function scale_bounding_box() to apply the image scale to each bounding box. Next, we iterate over each row of the OCR results stored in ocr_result , extract the bounding box coordinates and word text for each recognized text region, and scale the bounding box coordinates using the scale_bounding_box() . encoding = processor( image, words, boxes=boxes, max_length=512, padding="max_length", truncation=True, print(#""" input_ids: {1ist(encoding["input_ids"].squeeze().shape)} word boxes: {list (encoding["bbox"].squeeze().shape)} image data: (List(encoding[ "pixel_values"].squeeze().shape)} image size: {image.size} ) ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’ 12059115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp input_ids: [512] word boxes: [512, 4] image data: [3, 224, 224] image size: (819, 1195) We have three pieces of information: input_ids from the tokenizer, bbox for the bounding boxes, and pixel_values for the image. Let's have a look at the encoded image: image_data = encoding["pixel_values"][@] transform = T.ToPILImage() transform(image_data) The image encoding is a 3-dimensional array of shape (channels, height, width) Next, we convert the tensor to a PIL image object using a transformation from torchvision Model Let's create an instance of LayoutLMv3: model = LayoutLMv3ForSequenceClassification.from_pretrained( “microsoft/lLayout1mv3-base", num_labels=2 ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’98124, 8:43 AM Document Classification wth LayoutLMV | MLExpert- Get Things Dane wth Al Bootcamp The sequence classification model is loaded from the microsoft/layout1mv3-base checkpoint. We set num_labels to 2, which indicates we'll use it for binary classification. We can run the encoded document through the model and look at the predictions: outputs = model (**encoding) outputs. logits tensor([[®.2644, 0.2629], grad_fn=) Naturally, our model is untrained and lacks the ability to comprehend the documents in our dataset. Let's train it! Training To fine-tune LayoutLMv3, we will utilize PyTorch Lightning. This is what we'll do: * Split the data into training and testing subsets * Create a PyTorch Dataset * Generating dataloaders * Define a LightningModule * Use the Trainer from PyTorch Lightning to train our model Let's start by preparing the data: train_images, test_images = train_test_split(image_paths, test_size=.2) DOCUMENT_CLASSES = sorted(List(map( lambda p: p.name, Path("images*).glob("*") ») DOCUMENT_CLASSES [ ntps:wwn.mlexpertiofblogldocument-classiiation-witriayoutinv’ 14259115724, 6:49AM Document Classification with Layout MvS | MLExpert- Gat Things Dane wit Al Bootcamp ‘balance_sheets', ‘cash flow", “incone_statement’, notes", ‘others* First, we split the document images into train and test subsets. Next, we extract the document classes from the document image directory names. This allows us to create a mapping from document image to its class. We have everything needed to create a PyTorch Dataset: class DocumentClassificationbataset (Dataset) def __init_(self, image_paths, processor): self.image_paths = image_paths self.processor = processor def _len_(self): return len(self.image_paths) def __getitem_(self, item): image_path = self.image_paths[item] json_path = image_path.with_suffix(".json") with json_path.open("r") as f: ocr_result = json.load(f) with Image.open(image_path).convert("RGB") as image: width, height = image.size width_scale = 1088 / width height_scale = 1000 / height words = [] boxes = [] for row in ocr_result: boxes . append(scale_bounding_box( row["bounding_box"], width_scale, height_scale » words. append(row[ "word" ]) ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’ 152598124, 8:43 AM Document Classification wth LayoutLMV | MLExpert- Get Things Dane wth Al Bootcamp encoding = self.processor( image, words, max_length=512, max_length", label = DOCUMENT_CLASSES. index(image_path. parent. name) return dict( input_ids=encoding[ "input_ids"].flatten(), attention_mask=encoding["attention_mask"].flatten(), bbox=encoding[ "box" ].flatten(end_dim=1), pixel_values=encoding| “pixel_values"].flatten(end_dim=1), Jabels=torch.tensor(1abel, dtype=torch. long) The class takes two arguments: © image_paths a list of paths to document images © processor : an instance of the LayoutLMv3Processor class The _1en__ method returns the number of images in the dataset, and the _getitem__ method loads and preprocesses the image and OCR results at a given index We can now create datasets and data loaders for the train and test documents: train_dataset = DocumentClassificationDataset(train_images, processor) test_dataset = DocumentClassificationDataset(test_images, processor) train_data_loader = DataLoader( train_dataset, batch_size=8, shuffle-True, num_workers=2 test_data_loader = DataLoader( test_dataset, ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’ 161259115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp shuffle-False, niun_workers=2 Let's implement a LightningModule using PyTorch Lightning. This will wrap all the components and allow us to train our model class ModelModule(pl.LightningModule) : def _init_(self, n_classes:int): super()._init_() self.model = LayoutLmv3ForSequenceClassification.from_pretrained( "microsoft/layoutImv3-base”, num_labels=n_classes ) self.model.config.id2label = {k: v for k, v in enumerate(DOCUMENT_CLASSES) } self.model.config.label2id = {v: k for k, v in enumerate(DOCUMENT_CLASSES)} self.train_accuracy = Accuracy(task="multiclass", num_classes=n_classes) self.val_accuracy = Accuracy(task="multiclass”, num_classes=n_classes) def forward(self, input_ids, attention_mask, bbox, pixel_values, labels=None): return self.model( input_ids, attention_mask=attention_mask, bbox=bbox, pixel_values=pixel_values, labels=labels def training_step(self, batch, batch_idx): input_ids = batch["input_ids"] attention_mask = batch["attention_mask"] bbox = batch["bbox"] pixel_values = batch["pixel_values”] labels = batch{"labels"] output = self(input_ids, attention_mask, bbox, pixel_values, labels) self.log("train_loss", output .1oss) self. 1og( “train_acc", self. train_accuracy (output. logits, labels), ) return output.loss def validation step(self, batch, batch_idx): https shwwn-mlexpertiofblogidocument classification witrayourinv3 17289115724, 843 AM Document Classification wth Layout M3 | MLExpert- Get Things Done with Al Booteamp input_ids = batch{"input_ids"] attention_mask = batch["attention_mask"] bbox = batch["bbox"] pixel_values = batch{"pixel_values”] labels = batch["labels"] output = self(input_ids, attention_mask, bbox, pixel_values, labels) self.log("val_loss", output. 1oss) self. 10g( "val_ace", self.val_accuracy(output.logits, labels), on_step-False, on_epoch=True ) return output. loss def configure_optimizers(self): optimizer = torch.optim.Adam(self.model.parameters(), 1r=0.00001) #1e-5 return optimizer The _init__ method initializes the LayoutLMv3 model for sequence classification with a specified number of classes, and sets up the accuracy metric for both training and validation. The forward method takes input tensors ( input_ids , attention_mask, bbox, and pixel_values ) and an optional labels tensor (only used during training), and returns the model output. The training_step and validation_step methods define the training and validation steps respectively. In each method, the input tensors are passed through the model, and the loss and accuracy are logged. The configure_optimizers method defines an Adam optimizer used for training. Let's create an instance of our ModelModule : model_module = ModelModule(1en(DOCUMENT_CLASSES)) We'll use Tensorboard to track the training progress load_ext tensorboard https:ihwww-mlexpertiofblogidocument classification witrayourinv3 187259115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done wth Al Boatcamp ‘Ktensorboard --logdir lightning_logs Finally, we need to set up the PyTorch Lightning Trainer: model_checkpoint = ModelCheckpoint ( filename="{epoch}-{step}-{val_loss:.4F}", save_last=True, save_top_k=3, monitor="vi trainer = pl.Trainer( accelerator="gpu", precision=16, devices: max_epochs=5, callbacks=[ model_checkpoint 1. ) The Modelcheckpoint callback is defined to save the model's weights after each epoch, with a specific naming format that includes the epoch number, training step, and validation loss. The Trainer will use a single GPU, mixed precision (16 bit) training, and train for 5 epochs. Let's train: trainer. fit(model_module, train_data_loader, test_data_loader) INFO: pytorch_lightning.accelerators.cuda:LOCAL_RANK: @ - CUDA_VISIBLE_DEVICES: [@] INFO: pytorch_Lightning.callbacks.model_summary: | Name | type | Params | model | LayoutLMv3ForSequenceClassification | 125 1 | train_accuracy | MulticlassAccuracy le | val_accuracy | MulticlassAccuracy Je 125M‘ Trainable parans ° Non-trainable params 225M ‘Total. params https:ihwww-mlexpertiofblogidocument classification witrayourinv3 1912598124, 8:43 AM Document Classification wth LayoulL MVS | MLExpert- Get Things Done with Al Bootcamp 251.843 Total estimated model params size (MB) nitpssww.mlexpen joblog!documert-classifcation-withiayoutinyS 201289115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done with Al Boatcamp trained_model = ModelModule.load_from_checkpoint( model_checkpoint .best_model_path, n_classes=len(DOCUMENT_CLASSES) , local_files_only=True notebook_login() trained_model.model .push_to_hub( “layout Inv3-Financial-document classification” Once the model is uploaded, we can easily download it using its name or ID. We will load the model from the Hub and put it on the GPU for inference: DEVICE = “cuda:0" if torch.cuda.is_available() else "cpu model = LayoutLMv3ForSequenceClassification. from_pretrained( ‘curiousily/layout1mv3-financial-docunent-classification” ) model model.eval().to(DEVICE) We'll write a function to do inference for a single document image: def predict_document_image( image_path: Path, model: LayoutLMv3ForSequenceClassification, processor: LayoutLMv3Processor): json_path = image_path.with_suffix(".json") with json_path.open("r") as f: ocr_result = json.load(f) with Tmage.open(image_path).convert("RGB") as image: width, height = image.size width_scale = 1000 / width height_scale = 1000 / height words ol ol for row in ocr_result: boxes. append( tps mlexperioflogdocumentclassfeation with ayoutinv3 21 boxes9115124, 843 AM Document Classification wth LayoutLMV | MLExpert- Get Things Dane wth Al Bootcamp scale_bounding_box( row[ "bounding box"], width_scale, height_scale d words. append(row[ "word" ]) encoding = processor( image, words, boxes=boxes, max_length=512, padding="max_length”, truncation=True, return_tensor pt” with torch. inference_mode(): output = model( input_ids-encoding[ "input_ids" ]to(DEVICE), attention_mask=encoding[ “attention_mask"].to(DEVICE), bbox-encoding["bbox"] .to(DEVICE), pixel_values=encoding["pixel_values"].to(DEVICE) predicted_class = output. logits.argnax() return model. config. id2label [predicted_class.item()] This function takes an image path as input, opens the image, extracts the OCR, scales the bounding boxes based on the image size, and preprocesses the image and text data using the previously defined processor . The preprocessed data is then sent to the model for inference on the GPU. Finally, the function returns the predicted class label for the input image. We can now execute the function on all test documents ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’9115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done with Al Boatcamp labels = [] predictions = [] for image_path in tadm(test_images): labels .append(image_path.parent..name) predictions. append( predict_document_image(image_path, model, processor) Given that the dataset is imbalanced, relying solely on accuracy as the evaluation metric may not provide a complete picture of the model's performance. Therefore, we will use a confusion matrix to gain deeper insights: cm = confusion_matrix(labels, predictions, labels-DOCUMENT_CLASSES) cm_display = ConfusionMatrixDi splay( confusion_matrix=cm, display_labels-DOCUMENT_CLASSES cm_display.plot() cm_display.ax_.set_xticklabels(DOCUMENT_CLASSES, rotation=45) cm_display.figure_.set_size_inches(16, 8) plt.show(); There is some confusion between the two most represented classes - others and notes . Could you create an improved model that makes more accurate predictions for those? ntps:hwwwmlexpertofblogldocument-classiiation-witrayoutinv’ 231259si24, 8:43 AM Document Classification with LayoulL Mv3 | MLExpert- Get Things Do balance sheets cash flow Income_statement others hitps:www-mlexper ioblogidocument-classcation-with layouts with Al Bootcamp 200 150 100 S 241289115124, 843 AM Document Classification wth Layout MV | MLExpert- Get Things Done with Al Boatcamp 3,000+ people already joined Join the The State of Al Newsletter Every week, receive a curated collection of cutting-edge Al developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of Al Your Email Address SUBSCRIBE Iwon't send you any spam, ever! References 1. wkhtmitopdf - tool to render HTML into PDF/images & 2. Financial Documents Clustering © 3. Hexaware Technologies financial annual reports = 4, LayoutLMv3: Pre-training for Document Al with Unified Text and Image Masking © Dark 1020-2024 MLExpert™ by Venelin Valkov. All Rights Reserved. ntps:wwn.mlexpertofblogldocument-classiiation-witrayoutlnv’ 25125

OCR & Groq: Fast Data Extraction
No ratings yet
OCR & Groq: Fast Data Extraction
17 pages
Lesson 1hand Tools and Equipment
No ratings yet
Lesson 1hand Tools and Equipment
22 pages
Americans of Gentl 00 Walk
No ratings yet
Americans of Gentl 00 Walk
638 pages
Crim Law Deskbook V 1
No ratings yet
Crim Law Deskbook V 1
901 pages
The Person and Place of Jesus Christ
No ratings yet
The Person and Place of Jesus Christ
388 pages
Early Journal Content On JSTOR, Free To Anyone in The World
No ratings yet
Early Journal Content On JSTOR, Free To Anyone in The World
3 pages
Microservices .NET & K8S Cheat Sheet
No ratings yet
Microservices .NET & K8S Cheat Sheet
1 page
Ancestryofabr 1284 Leaj
No ratings yet
Ancestryofabr 1284 Leaj
286 pages
Student Chatbot
No ratings yet
Student Chatbot
84 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
63 pages
Handbook For Federal Grand Jurors
100% (1)
Handbook For Federal Grand Jurors
9 pages
AI Medical Chatbot Presentation
No ratings yet
AI Medical Chatbot Presentation
13 pages
North Dakota 2011-13
No ratings yet
North Dakota 2011-13
80 pages
Running Kubernetes On Your Raspberry Pi Homelab
No ratings yet
Running Kubernetes On Your Raspberry Pi Homelab
39 pages
Scientific American Supplement, No. 832, December 12, 1891 by Various
No ratings yet
Scientific American Supplement, No. 832, December 12, 1891 by Various
79 pages
AMPAAAS 8.25 Planet Alignment Termanation of Contracts
No ratings yet
AMPAAAS 8.25 Planet Alignment Termanation of Contracts
68 pages
Weaponized Meters
100% (1)
Weaponized Meters
15 pages
Python Programming and SQL 5 Books in 1 From Starter To Smarter Master Hands On Coding Break Career Barriers and Unlock Expert Techniques With A Step by Step Method
No ratings yet
Python Programming and SQL 5 Books in 1 From Starter To Smarter Master Hands On Coding Break Career Barriers and Unlock Expert Techniques With A Step by Step Method
210 pages
Homelab Recc
No ratings yet
Homelab Recc
1 page
The Right To Sue Letter Signed by Their General Counselor. All Claims Are Looked in The Army/Dod'S
0% (1)
The Right To Sue Letter Signed by Their General Counselor. All Claims Are Looked in The Army/Dod'S
738 pages
Army DNA Repository & Law Use
No ratings yet
Army DNA Repository & Law Use
50 pages
Iot and Ai in Agriculture: Tofael Ahamed
No ratings yet
Iot and Ai in Agriculture: Tofael Ahamed
501 pages
Green Finance Instruments, FinTech, and Investment Strategies: Sustainable Portfolio Management in The Post-COVID Era 1st Edition Nader Naifar Download
100% (1)
Green Finance Instruments, FinTech, and Investment Strategies: Sustainable Portfolio Management in The Post-COVID Era 1st Edition Nader Naifar Download
130 pages
Ethics of Quantum Computing: An Outline
No ratings yet
Ethics of Quantum Computing: An Outline
22 pages
177 Fall 2003 PDF
No ratings yet
177 Fall 2003 PDF
350 pages
UBot Commando Guide
0% (1)
UBot Commando Guide
30 pages
Do You Know
100% (1)
Do You Know
8 pages
Ultimate Portable Homelab
No ratings yet
Ultimate Portable Homelab
56 pages
Household Balance Sheet Q2 2009
No ratings yet
Household Balance Sheet Q2 2009
6 pages
Building LLaMA 3 From Scratch With Python
No ratings yet
Building LLaMA 3 From Scratch With Python
34 pages
Affidavit of Non Abandonment PFRS 8-13-21
100% (2)
Affidavit of Non Abandonment PFRS 8-13-21
5 pages
Beej's Guide To Network Programming
No ratings yet
Beej's Guide To Network Programming
128 pages
The Fruits and Fruit Trees of America PDF
No ratings yet
The Fruits and Fruit Trees of America PDF
621 pages
Docker Linux Exercises
No ratings yet
Docker Linux Exercises
38 pages
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
No ratings yet
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
123 pages
Genesis and Big Bang Reconciliation
No ratings yet
Genesis and Big Bang Reconciliation
17 pages
The Belgian Royals and The German Marshall Fund of The United States
No ratings yet
The Belgian Royals and The German Marshall Fund of The United States
16 pages
Practical RAG
No ratings yet
Practical RAG
127 pages
Chat With Multiple PDFs Using Llama 2 and LangChain
No ratings yet
Chat With Multiple PDFs Using Llama 2 and LangChain
17 pages
Quantum Artificial Intelligence Enhancing Machine
No ratings yet
Quantum Artificial Intelligence Enhancing Machine
6 pages
New Deal Powerpoint - Lesson
100% (2)
New Deal Powerpoint - Lesson
16 pages
Python Primer: Beginner's Guide
No ratings yet
Python Primer: Beginner's Guide
89 pages
Finance Fundamentals in Python
100% (4)
Finance Fundamentals in Python
877 pages
Donahoe & Palmer 2022
No ratings yet
Donahoe & Palmer 2022
15 pages
Electrical Basics for Beginners
No ratings yet
Electrical Basics for Beginners
112 pages
Docker Basics for Developers
100% (1)
Docker Basics for Developers
255 pages
RAG Beyond Text Enhancing Image Retrieval in RAG Systems
100% (1)
RAG Beyond Text Enhancing Image Retrieval in RAG Systems
6 pages
Docker - Complete Guide To Docker For Beginners and Intermediates - (Code Tutorials Book 6) (BooxRack)
100% (1)
Docker - Complete Guide To Docker For Beginners and Intermediates - (Code Tutorials Book 6) (BooxRack)
94 pages
A Docker Tutorial For Beginners
No ratings yet
A Docker Tutorial For Beginners
59 pages
Jenkins: Continuous Integration - Written in Java
No ratings yet
Jenkins: Continuous Integration - Written in Java
54 pages
Quantum Computing by Practice: Python Programming in The Cloud With Qiskit
No ratings yet
Quantum Computing by Practice: Python Programming in The Cloud With Qiskit
407 pages
Advanced Technologies For Smart Agriculture 1st Edition Kalaiselvi K PDF Download
50% (2)
Advanced Technologies For Smart Agriculture 1st Edition Kalaiselvi K PDF Download
85 pages
Electroculture by Briggs, Lyman J & Others 1926
No ratings yet
Electroculture by Briggs, Lyman J & Others 1926
38 pages
Dr. A True Ott's Court Injunction
100% (3)
Dr. A True Ott's Court Injunction
4 pages
Group 1 Project Report Jarvis Ai Voice Assistant
No ratings yet
Group 1 Project Report Jarvis Ai Voice Assistant
39 pages
Academic Research Assistance 1716570959
No ratings yet
Academic Research Assistance 1716570959
13 pages
Text Into Speech Python Report
No ratings yet
Text Into Speech Python Report
18 pages
Ch5 - A Snapchat-Like AR Filter On Android - Touched HH
No ratings yet
Ch5 - A Snapchat-Like AR Filter On Android - Touched HH
26 pages
Project Guidelines - AIML
No ratings yet
Project Guidelines - AIML
30 pages
Core ML Survival Guide
No ratings yet
Core ML Survival Guide
505 pages
Create GUI Python Programs
No ratings yet
Create GUI Python Programs
2 pages
Therapist GPT
No ratings yet
Therapist GPT
2 pages
Paddle OCR EN
No ratings yet
Paddle OCR EN
16 pages
Writing & Blogging
No ratings yet
Writing & Blogging
8 pages
A Python Book Beginning Python Advanced Python and Python Exercises
No ratings yet
A Python Book Beginning Python Advanced Python and Python Exercises
261 pages
Support For GraphQL in generateDS
No ratings yet
Support For GraphQL in generateDS
6 pages
Learning Assistant
No ratings yet
Learning Assistant
6 pages
ERNIE
No ratings yet
ERNIE
7 pages
PaddlePaddle Generative Adversarial Network CN
No ratings yet
PaddlePaddle Generative Adversarial Network CN
5 pages
How To Build A Python GUI Application With Wxpython
No ratings yet
How To Build A Python GUI Application With Wxpython
17 pages
A Cross-Platform ChatGPT Gemini UI
No ratings yet
A Cross-Platform ChatGPT Gemini UI
15 pages
Kwai Agents
No ratings yet
Kwai Agents
7 pages
Private Chatbot With Local LLM (Falcon 7B) and LangChain
No ratings yet
Private Chatbot With Local LLM (Falcon 7B) and LangChain
14 pages
Learning Different Languages
No ratings yet
Learning Different Languages
9 pages
Awesome AI Agents
100% (2)
Awesome AI Agents
35 pages
MemGPT - Unlimited Context (Memory) For LLMs
No ratings yet
MemGPT - Unlimited Context (Memory) For LLMs
11 pages
Auto GPT
No ratings yet
Auto GPT
7 pages
Agents
No ratings yet
Agents
4 pages
OpenAI Official Prompt Engineering Guide
No ratings yet
OpenAI Official Prompt Engineering Guide
17 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
Llama 3 - Open Model That Is Truly Useful
No ratings yet
Llama 3 - Open Model That Is Truly Useful
19 pages
Awesome Japanese NLP Resources
No ratings yet
Awesome Japanese NLP Resources
32 pages
LLaVA - Large Multimodal Model
No ratings yet
LLaVA - Large Multimodal Model
15 pages
Flux.1-Dev - Photorealistic (And Cute) Images
100% (1)
Flux.1-Dev - Photorealistic (And Cute) Images
15 pages
Fine-Tuning Llama 2 On A Custom Dataset
No ratings yet
Fine-Tuning Llama 2 On A Custom Dataset
22 pages
Prompts For Large Language Models
No ratings yet
Prompts For Large Language Models
6 pages
ChatGPT-repositories JP
0% (1)
ChatGPT-repositories JP
102 pages
LangChain QuickStart With Llama 2
No ratings yet
LangChain QuickStart With Llama 2
16 pages
ChatGPT-repositories ZH
No ratings yet
ChatGPT-repositories ZH
81 pages

Document Classification With LayoutLMv3

Uploaded by

Document Classification With LayoutLMv3

Uploaded by

You might also like