ViT4MNIST

Dec 8, 2021

2e46593 · Dec 8, 2021

Name	Name	Last commit message	Last commit date
parent directory ..
ViT4MNIST.xcodeproj	ViT4MNIST.xcodeproj	code changes completed for converting 8 demo apps to use lite version…	Sep 2, 2021
ViT4MNIST	ViT4MNIST	app update to PT 1.0: HelloWorld, Machine Translation, Speech Recogni…	Dec 8, 2021
Podfile	Podfile	app update to PT 1.0: HelloWorld, Machine Translation, Speech Recogni…	Dec 8, 2021
README.md	README.md	app update to PT 1.0: HelloWorld, Machine Translation, Speech Recogni…	Dec 8, 2021
convert_deit.py	convert_deit.py	code changes completed for converting 8 demo apps to use lite version…	Sep 2, 2021
mnist_vit.py	mnist_vit.py	app update to PT 1.0: HelloWorld, Machine Translation, Speech Recogni…	Dec 8, 2021
screenshot1.png	screenshot1.png	iOS demo app for Vision Transformer on DeiT and MNIST	Jan 8, 2021
screenshot2.png	screenshot2.png	iOS demo app for Vision Transformer on DeiT and MNIST	Jan 8, 2021
screenshot3.png	screenshot3.png	iOS demo app for Vision Transformer on DeiT and MNIST	Jan 8, 2021
vit_pytorch.py	vit_pytorch.py	iOS demo app for Vision Transformer on DeiT and MNIST	Jan 8, 2021

README.md

Vision Transformer with ImageNet and MNIST on iOS

Introduction

ImageNet is the most popular image dataset on which the breakthrough of deep learning took place in 2012, the handwritten digit dataset MNIST is the oldest and most commonly used dataset for machine learning, and the Vision Transformer (ViT - blog and paper) is one of the most recent models in deep learning, applying the revolutionary transformer model that was first successfully applied in natural language processing to computer vision.

In this demo app, we'll integrate the two oldest and most popular image datasets with the latest deep learning model and show you:

How to use the Facebook DeiT model, a ViT model pre-trained on ImageNet, for image classification on iOS;
How to train another ViT model on MNIST and convert it to TorchScript to use on iOS for handwritten digit recognition.

Prerequisites

PyTorch 1.10 (Optional)
Python 3.8 (Optional)
iOS Cocoapods LibTorch-Lite 1.10.0
Xcode 12 or later

Quick Start on Using Facebook DeiT

1. Prepare the Model (Optional)

To use a pre-trained Facebook DeiT model and convert it to TorchScript, first install PyTorch 1.10, then install timm using pip install timm==0.3.2, and finally run the following script:

python convert_deit.py

This will generate the quantized scripted model named fbdeit.pt, which can also be downloaded here. Note that the quantization code in the script reduces the model size from 346MB to 89MB.

To train and convert your own DeiT model on ImageNet, first follow the instructions under Data Preparation and Training at the DeiT repo, then simply run the following code after model is trained:

from torch.utils.mobile_optimizer import optimize_for_mobile
ts_model = torch.jit.script(model)
optimized_torchscript_model = optimize_for_mobile(ts_model)
optimized_torchscript_model.save("fbdeit.pt")

2. Run the Model on iOS

The official PyTorch iOS HelloWorld example app uses MobileNet v2 for image classification. To run the converted fbdeit.pt on iOS, first follow the steps in the HelloWorld example to make sure it works (the LibTorch version used in the Podfile needs to be 1.10, to be consistent with the PyTorch version used to generate the model file).

Then, drag and drop the fdbeit.pt model generated or downloaded in Step 1 to the HelloWorld Xcode project.

Finally, change the line of code in the project's ViewController.swift file from:

if let filePath = Bundle.main.path(forResource: "model", ofType: "pt"),

to

if let filePath = Bundle.main.path(forResource: "fbdeit", ofType: "pt"),

Run the app in Xcode and you'll see the same image classification result.

Quick Start on Using ViT for MNIST

To Test Run the iOS ViT4MNIST demo app, follow the steps below:

1. Prepare the Model (Optional)

On a Terminal, with PyTorch 1.10 and einops installed, run:

python mnist_vit.py

The model definition in vit_pytorch.py and training code in mnist_vit.py are mostly taken from the blog here. After the training, which takes about 20 minutes on a MacBook Pro, the model is saved as vit4mnist.pt and then dynamic-quantized, converted to TorchScript, optimized, and saved as vit4mnist.ptl, which should be the same as the one already added in the app project.

2. Use LibTorch-Lite

Run the commands below:

cd ViT4MNIST
pod install
open ViT4MNIST.xcworkspace/

3. Build and run with Xcode

Select an iOS simulator or device on Xcode to run the app. Some example results are as follows:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

ViT4MNIST

ViT4MNIST

README.md

Vision Transformer with ImageNet and MNIST on iOS

Introduction

Prerequisites

Quick Start on Using Facebook DeiT

1. Prepare the Model (Optional)

2. Run the Model on iOS

Quick Start on Using ViT for MNIST

1. Prepare the Model (Optional)

2. Use LibTorch-Lite

3. Build and run with Xcode

Files

ViT4MNIST

Directory actions

More options

Directory actions

More options

Latest commit

History

ViT4MNIST

Folders and files

parent directory

README.md

Vision Transformer with ImageNet and MNIST on iOS

Introduction

Prerequisites

Quick Start on Using Facebook DeiT

1. Prepare the Model (Optional)

2. Run the Model on iOS

Quick Start on Using ViT for MNIST

1. Prepare the Model (Optional)

2. Use LibTorch-Lite

3. Build and run with Xcode