{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "id": "copyright" }, "outputs": [], "source": [ "# Copyright 2020 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "title" }, "source": [ "# Custom training and batch prediction\n", "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \"Google
Open in Colab\n", "
\n", "
\n", " \n", " \"Google
Open in Colab Enterprise\n", "
\n", "
\n", " \n", " \"Vertex
Open in Workbench\n", "
\n", "
\n", " \n", " \"GitHub
View on GitHub\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "overview:custom" }, "source": [ "## Overview\n", "\n", "\n", "This tutorial demonstrates how to use the Vertex AI SDK for Python to train and deploy a custom image classification model for batch prediction.\n", "\n", "Learn more about [Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training) and [Vertex AI Batch Prediction](https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions)." ] }, { "cell_type": "markdown", "metadata": { "id": "objective:custom,training,online_prediction" }, "source": [ "### Objective\n", "\n", "In this tutorial, you learn to use Vertex AI Training to create a custom trained model and use Vertex AI Batch Prediction to do a batch prediction on the trained model.\n", "\n", "\n", "Create a custom-trained model from a Python script in a Docker container using the Vertex AI SDK for Python, and then do a prediction on the deployed model by sending data. Alternatively, you can create custom-trained models using `gcloud` command-line tool, or online using the Cloud Console.\n", "\n", "This tutorial uses the following Google Cloud ML services:\n", "\n", "- Vertex AI Training\n", "- Vertex AI Batch Prediction\n", "- Vertex AI Model resource\n", "\n", "\n", "The steps performed include:\n", "\n", "- Create a Vertex AI custom job for training a TensorFlow model.\n", "- Upload the trained model artifacts as a model resource.\n", "- Make a batch prediction." ] }, { "cell_type": "markdown", "metadata": { "id": "dataset:custom,cifar10,icn" }, "source": [ "### Dataset\n", "\n", "The dataset used for this tutorial is the [cifar10 dataset](https://www.tensorflow.org/datasets/catalog/cifar10) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). The version of the dataset you will use is built into TensorFlow. The trained model predicts which type of class an image is from ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck." ] }, { "cell_type": "markdown", "metadata": { "id": "costs" }, "source": [ "### Costs\n", "\n", "This tutorial uses billable components of Google Cloud (GCP):\n", "\n", "* Vertex AI\n", "* Cloud Storage\n", "\n", "Learn about [Vertex AI\n", "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", "Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "3b1ffd5ab768" }, "source": [ "## Get started" ] }, { "cell_type": "markdown", "metadata": { "id": "aae9ca040eab" }, "source": [ "### Install Vertex AI SDK for Python and other required packages\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "23e23ce735c9" }, "outputs": [], "source": [ "! pip3 install --upgrade google-cloud-aiplatform \\\n", " google-cloud-storage \\\n", " pillow \\\n", " numpy " ] }, { "cell_type": "markdown", "metadata": { "id": "ff555b32bab8" }, "source": [ "### Restart runtime (Colab only)\n", "\n", "To use the newly installed packages, you must restart the runtime on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f09b4dff629a" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "54c5ef8a8f43" }, "source": [ "
\n", "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "id": "92e68cfc3a90" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "Authenticate your environment on Google Colab.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "46604f70e831" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "107c51893a64" }, "source": [ "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "294fe4e5a671" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "LOCATION = \"us-central1\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "ddbea904fbe5" }, "source": [ "#### Create a Cloud Storage bucket\n", "\n", "Create a storage bucket to store intermediate artifacts such as datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "751138cf3bd5" }, "outputs": [], "source": [ "BUCKET_URI = f\"gs://your-bucket-name-{PROJECT_ID}-unique\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "58cb4f5895f0" }, "source": [ "**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5e1288505682" }, "outputs": [], "source": [ "! gsutil mb -l $LOCATION -p $PROJECT_ID $BUCKET_URI" ] }, { "cell_type": "markdown", "metadata": { "id": "fb03963cdb69" }, "source": [ "#### Initialize Vertex AI SDK for Python\n", "\n", "Initialize the Vertex AI SDK for Python for your project." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a4f61991b160" }, "outputs": [], "source": [ "from google.cloud import aiplatform\n", "\n", "aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)" ] }, { "cell_type": "markdown", "metadata": { "id": "setup_vars" }, "source": [ "### Import libraries and define constants" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9dcd3eedadfb" }, "outputs": [], "source": [ "import os" ] }, { "cell_type": "markdown", "metadata": { "id": "container:training,prediction" }, "source": [ "### Set pre-built containers\n", "\n", "Vertex AI provides pre-built containers to run training and prediction.\n", "\n", "For the latest list, see [Pre-built containers for training](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) and [Pre-built containers for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1u1mr18jlugv" }, "outputs": [], "source": [ "TRAIN_VERSION = \"tf-cpu.2-9\"\n", "DEPLOY_VERSION = \"tf2-cpu.2-9\"\n", "\n", "TRAIN_IMAGE = \"us-docker.pkg.dev/vertex-ai/training/{}:latest\".format(TRAIN_VERSION)\n", "DEPLOY_IMAGE = \"us-docker.pkg.dev/vertex-ai/prediction/{}:latest\".format(DEPLOY_VERSION)" ] }, { "cell_type": "markdown", "metadata": { "id": "tutorial_start:custom" }, "source": [ "# Tutorial\n", "\n", "Now you are ready to start creating your own custom-trained model with CIFAR10." ] }, { "cell_type": "markdown", "metadata": { "id": "train_custom_model" }, "source": [ "## Train a model\n", "\n", "There are two ways you can train a custom model using a container image:\n", "\n", "- **Use a Google Cloud prebuilt container**. If you use a prebuilt container, you will additionally specify a Python package to install into the container image. This Python package contains your code for training a custom model.\n", "\n", "- **Use your own custom container image**. If you use your own container, the container needs to contain your code for training a custom model." ] }, { "cell_type": "markdown", "metadata": { "id": "train_custom_job_args" }, "source": [ "### Define the command args for the training script\n", "\n", "Prepare the command-line arguments to pass to your training script.\n", "- `args`: The command line arguments to pass to the corresponding Python module. In this example, they will be:\n", " - `\"--epochs=\" + EPOCHS`: The number of epochs for training.\n", " - `\"--steps=\" + STEPS`: The number of steps (batches) per epoch.\n", " - `\"--distribute=\" + TRAIN_STRATEGY\"` : The training distribution strategy to use for single or distributed training.\n", " - `\"single\"`: single device.\n", " - `\"mirror\"`: all GPU devices on a single compute instance.\n", " - `\"multi\"`: all GPU devices on all compute instances." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1npiDcUtlugw" }, "outputs": [], "source": [ "JOB_NAME = \"custom_job_unique\"\n", "MODEL_DIR = \"{}/{}\".format(BUCKET_URI, JOB_NAME)\n", "\n", "\n", "TRAIN_STRATEGY = \"single\"\n", "\n", "EPOCHS = 20\n", "STEPS = 100\n", "\n", "CMDARGS = [\n", " \"--epochs=\" + str(EPOCHS),\n", " \"--steps=\" + str(STEPS),\n", " \"--distribute=\" + TRAIN_STRATEGY,\n", "]" ] }, { "cell_type": "markdown", "metadata": { "id": "taskpy_contents" }, "source": [ "#### Training script\n", "\n", "In the next cell, you will write the contents of the training script, `task.py`. In summary:\n", "\n", "- Get the directory where to save the model artifacts from the environment variable `AIP_MODEL_DIR`. This variable is set by the training service.\n", "- Loads CIFAR10 dataset from TF Datasets (tfds).\n", "- Builds a model using TF.Keras model API.\n", "- Compiles the model (`compile()`).\n", "- Sets a training distribution strategy according to the argument `args.distribute`.\n", "- Trains the model (`fit()`) with epochs and steps according to the arguments `args.epochs` and `args.steps`\n", "- Saves the trained model (`save(MODEL_DIR)`) to the specified model directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "72rUqXNFlugx" }, "outputs": [], "source": [ "%%writefile task.py\n", "# Single, Mirror and Multi-Machine Distributed Training for CIFAR-10\n", "\n", "import tensorflow_datasets as tfds\n", "import tensorflow as tf\n", "from tensorflow.python.client import device_lib\n", "import argparse\n", "import os\n", "import sys\n", "tfds.disable_progress_bar()\n", "\n", "parser = argparse.ArgumentParser()\n", "parser.add_argument('--lr', dest='lr',\n", " default=0.01, type=float,\n", " help='Learning rate.')\n", "parser.add_argument('--epochs', dest='epochs',\n", " default=10, type=int,\n", " help='Number of epochs.')\n", "parser.add_argument('--steps', dest='steps',\n", " default=200, type=int,\n", " help='Number of steps per epoch.')\n", "parser.add_argument('--distribute', dest='distribute', type=str, default='single',\n", " help='distributed training strategy')\n", "args = parser.parse_args()\n", "\n", "print('Python Version = {}'.format(sys.version))\n", "print('TensorFlow Version = {}'.format(tf.__version__))\n", "print('TF_CONFIG = {}'.format(os.environ.get('TF_CONFIG', 'Not found')))\n", "print('DEVICES', device_lib.list_local_devices())\n", "\n", "# Single Machine, single compute device\n", "if args.distribute == 'single':\n", " if tf.test.is_gpu_available():\n", " strategy = tf.distribute.OneDeviceStrategy(device=\"/gpu:0\")\n", " else:\n", " strategy = tf.distribute.OneDeviceStrategy(device=\"/cpu:0\")\n", "# Single Machine, multiple compute device\n", "elif args.distribute == 'mirror':\n", " strategy = tf.distribute.MirroredStrategy()\n", "# Multiple Machine, multiple compute device\n", "elif args.distribute == 'multi':\n", " strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()\n", "\n", "# Multi-worker configuration\n", "print('num_replicas_in_sync = {}'.format(strategy.num_replicas_in_sync))\n", "\n", "# Preparing dataset\n", "BUFFER_SIZE = 10000\n", "BATCH_SIZE = 64\n", "\n", "def make_datasets_unbatched():\n", " # Scaling CIFAR10 data from (0, 255] to (0., 1.]\n", " def scale(image, label):\n", " image = tf.cast(image, tf.float32)\n", " image /= 255.0\n", " return image, label\n", "\n", " datasets, info = tfds.load(name='cifar10',\n", " with_info=True,\n", " as_supervised=True)\n", " return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE).repeat()\n", "\n", "\n", "# Build the Keras model\n", "def build_and_compile_cnn_model():\n", " model = tf.keras.Sequential([\n", " tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(32, 32, 3)),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Conv2D(32, 3, activation='relu'),\n", " tf.keras.layers.MaxPooling2D(),\n", " tf.keras.layers.Flatten(),\n", " tf.keras.layers.Dense(10, activation='softmax')\n", " ])\n", " model.compile(\n", " loss=tf.keras.losses.sparse_categorical_crossentropy,\n", " optimizer=tf.keras.optimizers.SGD(learning_rate=args.lr),\n", " metrics=['accuracy'])\n", " return model\n", "\n", "# Train the model\n", "NUM_WORKERS = strategy.num_replicas_in_sync\n", "# Here the batch size scales up by number of workers since\n", "# `tf.data.Dataset.batch` expects the global batch size.\n", "GLOBAL_BATCH_SIZE = BATCH_SIZE * NUM_WORKERS\n", "MODEL_DIR = os.getenv(\"AIP_MODEL_DIR\")\n", "\n", "train_dataset = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)\n", "\n", "with strategy.scope():\n", " # Creation of dataset, and model building/compiling need to be within\n", " # `strategy.scope()`.\n", " model = build_and_compile_cnn_model()\n", "\n", "model.fit(x=train_dataset, epochs=args.epochs, steps_per_epoch=args.steps)\n", "model.save(MODEL_DIR)" ] }, { "cell_type": "markdown", "metadata": { "id": "train_custom_job" }, "source": [ "### Train the model\n", "\n", "Define your custom training job on Vertex AI.\n", "\n", "Use the `CustomTrainingJob` class to define the job, which takes the following parameters:\n", "\n", "- `display_name`: The user-defined name of this training pipeline.\n", "- `script_path`: The local path to the training script.\n", "- `container_uri`: The URI of the training container image.\n", "- `requirements`: The list of Python package dependencies of the script.\n", "- `model_serving_container_image_uri`: The URI of a container that can serve predictions for your model — either a prebuilt container or a custom container.\n", "\n", "Use the `run` function to start training, which takes the following parameters:\n", "\n", "- `args`: The command line arguments to be passed to the Python script.\n", "- `replica_count`: The number of worker replicas.\n", "- `model_display_name`: The display name of the `Model` if the script produces a managed `Model`.\n", "- `machine_type`: The type of machine to use for training.\n", "- `accelerator_type`: The hardware accelerator type.\n", "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n", "\n", "The `run` function creates a training pipeline that trains and creates a `Model` object. After the training pipeline completes, the `run` function returns the `Model` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mxIxvDdglugx" }, "outputs": [], "source": [ "job = aiplatform.CustomTrainingJob(\n", " display_name=JOB_NAME,\n", " script_path=\"task.py\",\n", " container_uri=TRAIN_IMAGE,\n", " requirements=[\"tensorflow_datasets==1.3.0\"],\n", " model_serving_container_image_uri=DEPLOY_IMAGE,\n", ")\n", "\n", "MODEL_DISPLAY_NAME = \"model_unique\"\n", "\n", "# Start the training\n", "\n", "model = job.run(\n", " model_display_name=MODEL_DISPLAY_NAME,\n", " args=CMDARGS,\n", " replica_count=1,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "make_prediction" }, "source": [ "## Make a batch prediction request\n", "\n", "Send a batch prediction request to your deployed model." ] }, { "cell_type": "markdown", "metadata": { "id": "get_test_item:test" }, "source": [ "### Get test data\n", "\n", "Download images from the CIFAR dataset and preprocess them.\n", "\n", "#### Download the test images\n", "\n", "Download the provided set of images from the CIFAR dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "E1EQBPGnlugz" }, "outputs": [], "source": [ "# Download the images\n", "! gsutil -m cp -r gs://cloud-samples-data/ai-platform-unified/cifar_test_images ." ] }, { "cell_type": "markdown", "metadata": { "id": "prepare_test_item:test,image" }, "source": [ "#### Preprocess the images\n", "Before you can run the data through the endpoint, you need to preprocess it to match the format that your custom model defined in `task.py` expects.\n", "\n", "`x_test`:\n", "Normalize (rescale) the pixel data by dividing each pixel by 255. This replaces each single byte integer pixel with a 32-bit floating point number between 0 and 1.\n", "\n", "`y_test`:\n", "You can extract the labels from the image filenames. Each image's filename format is \"image_{LABEL}_{IMAGE_NUMBER}.jpg\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cl59KGnXlugz" }, "outputs": [], "source": [ "import numpy as np\n", "from PIL import Image\n", "\n", "# Load image data\n", "IMAGE_DIRECTORY = \"cifar_test_images\"\n", "\n", "image_files = [file for file in os.listdir(IMAGE_DIRECTORY) if file.endswith(\".jpg\")]\n", "\n", "# Decode JPEG images into numpy arrays\n", "image_data = [\n", " np.asarray(Image.open(os.path.join(IMAGE_DIRECTORY, file))) for file in image_files\n", "]\n", "\n", "# Scale and convert to expected format\n", "x_test = [(image / 255.0).astype(np.float32).tolist() for image in image_data]\n", "\n", "# Extract labels from image name\n", "y_test = [int(file.split(\"_\")[1]) for file in image_files]" ] }, { "cell_type": "markdown", "metadata": { "id": "b1e29665076f" }, "source": [ "#### Prepare data for batch prediction\n", "Before you can run the data through batch prediction, you need to save the data into one of a few possible formats.\n", "\n", "For this tutorial, use JSONL as it's compatible with the 3-dimensional list that each image is currently represented in. To do this:\n", "\n", "1. In a file, write each instance as JSON on its own line.\n", "2. Upload this file to Cloud Storage.\n", "\n", "For more details on batch prediction input formats: https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions#batch_request_input" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3e6b04d29c3b" }, "outputs": [], "source": [ "import json\n", "\n", "BATCH_PREDICTION_INSTANCES_FILE = \"batch_prediction_instances.jsonl\"\n", "\n", "BATCH_PREDICTION_GCS_SOURCE = (\n", " BUCKET_URI + \"/batch_prediction_instances/\" + BATCH_PREDICTION_INSTANCES_FILE\n", ")\n", "\n", "# Write instances at JSONL\n", "with open(BATCH_PREDICTION_INSTANCES_FILE, \"w\") as f:\n", " for x in x_test:\n", " f.write(json.dumps(x) + \"\\n\")\n", "\n", "# Upload to Cloud Storage bucket\n", "! gsutil cp $BATCH_PREDICTION_INSTANCES_FILE $BATCH_PREDICTION_GCS_SOURCE\n", "\n", "print(\"Uploaded instances to: \", BATCH_PREDICTION_GCS_SOURCE)" ] }, { "cell_type": "markdown", "metadata": { "id": "send_prediction_request:image" }, "source": [ "### Send the prediction request\n", "\n", "To make a batch prediction request, call the model object's `batch_predict` method with the following parameters: \n", "- `instances_format`: The format of the batch prediction request file: \"jsonl\", \"csv\", \"bigquery\", \"tf-record\", \"tf-record-gzip\" or \"file-list\"\n", "- `prediction_format`: The format of the batch prediction response file: \"jsonl\", \"csv\", \"bigquery\", \"tf-record\", \"tf-record-gzip\" or \"file-list\"\n", "- `job_display_name`: The human readable name for the prediction job.\n", " - `gcs_source`: A list of one or more Cloud Storage paths to your batch prediction requests.\n", "- `gcs_destination_prefix`: The Cloud Storage path that the service will write the predictions to.\n", "- `model_parameters`: Additional filtering parameters for serving prediction results.\n", "- `machine_type`: The type of machine to use for training.\n", "- `accelerator_type`: The hardware accelerator type.\n", "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n", "- `starting_replica_count`: The number of compute instances to initially provision.\n", "- `max_replica_count`: The maximum number of compute instances to scale to. In this tutorial, only one instance is provisioned.\n", "\n", "### Compute instance scaling\n", "\n", "You can specify a single instance (or node) to process your batch prediction request. This tutorial uses a single node, so the variables `MIN_NODES` and `MAX_NODES` are both set to `1`.\n", "\n", "If you want to use multiple nodes to process your batch prediction request, set `MAX_NODES` to the maximum number of nodes you want to use. Vertex AI autoscales the number of nodes used to serve your predictions, up to the maximum number you set. Refer to the [pricing page](https://cloud.google.com/vertex-ai/pricing#prediction-prices) to understand the costs of autoscaling with multiple nodes.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1cf1076178fc" }, "outputs": [], "source": [ "MIN_NODES = 1\n", "MAX_NODES = 1\n", "\n", "# The name of the job\n", "BATCH_PREDICTION_JOB_NAME = \"cifar10_batch_prediction_unique\"\n", "\n", "# Folder in the bucket to write results to\n", "DESTINATION_FOLDER = \"batch_prediction_results\"\n", "\n", "# The Cloud Storage bucket to upload results to\n", "BATCH_PREDICTION_GCS_DEST_PREFIX = BUCKET_URI + \"/\" + DESTINATION_FOLDER\n", "\n", "# Make SDK batch_predict method call\n", "batch_prediction_job = model.batch_predict(\n", " instances_format=\"jsonl\",\n", " predictions_format=\"jsonl\",\n", " job_display_name=BATCH_PREDICTION_JOB_NAME,\n", " gcs_source=BATCH_PREDICTION_GCS_SOURCE,\n", " gcs_destination_prefix=BATCH_PREDICTION_GCS_DEST_PREFIX,\n", " model_parameters=None,\n", " starting_replica_count=MIN_NODES,\n", " max_replica_count=MAX_NODES,\n", " machine_type=\"n1-standard-4\",\n", " sync=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "df3b3d1bdd24" }, "source": [ "### Retrieve batch prediction results\n", "When the batch prediction is done processing, you can finally view the predictions stored at the Cloud Storage path you set as output. The predictions will be in a JSONL format, which you indicated when you created the batch prediction job. The predictions are located in a subdirectory starting with the name prediction. Within that directory, there is a file named prediction.results-xxxx-of-xxxx.\n", "\n", "Let's display the contents. You will get a row for each prediction. The row is the softmax probability distribution for the corresponding CIFAR10 classes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2f10b13b2b88" }, "outputs": [], "source": [ "RESULTS_DIRECTORY = \"prediction_results\"\n", "RESULTS_DIRECTORY_FULL = RESULTS_DIRECTORY + \"/\" + DESTINATION_FOLDER\n", "\n", "# Create missing directories\n", "os.makedirs(RESULTS_DIRECTORY, exist_ok=True)\n", "\n", "# Get the Cloud Storage paths for each result\n", "! gsutil -m cp -r $BATCH_PREDICTION_GCS_DEST_PREFIX $RESULTS_DIRECTORY\n", "\n", "# Get most recently modified directory\n", "latest_directory = max(\n", " (\n", " os.path.join(RESULTS_DIRECTORY_FULL, d)\n", " for d in os.listdir(RESULTS_DIRECTORY_FULL)\n", " ),\n", " key=os.path.getmtime,\n", ")\n", "\n", "# Get downloaded results in directory\n", "results_files = []\n", "for dirpath, subdirs, files in os.walk(latest_directory):\n", " for file in files:\n", " if file.startswith(\"prediction.results\"):\n", " results_files.append(os.path.join(dirpath, file))\n", "\n", "# Consolidate all the results into a list\n", "results = []\n", "for results_file in results_files:\n", " # Download each result\n", " with open(results_file, \"r\") as file:\n", " results.extend([json.loads(line) for line in file.readlines()])" ] }, { "cell_type": "markdown", "metadata": { "id": "962b5a11fdae" }, "source": [ "### Evaluate results\n", "\n", "You can then run a quick evaluation on the prediction results:\n", "\n", "1. `np.argmax`: Convert each list of confidence levels to a label\n", "2. Compare the predicted labels to the actual labels\n", "3. Calculate `accuracy` as `correct/total`\n", "\n", "To improve the accuracy, try training for a higher number of epochs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UywuX7fRlugz" }, "outputs": [], "source": [ "y_predicted = [np.argmax(result[\"prediction\"]) for result in results]\n", "\n", "correct = sum(y_predicted == np.array(y_test))\n", "accuracy = len(y_predicted)\n", "print(\n", " f\"Correct predictions = {correct}, Total predictions = {accuracy}, Accuracy = {correct/accuracy}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "cleanup:custom" }, "source": [ "# Cleaning up\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial:\n", "\n", "- Training Job\n", "- Model\n", "- Cloud Storage Bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NNmebHf7lug0" }, "outputs": [], "source": [ "# Warning: Setting this to true will delete everything in your bucket\n", "delete_bucket = False\n", "\n", "# Delete the training job\n", "job.delete()\n", "\n", "# Delete the model\n", "model.delete()\n", "\n", "if delete_bucket:\n", " ! gsutil -m rm -r $BUCKET_URI" ] } ], "metadata": { "colab": { "name": "sdk-custom-image-classification-batch.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }