{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ijGzTHJJUCPY" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VEqbX8OhE8y9" }, "source": [ "# Sheet Music Analysis with Gemini\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \"Google
Run in Colab\n", "
\n", "
\n", " \n", " \"Google
Run in Colab Enterprise\n", "
\n", "
\n", " \n", " \"GitHub
View on GitHub\n", "
\n", "
\n", " \n", " \"Vertex
Open in Vertex AI Workbench\n", "
\n", "
\n", " \n", " \"Google
Open in Cloud Skills Boost\n", "
\n", "
\n", "\n", "
\n", "\n", "Share to:\n", "\n", "\n", " \"LinkedIn\n", "\n", "\n", "\n", " \"Bluesky\n", "\n", "\n", "\n", " \"X\n", "\n", "\n", "\n", " \"Reddit\n", "\n", "\n", "\n", " \"Facebook\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "id": "5ad877ea09dd" }, "source": [ "| Author |\n", "| --- |\n", "| [Holt Skinner](https://github.com/holtskinner) |" ] }, { "cell_type": "markdown", "metadata": { "id": "CkHPv2myT2cx" }, "source": [ "## Overview\n", "\n", "[Sheet Music](https://en.wikipedia.org/wiki/Sheet_music) is the primary form of music notation used by composers and performers across the world. These pages contain information about the lyrics, pitches, rhythms, composer, text author, composition date, among others.\n", "\n", "This notebook illustrates using Gemini to extract this metadata from sheet music PDFs.\n", "\n", "These prompts and documents were demonstrated in the [Google Cloud Next 2024 session \"What's next with Gemini: Driving business impact with multimodal use cases\"](https://www.youtube.com/watch?v=DqH1R9Pk5RI).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "r11Gu7qNgx1p" }, "source": [ "## Getting Started\n" ] }, { "cell_type": "markdown", "metadata": { "id": "No17Cw5hgx12" }, "source": [ "### Install Google Gen AI SDK for Python" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tFy3H3aPgx12" }, "outputs": [], "source": [ "%pip install --upgrade -q google-genai PyPDF2" ] }, { "cell_type": "markdown", "metadata": { "id": "dmWOrTJ3gx13" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NyKGtVQjgx13" }, "outputs": [], "source": [ "import sys\n", "\n", "# Additional authentication is required for Google Colab\n", "if \"google.colab\" in sys.modules:\n", " # Authenticate user to Google Cloud\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "DF4l8DTdWgPY" }, "source": [ "### Set Google Cloud project information and create client\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", "\n", "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Nqwi-5ufWp_B" }, "outputs": [], "source": [ "# Use the environment variable if the user doesn't provide Project ID.\n", "import os\n", "\n", "from google import genai\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"global\")\n", "\n", "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "jXHfaVS66_01" }, "source": [ "### Import libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lslYAvw37JGQ" }, "outputs": [], "source": [ "import json\n", "\n", "from IPython.display import Markdown, display\n", "import PyPDF2\n", "from google.genai.types import (\n", " GenerateContentConfig,\n", " GoogleSearch,\n", " HarmBlockThreshold,\n", " HarmCategory,\n", " Part,\n", " SafetySetting,\n", " Tool,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "FTMywdzUORIA" }, "source": [ "### Load the Gemini model\n", "\n", "Gemini is a multimodal model that supports multimodal prompts. You can include text, image(s), PDFs, audio, and video in your prompt requests." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7b76801cc9e4" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.0-flash\" # @param {type: \"string\"}\n", "\n", "config = GenerateContentConfig(\n", " temperature=1.0,\n", " max_output_tokens=8192,\n", " safety_settings=[\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,\n", " threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,\n", " )\n", " ],\n", " system_instruction=\"You are an expert in musicology and music history.\",\n", " tools=[Tool(google_search=GoogleSearch())],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "Wy75sLb-yjNn" }, "source": [ "## Extract Structured Metadata from Sheet Music PDF\n", "\n", "For this example, we will be using the popular classical music book [24 Italian Songs and Arias of the 17th and 18th Centuries](https://imslp.org/wiki/24_Italian_Songs_and_Arias_of_the_17th_and_18th_Centuries_(Various)), and extracting metadata about each song in the book." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0ed417af1e5c" }, "outputs": [], "source": [ "sheet_music_pdf_uri = \"gs://github-repo/use-cases/sheet-music/24ItalianSongs.pdf\"\n", "\n", "sheet_music_extraction_prompt = \"\"\"The following document is a book of sheet music. Your task is to output structured metadata about every piece of music in the document.\n", "\n", "Correct any mistakes that are in the document and fill in missing information when not present in the document.\n", "\n", "Include the following details:\n", "\n", "Title\n", "Composer with lifetime\n", "Tempo Marking\n", "A description of the piece\n", "\"\"\"\n", "\n", "# Send to Gemini\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " sheet_music_extraction_prompt,\n", " # Load file directly from Google Cloud Storage\n", " Part.from_uri(\n", " file_uri=sheet_music_pdf_uri,\n", " mime_type=\"application/pdf\",\n", " ),\n", " ],\n", " config=config,\n", ")\n", "\n", "# Display results\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "29b40b5e0cb0" }, "source": [ "You can see that Gemini extracted all of the relevant fields from the document." ] }, { "cell_type": "markdown", "metadata": { "id": "db3765c7645d" }, "source": [ "### Song Identification with Audio\n", "\n", "Now, let's try something more challenging, identifying a song being performed based on the sheet music. We have an audio clip of Holt Skinner performing one of the songs in the book, and we will ask Gemini to identify it based on the sheet music PDF." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "61ea3f2f1c4a" }, "outputs": [], "source": [ "song_identification_prompt = \"\"\"Based on the sheet music PDF, what song is in the audio clip? Explain how you made the decision.\"\"\"\n", "\n", "# Load PDF file\n", "pdf_part = Part.from_uri(\n", " file_uri=sheet_music_pdf_uri,\n", " mime_type=\"application/pdf\",\n", ")\n", "\n", "audio_part = Part.from_uri(\n", " file_uri=\"gs://github-repo/use-cases/sheet-music/24ItalianClip.mp3\",\n", " mime_type=\"audio/mpeg\",\n", ")\n", "\n", "# Send to Gemini\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[pdf_part, audio_part, song_identification_prompt],\n", " config=config,\n", ")\n", "\n", "# Display results\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "9730e8a5628b" }, "source": [ "### Edit PDF Metadata\n", "\n", "Next, we'll use the output from Gemini to edit the metadata of a PDF containing one song, which can make it easier to organize this file in sheet music applications.\n", "\n", "We'll adjust the prompt slightly and set the [`response_mime_type`](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini#:~:text=in%20the%20list.-,responseMimeType,-(Preview)) to get the response in JSON format." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "97e2a06cc762" }, "outputs": [], "source": [ "sheet_music_pdf_uri = \"gs://github-repo/use-cases/sheet-music/SebbenCrudele.pdf\"\n", "output_file_name = \"SebbenCrudele.pdf\"\n", "\n", "sheet_music_extraction_prompt = \"\"\"The following document is a piece of sheet music. Your task is to output structured metadata about the piece of music in the document. Correct any mistakes that are in the document and fill in missing information when not present in the document.\n", "\n", "Output the data in the following JSON format:\n", "\n", "{\n", " \"/Title\": \"Title of the piece\",\n", " \"/Author\": \"Composer(s) of the piece\",\n", " \"/Subject\": \"Music Genre(s) in a comma separated list\",\n", "}\n", "\n", "\"\"\"\n", "\n", "# Load file directly from Google Cloud Storage\n", "file_part = Part.from_uri(\n", " file_uri=sheet_music_pdf_uri,\n", " mime_type=\"application/pdf\",\n", ")\n", "\n", "config.response_mime_type = \"application/json\"\n", "\n", "# Send to Gemini\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[sheet_music_extraction_prompt, file_part],\n", " config=config,\n", ")\n", "\n", "# Display results\n", "display(Markdown(response.text))\n", "\n", "new_metadata = json.loads(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "8e997cb6affc" }, "source": [ "Next, we'll download the PDF from the GCS Bucket and edit the metadata using the [`PyPDF2`](https://pypdf2.readthedocs.io/en/3.x/) library." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "879f827c537a" }, "outputs": [], "source": [ "! gcloud storage cp {sheet_music_pdf_uri} {output_file_name}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e81759999d78" }, "outputs": [], "source": [ "def edit_pdf_metadata(file_path: str, new_metadata: dict) -> None:\n", " \"\"\"Edits metadata of a PDF file.\n", "\n", " Args:\n", " file_path (str): Path to the PDF file.\n", " new_metadata (dict): Dictionary containing the new metadata fields and values.\n", " Example: {'/Author': 'John Doe', '/Title': 'My Report'}\n", " \"\"\"\n", " with open(file_path, \"rb\") as pdf_file:\n", " pdf_reader = PyPDF2.PdfReader(pdf_file)\n", " pdf_writer = PyPDF2.PdfWriter()\n", "\n", " pdf_writer.clone_reader_document_root(pdf_reader)\n", " pdf_writer.add_metadata(new_metadata)\n", "\n", " with open(file_path, \"wb\") as out_file:\n", " pdf_writer.write(out_file)\n", "\n", "\n", "edit_pdf_metadata(output_file_name, new_metadata)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jtbQo-df1TTu" }, "outputs": [], "source": [ "pdf_reader = PyPDF2.PdfReader(output_file_name)\n", "print(pdf_reader.metadata)" ] } ], "metadata": { "colab": { "name": "sheet_music.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }