Getting Started with Google Gemini with Python: API Integration and Model Capabilities

Last Updated : 23 Jul, 2025

Google released Gemini, their first truly multimodal device, in three sizes: Ultra, Pro, and Nano, in December. Since each Gemini model is designed for a specific set of use cases, the family of models is adaptable and functions well on a variety of platforms, including devices and data centers.

Gemini models combine and comprehend text, code, graphics, audio, and video with ease since they were designed from the ground up for multimodality. These models have the ability to produce code based on many input types. They can produce both text and visuals, and they can comprehend and carry out multilingual activities.

Table of Content

Let us take a deep dive into the Gemini models:

Gemini Ultra
Gemini PRO
Gemini Flash
Gemini Nano

Get started with the Gemini API: Python

a) Generate text from text inputs
b) Generate text from image and text inputs

Let us take a deep dive into the Gemini models:

Gemini Ultra

The largest model designed exclusively for extremely difficult jobs is Gemini Ultra. With support for several languages, it is optimized for high-quality output across complicated tasks like reasoning and coding.This model comprehends and makes sense of text, image, and audio sequences naturally. When combined with alphacode2, it achieves cutting-edge performance and excels at coding. It also performs well on problem sets of competitive grade and possesses advanced analytical capabilities.

It is the first model to beat human specialists on the benchmark known as MMLU (Massive Multitask Language Understanding), which tests a subject's knowledge of the world and capacity to solve problems in 57 different areas, including arithmetic, physics, history, law, medicine, and ethics.

Gemini PRO

The greatest model for overall performance on a variety of jobs is Gemini Pro. With the longest updates of any large-scale foundation model—up to two million tokens—it is inherently multimodal.

With a context window of up to two million tokens, the 1.5 PRO model offers the longest context window of any large-scale foundation model to yet. Reaching almost flawless recall on extended-context retrieval assignments in several modalities, it opens up new possibilities for processing vast volumes of documents, thousands of lines of code, hours of audio and video, and more.

Gemini 1.5 Pro can use text, graphics, audio, and video to carry out extremely complex reasoning tasks.

Gemini Flash

The lightweight Gemini flash variant has been enhanced for speed and effectiveness. In addition to having multimodal reasoning and a lengthy context window with a maximum of one million tokens, it is also reasonably priced.

The primary characteristic of Gemini Flash is its fast construction. For the great majority of enterprise and developer use cases, it has an average first-token latency of less than one second.

Additionally, 1.5 When compared to larger versions, Flash delivers comparable quality at a far lower cost. It can process hundreds of thousands of words or lines of code, as well as hours' worth of audio and video.

With Flash's default one-million-token context window, you can handle codebases with over 30,000 tokens, one hour of video, and eleven hours of audio.

Gemini Nano

Gemini Nano is one of the most efficient models for on-device tasks. It is optimized for providing quick responses , on devices with or without a data network.

It provides Richer and clearer descriptions of images and what’s in them. With its speech transcription feature, you can converse instead of typing because it comprehends what you're saying. Additionally, it provides text summary, which turns emails, papers, and communications into understandable, succinct summaries.

Get started with the Gemini API: Python

1. Run the code in Google colab. Open Google Colab and create a new notebook and install the following dependency.The Python SDK for the Gemini API, is contained in the google-generative ai package. Install the dependency using pip.

Python

!pip install -q -U google-generativeai

2. Import the necessary packages. Note the markdown function is used to format the output generated by the model.

Python

import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Used to securely store your API key
from google.colab import userdata

3. Setup Your API KEY: Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

A popup will come up to search and select google cloud project.
Screenshot-from-2024-07-02-00-46-16

If your list is empty and you don’t have any project setup go to Google cloud console and create a new project.

4. Setup your API key: Before you can use the Gemini API, you must first obtain an API key. In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name GOOGLE_API_KEY.

Once you have the API key, pass it to the SDK. You can do this in two ways:

a) Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).

b) Pass the key to genai.configure(api_key=...)

Python

# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

5. List Models: Now we’re ready to call Gemini Model .Use list_models to see the available Gemini models:

Python

for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

a) Generate text from text inputs

1. For text-only prompts, use the gemini-pro model:

Python

model = genai.GenerativeModel('gemini-1.5-flash')

2. The generate_content method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output. You can pass a prompt string to the GenerativeModel.generate_content method:

Python

response = model.generate_content("What is the meaning of life?")

In simple cases, the response.text accessor is all you need. To display formatted Markdown text, use the to_markdown function

Python

to_markdown(response.text)

Output:

Screenshot-from-2024-07-10-13-35-45 — Output text by model

b) Generate text from image and text inputs

1. The GenerativeModel.generate_content API is designed to handle multimodal prompts and returns a text output. Upload any image on colab.

2. Use the gemini-1.5-flash model and pass the image to the model with generate_content. To provide both text and images in a prompt, pass a list containing the strings and images:

Python

import PIL.Image

img = PIL.Image.open('image.jpg')
img
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)
response.resolve()

Output:

Screenshot-from-2024-07-10-14-28-58 — image.jpeg (used in the example)

3. Print the output using response.text.

Python

to_markdown(response.text)

Output:

Screenshot-from-2024-07-10-13-39-56 — This is the output

Getting Started With Google Colab

jyotinigam2370

Improve

Article Tags :

Getting Started with Google Gemini with Python: API Integration and Model Capabilities

Let us take a deep dive into the Gemini models:

Gemini Ultra

Gemini PRO

Gemini Flash

Gemini Nano

Get started with the Gemini API: Python

a) Generate text from text inputs

b) Generate text from image and text inputs

Similar Reads

Thank You!

What kind of Experience do you want to share?