Getting Started with Google Gemini with Python: API Integration and Model Capabilities
Last Updated :
23 Jul, 2025
Google released Gemini, their first truly multimodal device, in three sizes: Ultra, Pro, and Nano, in December. Since each Gemini model is designed for a specific set of use cases, the family of models is adaptable and functions well on a variety of platforms, including devices and data centers.
Gemini models combine and comprehend text, code, graphics, audio, and video with ease since they were designed from the ground up for multimodality. These models have the ability to produce code based on many input types. They can produce both text and visuals, and they can comprehend and carry out multilingual activities.
Let us take a deep dive into the Gemini models:
Gemini Ultra
The largest model designed exclusively for extremely difficult jobs is Gemini Ultra. With support for several languages, it is optimized for high-quality output across complicated tasks like reasoning and coding.This model comprehends and makes sense of text, image, and audio sequences naturally. When combined with alphacode2, it achieves cutting-edge performance and excels at coding. It also performs well on problem sets of competitive grade and possesses advanced analytical capabilities.
It is the first model to beat human specialists on the benchmark known as MMLU (Massive Multitask Language Understanding), which tests a subject's knowledge of the world and capacity to solve problems in 57 different areas, including arithmetic, physics, history, law, medicine, and ethics.
Gemini PRO
The greatest model for overall performance on a variety of jobs is Gemini Pro. With the longest updates of any large-scale foundation model—up to two million tokens—it is inherently multimodal.
With a context window of up to two million tokens, the 1.5 PRO model offers the longest context window of any large-scale foundation model to yet. Reaching almost flawless recall on extended-context retrieval assignments in several modalities, it opens up new possibilities for processing vast volumes of documents, thousands of lines of code, hours of audio and video, and more.
Gemini 1.5 Pro can use text, graphics, audio, and video to carry out extremely complex reasoning tasks.
Gemini Flash
The lightweight Gemini flash variant has been enhanced for speed and effectiveness. In addition to having multimodal reasoning and a lengthy context window with a maximum of one million tokens, it is also reasonably priced.
The primary characteristic of Gemini Flash is its fast construction. For the great majority of enterprise and developer use cases, it has an average first-token latency of less than one second.
Additionally, 1.5 When compared to larger versions, Flash delivers comparable quality at a far lower cost. It can process hundreds of thousands of words or lines of code, as well as hours' worth of audio and video.
With Flash's default one-million-token context window, you can handle codebases with over 30,000 tokens, one hour of video, and eleven hours of audio.
Gemini Nano
Gemini Nano is one of the most efficient models for on-device tasks. It is optimized for providing quick responses , on devices with or without a data network.
It provides Richer and clearer descriptions of images and what’s in them. With its speech transcription feature, you can converse instead of typing because it comprehends what you're saying. Additionally, it provides text summary, which turns emails, papers, and communications into understandable, succinct summaries.
Get started with the Gemini API: Python
1. Run the code in Google colab. Open Google Colab and create a new notebook and install the following dependency.The Python SDK for the Gemini API, is contained in the google-generative ai package. Install the dependency using pip.
Python
!pip install -q -U google-generativeai
2. Import the necessary packages. Note the markdown function is used to format the output generated by the model.
Python
import pathlib
import textwrap
import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown
def to_markdown(text):
text = text.replace('•', ' *')
return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
# Used to securely store your API key
from google.colab import userdata
3. Setup Your API KEY: Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.
A popup will come up to search and select google cloud project.

If your list is empty and you don’t have any project setup go to Google cloud console and create a new project.
4. Setup your API key: Before you can use the Gemini API, you must first obtain an API key. In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name GOOGLE_API_KEY.
Once you have the API key, pass it to the SDK. You can do this in two ways:
a) Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).
b) Pass the key to genai.configure(api_key=...)
Python
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
5. List Models: Now we’re ready to call Gemini Model .Use list_models to see the available Gemini models:
Python
for m in genai.list_models():
if 'generateContent' in m.supported_generation_methods:
print(m.name)
a) Generate text from text inputs
1. For text-only prompts, use the gemini-pro model:
Python
model = genai.GenerativeModel('gemini-1.5-flash')
2. The generate_content method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output. You can pass a prompt string to the GenerativeModel.generate_content method:
Python
response = model.generate_content("What is the meaning of life?")
In simple cases, the response.text accessor is all you need. To display formatted Markdown text, use the to_markdown function
Python
to_markdown(response.text)
Output:
Output text by modelb) Generate text from image and text inputs
1. The GenerativeModel.generate_content API is designed to handle multimodal prompts and returns a text output. Upload any image on colab.
2. Use the gemini-1.5-flash model and pass the image to the model with generate_content. To provide both text and images in a prompt, pass a list containing the strings and images:
Python
import PIL.Image
img = PIL.Image.open('image.jpg')
img
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)
response.resolve()
Output:
image.jpeg (used in the example)3. Print the output using response.text.
Python
to_markdown(response.text)
Output:
This is the output
Similar Reads
Getting Started With Google Colab Google Colab is a cloud-based, interactive notebook platform that combines code, text, images, equations, and more in a single document. Itâs designed to make data science, machine learning, and collaborative Python programming accessible for everyone, not just developers. It is widely used by data
7 min read
Google Gemini AI vs OpenAI ChatGPT: Everything to Know About It The field of AI has seen surprising breakthroughs in recent years, with the advancement of strong generative models that can compose text, pictures, and different kinds of content based on natural language inputs. Two of the most prominent instances of such models are Google Gemini and OpenAI ChatGP
8 min read
Google Gemini Advanced vs ChatGPT Plus: Which AI assistant is best for you? AI technology is quickly advancing, and Google Gemini Advanced along with ChatGPT Plus are now two of the most popular AI assistants on the market. By 2024, the worldwide AI market is predicted to hit $190 billion. A large portion of this will be spent on AI assistants for both personal and professi
7 min read
How to Load a Dataset From the Google Drive to Google Colab Google Colab (short for Collaboratory) is a powerful platform that allows users to code in Python using Jupyter Notebook in the cloud. This free service provided by Google enables users to easily and effectively load a dataset in Google Colab without the need for local resources. One of the advantag
6 min read
Gemini Advanced vs ChatGPT Plus vs Copilot Pro: Which AI Chatbot to Subscribe to? In the fast-paced digital world, chatbots backed by AI are being employed by businesses and individuals to make operations more efficient, improve customer service, and enhance communication channels. Various platforms have been developed with the advent of artificial intelligence each having unique
6 min read
Integrating Vector Databases with LLMs Vector databases enhance LLMs by providing contextual, domain-specific knowledge beyond their training data. This integration solves key LLM limitations like illusions and outdated information by enabling:Retrieval-Augmented Generation (RAG): Retrieve relevant context before response generation.Sema
3 min read