{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "R2-i8jBl9GRH" }, "source": [ "![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)\n", "\n", "# Introduction to Redis Python\n", "\n", "This notebook introduces [Redis](https://redis.io) and the standard Python client, [redis-py](https://redis-py.readthedocs.io/en/stable/), for interacting with the database. We will explore the basics of Redis setup, data structures, and capabilities like vector search!\n", "\n", "## Let's Begin!\n", "\"Open\n" ] }, { "cell_type": "markdown", "metadata": { "id": "2UG2-tksuPpl" }, "source": [ "## Environment Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "ojfRVcU2uPpr" }, "source": [ "### Install Python Dependencies" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Tpqa4wdIuPps", "outputId": "98d012ed-c460-4d46-bcb2-fdbd86589454" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.2\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "%pip install -q redis pandas" ] }, { "cell_type": "markdown", "metadata": { "id": "NBlbUrB27QQs" }, "source": [ "### Install Redis Stack\n", "\n", "Later in this tutorial, Redis will be used to store, index, and query vector\n", "embeddings created from PDF document chunks. **We need to make sure we have a Redis\n", "instance available." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Colab\n", "Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aKMKXPY2j8Gt", "outputId": "c6cb3b64-0f4c-46cf-df34-7a25a70f4f2d" }, "outputs": [], "source": [ "# NBVAL_SKIP\n", "%%sh\n", "curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg\n", "echo \"deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main\" | sudo tee /etc/apt/sources.list.d/redis.list\n", "sudo apt-get update > /dev/null 2>&1\n", "sudo apt-get install redis-stack-server > /dev/null 2>&1\n", "redis-stack-server --daemonize yes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Alternative Environments\n", "There are many ways to get the necessary redis-stack instance running\n", "1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your\n", "own version of Redis Enterprise running, that works too!\n", "2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)\n", "3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the Redis Connection URL\n", "\n", "By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "dyPfCO3pkB7M" }, "outputs": [], "source": [ "import os\n", "\n", "# Replace values below with your own if using Redis Cloud instance\n", "REDIS_HOST = os.getenv(\"REDIS_HOST\", \"localhost\") # ex: \"redis-18374.c253.us-central1-1.gce.cloud.redislabs.com\"\n", "REDIS_PORT = os.getenv(\"REDIS_PORT\", \"6379\") # ex: 18374\n", "REDIS_PASSWORD = os.getenv(\"REDIS_PASSWORD\", \"\") # ex: \"1TNxTEdYRDgIDKM2gDfasupCADXXXX\"\n", "\n", "# If SSL is enabled on the endpoint, use rediss:// as the URL prefix\n", "REDIS_URL = f\"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hello World Redis\n", "\n", "Now let's connect to the Redis db and get a basic feel for the most common\n", "commands and data structures." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import redis\n", "import json\n", "import numpy as np\n", "\n", "from time import sleep\n", "\n", "# Connect with the Redis Python Client\n", "client = redis.Redis.from_url(REDIS_URL)\n", "\n", "client.ping()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.dbsize() # should be empty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Redis, at it's core, is a simple key/value store. It supports a number of interesting\n", "and flexible data structures that can solve a variatey of business and operational\n", "problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Strings\n", "\n", "The basic string data type can be accessed using set/get methods. You can also place a\n", "TTL policy (expiration) on any key in Redis." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.set(\"hello\", \"world\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'world'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.get(\"hello\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.delete(\"hello\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "client.set(\"hello\", \"world\")\n", "client.expire(\"hello\", time=3)\n", "\n", "sleep(4)\n", "\n", "# should be EMPTY\n", "client.get(\"hello\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hashes\n", "Hashes are collections of key/value pairs that are grouped together. It gets\n", "serialized as a string in Redis, but can hold a variety of data in each field.\n", "\n", "You can think of a Hash as a one-level deep Python dictionary.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "obj = {\n", " \"user\": \"john\",\n", " \"age\": 45,\n", " \"job\": \"dentist\",\n", " \"bio\": \"long form text of john's bio\",\n", " \"user_embedding\": np.array([0.3, 0.4, -0.8], dtype=np.float32).tobytes() # cast vectors to bytes string\n", "}" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.hset(\"user:john\", mapping=obj)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{b'user': b'john',\n", " b'age': b'45',\n", " b'job': b'dentist',\n", " b'bio': b\"long form text of john's bio\",\n", " b'user_embedding': b'\\x9a\\x99\\x99>\\xcd\\xcc\\xcc>\\xcd\\xccL\\xbf'}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.hgetall(\"user:john\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.delete(\"user:john\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON\n", "With the JSON capabilitie enabled, Redis can be a drop-in replacement for MongoDB\n", "or other slower document databases. You can store nested and structured JSON data\n", "directly in Redis." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# set a JSON obj\n", "obj = {\n", " \"user\": \"john\",\n", " \"metadata\": {\n", " \"age\": 45,\n", " \"job\": \"dentist\",\n", " },\n", " \"user_embedding\": [0.3, 0.4, -0.8]\n", "}\n", "\n", "client.json().set(\"user:john\", \"$\", obj)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'user': 'john',\n", " 'metadata': {'age': 45, 'job': 'dentist'},\n", " 'user_embedding': [0.3, 0.4, -0.8]}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get user JSON obj\n", "client.json().get(\"user:john\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# grab array length for embedding field\n", "client.json().arrlen(\"user:john\", \"$.user_embedding\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[b'user', b'metadata', b'user_embedding']]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# grab obj keys\n", "client.json().objkeys(\"user:john\", \"$\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# delete user JSON\n", "client.delete(\"user:john\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lists\n", "Lists store sequences of information... potentially list of messages in an LLM\n", "converstion flow, or really any list of items in a queue." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# add items to a list\n", "client.rpush(\"messages:john\", *[\n", " json.dumps({\"role\": \"user\", \"content\": \"Hello what can you do for me?\"}),\n", " json.dumps({\"role\": \"assistant\", \"content\": \"Hi, I am a helpful virtual assistant.\"})\n", "])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'role': 'user', 'content': 'Hello what can you do for me?'},\n", " {'role': 'assistant', 'content': 'Hi, I am a helpful virtual assistant.'}]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list all items in the list using indices\n", "[json.loads(msg) for msg in client.lrange(\"messages:john\", 0, -1)]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count items in the list\n", "client.llen(\"messages:john\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'{\"role\": \"assistant\", \"content\": \"Hi, I am a helpful virtual assistant.\"}'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pop the first item from the list and push to another list\n", "client.rpoplpush(\"messages:john\", \"read_messages:john\")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[b'{\"role\": \"assistant\", \"content\": \"Hi, I am a helpful virtual assistant.\"}']" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.lrange(\"read_messages:john\", 0, -1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list cleanup\n", "client.delete(\"messages:john\", \"read_messages:john\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pipelines\n", "All Redis commands can be pipelined to gain some round trip latency improvements." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "with client.pipeline(transaction=False) as pipe:\n", " for i in range(50):\n", " pipe.json().set(f\"user:{i}\", \"$\", obj)\n", " # execute batch\n", " pipe.execute()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "50" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "client.dbsize()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# clean up!\n", "client.flushall()" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 0 }