Introduction

RIOT-X is a command-line utility to get data in and out of Redis. It supports Redis Cloud and Redis Software and includes the following features:

Files (CSV, JSON, XML, Parquet)
Databases
Data Generators
Replication

Redis → Redis

RIOT-X is supported by Redis, Inc. To report bugs, request features, or receive assistance, please file an issue or contact your Redis account team.

Install

RIOT-X can be installed on Linux, macOS, and Windows platforms and can be used as a standalone tool that connects remotely to a Redis database. It is not required to run locally on a Redis server.

Homebrew (macOS & Linux)

brew install redis/tap/riotx

Scoop (Windows)

scoop bucket add redis https://github.com/redis/scoop.git
scoop install riotx

Manual Installation (All Platforms)

Download the pre-compiled binary from RIOT-X Releases, uncompress and copy to the desired location.

riotx-0.6.3.zip requires Java 11 or greater to be installed.

riotx-standalone-0.6.3-*.zip includes its own Java runtime and does not require a Java installation.

Docker

You can run RIOT-X as a docker image:

docker run riotx/riotx [OPTIONS] [COMMAND]

Concepts

RIOT-X is essentially an ETL tool where data is extracted from the source system, transformed (see Processing), and loaded into the target system.

architecture

Redis URI

RIOT-X follows the Redis URI specification, which supports standalone, sentinel and cluster Redis deployments with plain, SSL, TLS and unix domain socket connections.

You can use the host:port short hand for redis://host:port.
Redis Standalone

redis :// [[username :] password@] host [:port][/database] [?[timeout=timeout[d|h|m|s|ms|us|ns]] [&clientName=clientName] [&libraryName=libraryName] [&libraryVersion=libraryVersion] ]

Redis Standalone (SSL)

rediss :// [[username :] password@] host [: port][/database] [?[timeout=timeout[d|h|m|s|ms|us|ns]] [&clientName=clientName] [&libraryName=libraryName] [&libraryVersion=libraryVersion] ]

Redis Sentinel

redis-sentinel :// [[username :] password@] host1[:port1] [, host2[:port2]] [, hostN[:portN]] [/database] [?[timeout=timeout[d|h|m|s|ms|us|ns]] [&sentinelMasterId=sentinelMasterId] [&clientName=clientName] [&libraryName=libraryName] [&libraryVersion=libraryVersion] ]

You can provide the database, password and timeouts within the Redis URI. For example redis://localhost:6379/1 selects database 1.
Timeout Units
d

Days

h

Hours

m

Minutes

s

Seconds

ms

Milliseconds

us

Microseconds

ns

Nanoseconds

Batching

Processing in RIOT-X is done in batches: a fixed number of records is read from the source, processed, and written to the target. The default batch size is 50, which means that an execution step reads 50 items at a time from the source, processes them, and finally writes then to the target. If the source/target is Redis, reading/writing of a batch is done in a single command pipeline to minimize the number of roundtrips to the server.

You can change the batch size (and hence pipeline size) using the --batch option. The optimal batch size in terms of throughput depends on many factors like record size and command types (see Redis Pipeline Tuning for details).

Multi-threading

By default processing happens in a single thread, but it is possible to parallelize processing by using multiple threads. In that configuration, each chunk of items is read, processed, and written in a separate thread of execution. This is different from partitioning where items would be read by multiple readers. Here, only one reader is being accessed from multiple threads.

To set the number of threads, use the --threads option.

Multi-threading example
riotx db-import "SELECT * FROM orders" --jdbc-url "jdbc:postgresql://host:port/database" --jdbc-user appuser --jdbc-pass passwd --threads 3 hset --keyspace order --key order_id

Importing

When importing data into Redis (file-import, db-import, faker) the following options allow for field-level processing and filtering.

Processing

Processors allow you to create/update/delete fields using the Spring Expression Language (SpEL).

Examples
--proc field1="'foo'"

Generate a field named field1 containing the string foo

--proc temp="(temp-32)*5/9"

Convert from Fahrenheit to Celsius

--proc name='remove("first").concat(remove("last"))'

Concatenate first and last fields and delete them

--proc field2=null

Delete field2

Input fields are accessed by name (e.g. field3=field1+field2).

Processors have access to the following context variables and functions:

date

Date parsing and formatting object. Instance of Java SimpleDateFormat.

number

Number parsing and formatting object. Instance of Java DecimalFormat.

faker

Faker object.

redis

Redis commands object. Instance of Lettuce RedisCommands. The replicate command exposes 2 command objects named source and target.

geo

Convenience function that takes a longitude and a latitude to produce a RediSearch geo-location string in the form longitude,latitude (e.g. location=#geo(lon,lat))

Processor example
riot file-import --proc epoch="#date.parse(mydate).getTime()" location="#geo(lon,lat)" name="#redis.hget('person1','lastName')" ...
Faker processor example
riotx file-import http://storage.googleapis.com/jrx/beers.csv --header --proc fakeid="#faker.numerify('########')" hset --keyspace beer --key fakeid

You can register your own variables using --var.

Custom variable example
riotx file-import http://storage.googleapis.com/jrx/lacity.csv --var rnd="new java.util.Random()" --proc randomInt="#rnd.nextInt(100)" --header hset --keyspace event --key Id

Filtering

Filters allow you to exclude records that don’t match a SpEL boolean expression.

For example this filter will only keep records where the value field is a series of digits:

riot file-import --filter "value matches '\\d+'" ...

Exporting

When exporting data from Redis the following options allow for filtering .

Key Filtering

Key filtering can be done through multiple options in RIOT-X:

--key-pattern

Glob-style pattern used for scan and keyspace notification registration.

--key-type

Type of keys to consider for scan and keyspace notification registration.

--key-include & --key-exclude

Glob-style pattern(s) to futher filter keys on the client (RIOT) side, i.e. after they are received through scan or keyspace notifications.

--mem-limit: Ignore keys whose memory usage exceeds the given limit. For example --mem-limit 10mb skips keys over 10 MB in size.

Usage

You can launch RIOT-X with the following command:

riotx

This will show usage help, which you can also get by running:

riotx --help

--help is available on any command:

riotx COMMAND --help

Run the following command to give riotx TAB completion in the current shell:

source <(riotx generate-completion)

Man Page

Data Generation

RIOT-X includes 2 commands for data generation:

generate

Generate Redis data structures

faker

Import data from Datafaker

Data Structure Generator

The gen command generates Redis data structures as well as JSON and Timeseries.

riot gen [OPTIONS]
Example
riotx gen --type string hash json timeseries

Faker Generator

The faker command generates data using Datafaker.

riot faker [OPTIONS] EXPRESSION... [REDIS COMMAND...]

where EXPRESSION is a Faker expression field in the form field="expression".

To show the full usage, run:

riot faker --help

You must specify at least one Redis command as a target.

Redis connection options apply to the root command (riot) and not to subcommands.

In this example the Redis options will not be taken into account:

riot faker id="numerify '####'" hset -h myredis.com -p 6380
Keys

Keys are constructed from input records by concatenating the keyspace prefix and key fields.

mapping
Import into hashes
riotx faker id="numerify '##########'" firstName="name.first_name" lastName="name.last_name" address="address.full_address" hset --keyspace person --key id
Import into sets
riotx faker name="GameOfThrones.character" --count 1000 sadd --keyspace got:characters --member name
Data Providers

Faker offers many data providers. Most providers don’t take any arguments and can be called directly:

Simple Faker example
riot faker firstName="name.first_name"

Some providers take parameters:

Parameter Faker example
riot faker lease="number.digits '2'"

Here are a few sample Faker expressions:

  • regexify '(a|b){2,3}'

  • regexify '\\.\\*\\?\\+'

  • bothify '????','false'

  • name.first_name

  • name.last_name

  • number.number_between '1','10'

Refer to Datafaker Providers for a list of providers and their corresponding documentation.

Databases

RIOT-X includes two commands for interaction with relational databases:

db-import

Import database tables into Redis

db-export

Export Redis data structures to a database

Drivers

RIOT-X relies on JDBC to interact with databases. It includes JDBC drivers for the most common database systems:

Oracle

jdbc:oracle:thin:@myhost:1521:orcl

SQL Server

jdbc:sqlserver://[serverName[\instanceName][:portNumber]][;property=value[;property=value]]

MySQL

jdbc:mysql://[host]:[port][/database][?properties]

Postgres

jdbc:postgresql://host:port/database

Snowflake

jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>

SQLite

jdbc:sqlite:path_to_sqlite_file

Db2

jdbc:db2://host:port/database

For non-included databases, place the JDBC driver jar under the lib directory

Database Import

The db-import command imports data from a relational database into Redis.

Ensure RIOT-X has the relevant JDBC driver for your database. See the Drivers section for more details.
riot db-import --jdbc-url <jdbc url> -u <Redis URI> SQL [REDIS COMMAND...]

To show the full usage, run:

riot db-import --help

You must specify at least one Redis command as a target.

Redis connection options apply to the root command (riot) and not to subcommands.

The keys that will be written are constructed from input records by concatenating the keyspace prefix and key fields.

mapping
PostgreSQL Import Example
riotx db-import "SELECT * FROM orders" --jdbc-url "jdbc:postgresql://host:port/database" --jdbc-user appuser --jdbc-pass passwd hset --keyspace order --key order_id
Import from PostgreSQL to JSON strings
riotx db-import "SELECT * FROM orders" --jdbc-url "jdbc:postgresql://host:port/database" --jdbc-user appuser --jdbc-pass passwd set --keyspace order --key order_id

This will produce Redis strings that look like this:

{
  "order_id": 10248,
  "customer_id": "VINET",
  "employee_id": 5,
  "order_date": "1996-07-04",
  "required_date": "1996-08-01",
  "shipped_date": "1996-07-16",
  "ship_via": 3,
  "freight": 32.38,
  "ship_name": "Vins et alcools Chevalier",
  "ship_address": "59 rue de l'Abbaye",
  "ship_city": "Reims",
  "ship_postal_code": "51100",
  "ship_country": "France"
}

Snowflake Import

The snowflake-import command uses a Snowflake STREAM object to track changes (CDC) to a table and read them into a Redis data structure like hash or json. The Snowflake STREAM is created and managed by RIOTX. The user credentials you provide must have the ability to create a stream in the database and schema specified by the fully qualified object name.

Side effects and limitations
  • SAMPLE_DATABASE.SAMPLE_SCHEMA.DATA_TABLE_changestream will be created or replaced. For security, this can be created in a different schema than the table you are importing from by specifying --cdc-schema.

  • riotx:offset:SAMPLE_DATABASE.SAMPLE_SCHEMA.DATA_TABLE_changestream - this key will be stored in the destination Redis database and is used to track the stream offset. If RIOT-X fails in the middle of copying data from the stream when restarted it will resume copying data from this offset. Removing this offset key from Redis will result in RIOT-X creating recreating the stream at time "NOW". If --snapshot-mode INITIAL option is specified the stream will also include the initial table data plus changes going forward. If you do not want initial table data to be included specify --snapshot-mode NEVER

  • snowflake-import currently works on tables and materialized views

The basic usage is:

riotx snowflake-import [TABLE] [OPTIONS] [REDIS COMMAND...]

The recommended minimal necessary permissions for a snowflake role and user to run this command are:

CREATE OR REPLACE ROLE riotx_cdc
COMMENT = 'minimum cdc role for riotx';

-- replace compute_wh with the name of the warehouse you want to use
GRANT USAGE, OPERATE ON WAREHOUSE compute_wh TO ROLE riotx_cdc;

-- replace tb_101.raw_pos_cdc with the name of a database and schema for RIOT to create the stream in
CREATE OR REPLACE SCHEMA tb_101.raw_pos_cdc;
GRANT USAGE ON SCHEMA tb_101.raw_pos_cdc TO ROLE riotx_cdc;

-- replace tb_101 with the name of the database RIOT needs to read out of
GRANT USAGE ON DATABASE tb_101 TO ROLE riotx_cdc;

-- replace tb_101.raw_pos with the name of the schema RIOT needs to read out of
GRANT USAGE ON SCHEMA tb_101.raw_pos TO ROLE riotx_cdc;

-- replace with the name of the table(s) you want to read from
GRANT SELECT ON TABLE tb_101.raw_pos.incremental_order_header TO ROLE riotx_cdc;
GRANT REFERENCE_USAGE ON TABLE tb_101.raw_pos.incremental_order_header TO ROLE riotx_cdc;
ALTER TABLE tb_101.raw_pos.INCREMENTAL_ORDER_HEADER SET CHANGE_TRACKING = TRUE;

GRANT SELECT ON FUTURE TABLES IN SCHEMA tb_101.raw_pos_cdc TO ROLE riotx_cdc;
GRANT CREATE TABLE ON SCHEMA tb_101.raw_pos_cdc TO ROLE riotx_cdc;
GRANT CREATE STREAM ON SCHEMA tb_101.raw_pos_cdc TO ROLE riotx_cdc;
GRANT SELECT ON FUTURE STREAMS IN SCHEMA tb_101.raw_pos_cdc TO ROLE riotx_cdc;

CREATE OR REPLACE USER riotx_cdc
    DEFAULT_ROLE = 'riotx_cdc'
    DEFAULT_WAREHOUSE = 'compute_wh'
    PASSWORD = '{{PASSWORD}}';

GRANT ROLE riotx_cdc TO USER riotx_cdc;

For the full usage, run:

riotx snowflake-import --help
Example: CDC to Hashes

This command uses the example db, schema and table names from the minimal role setup above.

riotx snowflake-import \
      tb_101.raw_pos.incremental_order_header \
      --snapshot-mode INITIAL # include initial table data  \
      --role riotx_cdc \
      --warehouse compute_wh \
      --cdc-schema raw_pos_cdc \
      --jdbc-url "jdbc:snowflake://abcdefg.abc12345.snowflakecomputing.com" \
      --jdbc-user databaseuser \
      --jdbc-pass databasepassword \
      --repeat 10s # sleep 10s after each cdc import and then repeat \
      hset \
      --keyspace orderheader \
      --key order_id # column name to use as id

The command above imports CDC data from the Snowflake table tb_101.raw_pos.incremental_order_header into Redis hashes in the keyspace orderheader.

If you only need to do a one time import of data from Snowflake you can use the db-import command. This command will read all of the rows output from your SQL query and will write them to Redis. For more information see the db-import command.

Example: One-time Import
riotx db-import \
      "SELECT * FROM SAMPLE_DATABASE.SAMPLE_SCHEMA.DATA_TABLE" \
      --jdbc-url "jdbc:snowflake://abcdefg.abc12345.snowflakecomputing.com" \
      --jdbc-driver net.snowflake.client.jdbc.SnowflakeDriver \
      --jdbc-user databaseuser \
      --jdbc-pass databasepassword \
      hset \
      --keyspace datatable \
      --key data_id # column name to use as id

This command performs a one-time import from Snowflake using the db-import command.

Database Export

Use the db-export command to read from a Redis database and writes to a SQL database.

Ensure RIOT-X has the relevant JDBC driver for your database. See the Drivers section for more details.

The general usage is:

riot db-export --jdbc-url <jdbc url> SQL

To show the full usage, run:

riot db-export --help
Example: export to PostgreSQL
riotx db-export "INSERT INTO mytable (id, field1, field2) VALUES (CAST(:id AS SMALLINT), :field1, :field2)" --jdbc-url "jdbc:postgresql://host:port/database" --jdbc-user appuser --jdbc-pass passwd --key-pattern "gen:*" --key-regex "gen:(?<id>.*)"

Files

RIOT-X includes two commands to work with files in various formats:

file-import

Import data from files

file-export

Export Redis data structures to files

File Import

The file-import command reads data from files and writes it to Redis.

The basic usage for file imports is:

riot file-import [OPTIONS] FILE... [REDIS COMMAND...]

To show the full usage, run:

riot file-import --help

RIOT-X will try to determine the file type from its extension (e.g. .csv or .json), but you can specify it with the --type option.

Gzipped files are supported and the extension before .gz is used (e.g. myfile.json.gzjson).

Examples
  • /path/file.csv

  • /path/file-*.csv

  • /path/file.json

  • http://data.com/file.csv

  • http://data.com/file.json.gz

Use - to read from standard input.

Amazon S3 and Google Cloud Storage buckets are supported.

Importing from Amazon S3
riotx file-import s3://riotx/beers.json --s3-region us-west-1 hset --keyspace beer --key id
Importing from Google Cloud Storage
riotx file-import gs://riotx/beers.json hset --keyspace beer --key id
Data Structures

If no REDIS COMMAND is specified, it is assumed that the input file(s) contain Redis data structures serialized as JSON or XML. See the File Export section to learn about the expected format and how to generate such files.

Example
riotx file-import /tmp/redis.json
Redis Commands

When one or more `REDIS COMMAND`s are specified, these commands are called for each input record.

Redis client options apply to the root command (riot) and not to Redis commands.

In this example Redis client options will not be taken into account:

riot file-import my.json hset -h myredis.com -p 6380

Redis command keys are constructed from input records by concatenating keyspace prefix and key fields.

mapping
Import into hashes with keyspace blah:<id>
riot file-import my.json hset --keyspace blah --key id
Import into JSON
riotx file-import http://storage.googleapis.com/jrx/es_test-index.json json.set --keyspace elastic --key _id
Import into hashes and set TTL on the key
riot file-import my.json hset --keyspace blah --key id expire --keyspace blah --key id
Import into hashes in keyspace blah:<id> and set TTL and add each id to a set named myset
riot file-import my.json hset --keyspace blah --key id expire --keyspace blah --key id sadd --keyspace myset --member id
Delimited (CSV)

The default delimiter character is comma (,). It can be changed with the --delimiter option.

If the file has a header, use the --header option to automatically extract field names. Otherwise specify the field names using the --fields option.

Let’s consider this CSV file:

Table 1. beers.csv
row abv ibu id name style brewery ounces

1

0.079

45

321

Fireside Chat (2010)

Winter Warmer

368

12.0

2

0.068

65

173

Back in Black

American Black Ale

368

12.0

3

0.083

35

11

Monk’s Blood

Belgian Dark Ale

368

12.0

The following command imports this CSV into Redis as hashes using beer as the key prefix and id as primary key.

riotx file-import http://storage.googleapis.com/jrx/beers.csv --header hset --keyspace beer --key id

This creates hashes with keys beer:321, beer:173, …​

This command imports a CSV file into a geo set named airportgeo with airport IDs as members:

riotx file-import http://storage.googleapis.com/jrx/airports.csv --header --skip-limit 3 geoadd --keyspace airportgeo --member AirportID --lon Longitude --lat Latitude
Fixed-Length (Fixed-Width)

Fixed-length files can be imported by specifying the width of each field using the --ranges option.

riotx file-import http://storage.googleapis.com/jrx/accounts.fw --type fw --ranges 1 9 25 41 53 67 83 --header hset --keyspace account --key Account
JSON

The expected format for JSON files is:

[
  {
    "...": "..."
  },
  {
    "...": "..."
  }
]
JSON import example
riotx file-import /tmp/redis.json

JSON records are trees with potentially nested values that need to be flattened when the target is a Redis hash for example.

To that end, RIOT-X uses a field naming convention to flatten JSON objects and arrays:

Table 2. Nested object

{ "field": { "sub": "value" } }

field.sub=value

Table 3. Array

{ "field": [1, 2, 3] }

field[0]=1 field[1]=2 field[2]=3

XML

Here is a sample XML file that can be imported by RIOT-X:

<?xml version="1.0" encoding="UTF-8"?>
<records>
    <trade>
        <isin>XYZ0001</isin>
        <quantity>5</quantity>
        <price>11.39</price>
        <customer>Customer1</customer>
    </trade>
    <trade>
        <isin>XYZ0002</isin>
        <quantity>2</quantity>
        <price>72.99</price>
        <customer>Customer2c</customer>
    </trade>
    <trade>
        <isin>XYZ0003</isin>
        <quantity>9</quantity>
        <price>99.99</price>
        <customer>Customer3</customer>
    </trade>
</records>
XML Import Example
riotx file-import http://storage.googleapis.com/jrx/trades.xml hset --keyspace trade --key id
Parquet

RIOT-X supports Parquet files.

Parquet file import example
riotx file-import s3://riotx/userdata1.parquet --s3-region us-west-1 hset --keyspace user --key id

File Export

The file-export command reads data from a Redis database and writes it to a JSON or XML file, potentially gzip-compressed.

The general usage is:

riot file-export [OPTIONS] FILE

To show the full usage, run:

riot file-export --help
JSON
Export to JSON
riotx file-export /tmp/redis.json
Sample JSON-export file
[
  {
    "key": "string:615",
    "ttl": -1,
    "value": "value:615",
    "type": "STRING"
  },
  {
    "key": "hash:511",
    "ttl": -1,
    "value": {
      "field1": "value511",
      "field2": "value511"
    },
    "type": "HASH"
  },
  {
    "key": "list:1",
    "ttl": -1,
    "value": [
      "member:991",
      "member:981"
    ],
    "type": "LIST"
  },
  {
    "key": "set:2",
    "ttl": -1,
    "value": [
      "member:2",
      "member:3"
    ],
    "type": "SET"
  },
  {
    "key": "zset:0",
    "ttl": -1,
    "value": [
      {
        "value": "member:1",
        "score": 1.0
      }
    ],
    "type": "ZSET"
  },
  {
    "key": "stream:0",
    "ttl": -1,
    "value": [
      {
        "stream": "stream:0",
        "id": "1602190921109-0",
        "body": {
          "field1": "value0",
          "field2": "value0"
        }
      }
    ],
    "type": "STREAM"
  }
]
Export to compressed JSON
riotx file-export /tmp/beers.json.gz --key-pattern beer:*
XML
Export to XML
riotx file-export /tmp/redis.xml
Parquet
Parquet file export example
riotx file-export beers.parquet --parquet-field ounces=DOUBLE abv=DOUBLE id=INT32

Memcached Replication

The memcached-replicate command reads data from a source Memcached database and writes to a target Memcached database.

riotx memcached-replicate SOURCE TARGET [OPTIONS]

For the full usage, run:

riotx memcached-replicate --help
Example
riotx memcached-replicate mydb.cache.amazonaws.com:11211 mydb-12211.redis.com:12211 --source-tls

Redis Import

The redis-import command reads data from a Redis database and writes it to another Redis database.

The basic usage is:

riotx redis-import [OPTIONS] [REDIS COMMAND...]

For the full usage, run:

riotx redis-import --help
Example: migrate hashes to JSON
riotx redis-import --target-uri redis://localhost:6380 --key-pattern 'hash:*' --key-regex 'hash:(?<id>.+)' json.set --keyspace doc --key id --remove

Replication

The replicate command reads data from a source Redis database and writes to a target Redis database.

replication architecture

The replication mechanism is as follows:

  1. Identify source keys to be replicated using scan and/or keyspace notifications depending on the replication mode.

  2. Read data associated with each key using dump or type-specific commands.

  3. Write each key to the target using restore or type-specific commands.

The basic usage is:

riot replicate [OPTIONS] SOURCE TARGET

where SOURCE and TARGET are Redis URIs.

For the full usage, run:

riot replicate --help
To replicate a Redis logical database other than the default (0), specify the database in the source Redis URI. For example riot replicate redis://source:6379/1 redis://target:6379 replicates database 1.

Replication Mode

Replication starts with identifying keys to be replicated from the source Redis database. The --mode option allows you to specify how RIOT-X identifies keys to be replicated:

  • iterate over keys with a key scan (--mode scan)

  • received by a keyspace notification subscriber (--mode liveonly)

  • or both (--mode live)

Scan

This key reader scans for keys using the Redis SCAN command:

SCAN cursor [MATCH pattern] [COUNT count] [TYPE type]
MATCH pattern

configured with the --key-pattern option

TYPE type

configured with the --key-type option

COUNT count

configured with the --scan-count option

INFO: In cluster mode keys are scanned in parallel across cluster nodes.

The status bar shows progress with a percentage of keys that have been replicated. The total number of keys is estimated when the replication process starts and it can change by the time it is finished, for example if keys are deleted or added during replication.

Scan replication example
riotx replicate redis://source redis://target
Live

The key notification reader listens for key changes using keyspace notifications.

Make sure the source database has keyspace notifications enabled using:

  • redis.conf: notify-keyspace-events = KEA

  • CONFIG SET notify-keyspace-events KEA

For more details see Redis Keyspace Notifications.

Live replication example
riotx replicate --mode live redis://source redis://target

The live replication mechanism does not guarantee data consistency. Redis sends keyspace notifications over pub/sub which does not provide guaranteed delivery. It is possible that RIOT-X can miss some notifications in case of network failures for example.

Also, depending on the type, size, and rate of change of data structures on the source it is possible that RIOT-X cannot keep up with the change stream. For example if a big set is repeatedly updated, RIOT-X will need to read the whole set on each update and transfer it over to the target database. With a big-enough set, RIOT-X could fall behind and the internal queue could fill up leading up to updates being dropped.

For those potentially problematic migrations it is recommend to perform some preliminary sizing using Redis statistics and bigkeys/memkeys in tandem with --mem-limit. If you need assistance please contact your Redis account team.

Replication Types

RIOT-X offers two different mechanisms for reading and writing keys:

  • Dump & restore (default)

  • Data structure replication (--struct)

Dump & Restore

The default replication mechanism is Dump & Restore:

  1. Scan for keys in the source Redis database. If live replication is enabled the reader also subscribes to keyspace notifications to generate a continuous stream of keys.

  2. Reader threads iterate over the keys to read corresponding values (DUMP) and TTLs.

  3. Reader threads enqueue key/value/TTL tuples into the reader queue, from which the writer dequeues key/value/TTL tuples and writes them to the target Redis database by calling RESTORE and EXPIRE.

Data Structure Replication

There are situations where Dump & Restore cannot be used, for example:

  • The target Redis database does not support the RESTORE command (Redis Enterprise CRDB)

  • Incompatible DUMP formats between source and target (Redis 7.0)

In those cases you can use another replication strategy that is data structure-specific: each key is introspected to determine its type and then use the corresponding read/write commands.

Type Read Write

Hash

HGETALL

HSET

JSON

JSON.GET

JSON.SET

List

LRANGE

RPUSH

Set

SMEMBERS

SADD

Sorted Set

ZRANGE

ZADD

Stream

XRANGE

XADD

String

GET

SET

TimeSeries

TS.RANGE

TS.ADD

This replication strategy is more intensive in terms of CPU, memory, and network for all the machines involved (source Redis, target Redis, and RIOT-X machines). Adjust number of threads, batch and queue sizes accordingly.
Type-based replication example
riotx replicate --struct redis://source redis://target
Live type-based replication example
riotx replicate --struct --mode live redis://source redis://target

Compare

Once replication is complete, RIOT-X performs a verification step by reading keys in the source database and comparing them against the target database.

The verification step happens automatically after the scan is complete (snapshot replication), or for live replication when keyspace notifications have become idle.

Verification can also be run on-demand using the compare command:

riot compare SOURCE TARGET [OPTIONS]

The output looks like this:

Verification failed (type: 225,062, missing: 485,450)
missing

Number of keys in source but not in target.

type

Number of keys with mismatched types (e.g. hash vs string).

value

Number of keys with mismatched values.

ttl

Number of keys with mismatched TTL i.e. difference is greater than tolerance (can be specified with --ttl-tolerance).

There are 2 comparison modes available through --compare (--quick for compare command):

Quick (default)

Compare key types and TTLs.

Full

Compare key types, TTLs, and values.

To show which keys differ, use the --show-diffs option.

Performance

Performance tuning is an art but RIOT-X offers some options to identify potential bottlenecks. In addition to --batch and --threads options you have the --dry-run option which disables writing to the target Redis database so that you can tune the reader in isolation. Add that option to your existing replicate command-line to compare replication speeds with and without writing to the target Redis database.

Stats

The stats command analyzes the Redis database and displays keyspace statistics as well as keys that could be problematic during a live replication.

The basic usage is:

riotx stats [OPTIONS]

For the full usage, run:

riotx stats --help
--mem <size>

Memory usage threshold above which a key is considered big

--rate <size>

Write bandwidth above which a key is considered problematic.

Example: Keys over 3MB in memory usage and 10MB in write rate
riotx stats --mem 3mb --rate 10mb

Stream

Stream Import

The stream-import command reads data from a stream and writes it to Redis.

The basic usage is:

riotx stream-import STREAM...

For the full usage, run:

riotx stream-import --help
Example: Import stream into hashes
riotx stream-import stream:beers --idle-timeout 1s hset --keyspace beer --key id

Stream Export

The stream-export command enables Redis CDC to a Redis stream.

riotx stream-export SOURCE TARGET [OPTIONS]

For the full usage, run:

riotx stream-export --help
Example: Export stream to another Redis instance
riotx stream-export redis://localhost:6379 redis://localhost:6380 --mode live

redis-cli -p 6380 xread COUNT 3 STREAMS stream:export 0-0
1) 1) "stream:export"
   2) 1) 1) "1718645537588-0"
         2)  1) "key"
             2) "order:4"
             3) "time"
             4) "1718645537000"
             5) "type"
             6) "hash"
             7) "ttl"
             8) "-1"
             9) "mem"
            10) "136"
            11) "value"
            12) "{\"order_date\":\"2024-06-13 22:19:35.143797\",\"order_id\":\"4\"}"

Cookbook

Here are various recipes using RIOT-X.

Observability

RIOT-X exposes several metrics over a Prometheus endpoint that can be useful for troubleshooting and performance tuning.

Getting Started

The riotx-dist repository includes a Docker compose configuration that set ups Prometheus and Grafana.

git clone https://github.com/redis-field-engineering/riotx-dist.git
cd riotx-dist
docker compose up

Prometheus is configured to scrape the host every second.

You can access the Grafana dashboard at localhost:3000.

Now start RIOT-X with the following command:

riotx replicate ... --metrics

This will enable the Prometheus metrics exporter endpoint and will populate the Grafana dashboard.

Configuration

Use the --metrics* options to enable and configure metrics:

--metrics

Enable metrics

--metrics-jvm

Enable JVM and system metrics

--metrics-redis

Enable command latency metrics. See https://github.com/redis/lettuce/wiki/Command-Latency-Metrics#micrometer

--metrics-name=<name>

Application name tag that will be applied to all metrics

--metrics-port=<int>

Port that Prometheus HTTP server should listen on (default: 8080)

--metrics-prop=<k=v>

Additional properties to pass to the Prometheus client. See https://prometheus.github.io/client_java/config/config/

Metrics

Below you can find a list of all metrics declared by RIOT-X.

riotx dashboard replication
Replication Metrics
Name Type Description

riotx_replication_bytes_total

Counter

Number of bytes replicated (needs memory usage with --mem-limit)

riotx_replication_lag_seconds

Summary

Replication end-to-end latency

riotx_replication_read_latency_seconds

Summary

Replication read latency

spring_batch_chunk_write_seconds

Timer

Batch writing duration

spring_batch_item_process_seconds

Timer

Item processing duration

spring_batch_item_read_seconds

Timer

Item reading duration

spring_batch_job_active_seconds

Timer

Active jobs

spring_batch_job_launch_count_total

Counter

Job launch count

spring_batch_redis_key_event_queue_capacity

Gauge

Gauge reflecting the remaining capacity of the queue

spring_batch_redis_key_event_queue_size

Gauge

Gauge reflecting the size (depth) of the queue

spring_batch_redis_key_scan_total

Counter

Number of keys scanned

spring_batch_redis_operation_seconds

Timer

Operation execution duration

spring_batch_redis_read_chunk

Gauge

Gauge reflecting the chunk size of the reader

spring_batch_redis_read_queue_capacity

Gauge

Gauge reflecting the remaining capacity of the queue

spring_batch_redis_read_queue_size

Gauge

Gauge reflecting the size (depth) of the queue

JVM Metrics

Use the --metrics-jvm option to enable the following additional metrics:

riotx dashboard jvm
Name Type Description

jvm_buffer_count_buffers

Gauge

An estimate of the number of buffers in the pool

jvm_buffer_memory_used_bytes

Gauge

An estimate of the memory that the Java virtual machine is using for this buffer pool

jvm_buffer_total_capacity_bytes

Gauge

An estimate of the total capacity of the buffers in this pool

jvm_gc_concurrent_phase_time_seconds

Timer

Time spent in concurrent phase

jvm_gc_live_data_size_bytes

Gauge

Size of long-lived heap memory pool after reclamation

jvm_gc_max_data_size_bytes

Gauge

Max size of long-lived heap memory pool

jvm_gc_memory_allocated_bytes_total

Gauge

Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next

jvm_gc_memory_promoted_bytes_total

Counter

Count of positive increases in the size of the old generation memory pool before GC to after GC

jvm_gc_pause_seconds

Timer

Time spent in GC pause

jvm_memory_committed_bytes

Gauge

The amount of memory in bytes that is committed for the Java virtual machine to use

jvm_memory_max_bytes

Gauge

The maximum amount of memory in bytes that can be used for memory management

jvm_memory_used_bytes

Gauge

The amount of used memory

jvm_threads_daemon_threads

Gauge

The current number of live daemon threads

jvm_threads_live_threads

Gauge

The current number of live threads including both daemon and non-daemon threads

jvm_threads_peak_threads

Gauge

The peak live thread count since the Java virtual machine started or peak was reset

jvm_threads_started_threads_total

Counter

The total number of application threads started in the JVM

jvm_threads_states_threads

Gauge

The current number of threads

process_cpu_time_ns_total

Counter

The "cpu time" used by the Java Virtual Machine process

process_cpu_usage

Gauge

The "recent cpu usage" for the Java Virtual Machine process

process_start_time_seconds

Gauge

Start time of the process since unix epoch.

process_uptime_seconds

Gauge

The uptime of the Java virtual machine

system_cpu_count

Gauge

The number of processors available to the Java virtual machine

system_cpu_usage

Gauge

The "recent cpu usage" of the system the application is running in

system_load_average_1m

Gauge

The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time

Changelog

You can use RIOT-X to stream change data from a Redis database.

Streaming to stdout
riot file-export --mode live
{"key":"gen:1","type":"string","time":1718050552000,"ttl":-1,"memoryUsage":300003376}
{"key":"gen:3","type":"string","time":1718050552000,"ttl":-1,"memoryUsage":300003376}
{"key":"gen:6","type":"string","time":1718050552000,"ttl":-1,"memoryUsage":300003376}
...
Streaming to a file
riot file-export export.json --mode live

ElastiCache Migration

This recipe contains step-by-step instructions to migrate an ElastiCache (EC) database to Redis Cloud or Redis Software.

The following scenarios are covered:

  • One-time (snapshot) migration

  • Online (live) migration

It is recommended to read the Replication section to familiarize yourself with its usage and architecture.

Setup

Prerequisites

For this recipe you will require the following resources:

  • AWS ElastiCache: Primary Endpoint in case of Single Master and Configuration Endpoint in case of Clustered EC. Refer to this link to learn more

  • Redis Cloud or Redis Software

  • An Amazon EC2 instance to run RIOT-X

Keyspace Notifications

For a live migration you need to enable keyspace notifications on your ElastiCache instance (see AWS Knowledge Center).

Migration Host

To run the migration tool we will need an EC2 instance.

You can either create a new EC2 instance or leverage an existing one if available. In the example below we first create an instance on AWS Cloud Platform. The most common scenario is to access an ElastiCache cluster from an Amazon EC2 instance in the same Amazon Virtual Private Cloud (Amazon VPC). We have used Ubuntu 16.04 LTS for this setup but you can choose any Ubuntu or Debian distribution of your choice.

SSH to this EC2 instance from your laptop:

ssh -i “public key” <AWS EC2 Instance>

Install redis-cli on this new instance by running this command:

sudo apt update
sudo apt install -y redis-tools

Use redis-cli to check connectivity with the ElastiCache database:

redis-cli -h <ec primary endpoint> -p 6379

Ensure that the above command allows you to connect to the remote ElastiCache database successfully.

Installing RIOT-X

Let’s install RIOT-X on the EC2 instance we set up previously. For this we’ll follow the steps in Manual Installation.

Performing Migration

We are now all set to begin the migration process. The options you will use depend on your source and target databases, as well as the replication mode (snapshot or live).

ElastiCache Single Master → Redis
riot replicate source:port target:port
Live ElastiCache Single Master → Redis
riot replicate source:port target:port --mode live

In case ElastiCache is configured with AUTH TOKEN enabled, you need to pass --source-tls as well as --source-pass option:

riot replicate source:port target:port --source-tls --source-pass <password>
ElastiCache Cluster → Redis
riot replicate source:port target:port --source-cluster
--cluster is an important parameter used ONLY for ElastiCache whenever cluster-mode is enabled. Do note that the source database is specified first and the target database is specified after the replicate command and it is applicable for all the scenarios.
ElastiCache Single Master → Redis (with specific database index)
riot replicate redis://source:port/db target:port
ElastiCache Single Master → Redis with OSS Cluster
riot replicate source:port target:port --target-cluster
Live ElastiCache Cluster → Redis with OSS Cluster
riot replicate source:port target:port --source-cluster --target-cluster --mode live

Important Considerations

  • It is recommended to test migration in UAT before production use.

  • Once migration is completed, ensure that application traffic gets redirected to Redis endpoint successfully.

  • It is recommended to perform the migration process during low traffic hours so as to avoid chances of data loss.

Connectivity Test

The ping command can be used to test connectivity to a Redis database.

riot ping [OPTIONS]

To show the full usage, run:

riot ping --help

The command prints statistics like these:

riot ping -h localhost --unit microseconds
[min=491, max=14811, percentiles={99.9=14811, 90.0=1376, 95.0=2179, 99.0=14811, 50.0=741}]
[min=417, max=1286, percentiles={99.9=1286, 90.0=880, 95.0=1097, 99.0=1286, 50.0=606}]
[min=382, max=2244, percentiles={99.9=2244, 90.0=811, 95.0=1036, 99.0=2244, 50.0=518}]
...

Best Practices

This section contains best practices and recipes for various RIOT-X use cases.

Replication Performance Tuning

replication architecture

The replicate command reads from a source Redis database and write to a target Redis database.

Replication Bottleneck

To optimize throughput it is necessary to understand the two main possible scenarios:

Slow Producer

In this scenario the reader does not read from source as fast as the writer can write to the target. This means the writer is starved and we should look into ways to speed up the reader.

Slow Consumer

In this scenario the writer can not keep up with the reader and we should look into optimizing writes.

There are two ways to identify which scenario we fall into:

No-op writer

With the --dry-run option the replication process will use a no-op writer instead of a Redis writer. If throughput with dry-run is similar to throughput without then the writer is not the bottleneck. Follow steps below to improve reader throughput.

Reader queue utilization

Using the Grafana dashboard you can monitor reader queue depth. A low queue utilization means the writer can keep up with the reader. A queue utilization close to 100% means writes are slower than reads.

Reader

To improve reader performance tweak the options below until you reach optimal throughput.

--read-threads

How many value reader threads to use in parallel (default: 1).

--read-batch

Number of values each reader thread should read in a single pipelined call (default: 50).

--read-queue

Capacity of the reader queue (default: 10000). When the queue is full the threads wait for space to become available. Increase this value if you have peaky traffic on the source database causing fluctuating reader throughput.

--source-pool

Number of Redis connections to the source database (default: 8). Keep in sync with the number of read threads to have a dedicated connection per thread.

Writer

To improve writer performance you can tweak the following options:

--batch

Number of items written in a single network round-trip to the Redis server (i.e. number of commands in the pipeline).

--threads

How many write operations can be performed concurrently (default: 1).

--target-pool

Number of Redis connections to the target database (default: 8). Keep in sync with the number of threads to have a dedicated connection per thread.

System Requirements

Operating System

RIOT-X works on all major operating systems but has been tested at scale on Linux X86 64-bit platforms.

CPU

CPU used by RIOT-X varies greatly dependending on specific replication settings and data structures at play. You can monitor CPU usage with the supplied Grafana dashboard (process_cpu_usage metric).

Disk

RIOT-X does not require any specific disk requirements since all state is kept in memory.

Memory

Memory requirements for RIOT-X itself are very light. Being JVM-based the default initial heap size is dependent on available system memory and on the operating system.

If you have very intensive replication requirements you will need to increase the JVM heap size. To estimate the worst case scenario for memory requirements you can use this formula: keySize * queueSize where:

keySize

average key size as reported by the MEMORY USAGE command

queueSize

Redis reader queue capacity configured with the --read-queue option

Conversely if you need to minimize memory used by RIOT-X you can lower the reader queue size (but possibly at the expense of reader throughtput).

Network

RIOT-X replication is essentially a network bridge between the source and target Redis databases so underlying network is crucial for the overall throughput and a 10 Gigabit network is the minimum recommended. Network latency will also have an impact on replication (and other RIOT-X uses) performance. Make sure the host running RIOT-X offers minimal latency to both the source and target databases. You can test the latency using the ping command.

CRDB

Active/active Redis databases (CRDB) need special considerations. If your target database is a CRDB deployment you will need to use the data-structure replication type (--struct) as the RESTORE command is not supported in CRDB.

FAQ

  1. Logs are cut off or missing

    This could be due to concurrency issues in the terminal when refreshing the progress bar and displaying logs. Try running with job option --progress log.

  2. Unknown options: '--keyspace', '--key'

    You must specify one or more Redis commands with import commands (file-import, faker, db-import).

  3. ERR DUMP payload version or checksum are wrong

    Redis 7 DUMP format is not backwards compatible with previous versions. To replicate between different Redis versions, use Type-Based Replication.

  4. ERR Unsupported Type 0

    The target database is most likely CRDB in which case you need to use type-based replication (--struct option).

  5. Process gets stuck during replication and eventually times out

    This could be due to big keys clogging the replication pipes. In these cases it might be hard to catch the offending key(s). Try running the same command with --info and --progress log so that all errors are reported. Check the database with redis-cli Big keys and/or use reader options to filter these keys out.

  6. NOAUTH Authentication required

    This issue occurs when you fail to supply the --pass <password> parameter.

  7. ERR The ID argument cannot be a complete ID because xadd-id-uniqueness-mode is strict

    This usually happens in Active/Active (CRDB) setups where stream message IDs cannot be copied over to the target database. Use the --no-stream-id option to disable ID propagation.

  8. ERR Error running script…​ This Redis command is not allowed from scripts

    This can happen with Active/Active (CRDB) databases because the MEMORY USAGE command is not allowed to be run from a LUA script. Use the --mem-limit -1 option to disable memory usage.

  9. java.lang.OutOfMemoryError: Java heap space

    The RIOT JVM ran out of memory. If you are running db-import this could be due to a large resultset being loaded upfront. Use the --fetch option to set a fixed fetch size (e.g. --fetch 1000). Otherwise increase max JVM heap size (export JAVA_OPTS="-Xmx8g") or reduce RIOT memory usage by lowering threads, batch, read-batch and read-queue.