Introduction
Redis Input/Output Tools (RIOT) is a command-line utility to get data in and out of any Redis-compatible database like Redis OSS, Redis Cluster, Redis Enterprise, Redis Cloud, or Amazon ElastiCache.
RIOT includes the following features:
-
File import/export (CSV, JSON, XML)
-
Relational database import/export
-
Data generation using Faker or random data structures
-
Data migration from a Redis database to another
-
Live replication between two Redis databases
Quick Start
This section helps you get started with RIOT.
Install
RIOT can be installed on Linux, macOS, and Windows platforms and can be used as a standalone tool that connects remotely to a Redis database. It is not required to run locally on a Redis server.
brew install redis-developer/tap/riot
scoop bucket add redis-developer https://github.com/redis-developer/scoop.git
scoop install riot
Download the pre-compiled binary from the releases page, uncompress and copy to the desired location.
|
docker run fieldengineering/riot [OPTIONS] [COMMAND]
Files
RIOT can import from and export to files in various formats:
-
Delimited (CSV, TSV, PSV)
-
Fixed-length (also known as fixed-width)
-
JSON and JSONL (JSON Lines)
-
XML
File Import
The file-import
command reads data from files and writes it to Redis.
The basic usage for file imports is:
riot -h <host> -p <port> file-import FILE... [REDIS COMMAND...]
To show the full usage, run:
riot file-import --help
You must specify at least one Redis command as a target.
Redis connection options apply to the root command ( In this example the Redis options will not be taken into account:
|
The keys that will be written are constructed from input records by concatenating the keyspace prefix and key fields.
blah:<id>
riot file-import my.json hset --keyspace blah --keys id
riot file-import https://storage.googleapis.com/jrx/es_test-index.json json.set --keyspace elastic --keys _id
riot file-import my.json hset --keyspace blah --keys id expire --keyspace blah --keys id
blah:<id>
and set TTL and add each id
to a set named myset
riot file-import my.json hset --keyspace blah --keys id expire --keyspace blah --keys id sadd --keyspace myset --members id
Paths
Paths can include wildcard patterns.
RIOT will try to determine the file type from its extension (e.g. .csv
or .json
), but you can specify it with the --filetype
option.
Gzipped files are supported and the extension before .gz
is used (e.g. myfile.json.gz
→ JSON type).
-
/path/file.csv
-
/path/file-*.csv
-
/path/file.json
-
http://data.com/file.csv
-
http://data.com/file.json.gz
Use - to read from standard input.
|
For AWS S3 buckets you can specify access and secret keys as well as the region for the bucket.
riot file-import s3://my-bucket/path/file.json --s3-region us-west-1 --s3-access xxxxxx --s3-secret xxxxxx
For Google Cloud Storage you can specify credentials and project id for the bucket:
riot file-import gs://my-bucket/path/file.json --gcs-key-file key.json --gcs-project-id my-gcp-project
Delimited
The default delimiter character is comma (,
).
It can be changed with the --delimiter
option.
If the file has a header, use the --header
option to automatically extract field names.
Otherwise specify the field names using the --fields
option.
Let’s consider this CSV file:
row | abv | ibu | id | name | style | brewery | ounces |
---|---|---|---|---|---|---|---|
1 |
0.079 |
45 |
321 |
Fireside Chat (2010) |
Winter Warmer |
368 |
12.0 |
2 |
0.068 |
65 |
173 |
Back in Black |
American Black Ale |
368 |
12.0 |
3 |
0.083 |
35 |
11 |
Monk’s Blood |
Belgian Dark Ale |
368 |
12.0 |
The following command imports this CSV into Redis as hashes using beer
as the key prefix and id
as primary key.
riot file-import https://storage.googleapis.com/jrx/beers.csv --header hset --keyspace beer --keys id
This creates hashes with keys beer:321
, beer:173
, …
This command imports a CSV file into a geo set named airportgeo
with airport IDs as members:
riot file-import https://storage.googleapis.com/jrx/airports.csv --header --skip-limit 3 geoadd --keyspace airportgeo --members AirportID --lon Longitude --lat Latitude
Fixed-Length
Fixed-length files can be imported by specifying the width of each field using the --ranges
option.
riot file-import https://storage.googleapis.com/jrx/accounts.fw --ranges 1 9 25 41 53 67 83 --header hset --keyspace account --keys Account
JSON
The expected format for JSON files is:
[
{
"...": "..."
},
{
"...": "..."
}
]
riot file-import https://storage.googleapis.com/jrx/beers.json hset --keyspace beer --keys id
JSON records are trees with potentially nested values that need to be flattened when the target is a Redis hash for example.
To that end, RIOT uses a field naming convention to flatten JSON objects and arrays:
|
→ |
|
|
→ |
|
XML
Here is a sample XML file that can be imported by RIOT:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<trade>
<isin>XYZ0001</isin>
<quantity>5</quantity>
<price>11.39</price>
<customer>Customer1</customer>
</trade>
<trade>
<isin>XYZ0002</isin>
<quantity>2</quantity>
<price>72.99</price>
<customer>Customer2c</customer>
</trade>
<trade>
<isin>XYZ0003</isin>
<quantity>9</quantity>
<price>99.99</price>
<customer>Customer3</customer>
</trade>
</records>
riot file-import https://storage.googleapis.com/jrx/trades.xml hset --keyspace trade --keys id
Regular Expressions
In addition to general processing you can perform field extraction using regular expressions.
riot file-import --regex name="(?<first>\w+)\/(?<last>\w+)" ...
File Export
The file-export
command reads data from a Redis database and writes it to a JSON or XML file, potentially gzip-compressed.
The general usage is:
riot -h <host> -p <port> file-export FILE
To show the full usage, run:
riot file-export --help
riot file-export /tmp/redis.json
[
{
"key": "string:615",
"ttl": -1,
"value": "value:615",
"type": "STRING"
},
{
"key": "hash:511",
"ttl": -1,
"value": {
"field1": "value511",
"field2": "value511"
},
"type": "HASH"
},
{
"key": "list:1",
"ttl": -1,
"value": [
"member:991",
"member:981"
],
"type": "LIST"
},
{
"key": "set:2",
"ttl": -1,
"value": [
"member:2",
"member:3"
],
"type": "SET"
},
{
"key": "zset:0",
"ttl": -1,
"value": [
{
"value": "member:1",
"score": 1.0
}
],
"type": "ZSET"
},
{
"key": "stream:0",
"ttl": -1,
"value": [
{
"stream": "stream:0",
"id": "1602190921109-0",
"body": {
"field1": "value0",
"field2": "value0"
}
}
],
"type": "STREAM"
}
]
riot file-export /tmp/beers.json.gz --scan-match beer:*
riot file-export /tmp/redis.xml
Dump Import
RIOT can import Redis data structure files in JSON or XML formats. See File Export section to generate such files.
riot dump-import /tmp/redis.json
Generators
RIOT includes two data generators that can be used to quickly mock up a dataset in Redis.
Faker
The faker-import
command generates data using Datafaker.
riot -h <host> -p <port> faker-import SPEL... [REDIS COMMAND...]
where SPEL is a Spring Expression Language field in the form field="expression"
.
To show the full usage, run:
riot faker-import --help
You must specify at least one Redis command as a target.
Redis connection options apply to the root command ( In this example the Redis options will not be taken into account:
|
Keys
Keys are constructed from input records by concatenating the keyspace prefix and key fields.
riot faker-import id="index" firstName="name.firstName" lastName="name.lastName" address="address.fullAddress" hset --keyspace person --keys id
riot faker-import name="gameOfThrones.character" --count 1000 sadd --keyspace got:characters --members name
Data Providers
Faker offers many data providers. Most providers don’t take any arguments and can be called directly:
riot faker-import firstName="name.firstName"
Some providers take parameters:
riot faker-import lease="number.digits(2)"
Refer to Datafaker Providers for complete documentation.
Built-in Fields
In addition to the Faker fields specified with field="expression"
you can use these built-in fields:
index
-
current iteration number.
thread
-
current thread id. Useful for multithreaded data generation.
riot faker-import --count 8000 --threads 8 id="thread*10000+index" firstName="name.firstName" lastName="name.lastName" address="address.fullAddress" hset --keyspace person --keys id
Databases
RIOT can import from and export to databases.
Database Import
The db-import
command imports data from a relational database into Redis.
Ensure RIOT has the relevant JDBC driver for your database. See the Drivers section for more details. |
riot -h <redis host> -p <redis port> db-import --url <jdbc url> SQL [REDIS COMMAND...]
To show the full usage, run:
riot db-import --help
You must specify at least one Redis command as a target.
Redis connection options apply to the root command ( In this example the Redis options will not be taken into account:
|
The keys that will be written are constructed from input records by concatenating the keyspace prefix and key fields.
riot db-import "SELECT * FROM orders" --url "jdbc:postgresql://host:port/database" --username appuser --password passwd hset --keyspace order --keys order_id
riot db-import "SELECT * FROM orders" --url "jdbc:postgresql://host:port/database" --username appuser --password passwd set --keyspace order --keys order_id
This will produce Redis strings that look like this:
{
"order_id": 10248,
"customer_id": "VINET",
"employee_id": 5,
"order_date": "1996-07-04",
"required_date": "1996-08-01",
"shipped_date": "1996-07-16",
"ship_via": 3,
"freight": 32.38,
"ship_name": "Vins et alcools Chevalier",
"ship_address": "59 rue de l'Abbaye",
"ship_city": "Reims",
"ship_postal_code": "51100",
"ship_country": "France"
}
Database Export
Use the db-export
command to read from a Redis database and writes to a SQL database.
The general usage is:
riot -h <redis host> -p <redis port> db-export --url <jdbc url> SQL
To show the full usage, run:
riot db-export --help
riot db-export "INSERT INTO mytable (id, field1, field2) VALUES (CAST(:id AS SMALLINT), :field1, :field2)" --url "jdbc:postgresql://host:port/database" --username appuser --password passwd --scan-match "gen:*" --key-regex "gen:(?<id>.*)"
Database Drivers
RIOT relies on JDBC to interact with databases. It includes JDBC drivers for the most common database systems:
-
jdbc:oracle:thin:@myhost:1521:orcl
-
jdbc:sqlserver://[serverName[\instanceName][:portNumber]][;property=value[;property=value]]
-
jdbc:mysql://[host]:[port][/database][?properties]
-
jdbc:postgresql://host:port/database
For non-included databases you must install the corresponding JDBC driver under the
|
Replication
Most Redis migration tools available today are offline in nature. Migrating data from AWS ElastiCache to Redis Enterprise Cloud for example means backing up your Elasticache data to an AWS S3 bucket and importing it into Redis Enterprise Cloud using its UI.
Redis has a replication command called REPLICAOF but it is not always available (see ElastiCache restrictions). Instead, RIOT implements client-side replication using dump & restore or type-based read & write. Both snapshot and live replication modes are supported.
Please note that RIOT is NEITHER recommended NOR officially supported by Redis, Inc. |
Usage
riot <source> replicate <target> --mode <snapshot|live|compare> [--type] [OPTIONS]
For the full usage, run:
riot replicate --help
Replication Mode
Live
This mode builds upon snapshot replication by listening for changes on the source Redis database. Whenever a key is modified its corresponding value is read and propagated to the target Redis database.
Live replication relies on keyspace notifications for capturing these changes.
Make sure the source database has keyspace notifications enabled using:
-
redis.conf
:notify-keyspace-events = KA
-
CONFIG SET notify-keyspace-events KA
For more details see Redis Keyspace Notifications.
The live replication mechanism does not guarantee data consistency. Redis sends keyspace notifications over pub/sub which does not provide guaranteed delivery. It is possible that RIOT can miss some notifications in case of network failures for example. Also, depending on the type, size, and rate of change of data structures on the source it is possible that RIOT cannot keep up with the change stream. For example if a big set is repeatedly updated, RIOT will need to read the whole set on each update and transfer it over to the target database. With a big-enough set, RIOT could fall behind and the internal queue could fill up leading up to updates being dropped. For those potentially problematic migrations it is recommend to perform some preliminary sizing using Redis statistics and |
Replication Type
Dump & Restore
The default replication mechanism is Dump & Restore:
-
Scan for keys in the source Redis database. If live replication is enabled the reader also subscribes to keyspace notifications to generate a continuous stream of keys.
-
Reader threads iterate over the keys to read corresponding values (DUMP) and TTLs.
-
Reader threads enqueue key/value/TTL tuples into the reader queue, from which the writer dequeues key/value/TTL tuples and writes them to the target Redis database by calling RESTORE and EXPIRE.
Type-Based Replication
There are situations where Dump & Restore cannot be used, for example:
-
The target Redis database does not support the RESTORE command (Redis Enterprise CRDB)
-
Incompatible DUMP formats between source and target (Redis 7.0)
In those cases you can use another replication strategy called Type-Based Replication where each key is introspected to determine the type of data structure and which read/write commands to use:
Type | Read | Write |
---|---|---|
Hash |
|
|
JSON |
|
|
List |
|
|
Set |
|
|
Sorted Set |
|
|
Stream |
|
|
String |
|
|
TimeSeries |
|
|
This replication strategy is more intensive in terms of CPU, memory, and network for the machines running RIOT as well as the source and target Redis databases. Adjust number of threads, batch and queue sizes accordingly. |
riot -h source -p 6379 replicate -h target -p 6380 --batch 10
riot -h source -p 6379 replicate -h target -p 6380 --mode live
riot -h source -p 6379 replicate -h target -p 6380 --type
riot -h source -p 6379 replicate -h target -p 6380 --type --mode live
Progress Reporting
Each process (scan iterator and/or event listener in case of live replication) has a corresponding status bar that shows the process name and its progress:
- Scanning
-
Percentage of keys that have been replicated ⇒ replicated / total. The total number of keys is calculated when the process starts and it can change by the time it is finished (for example if keys are deleted or added during the replication). The progress bar is only a rough indicator.
- Listening
-
Progress is indefinite as total number of keys is unknown.
Compare
Once replication is complete, RIOT performs a verification step by iterating over keys in the source database and comparing values and TTLs between source and target databases.
The verification step happens automatically after the scan is complete (snapshot replication), or for live replication when keyspace notifications have become idle (see Usage section).
Verification can also be run on-demand using the compare
mode:
riot <source> replicate --mode compare <target>
The output looks like this:
missing: 123, type: 54, value: 7, TTL: 19
- missing
-
Number of keys only present in source database
- type
-
Number of keys with mismatched data structure type
- value
-
Number of keys with mismatched value
- TTL
-
Number of keys with mismatched TTL i.e. difference is greater than tolerance (can be specified with
--ttl-tolerance
)
There are 2 comparison modes available through --compare-mode
:
-
Quick (default): compares key types
-
Full: compares key types, values, and TTLs
To show which keys are different use the --show-diffs
option:
riot <source> replicate <target> --show-diffs
Performance Tuning
Performance tuning is an art but RIOT offers some options to identify potential bottlenecks.
In addition to batch and threads options you have the --dry-run
option which disables writing to the target Redis database so that you can tune the reader in isolation.
Add that option to your existing replicate
command-line to compare replication speeds with and without writing to the target Redis database:
riot <source> replicate <target> --dry-run
Architecture
RIOT is essentially an ETL tool where data is extracted from the source system, transformed (see Processing), and loaded into the target system.
Batching
Processing in RIOT is done in batches: a fixed number of records is read from the source, processed, and written to the target.
The default batch size is 50
, which means that an execution step reads 50 items at a time from the source, processes them, and finally writes then to the target.
If the target is Redis, writing is done in a single command (Redis Pipelining) to minimize the number of roundtrips to the server.
You can change the batch size (and hence pipeline size) using the --batch
option.
The optimal batch size in terms of throughput depends on many factors like record size and command types (see Redis Pipeline Tuning for details).
Multi-threading
It is possible to parallelize processing by using multiple threads. In that configuration, each chunk of items is read, processed, and written in a separate thread of execution. This is different from partitioning where items would be read by multiple readers. Here, only one reader is being accessed from multiple threads.
To set the number of threads, use the --threads
option.
riot db-import "SELECT * FROM orders" --url "jdbc:postgresql://host:port/database" --username appuser --password passwd --threads 3 hset --keyspace order --keys order_id
Processing
RIOT lets you transform incoming records using processors.
These processors allow you to create/update/delete fields using the Spring Expression Language (SpEL).
For example, import commands like file-import
, database-import
, and faker-import
have a --proc
option that allow for field-level processing:
-
field1='foo'
→ generate a field namedfield1
containing the stringfoo
-
temp=(temp-32)*5/9
→ convert from Fahrenheit to Celsius -
name=remove(first).concat(remove(last))
→ concatenatefirst
andlast
fields and delete them -
field2=null
→ deletefield2
Input fields are accessed by name (e.g. field3=field1+field2
).
Processors have access to the following context variables and functions:
date
-
Date parsing and formatting object. Instance of Java SimpleDateFormat.
redis
-
Redis commands object. Instance of Lettuce RedisCommands.
geo
-
Convenience function that takes a longitude and a latitude to produce a RediSearch geo-location string in the form
longitude,latitude
(e.g.location=#geo(lon,lat)
)
riot file-import --proc epoch="#date.parse(mydate).getTime()" location="#geo(lon,lat)" name="#redis.hget('person1','lastName')" ...
You can register your own variables using --var
.
riot file-import https://storage.googleapis.com/jrx/lacity.csv --var rnd="new java.util.Random()" --proc randomInt="#rnd.nextInt(100)" --header hset --keyspace event --keys Id
Filtering
Filters allow you to exclude records that don’t match a SpEL boolean expression.
For example this filter will only keep records where the value
field is a series of digits:
riot file-import --filter "value matches '\\d+'" ...
Replication
The basic replication mechanism is as follows:
-
Identify source keys to be replicated using scan and/or keyspace notifications depending on the replication mode.
-
Read data associated with each key using dump or type-specific commands.
-
Write each key to the target using restore or type-specific commands. = Cookbook
Here are various recipes using RIOT.
Ping
The ping
command can be used to test connectivity to a Redis database.
riot -h <host> -p <port> ping <options>
When the command is complete you will see statistics like these:
[min=0, max=19, percentiles={50.0=1, 90.0=3, 95.0=6, 99.0=10, 99.9=17}]
Migrating from Elasticache
This recipe contains step-by-step instructions to migrate an Elasticache (EC) database to Redis Enterprise (RE).
The following scenarios are covered:
-
One-time (snapshot) migration
-
Online (live) migration
It is recommended to read the Replication section to familiarize yourself with its usage and architecture. |
Setup
Prerequisites
For this recipe you will require the following resources:
-
AWS ElastiCache: Primary Endpoint in case of Single Master and Configuration Endpoint in case of Clustered EC. Refer to this link to learn more
-
Redis Enterprise: hosted on Cloud or On-Prem
-
An Amazon EC2 instance
Keyspace Notifications
For a live migration you need to enable keyspace notifications on your Elasticache instance (see AWS Knowledge Center). |
Migration Host
To run the migration tool we will need an EC2 instance.
You can either create a new EC2 instance or leverage an existing one if available. In the example below we first create an instance on AWS Cloud Platform. The most common scenario is to access an ElastiCache cluster from an Amazon EC2 instance in the same Amazon Virtual Private Cloud (Amazon VPC). We have used Ubuntu 16.04 LTS for this setup but you can choose any Ubuntu or Debian distribution of your choice.
SSH to this EC2 instance from your laptop:
ssh -i “public key” <AWS EC2 Instance>
Install redis-cli
on this new instance by running this command:
sudo apt update
sudo apt install -y redis-tools
Use redis-cli
to check connectivity with the Elasticache database:
redis-cli -h <ec primary endpoint> -p 6379
Ensure that the above command allows you to connect to the remote Elasticache database successfully.
Installing RIOT
Let’s install RIOT on the EC2 instance we set up previously. For this we’ll follow the steps in Manual Installation.
Performing Migration
We are now all set to begin the migration process. The options you will use depend on your source and target databases, as well as the replication mode (snapshot or live).
EC Single Master → RE
riot -h <source EC host> -p <source EC port> replicate -h <target RE host> -p <target RE port> --pass <RE password>
Live EC Single Master → RE
riot -h <source EC host> -p <source EC port> replicate --mode live -h <target RE host> -p <target RE port> --pass <RE password>
In case ElastiCache is configured with AUTH TOKEN enabled, you need to pass
|
EC Cluster → RE
riot -h <source EC host> -p <source EC port> --cluster replicate -h <target RE host> -p <target RE port> --pass <RE password>
--cluster is an important parameter used ONLY for ElastiCache whenever cluster-mode is enabled.
Do note that the source database is specified first and the target database is specified after the replicate command and it is applicable for all the scenarios.
|
EC Single Master → RE (with specific db index)
riot -h <source EC host> -p <source EC port> --db <index> replicate -h <target RE host> -p <target RE port> --pass <RE password>
Important Considerations
-
As stated earlier, this tool is not officially supported by Redis Inc.
-
It is recommended to test migration in UAT before production use.
-
Once migration is completed, ensure that application traffic gets redirected to Redis Enterprise Endpoint successfully.
-
It is recommended to perform the migration process during low traffic so as to avoid chances of data loss.
Frequently Asked Questions
- Logs are cut off or missing
-
This could be due to concurrency issues in the terminal when refreshing the progress bar and displaying logs. Try running with job options
--progress log
. - Unknown options: '--keyspace', '--keys'
-
You must specify one or more Redis commands with import commands (
file-import
,faker-import
,db-import
). - ERR DUMP payload version or checksum are wrong
-
Redis 7 DUMP format is not backwards compatible with previous versions. To replicate between different Redis versions, use Type-Based Replication.
- Process gets stuck during replication and eventually times out
-
This could be due to big keys clogging the replication pipes. In these cases it might be hard to catch the offending key(s). Try running the same command with
--info
and--progress log
so that all errors are reported. Check the database withredis-cli
Big keys and/or use reader options to filter these keys out. - NOAUTH Authentication required
-
This issue occurs when you fail to supply the
--pass <password>
parameter.