New – Export Amazon DynamoDB Table Data to Your Data Lake in Amazon S3, No Code Writing Required
New – Export Amazon DynamoDB Table Data to Your Data Lake in Amazon S3, No Code Writing Required
Hundreds of thousands of AWS customers have chosen Amazon DynamoDB for mission-critical workloads since its launch in
2012. DynamoDB is a nonrelational managed database that allows you to store a virtually infinite amount of data and retrieve it
with single-digit-millisecond performance at any scale.
To get the most value out of this data, customers had to rely on AWS Data Pipeline, Amazon EMR, or other solutions based on
DynamoDB Streams. These solutions typically require building custom applications with high read throughput, resulting in
expensive maintenance and operational costs.
Today we are launching a new functionality that allows you to export DynamoDB table data to Amazon Simple Storage Service
(Amazon S3) – no code writing required.
This is a new native feature of DynamoDB, so it works at any scale without having to manage servers or clusters and allows you
to export data across AWS Regions and accounts to any point-in-time in the last 35 days at a per-second granularity. Plus, it
doesn’t affect the read capacity or the availability of your production tables.
Once your data is exported to S3 — in DynamoDB JSON or Amazon Ion format — you can query or reshape it with your favorite
tools such as Amazon Athena, Amazon SageMaker, and AWS Lake Formation.
In this article, I’ll show you how to export a DynamoDB table to S3 and query it via Amazon Athena with standard SQL.
You can start by clicking Export to S3 in the Streams and exports tab.
Unless you’ve already enabled continuous backups, in the next page you must enable them by clicking Enable PITR.
You provide your bucket name in Destination S3 bucket, for example s3://my-dynamodb-export-bucket. Keep in mind that your
bucket could also be in another account or another region.
Feel free to have a look at the Additional settings, here’s where you can configure a specific point in time, the output format,
and the encryption key. I’m going to use the default settings.
Now you can confirm the export request by clicking Export.
The export process begins and you can monitor its status in the Streams and exports tab.
Once the export process completes, you’ll find a new AWSDynamoDB folder in your S3 bucket and a sub-folder corresponding to
the Export ID.
Bash
aws dynamodb export-table-to-point-in-time \
--table-arn TABLE_ARN \
--s3-bucket BUCKET_NAME \
--export-time 1596232100 \
--s3-prefix demo_prefix \
-export-format DYNAMODB_JSON
{
"ExportDescription": {
"ExportArn": "arn:aws:dynamodb:REGUIB:ACCOUNT_ID:table/TABLE_NAME/export/EXPORT_ID",
"ExportStatus": "IN_PROGRESS",
"StartTime": 1596232631.799,
"TableArn": "arn:aws:dynamodb:REGUIB:ACCOUNT_ID:table/TABLE_NAME",
"ExportTime": 1596232100.0,
"S3Bucket": "BUCKET_NAME",
"S3Prefix": "demo_prefix",
"ExportFormat": "DYNAMODB_JSON"
}
}
After requesting an export, you’ll have to wait until the ExportStatus is “COMPLETED”.
Bash
aws dynamodb list-exports
{
"ExportSummaries": [
{
"ExportArn": "arn:aws:dynamodb:REGION:ACCOUNT_ID:table/TABLE_NAME/export/EXPORT_ID",
"ExportStatus": "COMPLETED"
}
]
}
You’ll find many gz-compressed objects in your S3 bucket, each containing a text file with multiple JSON objects, one per line.
These JSON objects correspond to your DynamoDB items wrapped into an Item field, and with a different structure based on
which export format you chose.
In the export process above, I’ve chosen DynamoDB JSON, and items in my sample table represent users of a simple game, so a
typical object looks like the following.
JSON
{
"Item": {
"id": {
"S": "my-unique-id"
},
"name": {
"S": "Alex"
},
"coins": {
"N": "100"
}
}
}
I’d recommend using AWS Glue crawlers to autodiscover the schema of your data and to create a virtual table in your AWS Glue
catalog.
But you could also define a virtual table manually with a CREATE EXTERNAL TABLE statement.
SQL
CREATE EXTERNAL TABLE IF NOT EXISTS ddb_exported_table (
Item struct <id:struct<S:string>,
name:struct<S:string>,
coins:struct<N:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://my-dynamodb-export-bucket/AWSDynamoDB/{EXPORT_ID}/data/'
TBLPROPERTIES ( 'has_encrypted_data'='true');
Now you can query it with regular SQL, or even define new virtual tables with Create Table as Select (CTAS) queries.
With the DynamoDB JSON format, your query looks like this.
SQL
SELECT
Item.id.S as id,
Item.name.S as name,
Item.coins.N as coins
FROM ddb_exported_table
ORDER BY cast(coins as integer) DESC;