Sergey Nivens - Fotolia
Since its release in 2012, DynamoDB, AWS' nonrelational database service, has offered developers a way to store key-value pairs in the cloud without managing any underlying infrastructure. But some important DynamoDB features were lacking, including backup and table management, which left the service at risk of falling behind in the database market.
In the past, users had to back up DynamoDB data in a roundabout way. Amazon suggested they use AWS Data Pipeline to spin up Amazon Elastic MapReduce (EMR) clusters to process data from DynamoDB and store it in Simple Storage Service. This cumbersome process often left developers with unusable backups that were difficult to test. It was also relatively difficult to synchronize tables across regions. IT had to either rely on one table as a master or support multiple versions and manually deal with conflicting write operations.
Fortunately, Amazon recognized these problems and unveiled two major DynamoDB features at re:Invent 2017: Global Tables and Backup and Restore. Global Tables and built-in backups enable developers to build multiregion, resilient applications. These new DynamoDB features ease tasks that were previously difficult, and every development team can -- and should -- try to implement them.
DynamoDB Global Tables
While there typically aren't many issues that impact multiple availability zones (AZs) within a single AWS region, there are even fewer that impact multiple regions at once. Most large companies use multiple AWS regions to boost redundancy and reduce latency; the closer your data is to end users, the quicker they can access your content. Previously, DynamoDB would automatically balance loads across multiple AZs for increased availability, but that didn't help developers span databases across multiple regions.
DynamoDB Global Tables changes that. After developers create a new DynamoDB table, they can enable the table for global access. This action is only available on empty tables, so developers must enable Global Tables before they use or add data to them.
Once enabled, developers can choose multiple regions as table hosts. Each additional region creates a complete table replica that synchronizes with the others. This enables developers to write data to any region with a Global Table, and the data synchronizes across all of the selected regions. Global Tables are similar to a multiprimary SQL data store but replicate across completely different regions.
Keep in mind that DynamoDB itself is an eventually consistent database, meaning writes typically propagate within milliseconds. But when dealing with Global Tables, changes could take several seconds to propagate, as the service must copy data across the open internet, instead of just internally on Amazon's network. This also makes it incredibly important to only write to a single item from one location at a time.
DynamoDB Backup and Restore
Amazon's previous, cumbersome backup method could take hours to run. Developers would have to spin up an EMR cluster and use valuable read capacity. Now, with the native DynamoDB features for backup, the process is nearly instantaneous and doesn't require additional read capacity.
Developers can use the Backup tab in AWS Management Console or issue an API call to initiate a backup. DynamoDB will have two types of backups: On-Demand backups and automated backups through Point-in-Time Restore (PITR). The latter is currently in limited preview. Automated backups spin up and delete automatically, while On-Demand backups remain until a developer removes them. If you plan to keep backups long term -- as you might want to do before a data purge, for example -- On-Demand backups might be the better option.
When automated PITR becomes available in 2018, developers will have an even easier method to create backups and protect against accidental data removal. Until then, developers can use Lambda with Amazon CloudWatch to create their own semiautomated backups. Amazon suggests that developers use Lambda and a small Python script, but they can use any method to call the API to create a backup on any time schedule. But be sure to remove older backups, as AWS charges $0.10 per GB per month.
Developers can only initiate restore operations through AWS Management Console, but these backups restore to a new table and don't replace an existing table. Developers must migrate data back to the original table, which also means they can perform partial restores of individual items. To do this, they can copy over specific items from the newly created table to the original.