Amazon cloud database and data analytics expand
AWS expands its Redshift data warehouse capabilities including managed storage and query acceleration at the re:Invent 2019 conference, alongside a new managed Cassandra database service.
Amazon Web Services is quite clear about it: it wants organizations of all sizes, with nearly any use case, to run databases in the cloud.
At the AWS re:Invent 2019 conference in Las Vegas, the cloud giant outlined the Amazon cloud database strategy, which hinges on wielding multiple purpose-built offerings for different use cases.
AWS also revealed new services on Dec. 3, the first day of the conference, including the Amazon Managed Apache Cassandra Service, a supported cloud version of the popular Cassandra NoSQL database. The vendor also unveiled several new features for the Amazon Redshift data warehouse, providing enhanced data management and analytics capabilities.
"Quite simply, Amazon is looking to provide one-stop shopping for all data management and analytics needs on AWS," said Carl Olofson, an analyst at IDC. "For those who are all in for AWS, this is all good. For their competitors, such as Snowflake competing with Redshift and DataStax competing with the new Cassandra service, this will motivate a stronger competitive effort."
Amazon cloud database strategy
AWS CEO Andy Jassy, in his keynote, detailed the rationale behind Amazon's cloud database strategy and why one database isn't enough.
"A lot of companies primarily use relational databases for every one of their workloads, and the day of customers doing that has come and gone," Jassy said.
There is too much data, cost and complexity involved in using a relational database for all workloads. That has sparked demand for purpose-built databases, according to Jassy.
For example, Jassy noted that ride sharing company Lyft has millions of drivers and geolocation coordinates, which isn't a good fit for a relational database.
For the Lyft use case and others like it, there is a need for a fast, low-latency key value store, which is why AWS has the DynamoDB database. For workloads that require sub-microsecond latency, an in-memory database is best, and that is where ElastiCache fits in. For those looking to connect data across multiple big data sets, a graph database is a good option, which is what the Amazon Neptune service delivers. DocumentDB, on the other hand, is a document database, and is intended for those who work with documents and JSON.
Andy JassyCEO, AWS
"Swiss Army knives are hardly ever the best solution for anything other than the most simple tasks," Jassy said, referring to the classic multi-purpose tool. "If you want the right tool for the right job that gives you differentiated performance productivity and customer experience, you want the right purpose-built database for that job."
Amazon Apache Managed Cassandra
While AWS offers many different databases as part of the Amazon cloud database strategy, one variety it did not possess was Apache Cassandra, a popular open source NoSQL database.
It's challenging to manage and scale Cassandra, which is why Jassy said he sees a need for a managed version running as an AWS service. The Apache Managed Cassandra launched as a preview on Dec. 3 with general availability set for sometime in 2020.
With the managed service there are no clusters for users to manage, and the platform provides single-digit millisecond latency, Jassy noted. He added that existing Cassandra tools and drivers will all work, making it easier for users to migrate on-premises Cassandra workloads to the cloud.
AWS also detailed a series of moves at the conference that enhance its Redshift data warehouse platform. Among the new features Jassy talked about was Lake House, which enables data queries not just in local Redshift nodes but also across multiple data lakes and S3 cloud storage buckets.
"Not surprisingly, as people start querying across both Redshift and S3 they also want to be able to query across their operational databases where a lot of important data sets live," Jassy said. "So today, we just released something called federated query which now enables users to query across Redshift, S3 and our relational database services."
Storage and compute for data warehouse are closely related, but there is often a need to scale storage and compute independently. To that end, AWS announced as part of the Amazon cloud database strategy its new Redshift RA3 instances with managed storage. Jassy explained that as users exhaust the amount of storage available in a Redshift local instance, the RA3 service will move the less frequently accessed data over to S3.
As data is spread across different resources, it generates a need to accelerate query performance. Jassy introduced the new Advanced Query Accelerator (AQUA) for Redshift help meet that challenge.
Jassy said that AQUA provides an innovative way to do hardware accelerated cache to improve query performance. With AQUA, AWS has built a high-speed cache architecture on top of S3 that scale out in parallel to many different nodes. Each of the nodes host custom-designed AWS processors to speed up operations.
"This makes your processing so much faster that you can actually do the compute on the raw data without having to move it," Jassy said.