Sergey Nivens - Fotolia

Amazon's PartiQL query language eyes all data sources

PartiQL, a multisource query language developed internally at Amazon, is now open sourced under Apache 2.0, and may benefit enterprises with complex data-management needs.

Enterprises have another option for data management tasks with PartiQL, an open-source query language that AWS says can tap into data spread across multiple types of information stores in a unified fashion.

The move comes after a proliferation of AWS data management service options. PartiQL could also help customers wrangle a problem AWS itself has experienced.

Broad distribution of information among many types of databases, from standard relational ones to NoSQL engines and graph databases, hampers many enterprises, AWS said in a blog post. While each type of data store has its purpose, this generates a multitude of query languages that are tightly aligned with each kind, according to the company.

Amazon created PartiQL to meet internal demand for multi-source data queries across structured, semi-structured and unstructured data. Those needs came from various corners of Amazon's business, including its retail arm.

Amazon released PartiQL's tutorial materials, specification and reference implementation under the Apache 2.0 license. AWS has used PartiQL internally for services such as S3 Select, Glacier Select and RedShift Spectrum, and it was adopted as a query language for Quantum Ledger Database, which AWS launched last year.

PartiQL is compatible with standard SQL, which means enterprises can use existing queries with PartiQL in conjunction with SQL query processors. It also treats nested data as a first-class citizen, and doesn't require predefined schemas to be placed on a data set. PartiQL does contain SQL extensions, but they are minimal and simple for DBAs and developers to understand, AWS claims. Finally, PartiQL is data format-independent, which means one query applies to JSON, ORC, CSV and other data types.

Amazon takes on established idea with PartiQL

Doug HenschenDoug Henschen

This certainly isn't the first attempt at one SQL-centric query language for all data, said Doug Henschen, an analyst with Constellation Research in Cupertino, Calif. Apache Presto is supported by Starburst, a spinoff of Teradata, and it also underpins AWS Athena, while Apache Drill, an offshoot of Google BigQuery, is entirely in open source.

But with PartiQL, Amazon appears to desire its own standard for a one-fits-all query language.

Versions of the [unified query language] idea have been around since the 1980s, but the trick is to get good performance.
Curt MonashPresident, Monash Research

"With its cloud adoption and the array of services it's including in the initiative, it may well succeed, at least where AWS customers are concerned," Henschen said.

The true test of PartiQL's success will be its adoption beyond the AWS ecosystem, particularly given the criticism lodged at AWS for its alleged habit to take from open source projects but not contribute enough back.

"We'll also have to see whether query performance management and SLA capabilities are in the mix," Henschen said. Starburst, for one, pushes enterprise-grade capabilities and performance guarantees as part of its support for Presto, he added.

Indeed, for PartiQL, the devil is in the details of its performance, said Curt Monash, a longtime database industry watcher and founder of Monash Research in Acton, Mass.

"Versions of the idea have been around since the 1980s, but the trick is to get good performance, or else to write queries so simple that performance won't be a problem," he said.

Next Steps

Apache Drill improves big data SQL query engine

Dig Deeper on AWS database and analytics strategy

App Architecture
Cloud Computing
Software Quality