Gabi Moisa - Fotolia
It was something of a sleeper amid a rowdy group of new data management technologies, but NewSQL is new again.
NewSQL databases put special emphasis on SQL transactions, with atomicity, consistency and other processing attributes. At the same time, they distribute data workloads in a new style pioneered by Google, Yahoo and others. As attention was poured on Hadoop and NoSQL in the early years of this decade, NewSQL database alternatives plugged away quietly.
Fast forward to today, and the raucous NoSQL and Hadoop crew has taken on more and more SQL traits. For this and other reasons, NewSQL may be getting a second look in some quarters. As with the more ballyhooed Hadoop and NoSQL alternatives, global data distribution, real-time analytics and machine learning are the key drivers.
Among the NewSQL players -- each with a different technology twist -- are Clustrix Inc., NuoDB Inc., Splice Machine and VoltDB Inc. Each has been busy enhancing its NewSQL database. Clustrix recently increased its support for cluster resiliency. NuoDB improved its query optimizer. VoltDB expanded a program offering access to its enterprise version for development and testing. And Splice Machine launched its database platform on the Amazon Web Services Marketplace.
Enter machine learning
For its part, at last month's Strata Data Conference in New York, NewSQL vendor MemSQL Inc. said it would improve machine learning library support for its MemSQL 6 database engine. The enhancements will include real-time model scoring, image recognition and K-means clustering in SQL, according to Gary Orenstein, chief marketing officer at MemSQL.
Machine learning in real time is an objective for personalization, recommendation engines and other web-based applications that connect predictive analytics with transactional operations -- something Orenstein said is of growing interest to mainstream big data teams that are well versed in SQL database capabilities.
"We want to bring more machine learning into the database," he said. "A lot of the machine learning algorithms can be implemented in very streamlined fashion with SQL."
Enhancements to NewSQL databases can be seen in the reflection of recent NoSQL developments. Databases in the NoSQL camp began life as built-for-purpose engines, stripped of SQL. As they go for broader use cases, databases like Cassandra and MongoDB have added features, including types of SQL support.
Even as NoSQL poster child MongoDB was preparing an initial public offering that it made this month, there were signs of a long-anticipated consolidation among the many NoSQL players. During the summer, sales performance management company CallidusCloud acquired NoSQL vendor OrientDB, and online gambling site Bet365 purchased Basho's Riak NoSQL database technology assets.
Analytics in the mix
"It's hard to bucket NoSQL products into specific use cases any longer," said Gartner analyst Nick Heudecker. "Before, you could look at something like Cassandra and say 'it's good for write-intensive use cases' or something like MongoDB and say 'it's good for read-intensive use cases.'"
That's changing as different NoSQL databases move to the same markets and become less differentiated, Heudecker suggested. "Now you see them handling mixed workloads that include analytics, even including graph analytics," he said.
But, when analytics is in the mix and pilots go into production, SQL has appeal. That's true not just for NoSQL systems, Heudecker said, but for the Hadoop ecosystem, as well. As yet another example of SQL's abiding influence, he pointed to streaming data vendor Confluent's recent Kafka framework update, which added SQL support.
"You can build applications using NoSQL or Hadoop, but at the end of the day, you have to be sure you did something with the data," Heudecker said. "The lingua franca is SQL. No platform can be finished until it has a robust SQL interface."
After the Cambrian explosion
Orenstein has watched as both Hadoop and NoSQL vendors have begun to build out SQL interfaces to systems that are non-SQL at heart. That was a key design factor for the MemSQL database, which first appeared in 2013.
He noted, with some irony, that Google -- the company that helped pioneer work in modern NoSQL stores, and which had the original inspiration for what came to be called Hadoop -- followed up its work on NoSQL key-value stores and distributed file systems with Spanner, a distributed transactional relational database. Initiated a decade ago and detailed by Google in a research paper published in 2012, Spanner was first used internally, before being released commercially as Google Cloud Spanner last May.
"The Google File System and MapReduce can be credited for the Cambrian explosion of Apache Hadoop data analytics and storage projects," Orenstein said. "But, by 2007 and the emergence of Hadoop 1.0, Google was already focused on Spanner, a SQL database."
Gary Orensteinchief marketing officer, MemSQL
More organizations are similarly turning to SQL again now, Orenstein claimed. "The world is coming back full circle," he said. "In the same way Google started focusing on Spanner, we see the industry today with less emphasis on Hadoop and more emphasis on transactional systems that scale. The Hadoop platform wasn't designed for transactions."
Orenstein sees SQL as a gateway -- a way to more easily move data pilots into systems operations. "Now, people are realizing they have to do more than create wonderful models of data; they have to create things they can put into production," he said.
Spanner in the works
Google gave a type of endorsement to NewSQL databases when it formally released Google Cloud Spanner as a product.
While scaling for web applications, Cloud Spanner also offers value in its support for relational consistency in a way familiar to SQL developers, according to a user at a French online financial services company.
"What we wanted was the strong consistency of SQL," said Raphael Simon, CTO and co-founder of Shine.fr, which is set to launch this fall as a web bank that automates the administrative work of freelancers.
"SQL is a universal language," he said. "When you place something in the database, you want to be sure the value will be true. With NoSQL, you may do a write, but you are not sure when it will be true. With banking transactions, it is very important that the data is consistent."
A further advantage of Google Cloud Spanner is integration with Google Cloud and its associated tools, Simon added. "With NoSQL, there is often a lot of integration left for the developer to do."
Some believe Google Cloud Spanner confirms the existence of a sector that was in search of clearer definition.
"For some while, the NewSQL companies had difficulty articulating their value," Gartner's Heudecker said. "But, now, with Google Cloud Spanner, there is something with which they can compare their databases."
But comparisons are becoming more difficult in the NoSQL world, he added. "Now there are fair amounts of overlapping capabilities with NoSQL companies' products."
At the same time, Heudecker noted, NoSQL capabilities are being submerged along with SQL in larger database vendors' offerings. He cited the so-called multimodel database Azure Cosmos DB, released by Microsoft earlier this year, as an example.
The ins and outs of managing big data projects