Petya Petrova - Fotolia
The open source Presto SQL query engine is continuing to move forward, with big users such as Uber, Alibaba and Facebook using the technology at ever-growing scale.
At the PrestoCon Day virtual event on March 24, Presto users and developers gathered to discuss how the technology is being used and where it is headed in the future.
Presto is a SQL query engine originally developed by Facebook and currently run as an open source project under the governance of the Presto Foundation, which itself is operated by the Linux Foundation. Until December 2020, there were two distinct versions of Presto, with PrestoDB run by the Presto Foundation, and PrestoSQL, which was rebranded as Trino by its backers, including Starburst and Varada, among others.
With the split and confusion of two different Presto versions now in the past, the Presto project is looking to highlight its successes. While Presto is still strongly influenced by Facebook, it has found enterprise support and adoption as well, with vendors such Ahana, which provides a managed Presto service that runs on AWS.
Presto users detail benefits
During a user session at the virtual PrestoCon Day event, several Ahana customers detailed how they use Presto. Among those users is revenue management platform vendor Carbon, which is headquartered in New York City. Jordan Hoggart, data engineer at Carbon, described how the company moved from Amazon Athena to PrestoDB managed by Ahana. Hoggart explained that Athena is basically an implementation of Presto, albeit one that has been customized by Amazon. Hoggart said Athena didn't provide the scalability that Carbon needed to process multiple types of data queries, with different sets of parameters.
"With Ahana, another thing we could do was experiment with using different clusters for different workloads. With Athena we were stuck with one queue that served everything," Hoggart said. "Whereas now if we want, we can spin up a couple different clusters that have different configurations."
Meanwhile, B2B e-commerce marketplace vendor Cartona, based in Giza, Egypt, is also using Ahana's supported version of Presto.
During another user session, Omar Mohamed, senior data engineer at Cartona, explained that the vendor was encountering challenges analyzing data across multiple sources of data including transactional and analytics databases.
Omar MohamedSenior data engineer, Cartona
Mohamed noted that Cartona was getting approximately 200,000 events coming into its data sources every 12 hours. With Cartona expecting to keep growing and having to deal with even more data in the coming months and years, the vendor decided to use Presto to enable fast data queries across the disparate data sources.
"So now we are able to join data queries across our different databases without having to copy or ingest data," Mohamed said. "It's all done in Presto, which saved us hours of planning and manual work."
Ride-share giant Uber is one of the biggest contributors to and users of Presto. In a technical session, Girish Baliga, engineering manager at Uber, noted that over the last six months Uber has established Presto as its de facto SQL query engine for most data and analytics applications. Baliga emphasized that Uber, like many others that use Presto, benefits significantly from Facebook's continued contributions and testing of Presto at scale.
"Facebook does most of the work; we still have to do some original work here because we use different technologies, so we do some additional testing," Baliga said. "But the stability and scalability is really sussed out by the Facebook testing process."
The state of Presto, from Facebook's perspective
In a keynote session, Biswapesh Chattopadhyay, tech lead for data infrastructure and compute at Facebook, outlined the state of Presto and where it is headed in terms of technical direction.
One of the new capabilities that the community has been working on is a process called "auto-awesomize," which automatically configures and tunes Presto for optimal operational deployment.
"There is less and less patience for people to actually spend hours fine-tuning their query performance because they need to run ad hoc queries very quickly and you know time is money," Chattopadhyay said.
He noted that users just want their query engines to automatically figure out how to run a query quickly against any given data set. Presto is developing the auto-awesomize capabilities with advanced query execution technology as well as history-based optimizations that learn from past queries.
Presto is sometimes compared with Apache Spark, which also provides a query engine, as well as a distributed parallel processing framework for running queries at massive scale. There is an effort now to enable Presto to run on top of Spark, using Presto as the query engine and Spark as the underlying framework for parallel processing.
"Presto-on-Spark is something that I'm really excited about because we see it essentially breaking down the scalability limits of Presto," Chattopadhyay said.