Alluxio 2.9 provides users of the open source data orchestration platform new capabilities designed to scale deployments across multiple environments in a secured approach.
The vendor, based in San Mateo, Calif., develops its namesake data orchestration platform with both open source community and enterprise editions.
Alluxio enables organizations to access data from different storage locations, including on premises and cloud, and then execute queries on top for data analytics, business operations and machine learning applications.
The new release, which became generally available on Nov. 16, follows the Alluxio 2.8 update in May that introduced an enhanced policy engine for data management.
The new update builds on the previous version by introducing cross-environment synchronization features that let organizations more easily control and update multiple Alluxio clusters. Alluxio has also now integrated multi-tenant isolation and policy controls, which enables different groups to more easily use the same cluster.
Alluxio competes against a number of different data orchestration technologies that help organizations bring disparate sources of data together, including the open source Apache Hop platform, Denodo and K2View.
Kevin PetrieAnalyst, Eckerson Research
"Alluxio offers a distinct solution that connects any compute engine to any data store in any location," said Eckerson analyst Kevin Petrie. "This helps enterprises run advanced analytics projects in hybrid and multi-cloud environments."
Multi-tenant isolation can be a boost for data orchestration
Among the new features in Alluxio 2.9 is multi-tenant isolation, which makes it easier for different teams to use the same Alluxio instance.
The updates also make it easier for multiple tenants to use their own storage and compute, but still share metadata, Petrie said.
That means a data science team might have a sandbox to train machine learning models, and a business intelligence team might have a separate platform to manage operational dashboards, Petrie said. The two teams can isolate resources to simplify chargeback while still sharing metadata to help one another.
"It's like having two soccer games on two fields next to one another," Petrie said. "They stay off one another's turf. But the referees share metadata, meaning they enforce the same rules and update one another on the score."
Cross-cluster synchronization expands data orchestration capabilities
The new version is a significant step forward for the scalability of the vendor's platform, said Adit Madan, director of product management at Alluxio.
The cross-cluster feature marks the first time multiple instances of Alluxio can be easily deployed in a way that makes them aware of each other so that they can be managed together, Madan said.
Previously, a common architecture for Alluxio was for each business unit within an organization to have its own copy, which isolated each unit from nearby units that might be accessing the same data lake store. The complexity was further compounded when Alluxio instances were deployed across multiple cloud environments, Madan said.
With the update, each of the different instances that are running -- whether for different business units of the same company in one cloud or across multiple clouds -- can be synchronized. With cross-cluster synchronization, organizations can get a consistent view of data regardless of the environment they're operating in.
As part of enabling easier multi-cloud operations, Alluxio is also releasing an open source Kubernetes operator. The new operator includes the configuration options that will help users to deploy and run Alluxio on Kubernetes environments running in the cloud.
Looking forward, Alluxio will look to fill a number of gaps in the platform, the vendor said.
Alluxio does not currently have a SaaS platform. Rather, organizations deploy the technology onto the cloud on their own.
Madan said there is likely to be an Alluxio cloud SaaS service at some point in the future. The next major step for the vendor, however, will be the Alluxio 3.0 annual release in the first half of 2023, which will have a focus on improving usability for large data sets.