pixel - Fotolia

Yellowbrick Manager embraces Kubernetes for data warehouse

Yellowbrick is building out a new unified control plane to help users manage distributed cloud data warehouse deployments. The vendor also advanced its data lake integration.

Yellowbrick Data is looking to make it easier for its users to manage data warehouses across distributed cloud and on-premises deployments with its new Yellowbrick Manager technology.

The vendor, based in Palo Alto, Calif., introduced Yellowbrick Manager on April 6. A technology preview that users can try out is coming in early May, according to the vendor, with general availability set for the second half of 2021. Yellowbrick develops a hybrid data warehouse that can run both on premises as well as in the cloud.

Yellowbrick Manager provides a unified control system that uses the Kubernetes container orchestration system to enable users to manage and control both cloud and on-premises deployments with enhanced performance capabilities.

Alongside the Yellowbrick Manager preview release, the vendor is also set to release an update to its namesake Yellowbrick data warehouse release 5 in early May, with data lake integration enhancements that include native support for cloud object storage including Amazon S3 and Azure Data Lake Storage Gen 2 (ADLS).

John Santaferraro, research director at Enterprise Management Associates, said recent research conducted by his firm shows that the many enterprises have multiple data platforms under management. This scattered situation is further complicated by the fact that one data platform can have hundreds of instances and be in different physical locations on premises and in multiple clouds.

IT professionals are tired of the cost and complexity of trying to manage too many systems and are looking for unified analytics platforms that they can consolidate, simplifying administration and speeding time to insight, Santaferraro noted.

That Yellowbrick can also manage all their hybrid and multi-cloud instances in a single pane of glass amounts to less complexity and lower cost for their customers.
John SantaferraroResearch director, Enterprise Management Associates

"Yellowbrick's move to make data stored in object storage from Amazon, Microsoft, and MinIO accessible through their platform is critical and aligns with the trend toward consolidation," Santaferraro said. "That Yellowbrick can also manage all their hybrid and multi-cloud instances in a single pane of glass amounts to less complexity and lower cost for their customers."

Yellowbrick Manager embraces Kubernetes

The key to enabling the Yellowbrick Manager is the Kubernetes cloud native architecture. 

"Kubernetes provides that common framework that allows us to deploy our containerized software, on our own hardware, but also in the public cloud or even at the network edge," Yellowbrick CTO Mark Cusack said.

Cusack explained that with the common Kubernetes-based deployment approach, the Yellowbrick Manager brings in a single unified control plane. With it, users can provision new data warehouse instances in different clouds, manage existing deployments and monitor deployments.

Yellowbrick Manager
Yellowbrick Manager introduces an improved Load Assistant that helps users specify a data source.

Yellowbrick set to improve data lake integration

As part of the Yellowbrick Manager release, Yellowbrick is preparing the Yellowbrick Data Warehouse 5.2 upgrade with improved data lake integration components.

Cusack noted that a key piece of the update is native object storage support within Yellowbrick. The Yellowbrick data warehouse will connect directly to both cloud and on-premises object stores for loading data. Among the most widely used forms of object storage in the cloud is the Amazon S3 service, which Yellowbrick had supported before, but in a more cumbersome approach.

Cusack said Yellowbrick had used a "somewhat inefficient connector" to get access to S3 in the past. With the new native integration, Cusack said Yellowbrick data can be loaded directly from object storage systems like S3 more efficiently than with the connector model.

"You can do parallel loading from single files or multiple files and users get a very high-performance load from object storage, " he said.

Next Steps

Weigh the benefits and drawbacks of a hybrid data warehouse

Dig Deeper on Data warehousing

Business Analytics
Content Management