Sergey Nivens - Fotolia

Hitachi Pentaho integration aims to subdue unstructured data

Hitachi Pentaho data integration presents Hitachi Content Platform as an object-based data lake to send cleansed data to multiple cloud targets.

Hitachi Content Platform object storage has received a surcharge of AI power from Pentaho data-integration software.

Hitachi Pentaho 8.2 embeds Pentaho's data analytics and management in Hitachi Content Platform, turning it into a data lake to stage the movement of cleansed data to storage on hybrid clouds.

Customers can connect to Hitachi object storage directly through Pentaho, with flexibility to drag and drop objects between any data platform. That includes the ability to delete, update and write data from within Hitachi object storage.

Arik Pelkey, a senior director of product marketing for Hitachi Pentaho, said data centers can now build "data pipelines" from structured and unstructured data.

"Broadly speaking, we are facilitating easier data management by way of being able to connect with other types of infrastructure," Pelkey said.

Estimated rates of unstructured data growth vary by analyst firm, but the trajectory is always upward due to an explosion of data from email, IoT sensors, video and text.

Integrating Pentaho in Hitachi object storage separates compute and storage for analyzing data across multiple clouds, said Matt Aslett, a research vice president of data, AI and analytics at 451 Research.

"Many enterprises are leaving the data where it resides and spinning up a separate layer of compute to analyze the data as required, rather than move it all into a monolithic platform" for analysis, Aslett said.

Vehicle information company Carfax started to implement a Hitachi Pentaho stack in stages about two months ago. One objective was to improve governance of its data across operations in Canada, Europe and the U.S., Carfax Director of Data Management Andrew Buffone said.

Carfax lands its Hitachi stack on a private hosted cloud that connects to both Amazon Web Services and Microsoft Azure at the Pentaho tier. Buffone said the initial use is analysis and preparation of source-side data.

"We took an iterative approach. We're still in flight on certain components of the Hitachi stack -- we don't have the [hyper-converged infrastructure] component (Hitachi Unified Compute Platform hyper-convergence). We only have about 10% of our data footprint going through the Pentaho now, but our plan is to scale up to about 90% by the June timeframe," Buffone said.

The Hitachi-Pentaho integration has been several years in the making. Hitachi acquired Pentaho in 2015. The company launched the Hitachi Vantara brand in 2017, which combined Hitachi storage (formerly Hitachi Data Systems), Hitachi Insight IoT technologies and Pentaho.

Pelkey said the Pentaho integration illustrates how Hitachi is working to strengthen ties between analytics and Hitachi Vantara's overall storage. "There is a strong mandate for more collaboration across all the different product teams at Hitachi. This is just one example of the direction we're going in."

Dig Deeper on Primary storage devices

Disaster Recovery
Data Backup
Data Center
and ESG