chanpipat - Fotolia

New Qubole Pipelines Service targets streaming data

Data lake vendor Qubole unveiled a data pipeline service, a new tool designed to enable organizations to manage their increasing amounts of streaming data.

Qubole unveiled a new offering designed to help organizations develop streaming data pipelines.

The data lake vendor, founded in 2011 and based in Santa Clara, Calif., offers a cloud-based open data lake platform designed to foster analytics and machine learning. Its latest offering, Qubole Pipelines Service, was released Aug. 18 and adds to the capabilities of its open data lake by enabling organizations to manage streaming data from varied sources on a single platform so they can quickly turn that real-time data into action.

With augmented intelligence and machine learning along with the internet of things creating more and more streaming data, organizations will increasingly need to turn to platforms that can quickly extract, load and transform that data so they can subsequently use it to make the data-driven decisions that will drive their business.

Qubole Pipelines Service -- which is now in open beta testing after being in dark beta testing until Aug. 18 -- addresses the specific need created by ever-increasing amounts of data coming in at ever-faster rates from an ever-rising number of sources. According to Qubole co-founder and CTO Joydeep Sen Sarma, it both makes streaming data more accessible and lowers the complexity of moving streaming data from one place to another.

"We're now a data lake company, and one of the things we do is manage the data lake for our customers," he said. "So some of [the impetus for Qubole Pipelines Service] is to provide a way to ingest data, and part of that is streaming data, so what this offering does is it allows users to add streaming to their sources."

Streaming data, meanwhile, is an important area of focus for vendors specializing in data management, according to analysts.

Analysts can start contextualizing and analyzing business events faster, and even generating automated actions to capitalize on those events.
Kevin PetrieVice president of research, Eckerson Group

And given the explosion in streaming data, it's critical for organizations to find efficient ways to harness their streaming data without devoting copious amounts of time and resources to the process.

"It's absolutely a need," said Mike Leone, senior analyst at Enterprise Strategy Group. "It's actually a big reason why [our] research shows that data processing is the aspect of the data pipeline that causes the most delays."

"Organizations are struggling to keep up with the speed at which insights are required," he continued. "Trying to leverage real-time, streaming data exacerbates the problem."

Similarly, Kevin Petrie, vice president of research at Eckerson Group, said the ability to capture streaming data can be a significant asset.

"Tools like Qubole Pipelines help enterprises eliminate slow, duplicative and resource-hungry batch processing with real-time streams," he said. "They also help more users create streaming pipelines without coding. As a result, analysts can start contextualizing and analyzing business events faster, and even generating automated actions to capitalize on those events."

Users can develop data pipelines for their streaming data with Qubole Pipelines Service, a new offering from data lake vendor Qubole.
Qubole Pipelines Service enables users to develop data pipelines for their streaming data.

Among the features in Qubole Pipelines Service are:

  • an accelerated development cycle that includes built-in connectors and a code-generation wizard that help customers develop a data pipeline without needing to write code;
  • a stream processing engine that utilizes Apache Spark Structured Streaming to help developers build and deploy streaming applications;
  • comprehensive operational management, including application programming interfaces and user interfaces so engineers can manage streaming applications and get continuous insights; and
  • data management capabilities using Qubole's ACID framework to improve efficiency.

And while vendors such as Confluent and Databricks have developed offerings to help harness streaming data, Sen Sarma said Qubole Pipelines Service marks the vendor's first significant foray into the emerging market.

"We did not have well-advertised streaming capabilities before this," he said. "Largely, we have been used for data engineering, for interactive SQL, for BI-type applications, for data mining, but we didn't have a strong streaming offering. Now, our customers can start writing streaming applications on Qubole."

The analysts, meanwhile, said Qubole Pipelines Service is a strong addition to the vendor's platform.

"Qubole rightly understands that data streaming is a mandatory design pattern for the modern enterprise," Petrie said. "Data lake ecosystems need stream ingestion and processing in order to operate efficiently and meet today's low-latency requirements."

And Leone said that Qubole's new feature is at the front of innovation in maintaining efficient data pipelines.

"This represents an emerging area for vendors to help with the next wave of data-driven maturity," he said. "Leveraging a data lake is really just the start. How can organizations keep up with the amount of management and orchestration that is required to maintain efficient data pipelines? This is where Qubole's announcement really shines."

Next for Qubole, according to Sen Sarma, is continuing to more easily enable data ingestion.

In addition, he said that a focus for Qubole is reducing the complexity of its existing tools. The vendor's platform currently caters mostly to hardcore application developers, but just as Qubole Pipelines Service has low-code templates and connectors that begin to make it accessible to a wider audience, adding more low-code and no-code capabilities are part of the product roadmap.

"Where we want to end up soon is enabling people who are not expert engineers," Sen Sarma said. "We're not quite there yet, but we're definitely going to get there. Broadly, we are in the business of making data lakes highly accessible, cost efficient, less complex to operate and less complex to use."

Dig Deeper on Data science and analytics

Data Management
Content Management