Getty Images/iStockphoto

Feature

What makes up an analytics pipeline?

Analytics pipelines were traditionally hidden away, but they're changing as more organizations focus on agility for their data. Learn what makes up a successful analytics pipeline.

By

Lisa Morgan
Ben Cole, Executive Editor

Published: 21 Nov 2022

In today's data-driven economy, companies can't afford to have data-related issues, but many still do. Despite the exploding volume of data organizations continue to amass, they're still having trouble accessing and using that data.

To accelerate the speed and accuracy of data analytics insights, data engineers are constructing data analytics pipelines -- or data pipelines -- to operationalize data.

What is a data analytics pipeline?

An analytics pipeline streamlines data flow to improve the speed and quality of insights. Similar to a continuous integration/continuous delivery (CI/CD) pipeline used by a DevOps team, the speed advantage of an analytics pipeline hinges on automating tasks.

"If the owner of a finance group asks me for a cash flow report, I may have to extract the data manually [and] update that record myself," said Dan Maycock, principal of engineering and analysis at hop farm Loftus Labs. "When I'm manually extracting data every time it's requested, it doesn't happen as frequently. If I have a pipeline, that's happening automatically."

According to Pieter Vanlperen, managing partner at PWV Consultants, a process modernization consultancy, other things that require at least some automation in the analytics pipeline include data governance, data quality, data usability and categorization, depending on how advanced the pipeline is.

Having more than one analytics pipeline is common for various reasons, as each may serve a different purpose. Colleen Tartow, director of engineering at Starburst Data, a distributed SQL query engine platform provider, said data engineering is critical to pipeline function as they are often complex and vary in maturity.

"You could have a straightforward cloud-native pipeline using a modern data stack, or you could have a data center-based infrastructure that requires constant management alongside the actual data pipeline itself," she said.

Maycock uses one pipeline to transport data from its original source to a central repository and another pipeline to transport data from the central repository to a map, BI tool or data model.

"In the early 2000s when I started, you were pretty much on your own building and maintaining [pipelines], but that isn't the case anymore," he said.

Chart showing the 5 analytics modes

Other benefits of an analytics pipeline

Analytics pipelines can help organizations achieve higher levels of agility and resiliency, especially when they're built iteratively.

"The idea is that you're iterating on your designs through the canvas on which the pipeline is built. The benefit is higher productivity," said Arvind Prabhakar, CTO of StreamSets, a DataOps platform provider.

Analytics pipelines, like CI/CD pipelines, also provide visibility across the engineering and operations functions, which enables continuous feedback loops, faster iteration and quicker issue resolution. According to Prabhakar, the previous generation for platforms and tooling treated data operations as hidden workloads.

"In this new world of DataOps where every end point, every pipeline is [potentially] the weakest link, you need the ability to constantly monitor and manage because the pipelines themselves are a reflection of how your data architecture is evolving," Prabhakar said.

And cross-functional visibility into the analytics pipeline can help enable process improvements. Data observability makes sure business needs and processes are modeled in the analytics pipeline as well, Prabhakar said.

"These pipelines are not just artifacts of the design choices that data engineers made," he said. "They actually reflect business processes that are engrained in the fabric of the enterprise's data architecture."

Analytics pipeline scalability

Scalability is essential so the data analytics pipeline can adapt to growing data volumes. However, it is also important to consider not only scalability, but also how to integrate with existing analytics capabilities in data architecture.

When building a scalable data analytics pipeline, consider both input data and output data. Knowing the context of input data and how much can help determine the format to store the data and the technology to do so. Consider end users when it comes to output data. Data analysts rely heavily on this information, so the output data must be accessible and transparent for them.

Also consider how much data the analytics pipeline can ingest. Infrastructure must be able to handle a sudden change in data volume, for example, due to business growth. One option is to set up the pipeline in the cloud to allow for further flexibility and, ultimately, scalability.

Challenges with creating an analytics pipeline

The point of an analytics pipeline is to expedite the delivery of data, but a common obstacle is the data itself.

"I might have built a pipeline, but I really don't have any more information because the data warehouse or the data lake I built is so poorly governed that it's a swamp," Vanlperen said.

He said poor governance can quickly make data unusable. It's important to understand which data sources are important and tweak them so they can be useful, he said.

The diversity of data sources can also be problematic.

"Every software platform can have its own API and their own data model [because] there's not necessarily a role in software development specifying how data is presented to a data pipeline or an ETL platform," Maycock said. "Being able to connect to and extract data, depending on how foreign that platform is, can be somewhat difficult, as well as being able to access the information in a consistent way."

Another issue organizations face is that no one is responsible for understanding the full inventory of what data is available in-house and from third-party sources. Some argue that's a telltale sign of needing a chief data officer or at least someone responsible for understanding and operationalizing data.

"Ten years ago, the data engineer was expected to know everything, and they were given a big docket which contained all the specifications of the data infrastructures," Prabhakar said. "Now, the data engineer has no clue of where the data is coming from, who owns it [or] where it originated, let alone the schema, structure and semantics."

Also 10 years ago, data engineers and operations personnel often worked in data silos, which should no longer be the case because disconnects between groups can create friction that slows value delivery. Cross-functional disconnect can also negatively impact business operations. For example, if the analytics pipeline starts losing 10% data, the downstream analytics results would be dubious.

"When you talk about continuous operations, the goal of the pipeline is to establish a tight feedback loop between the data engineers and the operators," Prabhakar said. "You want the pipelines to automatically start raising a flag that something has changed."

Analytics pipelines are essential for any insight-driven organization. When designed and implemented well, they can help a company meet its strategic goals sooner.

Dig Deeper on Business intelligence technology

Search Data Management

Confluent adds A2A support to fuel multi-agent AI networks
Including the open protocol enables users to build an orchestrated network of collaborative agents and could help the vendor ...
18 top big data tools and technologies to know about in 2026
Numerous tools are available to use in big data applications. Here are 18 popular open source big data technologies, with details...
Real-time data streaming for AI: invest where it matters
Don't let batch processing lead to missed opportunities. Build AI systems for continuous data flows that deliver instant ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

Box releases Box Extract, its AI metadata agent
Line-of-business Box users can now tag contracts, reports and other commonly used docs with plain-language instructions, which an...
The top 6 content management trends in 2026
AI technology continues to shape the content management market. It underpins top trends in 2026, including generative AI, agentic...
12 content collaboration platforms for enterprises in 2026
When evaluating content collaboration platforms, business leaders have several options and must choose carefully to find one that...

Search Oracle

Click-to-launch tools pull apps through Oracle Cloud Infrastructure marketplace
Oracle has made it easier for customers to choose and launch third-party software onto its cloud. Now, the question is whether ...
Willis develops app to put a personal touch back in voluntary benefits
Part two of a two-part article: Willis uses PeopleSoft 9.1 to bring back the personal feel to automated insurance selection for ...
Willis develops app for real-time voluntary benefit selection
Part one of a two-part article: Willis uses PeopleSoft 9.1 to create real-time automated insurance selection for voluntary ...

Search SAP

At TechEd, SAP continues to lay down the AI data foundation
New tools to speed up agentic AI development, open SAP platforms and provide access to data products were also touted as helping ...
SAP pitches role-based Joule assistants as ERP work partners
New AI-driven applications for supply chain, procurement and CX also shared the spotlight as SAP strives to portray its broad ...
There are '50 shades of clean core' for SAP customers
In this Q&A, Michael Lemashov and Denis Malov of JDC Group discuss the strategies for SAP customers to achieve a clean core and ...

Close