alphaspirit - Fotolia
Organizations large enough to have one or more data teams typically have a mix of data scientists, data engineers and data analysts on those teams. However, as companies become increasingly digital, they must be able to utilize massive amounts of data intelligently in a timely manner at scale. Achieving all that may require the addition of a DataOps engineer who can help the company operationalize its data.
DataOps is an Agile approach to data management that focuses on improving communication and automation of how data flows between data managers and data users in different parts of the organization. The goal of DataOps is to quickly deliver business value from data. It's also common in a DataOps framework to automate different portions of the data pipeline to improve data usability and value.
The DataOps engineer helps make that all happen.
DataOps engineer vs. data engineer
Some believe data engineer and DataOps engineer roles are interchangeable, even though the latter title is relatively new. The common thread between the two is making data available for use by data scientists, data analysts and others.
Those that perceive the DataOps role as distinctly different typically describe it as a type of architect role. Robert Eichelman, senior technical architect of cloud and big data at ManpowerGroup Global, said DataOps engineers support the data sourcing and utilization cycle by defining and supporting the workspace process and technologies that others use to source, transform and manifest data.
"In most cases, this position knows how the data is utilized across the organization, but they usually never work with it directly," said Eichelman.
Brian Ray, global data science lead at digital transformation firm Maven Wave, said although data engineering is not new, it's reached a point of complexity where organizations need someone focused on the underlying infrastructure to make it more scalable, especially in the cloud. DataOps engineers also oversee the monitoring of data pipelines and infrastructure as well as the governance of the whole data engineering operation.
"DataOps engineers are architects of the enterprise's information stack," said Dave Mariani, co-founder and chief technology officer of AtScale, which offers online analytical processing solutions for enterprise analytics. "In the digital age we live in, data is the difference between winning and losing for enterprises."
DevOps is the model for DataOps
The DevOps movement began about 20 years ago, arising from the realization that the speed of business was outpacing software delivery. To address that gap, many software teams adopted an Agile methodology, but there were still disconnects between the development and operations teams that necessitated the need for DevOps.
Robert EichelmanSenior technical architect of cloud and big data, ManpowerGroup Global
Fast forward two decades, and DevOps teams now adopt continuous integration/continuous delivery, which involves building automated pipelines. Data teams have followed a similar trend with the emergence of automated pipelines.
DevOps and DataOps share several other key similarities:
- an iterative approach that enables greater agility;
- continuous improvement instead of building rigid monoliths;
- cross-functional collaboration instead of waterfall handoffs;
- baked-in security, compliance and governance; and
- a constant feedback loop.
"The DataOps engineer provides data engineers with guidance and design support around workflows and information pipelines and flows, code reviews, all new processes and workflows around utilizing data," Eichelman said. "[They] also help select the tools the overall team uses."
Poor choices in arbitrarily selected tools and technology can cripple an organization's ability to be flexible, he said.
Alicia Frame, director of graph data science at graph database provider Neo4j, said the lack of a DataOps engineer is a sign of an immature team.
"You have some really well-meaning data scientists with a Ph.D. trying to write ETL code, but if they make a mistake in the joins somewhere along the way, they've messed up the data pipeline because it's not their area of expertise," Frame said.
How to hire a DataOps engineer
The best way to hire the "right" DataOps engineer is to understand what your organization wants to accomplish, what constraints exist and how the DataOps engineer role complements other members of the data team.
"I think often when companies are trying to hire DataOps engineers, they don't really know what the role is because it's a newer role, so they don't know what to look for," Frame said.
It's also imperative to consult other members of the data team and ask what specific skills a DataOps engineer should have to work with your pipeline.
"[The role] is often hired without the input of the IT department, who could say, 'This is the architecture. These are the things you should probably consider,'" Frame said.
According to Eichelman, companies tend to make the following three mistakes when hiring DataOps engineers:
- failing to hire people with strong communication and people skills;
- moving a data engineer into a DataOps position without compelling evidence that the person is uniquely qualified to shift to address enterprise data management in their role; and
- giving the job to someone who lacks the requisite code-level development skills across multiple languages, tools and architectures.
"The [DataOps engineer] role evolved from problems encountered with data engineering role mistakes and missteps," said Eichelman.
Many organizations with the DataOps engineer role still make major mistakes because there are few blueprinted process solutions, he added. And it can be hard for organizations to influence others across all levels of the enterprise.
The DataOps engineer is a relatively new role that's growing in importance as organizations attempt to operationalize more data. While data scientists and data analysts can help the company drive more business value from data, they need to pull in data sets from different data sources and use it at scale in a governed way. In short, what the DataOps engineer does tends to fall outside the skill sets of other members on the data team.