DataOps -- sometimes described as "DevOps for data" -- is still a new concept for many organizations. As with DevOps, the principal goal of a DataOps framework is to generate more value for the business more efficiently, in this case by accelerating BI and advanced analytics processes.
The approach incorporates Agile software development principles and uses automated testing, containerization, orchestration and monitoring to speed the production of data pipelines for BI and analytics applications. Like Agile, DataOps also often entails major organizational and cultural change, including breaking down silos across IT operations and software development teams and encouraging line-of-business stakeholders to work with data engineers, data scientists and BI analysts. A successful DataOps framework not only speeds the delivery of data insights to the business, but also improves data quality.
"A good DataOps function enables data scientists, analysts and BI developers to work with well-formed and well-structured data sets without having to worry about the technical intricacies of how the data got to them," explained Stephen Lynch, head of DevOps and DataOps at Chetwood Financial Limited in Chester, U.K.
Still, as IT and business intelligence operations embark on DataOps for their BI initiatives, they may feel that best practices for this new discipline run contrary to the traditional data management techniques they have long practiced. After all, one of the tenets of DataOps is that BI teams and other data analysts should spend more time analyzing data and less time worrying about where it comes from and how it gets to them, Lynch said.
This article is part of
But the DataOps approach doesn't discount the value of data quality, stressed Nick Heudecker, a research vice president and analyst at Gartner. It assumes that an organization already has in place strong data governance rules and sound documentation procedures. With governance rules and documentation accounted for, there are many benefits to implementing a DataOps methodology to help accelerate BI and analytics applications, he said. Chief among them is an ability to keep up with the demands of a business environment where data is increasing -- and increasingly critical to business success.
"As data and analytics teams become critical to supporting more diverse, complex and mission-critical business processes, many are challenged with scaling the work they do in delivering data to support a range of consumers and use cases," Heudecker said. The constant pressure to deliver high-quality data insights faster in the face of constant change demands that data and analytics leaders rethink how their teams are organized and how they work.
"Traditional waterfall-oriented methodologies aren't meeting the need -- the distance between requirements definition and delivery of value is too great, the time required too long, and too many critical tasks get lost or degraded across role and team silos," he explained. "DataOps techniques can address these challenges through a more agile, collaborative and change-friendly approach to building and managing data delivery pipelines."
DataOps framework: The five principal elements
There are five principal elements of a DataOps framework, explained Daniel Skidmore, senior director of DataOps at Overstock in Salt Lake City, Utah.
- Communication. "DataOps brings local teams, development and operations in communication earlier in the development process, thus avoiding the creation of data silos," Skidmore said.
- Pipeline integration. There are two types of data systems or pipelines in BI initiatives: systems of record and systems of innovation. "Systems of record/value are the established systems with detailed and established rules. They are usually slower to accept change," Skidmore said.
Systems of innovation often involve businesses that need something faster than what the systems of record can provide. So, they set up their own data systems that are more agile and self-service. The goal is to get systems of record and innovation integrated, but this can be challenging, Skidmore noted.
"By introducing Agile methods, DataOps attempts to help with the integration of the two pipelines. This helps to prevent silos," he said.
- A physical data model deployed in Docker containers. "This breaks the monolith into data model containers, and aids in consistent test data creation or generation," Skidmore said.
- Agile development mindset. Skidmore said data teams need to migrate from a traditional waterfall project approach to an Agile one, which can help encourage innovation and speed deployment time.
- Shorter feedback loops. This is a big challenge, Skidmore stressed. "Shrinking the time for communicating between different teams is critical for DataOps success and the adoption of Agile development. Much of the typical delay is not process-related, but due to waiting for feedback from different individuals and teams," he said.
DataOps challenges and how to address them
In implementing a DataOps framework, organizations must understand that DataOps and its challenges will be different for every IT business intelligence organizations, Heudecker said.
"It's important to identify where your bottlenecks are when it comes to data delivery and tackle each bottleneck in turn," Heudecker said, adding that the problem likely isn't technology. "You can't buy DataOps. In most cases, different parties of data producers and consumers simply need better access to each other, guided by unified metrics and shared business goals and outcomes."
To achieve success, Heudecker said organizations should evolve from a top-down "command and control" governance approach to one that is more context-sensitive, encourages innovation, is connected to value and supports distributed groups of users.
Data teams should also become more accessible to the new range of data users, Heudecker said. "This may mean introducing emerging roles, like data product managers, to collaborate and coordinate across data producers and consumers. It may also mean embedding data-centric roles, like data engineers, within business units."
In terms of metrics, "we're still researching what kinds of metrics are meaningful as data delivery styles evolve," Heudecker noted. Some early ideas include two categories of metrics: production-related (such as delivery timeliness, mean time to recovery, data quality, etc.) and customer-related (such as data sharing ability).
As to technology, Heudecker recommended "the monitoring of data pipelines and the automation of data delivery." Essentially, this brings in elements of the familiar DevOps toolkit around continuous deployment and integration common in application development circles.
As parting advice, Skidmore noted that getting the technology right is the easiest part.
"Changing the culture to accept Agile principles and getting everyone on the same page is the hardest challenge by far," Skidmore said. "If you can get all the involved teams to buy into the plan, then success will be much easier to achieve."