Organizations that seek more value from data have many strategies from which to choose. Be sure to understand the options and their respective limitations to pick the right data architecture.
Organizations shouldn't overlook data needs and data strategy when shopping for tools. If they do, they could make suboptimal technology choices and underestimate data governance, security and privacy, said Srujan Akula, CEO of The Modern Data Company, which provides a data operating system.
"Professionals should prioritize communication, involve stakeholders and ensure a comprehensive understanding of their organization's objectives and requirements before implementing any data architecture solution," Akula said. In addition, staff training and skills development are crucial parts of technology adoption.
Data mesh is the latest chapter in the evolution of data architectures. Data analytics architectures started with data warehouses before evolving into data lakes. Data mesh is the third version for organizations to consider.
"Data mesh addresses both the needs of scale and variety of data, as well as the speed of deriving insights from these systems," said Ravi Mayuram, CTO of open source NoSQL database company Couchbase.
This article explores what data mesh is and how it differs from other common approaches, including data warehouses, data lakes and data fabrics. It also provides practical advice for organizations that implement a data mesh approach.
What is data mesh?
Data mesh addresses the challenges of scaling data and analytics in complex organizations. Data mesh is a decentralized data architecture that organizes data by domains and is predominately people and process focused. Zhamak Dehghani, CEO of Nextdata, pioneered the concept while she was at technology consultancy Thoughtworks.
It has four core principles:
- Domain ownership of data. Domain teams own their data and grant data access.
- Data as a product. Domain teams are responsible for the data's quality.
- Self-service. Data is available via self-service.
- Data governance. Governance enables trust in the data mesh through transparency of ownership and usage and provides a framework for accountability of the data products.
This approach contrasts data mesh with centralized data teams and structures. These centralized teams try to solve all the problems, said Lior Gavish, CTO of data observability solution vendor Monte Carlo Data. Data mesh should help businesses to scale data teams. "How can we enable a lot of different teams to use data effectively and independently of each other?" Gavish said.
Data mesh vs. data warehouse
A data warehouses tends to be monolithic and loads data into a single environment, functioning as a repository of data that supports analytics and decision-making. A data mesh enables a distributed environment where data doesn't have to move to supply business value. A data warehouse and a data mesh are not mutually exclusive, because a data warehouse can be part of a data mesh.
The philosophy behind a data warehouse is to create a single version of the truth and centralize it under IT's control. The data warehouse is the data platform; it is where users store and build data products.
"Data mesh focuses on more of an organizational mindset that treats data as first-class products owned by individual domains," said Dipankar Mazumdar, developer advocate at Dremio, an open data lake solution provider.
There are downsides to the data warehouse approach.
"Monolithic data drives complex change management processes [and] creates protracted ramp-up times for new technical people," said Jon Osborn, field CTO at data pipeline automation company Ascend.io. "[It also] feeds a never-ending engineering backlog with requests that should be self-serviced."
Data mesh vs. data lake
Like a data warehouse, a data lake centralizes data storage and processing, though a data lake can store both structured and unstructured data in primarily file or object storage. It too can become part of a data mesh.
"The data mesh concept relies on a mesh layer that weaves operational data sources and domain-specific data lakes together," Mayuram said.
Fundamentally, when assessing the data lake or mesh approach, or combination of both, a data leader must understand whether the architectures to manage distributed data are appropriate for their organization. Large organizations with complex architectures can suffer from data silos and accessibility issues. This makes integrating data across different sources daunting, said Bob Audet, a partner and a data management leader at Guidehouse, a consulting, digital and managed services firm.
"Data consumers and data curators cannot find the right data, which makes it hard stay ahead of the competition and keep pace with rapidly changing business needs," Audet said.
Data mesh vs. data fabric
The goal of a data fabric is to integrate disparate sources and provide a centralized, holistic view of an organization's data assets. This contrasts with data mesh's focus on decentralized data ownership and architecture. Both aim to support diverse use cases for the data at the organization.
"Each domain or business unit has its ownership of its own data products, which are managed and governed locally," Mazumdar said, describing data mesh. "This means that data is treated as a product and that domain teams are responsible for the quality, governance and lifecycle of their own data products."
Jon OsbornField CTO, Ascend.io
The data fabric approach to data management creates a unified, integrated view of data across the organization. It's built on the idea that data should be easily accessible and discoverable, and organized in a way that makes it easy to combine and analyze. Data fabric is typically implemented via a combination of technologies.
"Data fabric ... [is] the first technology strand that truly begins to de-silo application data -- an advance that's been long awaited," said Sylvie Veilleux, advisory board member at the nonprofit Data Collaboration Alliance, and former Dropbox CIO. "The modern data ecosystem is incredibly complex, connecting every kind of pipeline from databases to data lakes."
Data fabric uses an architecture to establish a connection between the data and metadata that exist in organizational silos, Veilleux said. With data fabric, permission-based systems control access to data, whereas in data mesh functional owners control the data and its access. This means it does not need permission from a central control authority.
This is "a crucial step toward ending the age-old practice of making endless copies of even sensitive data," Veilleux said.
Advice for practitioners
There is no single perfect data mesh implementation. Organizations can benefit from even simple or partial implementations, according to Osborn.
"A working mesh strategy will produce more approachable data and allow more fingers on keyboards to use the data," Osborn said. "Analysts, data scientists, report builders and, potentially, businesspeople will be able to participate. Plan for it."
All data strategies have underlying assumptions that must be true for it to work as intended. To bypass avoidable mistakes, organizations must understand these assumptions. According to Osborn, the three core data mesh assumptions are as follows:
- Technology and business domain experts are available to define and construct meaningful data domains.
- Data pipeline and data sharing capability and technologies exist within the environment. A lack of maturity here will devolve into an underwhelming and time-consuming build-it-yourself model.
- A functional governance strategy can help define and communicate standards and other expectations.