What is data governance and why does it matter?
Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.
A well-designed data governance program typically includes a governance team, a steering committee that acts as the governing body, and a group of data stewards. They work together to create the standards and policies for governing data, as well as implementation and enforcement procedures that are primarily carried out by the data stewards. Ideally, executives and other representatives from an organization's business operations take part, in addition to the IT and data management teams.
While data governance is a core component of an overall data management strategy, organizations need to focus on the expected business benefits of a governance program for it to be successful, independent consultant Nicola Askham wrote in a January 2022 blog post. Eric Hirschhorn, chief data officer at The Bank of New York Mellon Corp., made the same point in a session during the 2022 Enterprise Data World Digital conference. "Outcomes can't just be good governance," he said. "Outcomes have to be running better businesses."
This comprehensive guide to data governance further explains what it is, how it works, the business benefits it provides, best practices and the challenges of governing data. You'll also find an overview of data governance software and related technologies that can aid in the governance process. Throughout the guide, hyperlinks point to related articles that cover the topics being addressed in more depth.
Why data governance matters
Without effective data governance, data inconsistencies in different systems across an organization might not get resolved. For example, customer names may be listed differently in sales, logistics and customer service systems. That could complicate data integration efforts and create data integrity issues that affect the accuracy of business intelligence (BI), enterprise reporting and analytics applications. In addition, data errors might not be identified and fixed, further affecting BI and analytics accuracy.
Poor data governance can also hamper regulatory compliance initiatives. That could cause problems for companies that need to comply with the increasing number of data privacy and protection laws, such as the European Union's GDPR and the California Consumer Privacy Act (CCPA). An enterprise data governance program typically includes the development of common data definitions and standard data formats that are applied in all business systems, boosting data consistency for both business and compliance uses.
Data governance goals and benefits
A key goal of data governance is to break down data silos in an organization. Such silos commonly build up when individual business units deploy separate transaction processing systems without centralized coordination or an enterprise data architecture. Data governance aims to harmonize the data in those systems through a collaborative process, with stakeholders from the various business units participating.
Another data governance goal is to ensure that data is used properly, both to avoid introducing data errors into systems and to block potential misuse of personal data about customers and other sensitive information. That can be accomplished by creating uniform policies on the use of data, along with procedures to monitor usage and enforce the policies on an ongoing basis. In addition, data governance can help to strike a balance between data collection practices and privacy mandates.
Besides more accurate analytics and stronger regulatory compliance, the benefits that data governance provides include improved data quality; lower data management costs; and increased access to needed data for data scientists, other analysts and business users. Ultimately, data governance can help improve business decision-making by giving executives better information. Ideally, that will lead to competitive advantages and increased revenue and profits.
Who's responsible for data governance?
In most organizations, various people are involved in the data governance process. That includes business executives, data management professionals and IT staffers, as well as end users who are familiar with relevant data domains in an organization's systems. These are the key participants and their primary governance responsibilities.
Chief data officer. The chief data officer (CDO) -- if there is one -- is often the senior executive who oversees a data governance program and has high-level responsibility for its success or failure. The CDO's role includes securing approval, funding and staffing for the program; playing a lead role in setting it up; monitoring its progress; and acting as an advocate for it internally. If an organization doesn't have a CDO, another C-suite executive will usually serve as an executive sponsor and handle the same functions.
Data governance manager and team. In some cases, the CDO or an equivalent executive -- the director of enterprise data management, for example -- may also be the hands-on data governance program manager. In others, organizations appoint a data governance manager or lead specifically to run the program. Either way, the program manager typically heads a data governance team that works on the program full time. Sometimes more formally known as the data governance office, it coordinates the process, leads meetings and training sessions, tracks metrics, manages internal communications and carries out other management tasks.
Data governance committee. The governance team usually doesn't make policy or standards decisions, though. That's the responsibility of the data governance committee or council, which is primarily made up of business executives and other data owners. The committee approves the foundational data governance policy and associated policies and rules on things like data access and usage, plus the procedures for implementing them. It also resolves disputes, such as disagreements between different business units over data definitions and formats.
Data stewards. The responsibilities of data stewards include overseeing data sets to keep them in order. They're also in charge of ensuring that the policies and rules approved by the data governance committee are implemented and that end users comply with them. Workers with knowledge of particular data assets and domains are generally appointed to handle the data stewardship role. That's a full-time job in some companies and a part-time position in others. There can also be a mix of IT and business data stewards.
Data architects, data modelers and data quality analysts and engineers are usually part of the governance process, too. In addition, business users and analytics teams must be trained on data governance policies and data standards, so they can avoid using data in erroneous or improper ways. You can learn more about data governance roles and responsibilities and how to structure a governance program in a related article by technology writer George Lawton.
Components of a data governance framework
A data governance framework consists of the policies, rules, processes, organizational structures and technologies that are put in place as part of a governance program. It also spells out things such as a mission statement for the program, its goals and how its success will be measured, as well as decision-making responsibilities and accountability for the various functions that will be part of the program. An organization's governance framework should be documented and shared internally, so it's clear to everyone involved -- upfront -- how the program will work.
On the technology side, data governance software can be used to automate aspects of managing a governance program. While data governance tools aren't a mandatory framework component, they support program and workflow management, collaboration, development of governance policies, process documentation, the creation of data catalogs and other functions. They can also be used in conjunction with data quality, metadata management and master data management (MDM) tools.
Data governance implementation
Data governance should be a strategic initiative for organizations. In an article on creating a data governance strategy, Donald Farmer, principal of consultancy TreeHive Strategy, recommended a series of steps to take, including the following to-do items:
- identify data assets and existing informal governance processes;
- increase the data literacy and skills of end users; and
- decide how to measure the success of a governance program.
Before implementing a data governance framework, another step cited by Farmer is identifying the owners or custodians of different data assets across an enterprise and getting them -- or designated surrogates -- involved in the governance program. The CDO, executive sponsor or dedicated data governance manager then takes the lead in creating the program's structure, working to staff the data governance team, identify data stewards and formalize the governance committee.
Once the structure is in place, the real work of governing data begins. The data governance policies and data standards must be developed, along with rules that define how data can be used by authorized personnel. In addition, a set of controls and audit procedures are needed to ensure ongoing compliance with internal policies and external regulations and to guarantee that data is used in a consistent way across applications. The governance team should also document where data comes from, where it's stored and how it's protected from misuse and security attacks.
Data governance initiatives usually also include the following elements:
- Data mapping and classification. Mapping the data in systems helps document data assets and how data flows through an organization. Different data sets can then be classified based on factors such as whether they contain personal information or other sensitive data. The classifications influence how data governance policies are applied to individual data sets.
- Business glossary. A business glossary contains definitions of business terms and concepts used in an organization -- for example, what constitutes an active customer. By helping to establish a common vocabulary for business data, business glossaries can aid governance efforts.
- Data catalog. Data catalogs collect metadata from systems and use it to create an indexed inventory of available data assets that includes information on data lineage, search functions and collaboration tools. Information about data governance policies and automated mechanisms for enforcing them can also be built into catalogs. Read about the key steps for building a data catalog in an article by Anne Marie Smith, vice president of education and chief methodologist at data management consulting firm EWSolutions.
Best practices for managing data governance initiatives
Because data governance typically imposes restrictions on how data is handled and used, it can become controversial in organizations. A common concern among IT and data management teams is that they'll be seen as the "data police" by business users if they lead data governance programs. To promote business buy-in and avoid resistance to governance policies, experienced data governance managers and industry consultants recommend that programs be business-driven, with data owners involved and the data governance committee making the decisions on standards, policies and rules.
Training and education on data governance is a necessary component of initiatives, particularly to familiarize business users and data analysts with data usage rules, privacy mandates and their own responsibility for helping to keep data sets consistent. Ongoing communication with corporate executives, business managers and end users about the progress of a data governance program is also a must, through a combination of reports, email newsletters, workshops and other outreach methods.
Communication and training are part of a set of seven data governance best practices outlined by Farmer in a second article. Some of the others include applying data security and privacy rules as close to the source system as possible, putting appropriate governance policies in place at every level of an organization and reviewing governance policies on a regular basis.
Gartner analyst Saul Judah recommends an adaptive data governance approach that applies different governance policies and styles to individual business processes. He also has listed these seven foundations for successfully governing data and analytics applications:
- a focus on business value and organizational outcomes;
- internal agreement on data accountability and decision rights;
- a trust-based governance model that relies on data lineage and curation;
- transparent decision-making that hews to a set of ethical principles;
- risk management and data security included as core governance components;
- ongoing education and training, with mechanisms to monitor their effectiveness; and
- a collaborative culture and governance process that encourages broad participation.
Professional associations that promote best practices in data governance processes include DAMA International and the Data Governance Professionals Organization. The Data Governance Institute, an organization founded in 2003 by then-consultant Gwen Thomas, has published a data governance framework template and a variety of guidance on governance best practices. Some of the information is openly available on its website, while other materials can be accessed only by paid members. Similar guidance is also available elsewhere -- for example, in the DataManagementU online library maintained by EWSolutions.
Data governance challenges
Often, the early steps in data governance efforts can be the most difficult because different parts of an organization commonly have diverging views of key data entities, such as customers or products. These differences must be resolved as part of the data governance process -- for example, by agreeing on common data definitions and formats. That can be a fraught and fractious undertaking, which is why the data governance committee needs a clear dispute-resolution procedure.
Other common challenges that organizations face on data governance include the following.
Demonstrating its business value. Without upfront documentation of a data governance initiative's expected business benefits, getting it approved, funded and supported can be a struggle. In her January 2022 blog post, Askham said business executives want to know what's in it for them at the outset of a governance program. "If you can't answer that in a way that they really are interested in and benefits them, they're just not going to be interested," she wrote.
On an ongoing basis, demonstrating business value requires the development of quantifiable metrics, particularly on data quality improvements. That could include the number of data errors resolved on a quarterly basis and the revenue gains or cost savings that result from them. Other common data quality metrics measure accuracy and error rates in data sets and related attributes, such as data completeness and consistency. Read more about the close ties between data governance and data quality, plus other kinds of metrics that can also be used to show the value of a governance program.
Supporting self-service analytics. The shift to self-service BI and analytics has created new data governance challenges by putting data in the hands of more users in organizations. Governance programs must make sure data is accurate and accessible for self-service users, but also ensure that those users -- business analysts, executives and citizen data scientists, among others -- don't misuse data or run afoul of data privacy and security restrictions. Streaming data that's used for real-time analytics further complicates those efforts.
Governing big data. The deployment of big data systems also adds new governance needs and challenges. Data governance programs traditionally focused on structured data stored in relational databases, but now they must deal with the mix of structured, unstructured and semistructured data that big data environments typically contain, as well as a variety of data platforms, including Hadoop and Spark systems, NoSQL databases and cloud object stores. Also, sets of big data are often stored in raw form in data lakes and then filtered as needed for analytics uses, further complicating data governance.
Key data governance pillars
Data governance programs are underpinned by several other facets of the overall data management process. Most notably, these facets include the following:
- Data stewardship. As discussed earlier, a data steward is responsible for a portion of an organization's data. Data stewards also help implement and enforce data governance policies. Often, they're data-savvy business users who are subject matter experts in their domains. Data stewards collaborate with data quality analysts, database administrators and other data management professionals. They also work with business units to identify data requirements and issues.
- Data quality. Data quality improvement is one of the biggest driving forces behind data governance activities. Data accuracy, completeness and consistency across systems are crucial hallmarks of successful governance initiatives. Data cleansing, also known as data scrubbing, fixes data errors and inconsistencies, and it also correlates and removes duplicate instances of the same data elements to harmonize how customers or products are listed in different systems. Data quality tools provide those capabilities through data profiling, parsing and matching functions, among other features. Get tips on managing data quality improvement efforts in an article by Chris Foot, a senior strategist and consultant at IT services provider RadixBay.
- Master data management. MDM is another data management discipline that's closely associated with data governance processes. MDM initiatives establish a master set of data on customers, products and other business entities to help ensure that the data is consistent in different systems across an organization. As a result, MDM naturally dovetails with data governance. Like governance programs, though, MDM efforts can create controversy in organizations because of differences between departments and business units on how to format master data. In addition, MDM's complexity has limited its adoption, as compared with data governance. But the combination of the two has led to a shift toward smaller-scale MDM projects driven by data governance goals.
Data governance is also related to information governance, which focuses more broadly on how information is used overall in an organization. At a high level, data governance can be viewed as a component of information governance, but they're generally considered to be separate disciplines with similar aims. Get an explanation of how data and information governance differ in an article by Lawton.
Data governance use cases
Effective data governance is at the heart of managing the data used in operational systems, as well as the BI and analytics applications fed by data warehouses, data marts and data lakes. It's also a particularly important component of digital transformation initiatives, and it can aid in other corporate processes, such as risk management, business process management, and mergers and acquisitions.
As data uses continue to expand and new technologies emerge, data governance is likely to see even wider application. For example, efforts are underway to apply data governance processes to machine learning algorithms and other AI tools. Also, high-profile data breaches and laws like GDPR and CCPA have made building privacy protections into data governance policies a central part of governance efforts.
Data governance vendors and tools
Data governance tools are available from various vendors. That includes major IT vendors, such as Oracle, SAP and SAS Institute Inc., as well as data management specialists like Alation, ASG Technologies, Ataccama, Collibra, Informatica, OneTrust, Precisely, Quest Software, Semarchy, Syniti and Talend. In most cases, the governance tools are offered as part of larger suites that also incorporate metadata management features and data lineage functionality.
Many of the data governance and metadata management platforms include data catalog software, too. It's also available as a standalone product from Alation, Alex Solutions, Atlan, Data.world, Hitachi Vantara, IBM, OvalEdge and numerous other vendors, as well as cloud platform market leaders AWS, Google and Microsoft.