Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.
A well-designed data governance program typically includes a governance team, a steering committee that acts as the governing body, and a group of data stewards. They work together to create the standards and policies for governing data, as well as implementation and enforcement procedures that are primarily carried out by the data stewards. Executives and other representatives from an organization's business operations take part, in addition to the IT and data management teams.
While data governance is a core component of an overall data management strategy, organizations should focus on the desired business outcomes of a governance program instead of the data itself, Gartner analyst Andrew White wrote in a December 2019 blog post. This comprehensive guide to data governance further explains what it is, how it works, the business benefits it provides and the challenges of governing data. You'll also find an overview of data governance software and related tools. Click through the hyperlinks to get expert advice and read about data governance trends and best practices.
Why data governance matters
Without effective data governance, data inconsistencies in different systems across an organization might not get resolved. For example, customer names may be listed differently in sales, logistics and customer service systems. That could complicate data integration efforts and create data integrity issues that affect the accuracy of business intelligence (BI), enterprise reporting and analytics applications. In addition, data errors might not be identified and fixed, further affecting BI and analytics accuracy.
Poor data governance can also hamper regulatory compliance initiatives, which could cause problems for companies that need to comply with new data privacy and protection laws, such as the European Union's GDPR and the California Consumer Privacy Act (CCPA). An enterprise data governance program typically results in the development of common data definitions and standard data formats that are applied in all business systems, boosting data consistency for both business and compliance uses.
Data governance goals and benefits
A key goal of data governance is to break down data silos in an organization. Such silos commonly build up when individual business units deploy separate transaction processing systems without centralized coordination or an enterprise data architecture. Data governance aims to harmonize the data in those systems through a collaborative process, with stakeholders from the various business units participating.
Another data governance goal is to ensure that data is used properly, both to avoid introducing data errors into systems and to block potential misuse of personal data about customers and other sensitive information. That can be accomplished by creating uniform policies on the use of data, along with procedures to monitor usage and enforce the policies on an ongoing basis. In addition, data governance can help to strike a balance between data collection practices and privacy mandates.
Besides more accurate analytics and stronger regulatory compliance, the benefits that data governance provides include improved data quality; lower data management costs; and increased access to needed data for data scientists, other analysts and business users. Ultimately, data governance can help improve business decision-making by giving executives better information. Ideally, that will lead to competitive advantages and increased revenue and profits. Read more about the benefits of a successful data governance strategy and how to build one in an article by data management consultant Andy Hayler.
Who's responsible for data governance?
In most organizations, various people are involved in the data governance process. That includes business executives, data management professionals and IT staffers, as well as end users who are familiar with relevant data domains in an organization's systems. These are the key participants and their primary governance responsibilities.
Chief data officer
The chief data officer (CDO), if there is one, often is the senior executive who oversees a data governance program and has high-level responsibility for its success or failure. The CDO's role includes securing approval, funding and staffing for the program, playing a lead role in setting it up, monitoring its progress and acting as an advocate for it internally. If an organization doesn't have a CDO, another C-suite executive usually will serve as an executive sponsor and handle the same functions.
Data governance manager and team
In some cases, the CDO or an equivalent executive -- a director of enterprise data management, for example -- may also be the hands-on data governance program manager. In others, organizations appoint a data governance manager or lead specifically to run the program. Either way, the program manager typically heads a data governance team that works on the program full time. Sometimes more formally known as the data governance office, it coordinates the process, leads meetings and training sessions, tracks metrics, manages internal communications and carries out other management tasks.
Data governance committee
The governance team usually doesn't make policy or standards decisions, though. That's the responsibility of the data governance committee or council, which is primarily made up of business executives and other data owners. The committee approves the foundational data governance policy and associated policies and rules on things like data access and usage, plus the procedures for implementing them. It also resolves disputes, such as disagreements between different business units over data definitions and formats.
The responsibilities of data stewards include overseeing data sets to keep them in order. They're also in charge of ensuring that the policies and rules approved by the data governance committee are implemented and that end users comply with them. Workers with knowledge of particular data assets and domains are generally appointed to handle the data stewardship role. That's a full-time job in some companies and a part-time position in others; there can also be a mix of IT and business data stewards.
Data architects, data modelers and data quality analysts and engineers are also part of the governance process. In addition, business users and analytics teams must be trained on data governance policies and data standards so they can avoid using data in erroneous or improper ways. You can learn more about data governance roles and responsibilities and how to structure a governance program in a related article.
Components of a data governance framework
A data governance framework consists of the policies, rules, processes, organizational structures and technologies that are put in place as part of a governance program. It also spells out things such as a mission statement for the program, its goals and how its success will be measured, as well as decision-making responsibilities and accountability for the various functions that will be part of the program. An organization's governance framework should be documented and shared internally to show how the program will work, so that's clear to everyone involved upfront.
On the technology side, data governance software can be used to automate aspects of managing a governance program. While data governance tools aren't a mandatory framework component, they support program and workflow management, collaboration, development of governance policies, process documentation, the creation of data catalogs and other functions. They can also be used in conjunction with data quality, metadata management and master data management (MDM) tools.
Data governance implementation
The initial step in implementing a data governance framework involves identifying the owners or custodians of the different data assets across an enterprise and getting them or designated surrogates involved in the governance program. The CDO, executive sponsor or dedicated data governance manager then takes the lead in creating the program's structure, working to staff the data governance team, identify data stewards and formalize the governance committee.
Once the structure is finalized, the real work begins. The data governance policies and data standards must be developed, along with rules that define how data can be used by authorized personnel. Moreover, a set of controls and audit procedures are needed to ensure ongoing compliance with internal policies and external regulations and guarantee that data is used in a consistent way across applications. The governance team should also document where data comes from, where it's stored and how it's protected from mishaps and security attacks.
Data governance initiatives usually also include the following elements.
Data mapping and classification. Mapping the data in systems helps document data assets and how data flows through an organization. Different data sets can then be classified based on factors such as whether they contain personal information or other sensitive data. The classifications influence how data governance policies are applied to individual data sets.
Business glossary. A business glossary contains definitions of business terms and concepts used in an organization -- for example, what constitutes an active customer. By helping to establish a common vocabulary for business data, business glossaries can aid governance efforts.
Data catalog. Data catalogs collect metadata from systems and use it to create an indexed inventory of available data assets that includes information on data lineage, search functions and collaboration tools. Information about data governance policies and automated mechanisms for enforcing them can also be built into catalogs. Consultant Anne Marie Smith details the key steps for building a data catalog.
Best practices for managing data governance initiatives
To the extent that data governance may impose strictures on how data is handled and used, it can become controversial in organizations. A common concern among IT and data management teams is that they'll be seen as the "data police" by business users if they lead data governance programs. To promote user buy-in and avoid resistance to governance policies, experienced data governance managers and industry consultants recommend that programs be business-driven, with data owners involved and the governance committee making the decisions on standards, policies and rules.
"Only by agreeing to corporate-wide data governance with responsibility by business units will the foundations be laid for successful data management across the enterprise," Hayler wrote in an article about the need to eliminate incompatible data silos.
Training and education on data governance is a necessary component of initiatives, particularly to familiarize business users and data analysts with data usage rules, privacy mandates and their responsibility for helping to keep data sets consistent. Ongoing communication with corporate executives, business managers and end users about the progress of a data governance program is also a must, via a combination of reports, email newsletters, workshops and other outreach methods.
In a report published in October 2019, Gartner analyst Saul Judah listed these seven foundations for successfully governing data and analytics applications:
- a focus on business value and organizational outcomes;
- internal agreement on data accountability and decision rights;
- a trust-based governance model that relies on data lineage and curation;
- transparent decision-making that hews to a set of ethical principles;
- risk management and data security included as core governance components;
- ongoing education and training, with mechanisms to monitor their effectiveness; and
- a collaborative culture and governance process that encourages broad participation.
Professional associations that promote best practices in data governance processes include DAMA International and the Data Governance Professionals Organization. The Data Governance Institute, an organization founded in 2004 by then-consultant Gwen Thomas, published a data governance framework template and a variety of guidance on governance best practices. It's no longer active, but the information is still available on its website. Similar guidance is also available elsewhere -- for example, in the Data Management University online library maintained by consultancy EWSolutions.
Data governance challenges
Often, the early steps in data governance efforts can be the most difficult because it's characteristic that different parts of an organization have diverging views of key enterprise data entities, such as customers or products. These differences must be resolved as part of the data governance process -- for example, by agreeing on common data definitions and formats. That can be a fraught and fractious undertaking, which is why the data governance committee needs a clear dispute-resolution procedure.
Other common challenges that organizations face on data governance include the following.
Demonstrating its business value. That often starts at the very beginning: "It can be a real struggle to get your data governance initiative approved in the first place," data governance consultant and trainer Nicola Askham wrote in a September 2019 blog post. To help build a business case for a data governance program, Askham recommended that proponents document data quality horror stories and tie the expected outcomes of the program to specific corporate priorities.
On an ongoing basis, demonstrating business value requires the development of quantifiable metrics, particularly on data quality improvements. That could include the number of data errors resolved on a quarterly basis and the revenue gains or cost savings that result from them. Other common data quality metrics measure accuracy and error rates in data sets and related attributes such as data completeness and consistency. Read more about the close ties between data governance and data quality, plus other kinds of metrics that can also be used to show the value of a governance program.
Supporting self-service analytics. The self-service BI and analytics movement has created new data governance challenges by putting data in the hands of more users in organizations. Governance programs must make sure data is accurate and accessible for self-service users, while also ensuring that those users -- business analysts, executives and citizen data scientists, among others -- don't misuse data or run afoul of data privacy and security restrictions. Streaming data that's used for real-time analytics further complicates those efforts.
Governing big data. The deployment of big data systems also adds new governance needs and challenges. Data governance programs traditionally focused on structured data stored in relational databases, but now they must deal with the mix of structured, unstructured and semi-structured data that big data environments typically contain, as well as a variety of data platforms, including Hadoop and Spark systems, NoSQL databases and cloud object stores. Also, sets of big data are often stored in raw form in data lakes and then filtered as needed for analytics uses. A related article offers more details on the challenges and advice on best practices for big data governance.
Key data governance pillars
Data governance programs are underpinned by several other facets of the overall data management process. Most notably, that includes the following:
- Data stewardship. As discussed earlier, an essential responsibility of the data steward is to be accountable for a portion of an organization's data, with job duties in areas such as data quality, security and usage. Teams of data stewards typically are formed to help guide and execute the implementation of data governance policies. Often, they're data-savvy business users who are subject matter experts in their domains, although data steward can also be an IT position. Data stewards collaborate with data quality analysts, database administrators and other data management professionals, while also working with business units to identify data requirements and issues. In his December 2019 blog post, Gartner's White also pointed to an emerging need for analytics stewardship that would handle similar functions specifically for analytics systems, calling it "a missing link in analytics, BI and data science."
- Data quality. Data quality improvement is one of the biggest driving forces behind data governance activities. Data accuracy, completeness and consistency across systems are crucial hallmarks of successful governance initiatives. Data cleansing, also known as data scrubbing, is a common data quality element. It fixes data errors and inconsistencies and also correlates and removes duplicate instances of the same data elements, thus harmonizing the various ways in which the same customer or product may be listed in systems. Data quality tools provide those capabilities through data profiling, parsing and matching functions, among other features. Get tips on managing data quality efforts in an article by managed services strategist and consultant Chris Foot.
- Master data management. MDM is another data management discipline that's closely associated with data governance processes. MDM initiatives establish a master set of data on customers, products and other business entities to help ensure that the data is consistent in different systems across an organization. As a result, MDM naturally dovetails with data governance. Like governance programs, though, MDM efforts can create controversy in organizations because of differences between departments and business units on how to format master data. In addition, MDM's complexity has limited its adoption compared to data governance. But the combination of the two has led to a shift toward smaller-scale MDM projects driven by data governance goals, as explained in a separate article.
- Data governance use cases. Effective data governance is at the heart of managing the data used in operational systems and the BI and analytics applications fed by data warehouses, data marts and data lakes. It's also a particularly important component of digital transformation initiatives, and it can aid in other corporate processes, such as risk management, business process management, and mergers and acquisitions. As data uses continue to expand and new technologies emerge, data governance is likely to gain even wider application. For example, efforts are underway to apply data governance processes to machine learning algorithms and other AI tools. Also, high-profile data breaches and laws like GDPR and the California Consumer Privacy Act have made data protection and privacy more central to governance efforts. Compliance with the GDPR and CCPA privacy directives is another new use case for data governance -- Hayler offers advice on building privacy protections into governance policies to meet those requirements.
Data governance vendors and tools
Data governance tools are available from various vendors. That includes major IT vendors, such as IBM, Informatica, Information Builders, Oracle, SAP and SAS Institute, as well as data management specialists like Adaptive, ASG Technologies, Ataccama, Collibra, Erwin, Infogix and Talend. In most cases, the governance tools are offered as part of larger suites that also incorporate metadata management features and data lineage functionality.
Data catalog software is included in many of the data governance and metadata management platforms, too. It's also available as a stand-alone product from vendors such as Alation, Alteryx, Boomi, Cambridge Semantics and Data.world. Learn more about the features that data catalog software offers, including its governance-related capabilities.