Companies constantly struggle with collecting and analyzing trusted data on which to base business decisions. Study after study has shown that senior executives distrust their own data, despite their sizable number of investments in technology.
Larger organizations have many different computer systems. An oil company I consulted for acknowledged that it ran more than 600 significant applications, and just one was its corporate global ERP system. These different applications -- ranging from marketing to logistics -- frequently need to refer to master data, such as customer, product, asset and location. In other words, the applications need access to trusted data that crosses organizational boundaries.
Different parts of an organization have different needs. If, for example, a company sells cans of beverages, then the marketing department cares about the particular brand of beverage, its packaging, its price and special offers, while the logistics department cares about the dimensions and weight of the can and how many cans will fit on a palate. The individual product, therefore, needs to be classified and categorized in different ways.
In reality, a single standard set of product data -- or even customer data -- is rare. In fact, 15 was the median number of competing systems in an enterprise that were generating master data, according a 2013 survey by analyst firm The Information Difference, where I am the CEO. Five years later, the firm's survey yielded almost the same exact number.
The cost of duplication
Some important data, such as supplier data or credit data provided by external sources, isn't even within the corporate firewall. And big data from new sources like sensors, mobile phone towers and e-commerce sites only adds to the complexity.
Duplication of data as well as the consequent lack of consistency and quality can be costly. In the medical industry, a 2018 survey of U.S. healthcare managers by Black Book Market Research found that duplicate patient data added $1,950 to the cost of each inpatient stay, and one-third of all denied insurance claims were due to inaccurate patient information. IBM estimated that poor data cost U.S. companies a staggering $3.1 trillion in 2016.
In a project I consulted on, a single pricing error on just one product in the corporate ERP system across a particular region went unnoticed for months until it was discovered by a master data integration project. Fixing that one error alone paid for the cost of the company's $25 million project rollout.
These kinds of data integration benefits are partly the result of technology, but also due to business ownership of data. In the past, far too many companies assumed that the quality and consistency of trusted data in their IT systems were the responsibility of the IT department. However, very few IT managers have the authority to tell lines of business to change their processes to address the competing versions of data in different departments.
This contradiction in authority has contributed to the rise of business-led data governance structures, typically comprising a small core team, a steering committee and data stewards, charged with ensuring data quality, consistency and accuracy. Ownership of data returns to where it belongs: the business lines themselves.
Synergistic and supportive
Three categories of complementary tools support data governance efforts:
- Data integration. The original data integration tools focused on taking files of data and moving them about from system to system, sometimes merging the data from different systems based on business rules. Later versions performed the same role but in real time rather than batch mode.
- Data quality. These tools complement data integration tools by focusing on fixing data quality at the source. They can profile data to help identify potential issues and apply a range of algorithms to detect incomplete records and common typing errors as well as identify potential matches between records. Recently, the matching algorithms used by these tools have been bolstered by AI techniques, which, for example, study a human domain expert who is assessing possible duplicate records and learns over time what rules can be applied to mimic an expert human.
- Master data management. The idea is to gather data together, such as customer and product information, from underlying source systems and construct a single "golden copy" of key data that can be used to feed into data warehouses and analytics applications. Even if the data can't be realistically standardized at the source, these tools, for instance, can map the differences in product classifications and use business rules to decide which systems have the most trusted data. A customer record updated last week, for example, may be more reliable than one that hasn't been touched in two years.
After industry consolidation in recent years, some large vendors now offer data integration, data quality and master data management technologies in one platform, while some vendors have partnerships with other vendors to provide a complete package of functionality. As increasing proportions of these technologies are deployed in the cloud, almost all of these tools now have cloud as well as on-premises capabilities.
Best practices have shown that a combination of strong data governance and the application of the latest technologies can significantly improve the underlying issue of consistency and quality of trusted data in large organizations. According to an Information Difference survey of 101 large companies in a range of industries, the real-world benefits derived from master data and integration projects included the following, per comments from respondents: "improvement in day sales outstanding," "met regulatory challenges," "improved conversion ratios of campaigns," and "cost avoidance savings of $20 million."
By addressing and solving underlying data issues, the right data management and integration tools in combination with well-conceived data governance practices can yield significant ROI.