Evaluating data warehouse platform options and your need for one 5 data integration challenges and how to overcome them

Why organizations need a solid data governance strategy

The flood of data flowing into data warehouses, data lakes and other systems makes effective data governance a must for successful business analytics initiatives.

When I started working as a database administrator in 1983, IT was all about centralization. Data was safely kept...

on the corporate mainframe, and only those programmers who had the skills to navigate prerelational databases could access it. Nearly four decades later, it's all about data democratization -- and the corresponding need for a strong data governance strategy.

Back in the day, business analysts had to go cap in hand to the IT department because they didn't know how to navigate an IMS or IDMS database and wouldn't have been granted access even if they could. The IT department printed out monthly reports and distributed them, like Moses descending from the mountain with tablets of stone. There was no real notion of, or requirement for, data governance.

With the advent of the PC, the balance of power shifted radically. Suddenly, business users had access to spreadsheets and could create their own calculations and analyses, even if corporate data was still mostly out of reach. Then came client-server computing and a rush to decentralize that data, which brought giddying new possibilities but also confusion, as different departments used different versions of data and fought over whose version was correct.

Data analytics could now be done by business analysts and executives. But without agreement on the legitimacy of the data sources, chaos ensued. The mushrooming data silos hampered business intelligence (BI) and analytics efforts, and the first glimmers of a need to better govern data came to light.

The dawn of data governance

IT initially responded to those challenges with the data warehouse, which gathered up data from disconnected transaction systems for the sole purpose of enabling analytics. In addition, clever BI and reporting tools appeared that made it easier to manipulate, join and summarize raw tables of transaction data for analysis -- maybe even by downloading the data to spreadsheets.

Sure, the original data was still stored in different applications and formats. But with enough effort, a data warehouse could be coaxed into making sense of it all by providing consolidated data in multiple dimensions, such as customer, product, asset and location. However, to actually produce consistent sets of data, the inconsistencies of the underlying systems had to be resolved.

Data governance survey results
Survey results on enterprise data governance programs

Master data management (MDM) was born and, alongside it, the concept of a data governance strategy arose. Business users were encouraged or cajoled into deciding which classifications of customers and products were "golden records" to be held aloft across the enterprise and which were to be cast into the wilderness of department-specific, local terminology. This frequently was -- and still is -- an acrimonious process, with different departments and data owners arguing over the best way to classify data.

Organizations also began creating enterprise data governance programs, typically with a data governance council that sets data policies and data stewards who oversee data and ensure that the policies are implemented. Some corporate cultures suit this approach more than others. Highly centralized companies are used to having things dictated from on high, but decentralized ones often rail against that and struggle to keep within ordained data governance boundaries. Analysts in such companies think of themselves as freedom fighters, whereas IT managers may regard them as data terrorists.

Data management strategy breakdowns

It seems clear that, in a lot of companies, the freedom fighters are now in the ascendant. A sign of this is the growing market for self-service data preparation tools. These products can access data from a wide variety of sources, including traditional databases, big data systems, packaged business applications, Excel and systems outside the corporate firewall. They enable some data quality techniques, such as data profiling, and empower business users and data scientists to set up data transformations and manipulate data to their heart's content. Extracts, transformations and data scrubbing can also be automated via repeatable workflow processes.

Data now comes from so many sources and in such volumes that traditional data management approaches are breaking down.

Such a market simply wouldn't exist if corporate data warehouses and MDM systems were doing their job. Data quality checks and transformations are supposed to happen before data is fed into a data warehouse. The trouble is that the data warehouse has been stretched beyond its natural limits. Data now comes from so many sources and in such volumes that traditional data management approaches are breaking down.

E-commerce systems may generate web traffic logs of such size that normal databases can't handle the processing. Sensors on vehicles and machinery -- airplanes, cars, smart meters, elevators, pipelines and so forth -- transmit huge volumes of streaming data. All this is in addition to corporate transaction data, plus data from business partners and data brokers.

With that much data coming at you, who has time for meetings to discuss the merits of different customer classification hierarchies or the attributes of a particular data asset as part of an MDM project?

The benefits of effective data governance

Nonetheless, corporations need to take back control of this fast-flowing stream of data if they are to make sense of it for enterprise business analytics uses -- also to help boost data security and compliance with GDPR, the California Consumer Privacy Act and other new laws that regulate the use of personal data by companies.

Developing a data governance strategy may not be a sexy topic, but it's at the heart of what needs to be done from data collection and processing to analytics applications. Data lakes can become data swamps if there's no way to peer into their depths and bring some order to the data pooled there. And all the pretty charts that analysts create with BI and analytics tools mean nothing if you can't agree on what the underlying data means -- and whether it's trustworthy.

In the absence of some governance structure, we'll just be back to the old days, with analysts waving their charts at each other and arguing over whose data is correct. Putting the data genie back in the bottle is difficult, but in all too many organizations, things now feel chaotic rather than managed. It's not about imposing rules from the top down but about embedding data management and analytics discipline throughout the layers of the organization. Otherwise, valuable business insights may be overlooked, and potential competitive advantages lost.

A 2018 McKinsey report found that high-performing companies were twice as likely as others to say they had a strong data governance strategy, and more than twice as likely to say they had a clear and well-understood data strategy overall. The survey-based report also reckoned that the gap between the high performers and the pack was growing rapidly. Time is of the essence if you are to fully exploit analytics opportunities and gain business benefits from them.

Dig Deeper on Data governance