your123 - stock.adobe.com
Data warehouses help serve as a facilitator of data storage, assisting the integration, summarization, and transformation of data to make it easier to analyze for business intelligence. Teams can get the most out of their data warehouses by adopting new strategies that take advantage of cloud architectures.
The whole nature and use of data warehouses have seen a massive shift with the rise of the cloud and new kinds of data infrastructure for data integration, storage and management. Even as some enterprises mull the use of data lakes for storing everything, data lakes still offer a lot of value in providing faster access and a more coherent structure for regular analytics. In addition, new tools for streaming analytics, data preparation and master data management can help organizations adopt better data warehouse strategies. Here are six strategies that can help enterprises make the most of the new cloud data warehouse landscape.
1. Identify process bottlenecks
Adam Nathan, CEO and founder of the Bartlett System, a data analytics consultancy, has been involved in implementing BI systems for more than 15 years. Although he has seen incremental progress over that period, he said that recent advances in cloud services may fundamentally alter the way BI pros tap into data warehouses to benefit BI.
"The bottleneck for getting access to data, cleaning it, prepping it and integrating it with different data sources has really been the province of data engineers in overstretched IT teams," he said.
Adding to this challenge is the fact that the owners of the data -- those who truly know the semantic value of the information -- are cut off from the data itself.
"Somebody who barely has the time to help and only sort-of understands my data is the very person who can't get me what I need quickly enough," Nathan said.
2. Empower citizen data engineers
Teams can now conduct more data preparation activities in SQL, including those for big data and semi-structured data. According to Nathan, data engineering has been democratized as SQL skills have become ubiquitous.
As the tools have become easier to use and require fewer discreet skills, there is less need for centralized experts familiar with multiple languages and technologies. Nathan believes this is leading to the rise of citizen data engineers -- mirroring similar trends in application development, analytics and other domains.
It is also getting easier to share data. For example, a Snowflake user can expose data sets to data consumers -- partners, customers, vendors, fellow knowledge workers -- in discrete, curated sets.
"If I'm an analyst with decent SQL skills in HR, I can curate and share my group's data without needing outside support," Nathan said.
This is important because it removes IT as a bottleneck during data preparation. IT teams have more time to focus on staging the raw data.
3. Set up curation management
Next up, teams need to simplify the way the right data gets into the data warehouses in the first place. With IT out of the picture, an organization can start thinking about its data as a collection of individual, curated, mastered and certified data sets coming out of each group in an enterprise.
Someone who wants HR data can go to the data sets HR shares. HR experts, who understand how the data was collected and why, keep this data up to date, manage the stewardship and can provide the appropriate context for users in other departments who want to use this data for different analytics.
"Each department is essentially offering its data value from its own storefront," Nathan said.
Others can request permission to access it but are not able to alter or change it. This kind of framework also makes it easier to integrate data from a given department or with other data sets available across the enterprise and even external data sets, which tools like Snowflake are great at sharing.
A curation management strategy shifts data quality oversight from one central department to each individual team that can provide greater oversight into a smaller piece. As a result, data consumers can trust that the best HR data is coming out of the HR share site.
"It's a smaller burden, which is more manageable," Nathan said.
4. Establish contracts
Distributing curation also creates a new challenge -- data sets need to be consistent and altered with extreme care and security.
"This kind of governance is a skill -- and poorly implemented governance on data can become a free-for-all, which is probably the biggest risk," Nathan said.
Data management teams need to work with each department to help craft data contracts establishing service-level agreements for the data they provide. Contracts help level set everyone's expectations of reliability, cleanliness and timeliness.
"This will probably pain IT because of the governance issues and a reduced role, but it's a good problem to work through," Nathan said.
5. Consider different perspectives
"Data warehouses have the dubious reputation of being large, unwieldy data stores that are difficult to navigate, hence making them unsuitable for real-time analytics and decision-making," Avneet Dugal, vice president of Global Insights and Data at Capgemini, said.
One challenge she sees is that teams try and move all data possible into the data warehouse. Moving large volumes of data to another platform and rebuilding empirically trusted data is a costly exercise. For example, organizations can make the data easier to view and use by organizing it based on business focus -- supply chain, finance or marketing.
Dugal also finds it helpful to build "delta" updates as part of core processing capabilities. This makes it easier to surface data changes to various analytics use cases and reduces the need to churn through all the data to include the last day's updates.
6. Streamline data workflows
It's also important to consider some of the gaps between the strategic and tactical levels of management, according to Alex Bekker, head of the data analytics department at ScienceSoft, an international IT consulting and software development company.
One aspect of this is to set up a well-crafted data governance framework to ensure the data warehouse ingests high-quality data that is processed and stored securely and is only accessed according to user roles.
It is also helpful to choose data warehouse software with vast integration capabilities, such as prebuilt data source connectors and open APIs, to ensure data warehouse scalability. This helps add new data sources to address changing business needs.
Another aspect is to automate data warehouse maintenance and administration activities around integration, quality, security and backups. This reduces data warehouse operational costs and ensures high performance and availability.
Veronica Zhai, principal analytics technical product manager at Fivetran, also recommends centralizing key business logic in one place. For example, key business logic such as "what is net revenue?" should be defined once, in code, in a version-controlled place where all analysts and business users can reuse that piece of code. This also saves time and ensures consistency in reporting.