Data forecast for 2022: Data quality and cloud convergence
Expect data quality to be a top area of investment and activity next year as the need to trust data for operations, insight and machine learning will only continue to grow.
After nearly two years of dealing with pandemic-related challenges for remote work and resource constraints, 2022 will be a pivotal year for organizations to figure out how to continue to optimize operations with data.
Data is the basis on which organizations make decisions with business intelligence and data analytics. It drives operations and is the foundation upon which AI and machine learning literally learn what to do.
Yet despite the central role that data plays in the success and day-to-day operations of many organizations, it hasn't always been given the importance it deserves -- but that could change in 2022.
Data quality becomes central to data management in 2022
Not all data is equal. There can be problems with data lineage, format, timeliness and accuracy that affect the usefulness of data. It's a topic that goes by different names including data health, data hygiene and data quality.
"The big focus, the No. 1 data-centric area that will be getting the most significant investment over the next 12 to 18 months is data quality," said Mike Leone, analyst at Enterprise Strategy Group.
Data quality involves bringing together all the attributes of data and making sure the data can be trusted and useful to power insights and business outcomes.
Lack of trust in data due to potential data quality issues is a primary concern for Christal Belmont, CEO of data integration vendor Talend.
Belmont said it's important for organizations to treat data as an asset to successfully enable businesses. Talend conducted a survey in May 2021 that found 60% of IT executives don't always trust the data they use.
"Treating data as an asset that can be measured, trusted and acted on will provide healthy data for businesses to make critical decisions that drive business outcomes," Belmont said.
Data fragmentation will continue to be a challenge
Meanwhile, enterprise cloud data manager Informatica's chief product officer Jitesh Ghai predicted that data fragmentation will be the biggest challenge facing chief data officers next year to succeed with their digital transformation efforts.
Jitesh GhaiChief product officer, Informatica
Findings from the second annual Informatica Global CDO Survey, released on Dec. 9, revealed that 79% of organizations are using more than 100 data sources, with 30% using more than 1,000 sources. A driver for data fragmentation is that organizations are using hybrid and multi-cloud infrastructure -- a trend that will continue in 2022.
"Acceleration to the cloud will continue in 2022, and hybrid cloud will become the norm as companies are no longer asking 'why move to the cloud?' but 'how fast can we move?'" Ghai said. "It is critical that data leaders invest in the right technologies that enable them to manage data efficiently in a hybrid and multi-cloud environment."
Rise of table formats for data lakes
Among the nascent trends that emerged in 2021 likely to turn into a bigger movement in 2022 is the idea of bringing database table formats to cloud data lakes.
"Data lakes are rising to prominence, and structured data is transitioning to new formats," said Haoyuan Li, founder and CEO of data orchestration vendor Alluxio. "In 2022, open source projects like Apache Iceberg or Apache Hudi will replace more traditional Hive warehouses in cloud-native environments, enabling Presto and Spark workloads running more efficiently on a large scale."
Technology convergence -- data lakehouses and hydroanalytic data platforms
Table format technology for data lakes is helping to enable the further convergence of data warehouses with data lakes.
Matt Aslett, analyst at Ventana Research, said that in 2022 he expects to see the continued convergence of data warehouse, data lake and data streaming technologies to create analytic data platforms enabling organizations to collect and analyze all types of operations-generated information.
"This is driving the evolution of what we are calling hydroanalytic data platforms, which apply structured data management and processing functionality previously found in data warehouses to data stored in low-cost cloud data lakes," Aslett said.
The concept of the data lakehouse, which was first created by Databricks, is one such form of hydroanalytic data platform.
Overall, while 2022 is likely to bring continued convergence of data technologies in the cloud, convergence alone is not the sole answer for all the challenges of data. Organizations will also need to define what data quality means to them, wherever data exists, as the number of data sources proliferates.
Enterprise Strategy Group is a division of TechTarget.