Fully benefiting from a cloud data lake approach is a process that involves moving data and ensuring data quality.
For global payment and money transfer services vendor Western Union, data quality is a core element of its cloud data lake efforts.
Over the last two years, Western Union has embarked on an effort to consolidate its data warehouses in the cloud.
With locations and customers around the world, Western Union has amassed a large volume of data that it uses to improve its own business. Helping to lead Western Union's data efforts is the company's chief data officer, Thomas Mazzaferro.
"We've done a lot of consolidation throughout the last 18 months," Mazzaferro said, noting that the company has migrated more than 20 petabytes and now has more than 90% of its data in the cloud.
Western Union's path to data quality
Mazzaferro explained that the company consolidated multiple data warehouses onto a single data lake. Western Union is using AWS as the data lake, with Snowflake to enable its cloud data architecture.
Helping to move data around has been a task for which Mazzaferro and his team uses Talend and its suite of data tools for data ingestion, extract, transform and load, as well as data quality.
Mazzaferro noted that because Western Union operates internationally, it needs to be able to understand data quality wherever the data resides.
With Talend, Mazzaferro said Western Union has the ability to bring data metrics to a centralized location to visualize the data quality results.
Thomas MazzaferroChief data officer, Western Union
"Talend enables data to be visible because if you don't have data in the right place, you can't visualize it properly," Mazzaferro said. "Talend really helps to streamline and optimize our processes and our capabilities to support our customers."
Defining data quality
For Mazzaferro, the first component of an effective data quality strategy is the ability to actually measure data. With that ability comes a need for metrics to understand the data being used by the organization's processes and applications.
Beyond the ability to measure data, an effective data quality strategy also involves accountability. Mazzeferro emphasized that it's important to be able to align accountability and ownership of data with people who are accountable for a given set of information or processes.
Finally, when problems arise with data quality, he said it's important to put together a plan designed by both business and technology teams to fix the process or improve the overall data flow.
Data quality is not a measurement that Western Union conducts in real time.
Mazzeferro said his team doesn't want to slow down data flows in production or take any action that could decrease real-time performance for users.
That said, he noted that Western Union looks at data quality in what he referred to as "near real time," which could be minutes or hours after data moves. According to Mazzeferro, this enables the company to fix potential data quality problems quickly.
"Our focus for the new year is to scale, expand, modernize and improve our business policies through data-driven insights and through data quality," he said.