Sergey Nivens - Fotolia


Storing predictive analytics data can be tricky: Avoid these 5 errors

Looking into the future is getting easier with predictive analytics, but only when your storage resources are able to meet the demanding needs of the data those tools generate.

As predictive analytics technology evolves, its accuracy and reliability is steadily increasing. Humans, however, are as error-prone as ever. This can lead to problems when storing predictive analytics data.

There are numerous benefits to using predictive analytics, and the technology is gaining traction across the enterprise. However, its use also means huge amounts of data are being amassed and must be stored. Data storage vendors are increasingly tailoring their products for data generated by predictive analytics tools, but because most storage decisions are still made by people within the organization, this doesn't totally negate the possibility for mistakes.

If you're concerned about the potential for errors, there are steps you can take to avoid them. Below is a rundown of the top five predictive analytics storage traps managers fall into and tips for staying out of them.

1. Failing to create a data infrastructure and strategy that fully utilizes cloud technologies

Organizations that fail to use cloud resources to their full potential are generally unable to unite disparate data sets for analysis in real time. In marketing, for instance, this failure diminishes the capacity to drive customer growth and acquisition.

"Not having a data infrastructure and strategy that leverages cloud technologies is the biggest mistake organizations are making in storing predictive analytics data," said Todd Paris, managing director and Google Marketing Platform Alliance leader for the Deloitte Digital division of Deloitte Consulting. "The value of predictive analytics data is best realized by having connected data sets in the cloud that enable a holistic view of the customer."

Paris said he believes organizations should invest in a data infrastructure and strategy that uses cloud technologies to enable a safe, secure and agile environment to derive actionable insights.

"Data ownership is critical in the current landscape," he said. "With the growing regulations around data and privacy, organizations must have a strategy to harness their own insights to drive growth, personalization and develop predictive analytics to better know and market to their customers."

2. Prematurely implementing data management tools without first establishing a data storage strategy

When organizations with predictive analytics storage needs leap before planning, data is likely to be incorrectly stored in multiple locations across the organization where it can become misaligned, duplicated or decayed.

"This 'bad data' leads to chaos, confusion and mistrust of the analytical process and outcomes, and [it] ultimately negatively impacts the organization's bottom line," said Kim Kaluba, senior manager for data management solutions at analytics software provider SAS, based in Cary, N.C.

A better approach for storing predictive analytics data is developing a strategy that's grounded in the who, what, when, where, why and how of the data being saved.

"Organizations need to justify why the data is being stored," Kaluba said. "Data must have a purpose and a benefit to the analytical process."

Next, address where and how to store the data. A data governance plan will ensure the correct values, accurate definitions and precise time intervals are being applied to the stored data, Kaluba said.

Finally, establish a single open repository for all data rules, definitions, accessibility factors and ownership.

"The repository provides transparency into both the data and analytical process, which leads to trust and moves the organization into the elite status of being a 'data-driven' organization," Kaluba said.

3. Underestimating performance requirements

A better approach for storing predictive analytics data is developing a strategy that's grounded in the who, what, when, where, why and how of the data being saved.

Traditional unstructured storage systems were often designed under the assumption that only a small percentage of file data would be active at any given time. When using predictive analytics, workloads increase not only the amount of data accessed and the performance requirements, but they also fuel the creation and storage of even more data, further burdening the system. This is also true for machine learning and deep learning.

Select an architecture designed to deliver high performance and scale, while controlling infrastructure costs, said Scott Sinclair, a senior analyst at Enterprise Strategy Group, a technology research and advisory firm in Milford, Mass.

"Often, but not always, these solutions feature scale-out architectures with the ability to quickly integrate new hardware as it becomes available," he said. "Look for solutions that offer high metadata performance with a rich set of metadata tools and APIs to support your organization's data security and governance requirements, while also supporting your analytics team's need to quickly locate the right data."

4. Assuming predictive analytics data can be stored in exactly the same way as source data

It's important to recognize that the source systems from which data is drawn to power predictive analytics are typically designed for operational business purposes, said Clark Richey, CTO of data analytics company FactGem, based in Columbus, Ohio.

"Now that we are repurposing this data for predictive analytics, we have to consider how we want to analyze the data and use a data storage mechanism that fits with our analytic approach," he said.

Begin by framing the analytics problem. Specifically, what is the data model that most closely matches business requirements and meets the needs of predictive analysis?

"Once you have this model, determine what data storage mechanism fits the model and best facilitates the analytics you need to perform," Richey explained. "Then, go out and harvest the source data and map that into the storage system you have identified."

5. Failing to properly secure stored data

If you're playing fast and loose with analytics data, the hammer is going to come down hard on you, warned Zohar Pinhasi, founder and CEO of MonsterCloud, a managed security services provider based in Hollywood, Fla.

"It's imperative that you know in advance how you'll store this information securely," he said. "Failure to do so could do irreparable damage to your brand, but also could land you on the receiving end of a hefty fine, thanks to the GDPR regulation in the EU."

If you want to keep your data generated from predictive analytics protected, always encrypt it.

"Any predictive analytics data being stored needs to be protected, and, furthermore, you should limit access to only essential employees," Pinhasi said.

Dig Deeper on Storage management and analytics

Disaster Recovery
Data Backup
Data Center
and ESG