This content is part of the Essential Guide: Effective data management to combat the coming data apocalypse

10 critical multi-cloud and edge computing storage questions

Combining edge-to-cloud computing with data storage is riddled with complexities. Luckily, there are steps to take that can help avoid a storage management disaster.

Managing enterprise storage can be a complicated and resource-intensive process. Even more challenging is implementing a multi-cloud environment, which can increase the complexity tenfold. If edge computing storage is added to the mix, management can turn into an IT nightmare, with data coming in from all directions and spread across multiple platforms and geographic locations.

Before diving into the multi-cloud and edge storage fray, CIOs and other IT decision-makers should ask themselves a series of important questions. Since data management is at the core of storage, data collection, transfer and retention are just some of the many aspects to consider.

1. Where will the data be generated and collected?

In a multi-cloud/edge computing storage environment, data can be generated by users, applications or devices and originate from desktops, laptops, smartphones, IoT monitors or other systems. In some cases, the data is collected in proximity of where's it's generated, but often it is sent elsewhere. For example, sales reps might use a mobile app to submit orders to a web application hosted on a cloud service, where the data is also collected and stored. Compare this to a manufacturing plant full of IoT sensors that send their data to a nearby edge system to be temporarily stored and analyzed in real time.

2. What types of data will be generated and how much?

IT cannot plan storage for a multi-cloud/edge environment without knowing the types and amounts of data to expect. Will it be structured, semistructured or unstructured? Will it include video files, graphic files or word processing files? What about the amount of data? Teams must know how much data and what types of data will be coming in the short and long term, no matter where the data is generated or collected. They must also be able to account for any mechanisms that could affect storage, such as jumbo-size files or deduplication processes.

3. What data will be retained and for how long?

In many cases, data is collected with the expectation that it will need to be retained indefinitely or at least for a long time. In some situations, however, only part of the data will need to be retained or the data retained for only a short period of time. For example, raw IoT data collected on an edge system might be needed only long enough to perform the necessary analytics. Afterwards, the raw data can be discarded, while retaining the results of the analysis.

4. What data will be transferred and how much?

Not all collected data will necessarily need to be transferred to another platform. In some cases, only a subset of data must be moved or only data that has been aggregated, cleansed or transformed in some other way. It's even possible that no data will need to be moved. Much will depend on where data will be processed and analyzed after it collected. IT must be able to plan storage for both the collected data and transferred data, which requires a full understanding of the types and amounts of data to be moved, as well as what data will be retained or deleted.

5. When will the data be transferred and how often?

IT must also know when data will be transferred and how those transfers will be carried out. This gets into the basic scheduling of moving data from one platform to another. For example, data might need to be periodically copied from a data center to a cloud platform or moved from one cloud platform to another. Storage requirements can vary depending on the transfer schedule and the amounts of data involved. IT must also take into account whether data will be moving between platforms in a unidirectional or bidirectional fashion in case storage could be affected.

6. Where will the transferred data be stored and for how long?

In a multi-cloud/edge computing storage environment, data can move from an edge system to a cloud platform; a cloud platform to a data center; a data center to a cloud platform; or any combination of these three. IT must have a clear picture of the entire data flow, from each endpoint to each destination. Knowing what data will be transferred, how much data there is and when the data will be moved provides only part of that picture. IT must also understand where that data will be hosted and how long it will be there, whether on a cloud platform, in a private data center or somewhere else.

7. How will the data be managed and stored?

Data management includes both moving and storing data. Planners should identify what tools will be used to transfer data and the extent to which processes will be automated. They should also determine how the data movement will be orchestrated and how that the data will be stored. Various tools and processes can have both a direct and indirect effect on storage. For example, depending on the types of data transformations that are performed, staging areas might be needed to temporarily host data, perhaps requiring a substantial amount of storage. In addition, solutions such as database management systems and NoSQL data stores each have their own storage requirements.

8. How will disaster recovery be implemented and practiced?

To properly plan storage for the long term, IT needs to understand how edge integration will be managed and how changing conditions will be addressed across the entire multi-cloud/edge environment.

When planning storage for a multi-cloud environment, IT must consider how disaster recovery (DR) strategies will be implemented. Such processes as backing up data or maintaining redundant data usually translate into additional storage requirements. If teams must also incorporate edge systems into their environment, their DR strategies could become even more complicated, especially as the number of edge systems grows. What happens, for example, if one of those systems goes down? How does failover occur? Where are workloads redirected? Does each edge system need to have its own DR plan in place? All these variables can have an effect on how IT plans the storage necessary to support an effective DR strategy.

9. How will structural and workflow changes be incorporated?

Organizations implement multi-cloud strategies because they increase flexibility, while offering greater portability among heterogeneous environments. When edge computing storage systems are added to the mix, this flexibility and portability can be threatened unless IT is prepared to handle changes such as cloud services being added and removed, workloads being migrated between platforms or new edge systems being integrated into the structure. To properly plan storage for the long term, IT needs to understand how edge integration will be managed and how changing conditions will be addressed across the entire multi-cloud/edge environment.

10. What are the security and compliance requirements?

One of the challenges that comes with a multi-cloud environment is to ensure that data is secure and privacy is protected, regardless of where the data resides or how it is moved. At the same time, a multi-cloud storage environment can make it easier to meet compliance requirements because of its inherent flexibility. Edge computing storage, however, promises to complicate matters, no matter what the situation, especially when IoT devices are involved. IT must take every precaution possible to ensure that the data cannot be compromised and that the organization is meeting all its compliance requirements, no matter where that data resides or how it moves from one platform to the next.

Dig Deeper on Cloud storage

Disaster Recovery
Data Backup
Data Center
and ESG