Most organizations have sprawling, hard-to-manage data footprints. The management challenges come from disconnected data sources, inconsistent or nonexistent labeling and metadata guidelines, and local control over data access policies. These problems are exacerbated once organizations move data to multiple cloud services and multi-cloud data management is required.
In its recent survey on data management trends, CompTIA found a significant increase in cloud usage in the last five years. Companies reported the amount of data residing on cloud services has increased from 35% in 2015 to 52% in 2020.
Managing data in the cloud becomes more difficult as organizations spread applications and data across multiple providers and cloud regions. A recent Gartner survey found that 80% of respondents use more than one cloud service provider.
Multi-cloud data management challenges
Multi-cloud environments create many challenges to effectively manage an organization's information assets, including:
- data integration across disparate platforms;
- application access to repositories and databases in different cloud environments, including the need to run cloud-based data analytics workloads that can pull data from sources both on premises and on multiple cloud services; and
- comprehensive data governance, including the consistent application of data access policies, user security controls, metadata and data quality standards across clouds.
A TDWI survey of business and technical managers responsible for enterprise data environments validated these issues. When respondents were asked to rank the most barriers to cloud data management, the most frequent factors were privacy and security, particularly for personally identifiable information; governance, regulatory compliance and usage tracking or monitoring; and migration, integration and adaptation to new cloud services.
The list of cloud management challenges can also serve as criteria for features and capabilities when evaluating cloud data management products and services.
Core multi-cloud data management requirements and features
Like every emergent IT product category, data management across multiple clouds is too immature and dynamic a product sector to have an agreed-upon definition or feature set. Many vendors offer evolutionary extensions of traditional data management products for backup, archiving and indexing. These products aren't designed from scratch for multi-cloud data management.
Undeterred by the categorical ambiguity, we offer a list of foundational multi-cloud data management features. Look for products and services with the following five capabilities:
- Have a software-defined architecture that splits the centralized management control plane from the data footprint on multiple cloud platforms and services.
- Support standard access protocols for networked storage, databases, cloud services and disk volumes.
- Support consistent data access, usage, security, performance and tiering policies across clouds.
- Provide consolidated monitoring with customized reports and visualizations on content, resource usage, user access and regulatory compliance.
- Include a task automation engine that supports multiple programmatic interfaces.
More specifically, you'll want a product that facilitates and automates data portability between cloud environments. It should support task automation via a domain-specific language, command-line interface or API and facilitate integration with external process toolchains, such as continuous integration/continuous delivery and infrastructure as code.
It also should facilitate storage optimization by analyzing parameters such as data use, movement, classifications, metadata and storage efficiency to make placement decisions that improve performance and lower costs.
You'll want your data management tool to provide centralized monitoring and visibility of the data footprint, use and uniqueness. The monitoring tool should enable multiple analytical views of the storage footprint, reporting on system usage by storage tier, cloud environment, cloud storage service and local storage systems. It ought to include a graphical interface with dashboards and custom visualizations of various metrics for parameters, such as resource use and trends, data categories and metadata taxonomy. In terms of uniqueness, it should be able to distinguish between redundant copies and sole originals.
It should have a central management system and control plane for defining and enforcing consistent security policies across multiple environments. It should support enforcement of data sovereignty regulations, such as the ability to restrict data subsets to specific cloud regions. It should also provide policy templates that comply with other data protection and management regulations.
The product or service you choose should facilitate high availability and redundancy for the control plane and selected data subsets or categories. It should have modules for disaster recovery, backup and archiving of data, and it should support localized replicas on selected cloud regions for low-latency access and guarantee data consistency between replicas. You'll also want to ensure it offers multiple storage classes or tiers, such as high availability, high performance, and warm, cool and deep archive across multiple cloud services.
Keep it simple
IT managers must carefully consider their organization's needs and goals before evaluating multi-cloud data management products and services. Although our feature list is extensive, many organizations won't need them all and can make compromises.
In sum, you'll want to keep it simple: Don't make the solution more complicated than the problem, which translates into the following advice when buying a multi-cloud data management product or service:
- Don't buy features you don't need. For example, organizations that want a multi-cloud system for unstructured data might be better off with a cloud-capable NAS system that can replicate data between edge filer appliances and cloud file services. Similarly, if backup and archiving software has become central to your data management process, look for products that support multiple clouds and provide centralized management of data storage, use and security policies.
- Make sure the product's functionality can grow as requirements change. Some products specialize in providing visibility of data storage, use and classifications across multiple environments. However, even if that's your primary short-term concern, consider future needs, such as whether the system can be expanded to provide storage optimization, replication and policy enforcement.
- Make data management part of a broader strategy to limit the risk of cloud lock-in. Look for products that minimize the friction in moving data between on-premises and cloud infrastructure and prefer those with the broadest support for cloud infrastructure. Even though AWS might be your only cloud provider, consider how a data management product for multiple clouds can facilitate expansion to other environments that are better suited to a particular workload or use case.
- Evaluate data management holistically. Although cloud backup and DR are currently the most common use case, look for products that address application workloads with support for primary storage volumes, database replication and NAS filers because visibility and central manageability are critical capabilities once workloads spread from on premises to one or more cloud environments.