Copy data management describes the management of information other than what is stored in primary systems. Copy data can be used for data protection purposes, such as snapshots or backups, or to seed test and development environments for ongoing application development. Other uses include testing upgrades of existing applications and using the data for mining or analytics.
As we examine products from the leading copy data management vendors in the marketplace, we can see several distinct categories and implementation models. Some vendors sell physical appliances, including storage. Others sell virtual appliances, either alongside the physical appliance or as the main delivery mechanism. Some products also support public cloud, running instances of the software as a cloud virtual machine (VM).
Most of these products are from startups; however, the breadth of copy data management systems is widening as traditional vendors add more data management features to their platforms. Here, we dig deeper into the products from the six leading copy data management vendors.
Actifio Inc. uses a scale-out object store at the heart of its product. Actifio's Virtual Data Pipeline feature ingests data from both virtual and physical platforms. The Actifio Sky platform supports traditional applications such as Oracle, Microsoft SQL Server, Microsoft Exchange and SAP, with additional support for virtual environments, including VMware vSphere.
Sky ingests data at a block level, while still retaining application consistency. After an initial full copy of data is taken, subsequent ingests of incremental data are used to enable the creation of synthetic application images. These are made available through product features that enable instant application mounting, cloning or restoration.
Actifio Sky is delivered in two versions: Actifio Sky Basic and Actifio Sky Advanced. The Basic version provides features to take VM backups, replicate them to the public cloud, perform instant recoveries, and run as either a VM or physical appliance. Basic only covers data protection for databases running within VMware vSphere and Microsoft Hyper-V hypervisor environments, and only offers basic cloud instance protection. The Basic version also has no database awareness.
The Advanced version extends the capabilities of Basic with full support for databases and cloud database integration. Sky Advanced also provides physical backup, NAS backup and database cloning capabilities. It includes Sky DB, Actifio's scalable database cloning and recovery product. It supports the consistent capture of data from traditional databases, providing virtual copies for developers to use through self-service automation tools, including Jenkins, Chef, Puppet, GitHub, Ansible and SaltStack.
Actifio claims near-instant recovery times with 100 TB-plus databases. Sky DB is available as a virtual appliance in Amazon Machine Image format in Amazon Web Services (AWS), as an open virtualization application in VMware or in virtual hard drive format in Hyper-V, and it is also supported on Oracle Cloud and IBM Bluemix.
The Actifio Sky platform supports multiple public cloud environments, including AWS, Microsoft Azure, Oracle Cloud, IBM Bluemix and Google Cloud. In AWS, Actifio offers two products -- Actifio One for VM instance migration into AWS and Actifio Sky for AWS.
Catalogic Software offers a software-only-based copy data management system for managing copy orchestration and automation processes.
ECX, Catalogic's flagship product, provides in-place data management, which means it takes snapshots, replicas and clones of supported storage arrays, which currently include products from Pure Storage Inc., IBM, NetApp Inc. and Dell EMC (Unity). ECX takes advantage of existing storage and application APIs to ensure data is captured with application consistency, as opposed to crash copy consistency, which would be achieved with the storage platform alone.
ECX offers three main features.
Automated copy delivery. The production and use of copies is highly automated and can be used to quickly generate, for example, data for nonproduction purposes, such as test and development.
User self-service. ECX provides access to a user portal for end users to manage their own data copies within constraints established by the IT department.
Application awareness. ECX integrates with common applications to provide application-consistent copies. These applications include Oracle Recovery Manager, including support for Real Application Clusters and Automatic Storage Management; Microsoft SQL Server; and SAP HANA. This support is irrespective of the hardware platform, and it is managed in conjunction with the storage array.
Version 2.6 of ECX extends application support to SQL Server, SAP HANA, Epic electronic health records systems and InterSystems Cache, all running on physical hosts with shared storage. This makes the platform unique in supporting specific domain applications -- in this case, healthcare.
ECX can be deployed as a virtual appliance or within a Docker container.
Catalogic recently announced a partnership with Pure Storage that will use ECX code running within a Pure Storage array to provide tight integration for copy management services between the two platforms.
ECX is priced per controller and is available through channel organizations. A 30-day trial is available through the Catalogic website.
Using extensive research into the copy data management market, TechTarget editors focused on vendors that offer well-integrated and automated copy management systems. Our research included data from TechTarget surveys, as well as reports from other respected research firms, including Gartner.
Cohesity Inc. was founded in June 2013 by Mohit Aron, one of the original founders of Nutanix Inc. The company has taken in $160 million in funding, including $90 million in April 2017.
The underlying Cohesity technology is a scale-out file system that enables a high number of snapshot images and the ability to retrieve data from any image with no impact on performance. Cohesity's software, known as DataPlatform, is available in three forms -- as dedicated hardware appliance nodes and in two software-only versions.
There are currently three hardware nodes available -- a minimum of four per physical server block -- and scaling storage capacity from 6 TB per node of HDD and 800 GB of flash (C2100 model) to 24 TB per node of HDD and 1.6 TB of flash (C2500). CPU performance starts at dual Intel Xeon E5-2603 processors (six-core), scaling to eight E5-2630 eight-core processors. System memory scales from 64 GB to 256 GB.
Cohesity prices its hardware on a per node basis, with all software licenses included.
DataPlatform Virtual Edition enables customers to include Cohesity running on vSphere as part of a remote office/branch office (ROBO)-edge data protection environment. This includes the ability to replicate data into a core data center for additional off-site protection.
Cloud Edition is a cluster-deployed version of the Cohesity software that runs in the public cloud and provides all of the protection and data management benefits of applications running on premises. Data can be replicated between the core and the cloud for off-site protection and to enable features such as data migration.
Unlike products from the other copy data management vendors discussed here, DataPlatform supports a scale-out file system. This means customers can migrate existing file servers or filers to DataPlatform and benefit from a consolidated architecture.
DataPlatform 4.0, released in April 2017, introduced object storage capabilities using the Amazon Simple Storage Service API. It also has the ability to take NAS backups and protect existing file system data with write once, read many protection. Coverage of physical servers and storage has been extended using Pure Storage's FlashArray//M.
Physical and virtual appliances are priced on a per node basis. The company's DataProtect software is licensed by capacity.
Commvault Systems Inc. started out as a backup software provider, but has steadily increased the features of what it now calls its Data Platform. As with other products, Data Platform can provide recovery points to applications using full and incremental backups to deliver synthetic restores.
From a VM, Commvault Data Platform supports on-premises server virtualization and public cloud providers, including AWS and Microsoft Azure. These features enable customers to use the public cloud for application recovery and to run applications in the cloud as part of the migration process.
Looking at data protection, Commvault provides snapshot management for a wide range of storage vendors, enabling in-place management of data copies. Automation features enable repeated tasks to be turned into workflows, with many common tasks already preconfigured.
Moving to content management, Commvault Data Platform provides features for data retention and compliance, enabling enterprise search and e-discovery and email archiving. Commvault manages traditional databases, including Oracle, Microsoft SQL Server, IBM DB2 and MySQL, with support available for open source databases, including PostgreSQL and MongoDB.
Licensing is feature-based and dependent on the chosen components. These are grouped into three main categories: Solution Set, which provides entry-level backup and recovery features; Platform Licensing, which addresses a wider data management set of functions; and Capacity Licensing, which provides access to all the features based on capacity consumption.
The current release of the platform is simply called Commvault 11, as the company has rebranded its previous Simpana brand.
Time-limited trials and demos of Commvault are available through the company's website.
Delphix Corp.'s platform differs from the systems offered by the other copy data management vendors surveyed here in that it only supports the management of database images.
The Delphix engine is deployed as a software VM, with the 5.0 release supported on any x86 platform running either VMware vSphere or a kernel-based virtual machine. The Virtualization Engine currently supports native ingestion of data from Oracle, SQL Server, DB2, SAP, SAP ASE, PostgreSQL and MySQL databases.
The Virtualization Engine stores data on the Delphix file system, an implementation that maps data at a block level. This provides the capability to significantly reduce the amount of physical data stored in the system. Data is both deduplicated and compressed, resulting in a target 3:1 saving. Once an initial full copy has been stored, further application updates are stored incrementally, enabling synthetic, full copies of a database to be restored to any previous point in time.
Data is made available for restore or for application developers through a self-service interface. The Virtualization Engine implements role-based access controls that regulate what end users can and cannot access.
The Virtualization Engine also supports data replication to a second Virtualization Engine for backup and disaster recovery. This could be used as an alternative to vendor-specific replication tools and to provide a consistent interface for heterogeneous database environments.
Delphix provides the capability to obfuscate personal or other confidential data through the Delphix Data Masking Engine. This additional piece of software can be run as a VM or in the public cloud. Two masking scenarios are provided: Persistent masking ensures all the copies of application data are always masked, while flex masking enables both masked and unmasked copies to be made available to teams with the appropriate permissions.
A free trial of the Delphix software is available online through the Delphix community forum.
Founded in 2014, and with nearly $300 million in investments, Rubrik Inc. has quickly risen in the market to be one of the major providers of copy management systems.
The company initially sold a scale-out storage appliance that offered data protection facilities for virtual server environments. With the release of version 4.0 of the Rubrik platform -- code-named Alta -- in June 2017, the company has further extended its capabilities outside of the data center with public cloud and edge offerings.
Rubrik appliances offer five configurations of scale-out nodes, with the ability to mix and match node configurations as required. The entry model R334 appliance has three Intel eight-core Haswell processors, with 192 GB of dynamic RAM (DRAM) and 37.2 TB of hybrid storage (36 GB HDD, 1.2 TB SSD). At the high end, the R3410 model offers four eight-core Haswell processors, 256 GB of DRAM and 121.6 TB of storage capacity (120 TB HDD, 1.6 TB SSD).
Rubrik software is also available for Cisco Unified Computing System, with configurations supported on UCS C220 and UCS C240 systems. Each node type supports two levels of storage -- 96 or 120 TB for C220, or 384 or 480 TB for C240 -- with a minimum of four nodes required.
The latest Rubrik features expand the software to run on the public cloud and as a virtual appliance for remote and branch offices. The ROBO appliance enables users to do local restores, while feeding backups into the core data center, either as a way to provide recoverability for a single site or to implement off-site backups.
Cloud support enables users to restore VMs as instances into the public cloud; for example, to build out test and development environments.
Support for nonvirtualized environments has been extended with the introduction of functionality to take backup data from Oracle databases, in addition to the previously supported SQL. Alta also introduces hypervisor support for Microsoft Hyper-V and Nutanix Acropolis Hypervisor.
Rubrik is licensed per physical appliance or based on capacity for virtual appliance instances. Cloud deployments are licensed per cloud node, with a minimum of four nodes and an incremental cost per additional node.
Reduce data sprawl and storage costs with copy data management
Are there drawbacks to using copy data management?
How copy data management and backups can complement each other