Sergey Nivens - Fotolia

WANdisco launches automated Hadoop data migrator for AWS

WANdisco LiveData Migrator moves Hadoop data while avoiding downtime. Although niche, it solves the hurdle of migrating Hadoop data while it's going through active changes.

WANdisco wants to let organizations continue to work at full productivity even while a migration party is happening in the background.

This week, WANdisco launched LiveData Migrator, a tool that moves Hadoop data in a non-disruptive manner. The cloud-based service automates the entire migration project, moving HDFS data from any source to any Hadoop-compatible file systems.

WANdisco claims it's simple to use, without the need for engineers or consultants. The customer just needs to install a small, on-premises VM for LiveData Migrator to read the environment. The tool also ensures that any changes at the data source are replicated to the target environment throughout the migration, removing downtime for the duration of the project.

LiveData Migrator's non-disruptive migration capability is powered by WANdisco's patented consensus algorithm. End users can continue to work on data that is in the middle of migration, and the data is immediately available at the target environment as soon as it lands, before the entire data set arrives. WANdisco's technology ensures all changes midflight are captured and make it to the target.

Hadoop is an open source distributed processing framework for managing data for big data applications using scalable clusters. It supports workloads such as predictive analytics, data mining and machine learning. By its nature, enterprises that use the framework tend to have massive amounts of data in Hadoop clusters.

For organizations that have all that Hadoop data sitting on-premises, such as GoDaddy, migrating it to the cloud can be challenging, said WANdisco CEO David Richards. GoDaddy, an internet domain registrar and web hosting company, as well as a WANdisco customer, was stuck on-premises because its Hadoop data went through millions of changes per second. It couldn't afford to take anything offline to perform a migration.

"Moving the data wasn't the challenge; it's capturing the new and changing data. They can't pause for a month," Richards said.

Richards said GoDaddy was an example of the type of customer LiveData Migrator is designed for: those that need to move massive amounts of data from on-premises to cloud without downtime. He described LiveData Migrator as removing a roadblock to digital transformation for these customers. Although the most common use case will be Hadoop-to-cloud, which includes AWS, Google Cloud, Microsoft Azure and Alibaba, the tool supports Hadoop-to-Hadoop and cloud-to-cloud migrations as well.

LiveData Migrator is sold on a freemium model, where the first 5 TB of data moved is free. WANdisco doesn't publicly post what the cost is after that, but Richards said it can be as low as 13 cents per GB. As for migration speed, Richards said it depends on factors such as network connection and changes per second at the source cluster.

WANdisco LiveData Migrator is not unique in providing non-disruptive data migration. Komprise and StrongBox StrongLink have the capability built into their intelligent data management platforms, along with other features such as cost prediction and archiving.

We're going to see more migration, not less.
Merv AdrianResearch vice president, Gartner

LiveData Migrator's uniqueness instead lies in how it packages Hadoop data migration as a fully hands-off service, said Merv Adrian, research vice president of data and analytics at Gartner. For most organizations, moving Hadoop data to the cloud is going to be a one-time move, so they're not going to want to pay for a subscription tool to enable it. Moving data under active change is delicate, and those same organizations don't want to use their best IT people on what is ultimately a one-and-done project. This is LiveData Migrator's strength, Adrian said. It handles everything in the background and doesn't require expertise from the customer.

"It's as close to a silver bullet as you can find for this type of project," Adrian said.

Despite the one-and-done nature of the Hadoop data migration use case, Adrian said WANdisco LiveData Migrator will likely see plenty of use over the next few years. CFOs see the depreciating value of their hardware and realize they can save money by moving to the cloud. According to his research through Gartner, database platform-as-a-service jumped from 7% to about 33% in the last four years. Although this growth is slowing down, there's still plenty of wave to ride as companies embrace the cloud more, he said.

Even in the face of COVID-19, which has put some IT initiatives on hold for many organizations, Adrian doesn't expect data migration to slow down. In times of economic uncertainty, companies want to shift from capex to opex. They want predictable costs and immediate financial results. At the same time, they want their best, smartest people working on money-making projects rather than simply keeping the lights on, Adrian said.

"We're going to see more migration, not less. In economic times like these, historically, projects with immediate payback move to the front of the queue, and migration is one of those," Adrian said.

Next Steps

Next Pathway paves the way for Hadoop migration

WANdisco introduces Hive metadata migration to Databricks

Dig Deeper on Storage architecture and strategy

Disaster Recovery
Data Backup
Data Center
Sustainability and ESG