michelangelus - Fotolia

Microsoft boosts Azure big data strategy with Hadoop distro

Microsoft Azure's HDInsight has a new, company-backed distribution of Hadoop, which should reassure customers interested in hybrid big data deployments.

Microsoft has made two significant moves behind its Azure big data strategy, including a new, company-backed distribution of the open source Hadoop framework.

Hadoop, along with the Kafka messaging framework and Spark analytics engine, is central to HDInsight, the Azure big data service Microsoft released in 2013.

The move is good for customers because Microsoft will be able to offer native integrations with more Azure services, such as the Cosmos DB distributed database and its Synapse analytics platform, the company said in a blog post. In addition, Microsoft can deliver better support and faster upgrades with its own distribution, according to the blog.

No pricing changes will occur as part of the transition, and Microsoft's new distribution of Hadoop and Spark is 100% open source and compatible with the main line's latest version, the company said.

Customers who use HDInsight in conjunction with distributions of Hortonworks software should find comfort in Microsoft's Azure big data news, said Doug Henschen, an analyst at Constellation Research. HDInsight was originally based on the Hortonworks Data Platform.

In late 2018, Hortonworks merged with rival Hadoop vendor Cloudera. Today, the company is focused more on the Cloudera Data Platform, a cloud-based service available on Azure and other public clouds.

Microsoft, like rivals Google and AWS, offers an array of big data services.

With the new distribution, "Microsoft is essentially offering belt-and-suspenders assurance that there will be an on-premises alternative for Azure customers interested in a hybrid complement to the HDInsight service," Henschen said. "It's also investing in the future of HDInsight and offering tighter integration, whether in the cloud or on premises."

Microsoft is essentially offering belt-and-suspenders assurance that there will be an on-premises alternative for Azure customers interested in a hybrid complement to the HDInsight service.
Doug HenschenAnalyst, Constellation Research

HDInsight competes with the likes of Google Cloud Dataproc and AWS Elastic MapReduce. The market for Hadoop-based services in general has been in flux, as newer alternatives for wrangling big data emerged. But there remains a large on-premises Hadoop installed base, which drives continued market need for hybrid deployment options.

Azure database picture deepens

In other Azure big data news, Microsoft said Azure Database for PostgreSQL Hyperscale is now generally available. The service is based on technology from Citus Data, a startup that Microsoft acquired in January 2019.

Citus' technology builds on top of the PostgreSQL open source database to make it a distributed store. The company's idea wasn't novel, given the earlier advances of vendors such as Aster Data Systems and Greenplum, but PostgreSQL is popular enough that it's available natively today on most major cloud platforms.

One exception to that trend is Oracle Cloud Infrastructure (OCI), since a common use case for PostgreSQL is Oracle database compatibility and workload migration.

The availability of a high-scale, Oracle-compatible, native Azure database service such as PostgreSQL Hyperscale may add a new dynamic to Oracle and Microsoft's partnership on interoperability between Azure and OCI. The companies have pushed the idea that joint customers want to tap into Azure services while also tying back to Oracle Autonomous Database instances running inside OCI.

"Microsoft is clearly offering this service as a scalable, open source alternative to popular databases like Oracle," Henschen said.

Dig Deeper on Cloud app development and management

Data Center