Inside the MapR Hadoop distribution for managing big data

The MapR Hadoop distribution replaces HDFS with its proprietary file system, MapR-FS, which is designed to provide more efficient management of data, reliability and ease of use.

The MapR Converged Data Platform supports big data storage and processing through the Apache collection of Hadoop products, as well as its added-value components. These components from MapR Technologies provide several enterprise-grade proprietary tools to better manage and ensure the resiliency and reliability of data in the Hadoop cluster.

These platform components include MapR File System (MapR-FS); MapReduce; and MapR Control System, the product's user interface. The MapR Hadoop distribution includes a complete implementation of the Hadoop APIs, enabling the product to be fully compatible with the Hadoop ecosystem.

MapR-FS is written in C++ -- versus Apache HDFS, which is written in Java -- and serves as the company's proprietary implementation of Hadoop Distributed File System. Unlike HDFS, which follows the write-once-read-many paradigm, MapR-FS is a fully read/write Portable Operating System Interface-compliant file system.

By supporting industry-standard NFS, users can easily mount a MapR cluster and execute any file-based application directly on the data residing in the cluster. This enables data from nearly any source to be processed and allows for standard tools to be used to directly access data in the cluster without any modifications.

Additionally, unlike other Hadoop distributions, MapR can process distributed files, database tables and event streams all in the same cluster of nodes. This lets organizations run operational tools such as Apache HBase and analytic tools such as Hive or Impala on one cluster, reducing hardware and operational costs.

The latest version of MapR, 5.1., also includes MapR Streams, an event streaming system for big data. This platform is designed to support highly scalable real-time streaming of big data from producers to consumers on their converged platform. MapR claims it's the only big data streaming system to support global event replication at Internet of Things scale and reliability.

Other features of MapR's Converged Data Platform include:

  • MapR Snapshots that offer improved data protection by capturing point-in-time snapshots for both files and tables on demand, as well as at regularly scheduled intervals. 
  • Encryption of data transmitted to, from and within a cluster, as well as strong authorization mechanisms that are designed to improve data security while enabling administrators to have better control over what actions individual users are authorized to perform.
  • Out-of-the box, easily configurable mirroring capability that supports disaster recovery.

MapR Hadoop distribution editions

MapR offers Converged Community Edition, an unlimited free-to-use version, and Converged Enterprise Edition, a subscription-based version intended for organizations with business continuity requirements. The Enterprise version includes advanced multi-tenancy capabilities, consistent snapshots, high availability and disaster recovery features, as well as 24/7 commercial support and support for other modules and engines.

The MapR Hadoop distribution offers several training options, including free online, on-demand training as well as instructor-led for-fee training and certifications.

The products can be downloaded and installed on a local server using the GUI installer. The Community Edition is free to use and the Enterprise Edition can be downloaded and used for a 30-day trial period.

The distribution provides a sandbox version that's a self-contained virtual machine, which includes tutorials and demo applications, enabling users to get started quickly with Hadoop and Spark.

MapR in the Cloud provides users the ability to deploy in cloud environments, including Azure, Google Cloud Platform and on Amazon Web Services.

The MapR Hadoop distribution also provides several quick-start solutions, which include prebuilt, templated environments that support use-case scenarios, including self-service data exploration, real-time security log analytics, time series analytics, genome sequencing, data warehouse optimization and analytics, and a recommendation engine.

MapR runs on several versions of Linux, including Red Hat, CentOS, SUSE and Ubuntu. Hardware requirements include 64-bit CPU and 4 GB minimum of memory -- additional memory is required for production environments.

MapR Hadoop licensing and support  

To use MapR products, users are required to agree to the terms of the company's end-user license.  

While all users can access a variety of online resource material, Premium Support adds Web and email support and a custom portal. It also provides training, urgent bug fixes, follow-the-sun support and 24/7 phone support for priority 1 issues.

Premium+ Support adds priority queuing of tickets, single-point-of-contact support and options for on-site or remote dedicated support. Contact MapR for support pricing.

Next Steps

Learn which features Hadoop vendors provide

Is Apache Spark a Hadoop companion or competitor?

How one company built a Hadoop cluster

Dig Deeper on Data management strategies

Business Analytics
Content Management