Cloudera News

View All News

Cloudera Get Started

Bring yourself up to speed with our introductory content

  • Hadoop Distributed File System options for big data

    Because big data can scale to petabytes of capacity, organizations are looking to manage it in ways that are easier and less expensive than traditional scale-out NAS. Object storage and software-defined storage are frequently mentioned as big data tools. Both can add intelligence required for analyzing data and take advantage of low-cost disk storage.

    An object storage system handles files differently than a traditional file system. Servers use unique identifiers to find objects, which use metadata in a far more detailed way than file systems do. The unique identifiers mean objects can be geographically dispersed because they can be retrieved without the storage system knowing their physical location. That makes objects a good choice for large data stores or data stored in a cloud.

    Software-defined storage has many forms and use cases, but it applies to big data when used to pool and manage data across off-the-shelf commodity hardware. Because the management and analytics happen in software appliances, the hardware can be cheap, deep disk without bells and whistles.

    Perhaps the most well known option available is the Apache Hadoop Distributed File System (HDFS), which is a Java-based file system designed to be used in Hadoop clusters. HDFS currently scales to 200 petabytes and can support single Hadoop clusters of 4,000 nodes. It offers storage performance on a large scale and at a low cost, which is atypical of most enterprise arrays that cannot perform all three tasks simultaneously.

    In this chapter of "Tools to Tackle Big Data Troubles," we look at some core HDFS features, three HDFS commercial distributions and other Hadoop storage-related tools and their related applications.

     Continue Reading

Evaluate Cloudera Vendors & Products

Weigh the pros and cons of technologies, products and projects you are considering.

View All Evaluate

Manage Cloudera

Learn to apply best practices and optimize your operations.

  • Commercial Hadoop distributors bring HDFS improvements

    These three commercial distributions of Hadoop are alternative options for big data storage that can bypass data protection and performance problems common with HDFS. Continue Reading

  • Big data initiatives get huge boost from new technologies

    Big data initiatives can help companies improve operational efficiency, create new revenue and gain a competitive advantage. But traditional data processing often can't deal with the mountains of structured, semi-structured and unstructured data that needs to be mined for value. That leaves big data initiatives hungry for new tools and technologies to ease and speed data processing and predictive analytics functions.

    In this e-book, get insight on useful tools for big data projects. The first chapter provides real-world examples of organizations using SQL-on-Hadoop engines to simplify the process of querying and analyzing Hadoop data. The second defines Spark -- including its capabilities and limitations -- and offers advice on deploying, managing and using the big data processing engine. And the third chapter focuses on using the open source R analytical programming language and commercial tools such as SAS and IBM SPSS to run analytical applications against Hadoop data sets.

     Continue Reading

  • Will Oracle Big Data Discovery meet expectations?

    Oracle Big Data Discovery may be what organizations need to address big data challenges, but the all-in-one product is still in its infancy. Continue Reading

View All Manage