December 01, 2017
Born at Cloudera, the MPP query engine known as Apache Impala has become a top-level open source project. It's one of various tools bringing SQL-style interactivity to big data analytics.
May 03, 2017
Datos IO now offers data protection and recovery for Microsoft SQL in the cloud. The RecoverX 2.0 software also supports Cloudera and Hortonworks platforms.
January 24, 2017
Fuzzy Logix (like fuzzy logic, but with an X, get it?) has announced availability of its analytics suite DB Lytix on the Cloudera Enterprise 5 data platform. So what is this? Essentially it's a ...
June 07, 2016
Privacy campaigners have reacted with alarm to the news that the Home Office is experimenting with Hadoop to bring together its data sets. Is the open source technology fit for government IT purpose?
Cloudera Get Started
Bring yourself up to speed with our introductory content
Because big data can scale to petabytes of capacity, organizations are looking to manage it in ways that are easier and less expensive than traditional scale-out NAS. Object storage and software-defined storage are frequently mentioned as big data tools. Both can add intelligence required for analyzing data and take advantage of low-cost disk storage.
An object storage system handles files differently than a traditional file system. Servers use unique identifiers to find objects, which use metadata in a far more detailed way than file systems do. The unique identifiers mean objects can be geographically dispersed because they can be retrieved without the storage system knowing their physical location. That makes objects a good choice for large data stores or data stored in a cloud.
Software-defined storage has many forms and use cases, but it applies to big data when used to pool and manage data across off-the-shelf commodity hardware. Because the management and analytics happen in software appliances, the hardware can be cheap, deep disk without bells and whistles.
Perhaps the most well known option available is the Apache Hadoop Distributed File System (HDFS), which is a Java-based file system designed to be used in Hadoop clusters. HDFS currently scales to 200 petabytes and can support single Hadoop clusters of 4,000 nodes. It offers storage performance on a large scale and at a low cost, which is atypical of most enterprise arrays that cannot perform all three tasks simultaneously.
In this chapter of "Tools to Tackle Big Data Troubles," we look at some core HDFS features, three HDFS commercial distributions and other Hadoop storage-related tools and their related applications.Continue Reading
Evaluate Cloudera Vendors & Products
Weigh the pros and cons of technologies, products and projects you are considering.
Learn about the features and components of the Microsoft R family of predictive analytics tools, which includes Microsoft R Open, Microsoft R Client and Microsoft R Server. Continue Reading
Be it open source or commercial technology, software designed to ensure proper data governance in Hadoop data lakes is proliferating. But like many big data systems, these tools are still maturing. Continue Reading
The Hadoop ecosystem is both a horn of plenty and a grab bag of data technology. This podcast sorts through some recent news of streaming technologies and Hadoop in the cloud. Continue Reading
Learn to apply best practices and optimize your operations.
These three commercial distributions of Hadoop are alternative options for big data storage that can bypass data protection and performance problems common with HDFS. Continue Reading
Big data initiatives can help companies improve operational efficiency, create new revenue and gain a competitive advantage. But traditional data processing often can't deal with the mountains of structured, semi-structured and unstructured data that needs to be mined for value. That leaves big data initiatives hungry for new tools and technologies to ease and speed data processing and predictive analytics functions.
In this e-book, get insight on useful tools for big data projects. The first chapter provides real-world examples of organizations using SQL-on-Hadoop engines to simplify the process of querying and analyzing Hadoop data. The second defines Spark -- including its capabilities and limitations -- and offers advice on deploying, managing and using the big data processing engine. And the third chapter focuses on using the open source R analytical programming language and commercial tools such as SAS and IBM SPSS to run analytical applications against Hadoop data sets.Continue Reading
Oracle Big Data Discovery may be what organizations need to address big data challenges, but the all-in-one product is still in its infancy. Continue Reading