Access your Pro+ Content below.
Hadoop alternatives now offer data center-grade storage
Sponsored by SearchStorage.com
This chapter is included in the Tools to tackle big data problems E-Book.
Data is growing at record rates with no signs of slowing. But what good is having petabytes of data if you can't gain business advantage from it? Accurate analysis of data can have great positive business results, but requires the right tools and techniques. Effective data analytics requires having strategies for storing and managing large volumes of structured and unstructured data and a method of analyzing it to unlock business data.
Data lakes are strongly associated with Hadoop and use the open source software as a replacement for traditional data warehouses. Hadoop clusters are based on commodity hardware and can hold structured, unstructured and semi-structured data. This makes Hadoop a good choice for log files, web clickstreams, sensor data, social media posts and other types of applications that produce big data. Until recently, Hadoop alternatives were few and far between.
Still, Hadoop implementations that are not well planned can produce data swamps instead of lakes. Hadoop was not developed to run on shared storage, and storage vendors must tweak their arrays to support the Hadoop Distributed File System, fostering the rise of Hadoop alternatives. Also, Hadoop does not have data governance built in as many data warehouse tools do, allowing Hadoop alternatives to bridge the gap.