Access your Pro+ Content below.
Tools to tackle big data problems
Storage for big data often consists of scale-out NAS or object storage, and many look to commodity hardware as a cost-effective way of capturing petabytes of information. One of the most challenging big data problems is that big data storage systems must perform well enough to enable real-time analysis. Big data analytics often requires processes and people with specific skill sets, but there are software tools for analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis.
Because big data can scale to petabytes of capacity, organizations are looking for ways to manage it all that is easier and less expensive than traditional scale-out NAS. Object storage and software-defined storage are frequently mentioned as tools that can help remedy big data problems. Both can add intelligence required for analyzing data and take advantage of low-cost disk storage.
Data lakes can help manage those big data problems, but here is what you need to know before making the leap. Data lakes are strongly associated with Hadoop, and use the open source software as a replacement for traditional data warehouses. Hadoop clusters are based on commodity hardware and can hold structured, unstructured and semi-structured data. This makes Hadoop a good choice for log files, Web clickstreams, sensor data, social media posts and other types of applications that produce big data, but there are drawbacks to keep in mind.
CHAPTERS AVAILABLE FOR FREE ACCESS
Data is growing at record rates with no signs of slowing. But what good is having petabytes of data if you can't gain business advantage from it? Accurate analysis of data can have great positive business results, but requires the right tools and techniques. Effective data analytics requires having strategies for storing and managing large volumes of structured and unstructured data and a method of analyzing it to unlock business data.
Data lakes are strongly associated with Hadoop and use the open source software as a replacement for traditional data warehouses. Hadoop clusters are based on commodity hardware and can hold structured, unstructured and semi-structured data. This makes Hadoop a good choice for log files, web clickstreams, sensor data, social media posts and other types of applications that produce big data. Until recently, Hadoop alternatives were few and far between.
Still, Hadoop implementations that are not well planned can produce data swamps instead of lakes. Hadoop was not developed to run on shared storage, and storage vendors must tweak their arrays to support the Hadoop Distributed File System, fostering the rise of Hadoop alternatives. Also, Hadoop does not have data governance built in as many data warehouse tools do, allowing Hadoop alternatives to bridge the gap.Download