February 22, 2018
In big data news, we find Google TPUs, or Tensor Processing Units, offered as a cloud service, while LinkedIn is open sourcing a Hadoop test simulator called Dynamometer.
January 03, 2018
Is this the post-Hadoop era? Not in the eyes of Hadoop 3.0 backers, who see the latest update to the big data framework succeeding in machine learning applications and cloud systems.
August 30, 2017
In this Talking Data podcast, TechTarget editors discuss Hadoop's future, IBM's decision to resell the Hortonworks distribution of the open source technology and other big data issues.
January 20, 2017
News roundup: A flawed Adobe extension was secretly installed on 30 million Chrome browsers. Plus, the Mirai author has been identified; Google releases security details; and more.
HDFS Get Started
Bring yourself up to speed with our introductory content
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. Continue Reading
Because big data can scale to petabytes of capacity, organizations are looking to manage it in ways that are easier and less expensive than traditional scale-out NAS. Object storage and software-defined storage are frequently mentioned as big data tools. Both can add intelligence required for analyzing data and take advantage of low-cost disk storage.
An object storage system handles files differently than a traditional file system. Servers use unique identifiers to find objects, which use metadata in a far more detailed way than file systems do. The unique identifiers mean objects can be geographically dispersed because they can be retrieved without the storage system knowing their physical location. That makes objects a good choice for large data stores or data stored in a cloud.
Software-defined storage has many forms and use cases, but it applies to big data when used to pool and manage data across off-the-shelf commodity hardware. Because the management and analytics happen in software appliances, the hardware can be cheap, deep disk without bells and whistles.
Perhaps the most well known option available is the Apache Hadoop Distributed File System (HDFS), which is a Java-based file system designed to be used in Hadoop clusters. HDFS currently scales to 200 petabytes and can support single Hadoop clusters of 4,000 nodes. It offers storage performance on a large scale and at a low cost, which is atypical of most enterprise arrays that cannot perform all three tasks simultaneously.
In this chapter of "Tools to Tackle Big Data Troubles," we look at some core HDFS features, three HDFS commercial distributions and other Hadoop storage-related tools and their related applications.Continue Reading
Capturing and capitalizing on vast amounts of data about customers and products can help a business adapt and even thrive. But implementing big data or IoT means creating or adjusting IT resources to handle the burden. With these emerging IT workload types, storage takes on a critically important role. But can a single storage system do the job? A business will need to determine the types of data its IoT and big data projects will collect. Gathering many tiny data files that arrive simultaneously, for instance, will not require the same type of storage that collecting fewer, larger files will. Object storage for big data and IoT may be the answer in certain situations. Other conditions might call for a network file system, Fibre Channel or other types of resources. Making the right decisions when it comes to storage will be an important factor in determining whether an IoT or big data initiative succeeds or fails. Continue Reading
Evaluate HDFS Vendors & Products
Weigh the pros and cons of technologies, products and projects you are considering.
As part of its Big Data Cloud Service, Oracle provides a set of internal and external tools designed to help users efficiently deploy and manage Hadoop-based big data systems. Continue Reading
Has the Hadoop elephant left the room? At NBC, ad analytics have evolved in Hadoop style, but with Spark and S3 at the core, as discussed at the Big Data Innovation Summit in Boston. Continue Reading
KNIME offers open source data analytics, reporting and integration tools, as well as commercial software that can help build more efficient workflows. Continue Reading
Learn to apply best practices and optimize your operations.
As the full force of the internet of things comes to bear for organizations, it's raising questions about how to handle the resulting volumes of big data. Expert Andy Hayler has some advice. Continue Reading
With Hadoop reaching several 10-year milestones, proponents laud the big data framework for making organizations more data-driven. And there's some merit in what they're saying. Continue Reading
Increasingly, IT administrators are integrating data center-grade storage systems with Hadoop -- ones that come with the required data protection, security and governance built-in. Continue Reading
Problem Solve HDFS Issues
We’ve gathered up expert advice and tips from professionals like you so that the answers you need are always available.
Using HDFS technology as a data analysis platform may be insufficient for your storage needs. Explore the challenges storage administrators may encounter and find out how to address them. Continue Reading
Data is an upside and a downside to the Internet of Things. Many companies are eager to make IoT products or add IoT capabilities to their devices, and some don't go beyond that. But taking IoT from cool toy to useful tool means doing something with all the data IoT applications produce.
In the cover story of this issue of Business Information, executive editor Craig Stedman shares stories from companies that are implementing IoT applications and capturing the data they create. Businesses that have made the decision to invest in the IoT describe the changes they made to their organizational structure and technology infrastructure to be ready for the onslaught of data from connected devices. For example, one company using IoT-enabled equipment, Rockwell Automation Inc., now uses two databases to store all the incoming information.
Manufacturing companies such as Rockwell had a bit of a jump on the IoT. In another feature, executive editor David Essex writes about how sensors laid the foundation for IoT applications. But that doesn't mean adopting full-blown IoT is easy for manufacturers. "It can be hard to get wireless connectivity into manufacturing facilities that are laden with concrete walls and heavy iron pipes and machinery," writes Essex. One thing is certain: IoT capabilities are going to be an investment for any company, and it's one that more and more are willing to make.
Also in this issue, Essex talks with Phil Crannage, core systems director at British Gas, about a project he's leading to move the U.K. energy provider to smart meters by 2020. Our look at an emerging technology or term -- What's the Buzz? -- tackles the hype and reality of data storytelling. And Stedman returns with a column on the diminishment of MapReduce. Continue Reading
In the business intelligence and analytics world, data lakes are their own region -- one in which today's multifarious forms of information can be stored in their native forms until used -- and cheaply at that. But these vast storage repositories, which are based on open source Apache Hadoop are not for those seeking rest and recreation. They take serious work -- and often sought-after skills -- to build and maintain.
In this guide, SearchDataManagement peers across several types of data lakes to discover how different organizations today are implementing them. First, editor Craig Stedman talks to three companies that have taken the dive -- and learns about the challenges and benefits presented to each. Next, reporter Jack Vaughan tells one executive's story -- how a Hadoop-based system opened new doors for his company. Finally, Vaughan quizzes Forrester analyst Mike Gualtieri on whether data lakes can compete with data warehouses to quench organizations' thirst for storing and analyzing business data. Continue Reading