How big data collection works: Process, challenges, techniques Hadoop vs. Spark: An in-depth big data framework comparison

6 essential big data best practices for businesses

These best practices can help businesses put their big data strategy on the right track to meet analytics needs and produce the expected business benefits.

Over the past decade, big data management and analytics tools have been transformational technologies for companies of all sizes, in various industries. For example, retailers now have insight into their entire supply chain in fine detail. Manufacturers can monitor and manage the performance of thousands of components and machines in their factories. Marketers can analyze every customer touchpoint, from website visits to phone calls and purchases.

Yet I still hear a lot of confusion about how to get the best out of big data architectures. I'm going to describe six big data best practices you should bear in mind -- if you like, six discussion topics you can bring to the table when the broader topic of investing in big data technologies arises in your organization. These are not overly technical in nature. Remember that big data is a business asset, not just a technical resource. In fact, let's start there.

1. Focus on business needs, not technology

Technology, especially in the field of big data analytics, is advancing at a rapid pace. Data management and analytics teams can now deal with volumes of data and analytics complexity that just a few years ago were beyond all but the most advanced companies and government agencies. We can get carried away by the technology itself, assuming that if a new capability exists, there must be an advantage to using it.

For example, many businesses tell vendors and consultants that they want to do real-time analytics on data. But if we dig into what this means, we often find two problems that are not technical at all.

First, data is generated and collected at a much finer level of detail than many business users can understand or work with. And second, even if big data systems can deliver actionable analytics as data is collected or changes, the business cannot make relevant decisions at that speed. One result is that business executives and workers always find their actions lagging behind the data analysis, which means you have, to a certain extent, spent unnecessary costs.

Such a mismatch between the flow of data and the cadence of business decisions can also leave users feeling stressed and overloaded with information that gets in the way of doing their job well. When dealing with requests for real-time analytics in big data environments, it's worth asking whether "right-time analytics" would better suit the rhythm of the business.

Big data best practices chart
Organizations should adopt these best practices as part of their big data initiatives.

2. Collecting lots of data is a good thing, not a problem

Many data scientists and analysts complain of feeling overwhelmed by data and see big data as part of that problem. For sure, you shouldn't swamp even experienced analytics professionals with more data than they can comfortably take in and make sense of.

Nevertheless, not all data has to be analyzed by humans. Machine learning algorithms and enterprise AI tools can take advantage of big data volumes that data science teams couldn't handle on their own.

Also, even if you decide not to do real-time analytics, it can still prove valuable to collect and store all that streaming data for future use. Down the road, data scientists may find patterns in what is then historical data that can be used to detect potential business problems or opportunities. They could then deliver alerts and notifications to help improve business decisions.

The volume of big data overwhelms us only if we let it. Your organization's big data strategy should focus on effectively delivering the most appropriate analytics for business decision-making now, while also storing, governing and managing data for use cases and analytics scenarios you may not even know about yet.

3. Use data visualization to enable data discovery and analysis

When working with information at scale, our visual capacity is unmatched for making sense of it all. Even people who don't have the coding skills to write a clustering algorithm or the ability to describe how it works can easily pick out a clutch of close data points in a chart generated by that algorithm. And those who may not be able to find outliers in a set of big data programmatically would find it straightforward to spot a few values that just don't fit into the visual pattern they're seeing. With appropriate data visualizations, we're all natural data analysts.

Not all visualizations are simple and easy to grasp, of course. But when dealing with big data, how it's understood by business users and, consequently, their use of it in decision-making will be more effective with well-designed visual representations of the data and analytics results. This particularly holds true for predictive analytics applications, where interpreting the details of data can be very technical, even when the larger picture of future trends and probabilities is highly relevant to business goals.

With such patterns of discovery in mind, your big data strategy should include suitable data visualization tools, along with relevant training for both analysts and business users.

4. Iterate on structuring big data to match specific applications

By its nature, big data must be managed at scale, but you should also recognize that it's very diverse. For example, audio recordings of customer support calls might be stored in a big data environment, perhaps along with product images, relevant social media content, various types of documents and more traditional data, such as transactions and operational records.

The uses of this data are therefore also very diverse. You simply can't work out in advance all the possible use cases and business requirements. Similarly, you can't develop all those analytics scenarios in a single project. Over time, you'll discover new uses for sets of big data as your analytics team develops, business needs change and technology advances.

Future-proofing is one of the great advantages of data lakes and big data platforms such as Hadoop and Spark: You don't need to structure the data when you first process and store it. Instead, the data can be left in its native format and then filtered, transformed and organized as needed for each new analytics application.

Such an iterative approach should be an essential component of your long-term strategic thinking on big data. Remember: It's a marathon, not a sprint.

5. Consider the cloud for deployments of big data systems

With an incremental process of managing data and the need to store very large volumes of it for possible future uses, you may worry about the costs of keeping so much data around. Rather than being an expensive barrier to your big data strategy, cloud services can really help.

For one thing, cloud platform vendors price data storage as a commodity, typically making it far cheaper than buying your own on-premises storage devices. In addition, they manage data security, availability, backup and restore, replication and archiving on your behalf. A big data platform in the cloud likely has not only more processing capacity, but also better tools and a more experienced staff supporting it than your organization can afford on its own.

6. Govern data for both compliance and usability

In today's regulatory environment, strong data governance is no longer optional: It must be a primary consideration in your big data strategy. Whether you need to deal with general data security and privacy legislation such as the European Union's GDPR, or vertical regulations such as HIPAA for healthcare information in the U.S., regulatory compliance represents a key motivation for governing your data well.

Does that sound negative? Is data governance really just to ensure we don't break the law? In fact, well-governed data is also a better resource for big data analytics applications. Partly, this comes down to a matter of confidence. If you administer data carefully within a regulatory framework, data scientists and analysts feel freer to explore and experiment with new, and potentially innovative, usage scenarios. Moreover, companies generally find that well-governed data -- cataloged, described, secured and deployed in a thoughtful manner -- is easier to work with, too.

Put these big data best practices into action

As you can see, there are a lot of relevant issues to work through when considering and developing a big data strategy. IT, data management and analytics leaders need to have these conversations with business decision-makers -- because as we have seen again and again, technologies are not enough on their own. As I said above, big data is a business asset. Without business-focused analytics, it may be a wasted one.

Next Steps

Top trends in big data for 2021 and beyond

10 big data challenges and how to address them

Data quality for big data: Why it's a must and how to improve it

Dig Deeper on Data science and analytics

Data Management
Content Management