Tip

6 essential big data best practices for businesses

These best practices can help data leaders create an effective big data strategy that meets an organization's analytics needs and delivers valuable business benefits.

Big data management and analytics tools are transformative technologies for companies of all sizes across various industries. For example, big data environments give retailers detailed insights into their entire supply chain. Manufacturers can monitor and manage all the production equipment in their factories. Marketers in these and other industries can analyze every customer touchpoint, from website visits to phone calls, emails, chats and purchases.

Yet there are still lots of questions -- and confusion -- about how to get the most out of big data architectures. The following are six best practices that data management and analytics leaders should adopt when their organization decides to invest in big data technologies.

1. Focus on business needs, not the technology

Thanks to big data technologies, data management and analytics teams can handle data volumes and complex analytics applications that previously were beyond all but the most advanced companies and government agencies. However, organizations can get carried away by the technology, assuming that there must be an advantage to using any new tools or capabilities.

For example, many businesses want to implement real-time analytics applications. Analyzing data in real time as it's created and updated enables organizations to gain immediate insights into customer behavior, market trends and operational performance. But two business-related problems often make that a challenge:

  1. Data is generated and collected at a level of detail that many business users don't require.
  2. Even if big data systems deliver actionable real-time analytics, business processes and workflows don't enable users to make decisions at that pace. As a result, the actions of business executives and workers lag behind the data analysis.

This mismatch between the flow of data and the cadence of business decisions can overload users with information that just gets in their way as they try to do their job. It also leads to unnecessary spending on analytics technology, when less immediate "right-time analytics" might better suit business rhythms.

Big data is a valuable business asset, but it may well be a wasted one without strong use cases to justify deployments.

2. Incorporate AI into big data applications in sensible ways

The ways AI is transforming data management and analytics processes should factor into big data strategies. For example, AI tools can automate data preparation tasks and extract insights from text, images and other unstructured or semistructured data. Generative AI chatbots enable users to explore and analyze data through conversational natural language queries. They can also suggest issues to investigate in data sets and recommend appropriate data visualizations.

However, we see the same pattern with AI as with real-time analytics. Agentic AI is a particular case in point. It's a technology with great promise: AI systems that autonomously explore data, execute tasks and deliver insights without explicit human direction. Vendors are embedding AI agents into data platforms and analytics tools, and numerous organizations have launched pilot and proof-of-concept projects. Yet many of those projects have failed to reach production use.

Organizations that do well with agentic AI treat it as a tool for achieving desired business outcomes. They identify specific analytics workflows where agentic automation can deliver measurable business value, then redesign those workflows to accommodate how agents function rather than just replace existing human activities with AI ones.

It's the same with incorporating other AI technologies into big data applications. Data management and analytics leaders should first ask about business needs, current pain points and how AI could streamline internal processes and improve decision-making. Technology choices follow from the answers they get.

3. Collect lots of data for both current and future analytics uses

While the massive data volumes commonly collected in big data systems enable new types of analytics applications, data scientists and analysts often feel overwhelmed by all that data. Swamping even experienced analytics professionals with more data than they can comfortably work with certainly isn't something you should do. Indeed, many data lakes where big data is stored have become more like swamps, with sprawling data sets that are difficult to manage and analyze effectively.

However, collecting and using all that data doesn't have to be a problem. Data science teams can use AI tools and machine learning algorithms to analyze big data volumes that are too large for conventional analytics techniques. The case for broad data collection grows stronger based on how AI learns. A large data repository provides the context that enables AI models and agents to understand an organization's business well enough to recommend useful actions.

Data can also still be valuable even if it isn't used immediately. A comprehensive big data strategy collects data both to support business decision-making now and to be available for future analytics use cases and scenarios. Down the road, for example, data scientists might find patterns in consolidated streaming data sets that help them detect business problems or opportunities.

But don't collect data indiscriminately or manage it haphazardly. While storage is relatively cheap, managing large amounts of data requires time and attention. Data sets without solid lineage documentation or data quality controls due to a lack of data management resources are potential liabilities. Use input from business leaders to focus the collection process on data with immediate or expected future value, excluding data deemed not useful.

4. Apply rigorous controls to track and manage data

Big data is typically diverse, with a variety of structured, unstructured, and semistructured data types. For example, audio files of customer support calls might be stored in a big data environment alongside related product images, documents and social media content, as well as business data such as transactions and account records. Big data systems also commonly contain data from sensors, emails, videos, logs and external data sources.

This varied data also has diverse uses. Most organizations don't identify all the possible use cases for their big data environments in advance. Even if they do, they can't develop all the required analytics applications simultaneously.

This reality initially made data lakes attractive. They enable raw data to be stored in native formats and structured as needed for specific analytics uses. Yet the promise of data lakes proved difficult to fulfill in practice. Heightening the swamp effect, organizations often lose track of what their data lake contains. Many also can't reliably track where data originated, how it was ingested or how it's been transformed.

Newer data lakehouse architectures address these issues by combining the storage flexibility of data lakes with the more rigorous data management functions of traditional data warehouses. Open table formats, such as Apache Iceberg and Delta Lake, add transactional consistency and data versioning to previously ungoverned data storage. Data managers can maintain audit trails, enforce access controls and evolve schemas without disrupting analytics operations.

5. Govern data for regulatory compliance and increased usability

In today's regulatory environment, strong data governance isn't optional. Organizations face a growing number of general data security and privacy laws, such as the EU's GDPR and the California Consumer Privacy Act. Some companies also must comply with industry regulations, such as HIPAA, which protects healthcare information in the U.S.

AI regulations are also now a factor to consider. For example, new provisions of the EU AI Act scheduled to take effect in August 2026 require qualifying organizations deploying AI systems classified as high-risk to meet a set of data governance and management requirements, as well as risk management and human oversight obligations. Similar laws are progressing in many other jurisdictions.

As a result, data governance processes that support regulatory compliance efforts are an essential component of big data strategies. However, effective governance does more than just ensure an organization doesn't break the law. Well-governed data is also a better resource for analytics applications.

Partly, this is a matter of confidence. If data is carefully administered within a governance framework, data scientists and analysts feel freer to explore and experiment with new analytics scenarios that could spur business innovation. Data that's properly defined, cataloged, secured and managed is also easier to work with and more likely to produce accurate analytics results.

6. Balance cost, data sovereignty and other issues in the cloud

In most enterprises, the cloud is now the default IT infrastructure model for new systems and applications. But cloud deployments pose new data management issues, especially when big data environments span multiple cloud providers and geographic regions.

While multi-cloud strategies offer resilience and the ability to choose data platforms and tools that best fit individual applications, they can increase processing costs and complicate data governance and management. Data sovereignty is also now a pressing concern in many cloud implementations. Governments worldwide are asserting jurisdiction over personal data within their borders by mandating local storage and restricting cross-border data transfers, among other measures. Local restrictions are also being applied to the data used in AI applications.

As a result, a hybrid cloud approach isn't merely convenient but necessary for many organizations. In hybrid deployments, cloud systems are often used for most applications, while on-premises infrastructure is used for data workloads that must remain local due to privacy or AI regulations and applications running on hard-to-replace legacy systems.

Data and IT leaders should balance all these factors -- data needs, cost efficiency, regulatory compliance, operational resilience and the flexibility to adapt systems and applications as business requirements change -- when they design cloud-based big data environments.

Editor's note: This article was updated in February 2026 for timeliness and to add new information.

Donald Farmer is a data strategist with 30-plus years of experience, including as a product team leader at Microsoft and Qlik. He advises global clients on data, analytics, AI and innovation strategy, with expertise spanning from tech giants to startups.

Next Steps

Top trends in big data for enterprises

Benefits of using big data for businesses

Big data challenges and how to address them

Big data analytics and business intelligence: A comparison

Dig Deeper on Data science and analytics