Big data management
Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.. See complete definition.
Top Stories
-
Tip
17 Oct 2022
Make data usability a priority on data quality for big data
To help make big data analytics applications more effective, IT teams must augment conventional data quality processes with measures aimed at improving data usability for analysts. Continue Reading
-
News
02 Feb 2022
Onehouse emerges with managed Apache Hudi data lake service
One of the original creators of the Hudi project at Uber has launched a new company set to bring a managed service to market to operationalize cloud data lakes. Continue Reading
-
Feature
17 Apr 2020
Common data lake challenges and how to overcome them
Managing the data contained in your enterprise data lake presents many challenges. From the amount of data to data inconsistencies, here are some solutions to common issues. Continue Reading
-
Feature
16 Mar 2020
How to build an effective streaming data architecture
Data architecture can be tricky when it comes to real-time analytics. Clear objectives and scalability are important factors when determining the streaming data architecture you need. Continue Reading
-
News
04 Mar 2020
Microsoft boosts Azure big data strategy with Hadoop distro
Microsoft Azure's HDInsight has a new, company-backed distribution of Hadoop, which should reassure customers interested in hybrid big data deployments. Continue Reading
-
News
25 Feb 2020
Databricks Data Ingestion Network brings data to lakehouse
Databricks Ingest entered a public preview in a move by Databricks to enable a lakehouse that combines the best features of the data lake and data warehouse models. Continue Reading
-
Feature
20 Feb 2020
3 of the top use cases for graph databases
Graph databases establish many unique relationships between data points. These unusual relationships are beneficial in many use cases, but here are the top three. Continue Reading
-
Tip
19 Feb 2020
3 cloud-based data management challenges
Your organization has data in the cloud, but there are still concerns as to how you and your data protection vendor manage that data to maintain performance and costs. Continue Reading
-
Tip
10 Feb 2020
6 best practices on data governance for big data environments
Efforts to govern big data must corral a mix of structured and unstructured data. That's a challenge for most organizations. These six action items will help. Continue Reading
-
Infographic
10 Feb 2020
NoSQL database comparison to help you choose the right store
With so many NoSQL databases to choose from, how can you tell which is best for your data set? Check out this comparison of some of the most popular NoSQL databases. Continue Reading
-
Feature
06 Feb 2020
Breaking down data silos with strong data governance
Not having a single source of truth can be a huge issue for data professionals. But strong data governance policies can prevent data silos and lead to better-quality data. Continue Reading
-
Tip
28 Jan 2020
Should you host your data lake in the cloud?
On premises or in the cloud: What's the better place for your data lake? Here are some things to consider before deciding where to deploy a big data environment. Continue Reading
-
News
14 Jan 2020
Former Hortonworks leader named new Cloudera CEO
Cloudera finds a new leader, pulling the former CEO of Hortonworks back into the fold to help set the direction for the big data Hadoop vendor as it moves forward in 2020. Continue Reading
-
Feature
13 Jan 2020
Analytics demands add loftier goals to data warehouse strategies
As the concept of storing data and the technologies needed to do it evolve, companies with set goals in mind are building their data warehouses to maximize analytics outcomes. Continue Reading
-
Feature
18 Dec 2019
Q&A: The challenges of cloud-native data management
Cloud native data management isn't usually the first thing on anyone's mind, acknowledges Portworx co-founder and CTO Goutham Rao -- but find out how it's become part of organizations' journey. Continue Reading
-
Feature
10 Dec 2019
How to streamline feature engineering for machine learning
Structured data is necessary in machine learning, but sifting through data is time consuming. Streamlining the feature engineering process can help data scientists be more productive. Continue Reading
-
Feature
10 Dec 2019
Building enterprise-grade AI: Sberbank and AI Telekom
Machine learning and artificial intelligence are growing beyond adolescence to prove business value. Practical applications at Sberbank and AI Telekom suggest getting machine learning data in one place, where data scientists can experiment Continue Reading
-
Feature
06 Dec 2019
Top database cloud migration considerations for enterprises
Many organizations are switching to cloud databases and big data platforms. But understanding what option best meets your data needs is an important first step. Continue Reading
-
Feature
21 Nov 2019
Future of MongoDB could be brighter than other NoSQL engines
Top NoSQL database MongoDB, once thought of as a revolution in data management, has come back to Earth some, but it still has a chance to stay ahead of competitors. Continue Reading
-
Feature
29 Oct 2019
How enterprises navigate GDPR data management rules
For businesses that operate in the EU, complying with GDPR has to be a top priority. And in many, much of the compliance burden falls on the data management staff. Continue Reading
-
News
18 Oct 2019
Databricks contributes Delta Lake to the Linux Foundation
Databricks has found a new home at the Linux Foundation for its open source Delta Lake data lake project, in a bid to help grow a broader community and accelerate adoption. Continue Reading
-
Tip
08 Oct 2019
7 steps to a successful data lake implementation
Flooding a Hadoop cluster with data that isn't well organized and managed can stymie analytics efforts. Take these steps to help make your data lake accessible and usable. Continue Reading
-
Feature
26 Sep 2019
Worried about storing unstructured data? You're not alone
Most data being created today is unstructured, and storage pros often find themselves struggling to keep up. Luckily, efficient unstructured data storage is still possible. Continue Reading
-
News
25 Sep 2019
Cloudera Data Platform gives big data users multi-cloud path
Cloudera released a big data platform combining its technologies and ones from Hortonworks, initially in the AWS cloud but with multi-cloud support to come. Continue Reading
-
Feature
20 Aug 2019
Enterprise data marketplace aims to ease self-service chaos
Self-service data preparation can duplicate work and slow down analytics. One possible fix: an internal marketplace where users can 'shop' for data assets. Continue Reading
-
Feature
15 Aug 2019
What you need to know about Cloudera vs. AWS for big data
Enterprises in need of a big data platform must run some analytics of their own to choose a vendor. AWS' integration between services can't be beat, but is Cloudera a better choice? Continue Reading
-
Feature
13 Aug 2019
Data management roles: Data architect vs. data engineer, others
Veteran data pro Michael Bowers differentiates between key data management positions, including their salaries and which ones can add the most business value. Continue Reading
-
News
07 Aug 2019
MapR collapse into HPE harbinger of big data tech trough of despair?
The collapse of big data pioneer MapR into HPE could be the fate of an also-ran. But might it also be a sign of the crash of a Hadoop-related meteor shower that included Hortonworks and Cloudera? Continue Reading
-
News
06 Aug 2019
HPE buys MapR assets to fuel AI applications
Longtime independent big data vendor MapR goes out of business, selling technology and intellectual property to HPE. The move marks the continuing decline of the Hadoop market. Continue Reading
-
News
30 Jul 2019
Hitachi Vantara updates Pentaho 8.3 to expand DataOps vision
Hitachi Vantara's new Pentaho update brings DataOps capabilities for data management to help organizations derive better data insights. Continue Reading
-
News
26 Jul 2019
Cloudera open source route seeks to keep big data alive
Inspired by the IBM-Red Hat model, Cloudera goes the open source route to broaden its market as demand for Hadoop weakens and the vendor takes on big competitors like AWS. Continue Reading
-
Feature
02 Jul 2019
Container technologies promise more agility for big data apps
Along with the ability to provide greater agility and flexibility for big data applications, containers can play a role in IT strategy that drives real-time decision-making. Continue Reading
-
News
28 Jun 2019
PostgreSQL database specialist EnterpriseDB gets new backing
EnterpriseDB is looking to push its database further with help from new financial backers. The deal sees Postgres originator Michael Stonebraker coming onboard as technical adviser. Continue Reading
-
News
27 Jun 2019
Growth of dark data shows need for better classification
Dark data causes all sorts of problems, but there are ways to mitigate it. However, customers continue to add storage as a bandage solution, because it's cheaper and simpler. Continue Reading
-
News
27 Jun 2019
Cloud data management finding its place as volumes soar sky high
Is a cloud data management platform right for your organization? Experts discuss its benefits and drawbacks as more data moves to the cloud for a variety of uses. Continue Reading
-
News
18 Jun 2019
MongoDB Atlas cloud service adds data lake, touts multi-cloud
MongoDB released an S3-compatible data lake its developer legions can quickly query. But, word of MongoDB Atlas use on Google's cloud shows there are clouds to sow beyond AWS. Continue Reading
-
Feature
13 Jun 2019
Microservices and big data start to get closer
Microservices are riding a wave of user interest, leading to changes in IT operations. ThoughtWorks expert Zhamak Dehghani discusses what that means for big data. Continue Reading
-
News
31 May 2019
MapR's future in jeopardy, layoffs loom
It's right there in a MapR letter to California's labor department: A leader in the Hadoop market is desperately seeking funding after poor sales of its promising data platform. Continue Reading
-
Opinion
24 May 2019
GDPR privacy concerns still brewing on law's first birthday
The first year of the much-debated EU data protection rule was subdued. High-profile fines for privacy breaches have yet to come, but regulators are starting to take action. Continue Reading
-
Feature
21 May 2019
Inside view of Tibco integration architecture planning
Tibco's acquisitions of well-regarded, small software specialists such as SnappyData are part of a drive toward what it calls 'connected intelligence.' CTO Nelson Petracek provides background. Continue Reading
-
News
13 May 2019
Red Hat OpenShift Operators target AI, big data workloads
Speedy deployment of machine learning jobs may not arrive too soon. But container platform enhancements from Red Hat aim to quicken the uptake of such innovations. Continue Reading
-
Feature
09 May 2019
Data modeling software tackles glut of new data sources
Data modeling platforms are starting to incorporate features to automate data-handling processes, but IT must still address entity resolution, data normalization and governance. Continue Reading
-
News
07 May 2019
ProvenDB brings blockchain applications to world of MongoDB
Blockchain is intriguing technology, but carries with it high system overhead. ProvenDB adds blockchain to MongoDB in an effort to gain acceptable performance. Continue Reading
-
News
30 Apr 2019
Snowflake CEO Bob Muglia talks cloud data warehouse evolution
In this Q&A, now-former Snowflake CEO Bob Muglia discusses the vendor's decision to embrace cloud data warehousing and how the industry is changing as more data moves to the cloud. Continue Reading
-
Feature
29 Apr 2019
A future data scientist needs business, deep learning skills
As automation grows, data scientists will focus more on business needs, strategic oversight and deep learning and less on model creation and other routine tasks. Continue Reading
-
Feature
29 Apr 2019
Wayfair charts open source components course to growth
Teams at Wayfair mix new open source tools to power customer-facing apps. In such shops, tech leaders like Ben Clark must deftly maneuver an obstacle course of data components. Continue Reading
-
Feature
24 Apr 2019
Most in-demand data science skills include ML, Python
Experts detail the skills employers want most in data scientists -- notably machine learning and programming languages -- and why often the most valuable expertise comes with time. Continue Reading
-
News
17 Apr 2019
Google takes a run at enterprise cloud data management
New Google Cloud boss Thomas Kurian is putting databases and data management at the forefront at Google. The vendor has forged key data deals, showing a more mature Google Cloud. Continue Reading
-
Feature
17 Apr 2019
4 factors to consider in a Hadoop distributions comparison
Examine the key characteristics necessary to evaluate in a Hadoop distribution comparison, focusing on enterprise features, subscription options and deployment models. Continue Reading
-
News
15 Apr 2019
Kafka at center of new event processing infrastructure
Events are as important as data in emerging applications underlying many e-commerce efforts. Streams of events tell a company what motivates customers to use online products. Continue Reading
-
News
04 Apr 2019
Tools manage performance for big data cloud applications
Tools such as Unravel and Pepperdata offer a way to measure performance of big data cloud applications, which may aid companies with on-premises configuration issues. Continue Reading
-
News
01 Apr 2019
Tools fill gaps in predictive modeling and machine learning
Machine learning platforms can bring the lone data scientist into the overall workflow. Updates to tooling also let that data scientist use familiar development interfaces. Continue Reading
-
Tip
29 Mar 2019
5 things to know about deploying big data systems in data containers
Planning for security and container APIs, and watching out for infrastructure sprawls are some issues to be aware of before deploying big data in containers. Continue Reading
-
News
25 Mar 2019
Facebook alumni forge own paths to big data analytics tools
Startups Interana and Rockset differ in their approaches to providing new query capabilities on fast-arriving big data. Both are led by technologists who started at Facebook. Continue Reading
-
News
20 Mar 2019
Aparavi update adds better data classification, more clouds
Aparavi updates Active Archive with customizable data classification and tagging capabilities for easier future retrieval and improved metadata search. Continue Reading
-
News
12 Mar 2019
Aerospike database garners Spark, Kafka connectors
Apache Kafka and Apache Spark connectors ease use of the Aerospike NoSQL data store in high-speed applications such as analytics that are becoming more broadly supported. Continue Reading
-
News
11 Mar 2019
Data catalog software takes on data lakes, privacy laws
Data catalogs form a hub for managing enterprise data. New products focus on machine learning and AI add-ons that help automate aspects of data governance. Continue Reading
-
News
11 Mar 2019
Users find data preparation tools vital to BI strategies
Data preparation isn't the sexiest topic, but it's critically important to IT and business users, according to a study by Dresner Advisory Services. Continue Reading
-
Feature
25 Feb 2019
Explore Hadoop distributions to manage big data
Discover the uses of Hadoop distributions and the first steps in evaluating these products, as well as how the merger of rivals Cloudera and Hortonworks affects the market. Continue Reading
-
News
14 Feb 2019
Originators form group to boost Presto SQL query engine
The Presto engine arose as an alternative to Hive for big data queries. Now, the Presto Software Foundation has formed to promote the SQL query software's virtues. Continue Reading
-
News
01 Feb 2019
Open source cloud databases battle software 'strip mining'
Cloud giants like AWS have adopted open source databases, causing Confluent, MongoDB and others to guard their assets the best way they know how: licensing. Continue Reading
-
News
01 Feb 2019
Cloud data management, security top of mind for government
Federal government data officers grapple with cloud data management, weighing lower cost and efficiencies against security threats and vendor lock-in. Continue Reading
-
Feature
01 Feb 2019
Cloud data warehouse makes inroads as users spurn admin tasks
Overlooked in the run-up to Hadoop, data warehouses have found new life off premises. Cloud-based data warehouses find favor with teams that want to reduce warehouse administration. Continue Reading
-
Feature
23 Jan 2019
Advantages of graph databases: Easier data modeling, analytics
Graph databases are finding a place in analytics applications at organizations that need to be able to map and understand the connections in large and varied data sets. Continue Reading
-
Feature
16 Jan 2019
Southern Water’s centralised data team geared for silo busting
Southern Water has centralised its data specialists and overhauled its data management and business intelligence technology to support business decision-making at scale Continue Reading
-
News
15 Jan 2019
Cloudera and Hortonworks combo to push CDP, machine learning
Two wunderkinds of Hadoop have formalized their merger. Cloudera and Hortonworks say they will place special focus on AI as they chart the stand-alone vendor's future. Continue Reading
-
News
28 Dec 2018
Data management trends for 2019: Governance, DataOps, cloud
Better data governance, increased cloud use and wider DataOps adoption head the list of trends for data management teams to plan for in 2019, IT analysts say. Continue Reading
-
Podcast
19 Dec 2018
Open source support was central to 2018 data deals
Mergers and acquisitions unsettled the big data status quo in 2018. Open source support made these couplings a bit different than those of the past, Talking Data podcasters said. Continue Reading
-
Tip
17 Dec 2018
SQL Server 2019 improves Linux, container support
The SQL Server 2019 release includes new big data integration features, a collection of database engine enhancements and improved Linux and container support. Continue Reading
-
News
14 Dec 2018
Data companies vie to fill spots in AWS cloud data lineup
Third-party vendors that offer data platforms to AWS users tout hedges against cloud lock-in. But they must both compete and collaborate with the cloud leader. Continue Reading
-
Tip
20 Nov 2018
Trifacta data prep tool helps blend disparate data sources
Handling diverse data sources usually consumes precious developer time. That led healthcare CRM company SymphonyRM to hand the data prep task to business analysts. Continue Reading
-
News
12 Oct 2018
MarkLogic Data Hub Service aims to ease cloud use of NoSQL DBMS
MarkLogic rolled out a cloud-service version of its NoSQL database management system, a move designed to make the technology more cost-effective for cloud users. Continue Reading
-
Feature
11 Oct 2018
Cloud buoys data microservices -- for on-premises systems, too
Data in a microservices architecture is percolating anew. This news analysis looks at IBM Cloud Private for Data and other means to harmonize data in public and private locations. Continue Reading
-
News
04 Oct 2018
Cloudera-Hortonworks merger narrows Hadoop users' options
Hadoop users will have fewer choices as big data rivals Cloudera and Hortonworks unite. But the new company may be more competitive with AWS and Google. Continue Reading
-
Podcast
27 Sep 2018
Big data platform broadens place in analytics architecture
Big data platforms stumbled a bit getting out of the prototyping stage. But a view from the Strata Data Conference in New York sees broader use in the offing. Continue Reading
-
Opinion
21 Sep 2018
5 trends driving the big data evolution
The speedy evolution of big data technologies is connected to five trends, including practical applications of machine learning and cheap, abundantly available compute resources. Continue Reading
-
News
13 Sep 2018
Containers key for Hortonworks alliance on big data hybrid
Hortonworks is joining with Red Hat and IBM to work together on a hybrid big data architecture format that will run using containers both in the cloud and on premises. Continue Reading
-
News
16 Aug 2018
Chief data officer skills tested by AI tech blitz
A reporter's notebook from a recent MIT symposium provides insights on chief data officer needs, as the AI wave starts to hit CDOs and affects them more than most other tech trends. Continue Reading
-
News
08 Aug 2018
Confluent Platform 5.0 aims to mainstream Kafka streaming
Confluent Platform updates seek to bring data streaming with Apache Kafka to a wider audience. A new GUI and user-defined functions are part of the 5.0 release. Continue Reading
-
News
01 Aug 2018
Alluxio adds connectors for multi-cloud data migration
Updated Alluxio open source storage software homes in on portability across multiple big data object stores. Applications access Alluxio as mountable file storage. Continue Reading
-
News
27 Jul 2018
BigQuery ML moves machine learning into Google BigQuery
Google is enabling BigQuery users to build SQL-based machine learning models inside the cloud data warehouse via a BigQuery ML technology now out in beta. Continue Reading
-
News
25 Jul 2018
Qlik-Podium acquisition aims to boost BI data management
With its acquisition of Podium Data, Qlik seeks to amplify its enterprise BI data management capabilities and raise the level of competition with Tableau and Power BI. Continue Reading
-
News
19 Jul 2018
Focus, scope and spotting opportunity are key to role of CDO
Chief data officers and experts see the CDO role as changing to a more strategic orientation -- especially finding key opportunities in vast troves of data. Continue Reading
-
News
16 Jul 2018
Chief data officer role: Searching for consensus
The chief data officer role is about many things -- regulations, innovation, AI and more. Consultant Randy Bean discussed the matter ahead of an MIT symposium on the topic. Continue Reading
-
Feature
09 Jul 2018
eHarmony hooks up with Redis NoSQL database for hot storage
The Redis key-value store finds use in a system to match would-be romantic partners on dating site eHarmony, which employs a variety of NoSQL databases to make love click online. Continue Reading
-
Tip
06 Jul 2018
When to choose an S3 big data environment over HDFS storage
Selecting a storage service for big data in the cloud can be challenging. Expert David Loshin explains usage patterns that could lead organizations to Amazon Simple Storage Service. Continue Reading
-
Feature
03 Jul 2018
Labeled data brings machine learning applications to life
The types of data being collected for analytics use are increasing, but traditional structured data is a good match for machine learning. Gartner's Svetlana Sicular explains why. Continue Reading
-
Feature
02 Jul 2018
GPU cloud tools take complexity out of machine learning infrastructure
While talk of AI on GPUs is abuzz, actually building a machine learning infrastructure remains a dark art. A startup's PaaS is looking to automate parts of the process. Continue Reading
-
News
27 Jun 2018
MongoDB 4.0, Stitch aim to grow NoSQL apps in cloud, on-prem
NoSQL vendor MongoDB upgraded its database software with ACID support, while also releasing a serverless platform intended to simplify application development. Continue Reading
-
News
25 Jun 2018
Hadoop data lake architecture tests IT on data integration
Hortonworks users talk about building Hadoop data lakes to support new applications -- and the challenges their teams face on ingesting and refining data for end users. Continue Reading
-
News
18 Jun 2018
Hortonworks cloud options grow via Google, Microsoft, IBM
Hortonworks now supports Google Cloud Storage and has also broadened cloud deals with Microsoft and IBM, aiming to increase cloud uses of its big data platform. Continue Reading
-
Feature
07 Jun 2018
Google Cloud data lake fuels cloud payment processing flow
To create a cloud payment processing system, Global Payments first had to deploy a data lake in the Google Cloud. Getting quick user feedback was another early step. Continue Reading
-
Tip
05 Jun 2018
Why Spark DataFrame, lazy evaluation models outpace MapReduce
Learn how the Spark DataFrame execution plan works and why its lazy evaluation model helps the processing engine to avoid the performance issues inherent in Hadoop MapReduce. Continue Reading
-
Podcast
01 Jun 2018
Starburst finds new worlds to conquer with SQL query engine
Relational databases may have hit a wall of late, but the SQL query engine seems poised for wider growth. Starburst, a retro startup of sorts, is among those looking to take it wider still. Continue Reading
-
Feature
22 May 2018
AP uses data.world platform to spread data journalism
Data journalism reporters need tools that deliver quick context for stories on deadline. A data collaboration platform from data.world helps the Associated Press meet such needs. Continue Reading
-
News
18 Apr 2018
DataWorks 18: Hortonworks styles itself 3.0 with a ‘DataPlane’ service
Hortonworks has used its DataWorks Summit in Berlin to announce a data governance “studio” plug in to its DataPlane service Continue Reading
-
News
17 Apr 2018
Teradata applies time series analytics tools to IoT data
Data warehouse pioneer Teradata looks to ease IoT data analysis with capabilities that address skills gaps on time series analytics techniques, which are coming more to the fore. Continue Reading
- 09 Apr 2018
-
Opinion
09 Apr 2018
IoT, edge computing spawn new security issues
As real-time big data increasingly hitches up to internet of things, edge computing power and fog nodes, a whole new layer of security threats emerges. Continue Reading
-
Feature
09 Apr 2018
IT teams take big data security issues into their own hands
Data security needs to be addressed upfront in deployments of big data systems -- and users are likely to find they have to build some security capabilities themselves. Continue Reading
-
Feature
09 Apr 2018
Information architecture applied to big data streaming, AI
New technologies challenge data professionals, but taking a step back helps with hurdles. In this interview, consultant William McKnight takes a measured look at data streaming, GDPR and AI. Continue Reading
-
Feature
06 Apr 2018
Cloud workloads, data lakes challenge information architecture
Data management options are expanding; cloud workloads are an example. That means changing your approach to information architecture, says data management expert William McKnight. Continue Reading