Data aggregation is any process whereby data is gathered and expressed in a summary form. When data is aggregated, atomic data rows -- typically gathered from multiple sources -- are replaced with totals or summary statistics. Groups of observed aggregates are replaced with summary statistics based on those observations. Aggregate data is typically found in a data warehouse, as it can provide answers to analytical questions and also dramatically reduce the time to query large sets of data.
Data aggregation is often used to provide statistical analysis for groups of people and to create useful summary data for business analysis. Aggregation is often done on a large scale, through software tools known as data aggregators. Data aggregators typically include features for collecting, processing and presenting aggregate data.
Data aggregation can enable analysts to access and examine large amounts of data in a reasonable time frame. A row of aggregate data can represent hundreds, thousands or even more atomic data records. When the data is aggregated, it can be queried quickly instead of requiring all of the processing cycles to access each underlying atomic data row and aggregate it in real time when it is queried or accessed.
As the amount of data stored by organizations continues to expand, the most important and frequently accessed data can benefit from aggregation, making it feasible to access efficiently.
What does data aggregation do?
Data aggregators summarize data from multiple sources. They provide capabilities for multiple aggregate measurements, such as sum, average and counting.
Examples of aggregate data include the following:
- Voter turnout by state or county. Individual voter records are not presented, just the vote totals by candidate for the specific region.
- Average age of customer by product. Each individual customer is not identified, but for each product, the average age of the customer is saved.
- Number of customers by country. Instead of examining each customer, a count of the customers in each country is presented.
Data aggregation can also result in a similar effect to data anonymization -- as individual data elements with personally identifiable details are combined and replaced with a summary representing a group as a whole. An example of this is creating a summary that shows the aggregate average salary for employees by department, rather than browsing through individual employee records with salary data.
Aggregate data does not need to be numeric. You can, for example, count the number of any non-numeric data element.
Before aggregating, it is crucial that the atomic data is analyzed for accuracy and that there is enough data for the aggregation to be useful. For example, counting votes when only 5% of results are available is not likely to produce a relevant aggregate for prediction.
How do data aggregators work?
Data aggregators work by combining atomic data from multiple sources, processing the data for new insights and presenting the aggregate data in a summary view. Furthermore, data aggregators usually provide the ability to track data lineage and can trace back to the underlying atomic data that was aggregated.
Collection. First, data aggregation tools may extract data from multiple sources, storing it in large databases as atomic data. The data may be extracted from internet of things (IoT) sources, such as the following:
- social media communications;
- news headlines;
- personal data and browsing history from IoT devices; and
- call centers, podcasts, etc. (through speech recognition).
Processing. Once the data is extracted, it is processed. The data aggregator will identify the atomic data that is to be aggregated. The data aggregator may apply predictive analytics, artificial intelligence (AI) or machine learning algorithms to the collected data for new insights. The aggregator then applies the specified statistical functions to aggregate the data.
Presentation. Users can present the aggregated data in a summarized format that itself provides new data. The statistical results are comprehensive and high quality.
Data aggregation may be performed manually or through the use of data aggregators. However, data aggregation is often performed on a large-scale basis, which makes manual aggregation less feasible. Furthermore, manual aggregation risks accidental omission of crucial data sources and patterns.
Uses for data aggregation
Data aggregation can be helpful for many disciplines, such as finance and business strategy decisions, product planning, product and service pricing, operations optimization and marketing strategy creation. Users may be data analysts, data scientists, data warehouse administrators and subject matter experts.
Aggregated data is commonly used for statistical analysis to obtain information about particular groups based on specific demographic or behavioral variables, such as age, profession, education level or income.
For business analysis purposes, data can be aggregated into summaries that help leaders make well-informed decisions. User data can be aggregated from multiple sources, such as social media communications, browsing history from IoT devices and other personal data, to give companies critical insights into consumers.