Data gravity is the ability of a body of data to attract applications, services and other data.
The force of gravity, in this context, can be thought of as the way software, services and business logic are drawn to data relative to its mass (the amount of data). The larger the amount of data, the more applications, services and other data will be attracted to it and the more quickly they will be drawn.
In practical terms, moving data farther and more frequently impacts workload performance, so it makes sense for data to be amassed and for associated applications and services to be located nearby. This is one reason why internet of things (IoT) applications need to be hosted as close as possible to where the data they use is being generated and stored.
Hyperconvergence is a term that's often used to illustrate the concept of data gravity. In a hyper-converged infrastructure, compute, networking and virtualization resources are tightly integrated with data storage within a commodity hardware box. The greater the amount of data, and the more other data might be connected to it -- the more value the data has for analytics.
The history of data gravity
IT expert Dave McCrory coined the term data gravity as an analogy for the physical way that objects with more mass naturally attract objects with less mass.
According to McCrory, data gravity is moving to the cloud. As more and more internal and external business data is moved to the cloud or generated there, data analytics tools are also increasingly cloud-based. His explanation of the term differentiates between naturally-occurring data gravity and similar changes created through external forces such as legislation, throttling and manipulative pricing, which McCrory refers to as artificial data gravity.
McCrory recently released the Data Gravity Index, a report that measures, quantifies and predicts the intensity of data gravity for the Forbes Global 2000 Enterprises across 53 metros and 23 industries. The report includes a patent-pending formula for data gravity and a methodology based on thousands of attributes of Global 2000 enterprise companies’ presences in each location, along with variables for each location including
- Gross domestic product (GDP)
- Number of employees
- Technographic data
- IT spend
- Average bandwidth and latency
- Data flows