The massive data sets, complex processing capabilities and advanced analytical models in the current digital business landscape create the perfect storm of opportunity for data and analytics. After languishing for decades, graph approaches are being embraced by analysts, data scientists and data management professionals. Graph technology is a sort of catch-all phrase that includes graph theory, graph analytics and graph data management.
IT executives have a growing interest in graphs, as there is a basic understanding that graph technology is somehow different from existing solutions. Data and analytics leaders are being asked to provide guidance regarding how graph technology can be used, but many still don't have a complete understanding.
Here are the basics of what data and analytics leaders wishing to use graph technology for analytics, business intelligence and data science solutions must understand about the technology and its use cases.
What does graph mean?
Graph represents the next major evolutionary step to enhance analytics delivery. As data volumes grow, traditional analytics often fails to address complex business operations, delivery and analysis problems. Graph technology helps find unknown relationships in data that are not being identified or analyzed through traditional means.
When the broad term graph is introduced as a topic, it often blurs three separate topics: graph theory, graph analytics and graph data management.
- Graph theory is the mathematical principle of stack ordering to identify paths, links and networks of logical or physical objects and the relationships they have to each other. This approach can be applied to almost anything: molecules, telephone lines, delivery routes, manufacturing processes and more.
- Graph analytics is the use of graph theory to discover the nodes, edges and data links that can be assigned semantic properties. Analysts can then address difficult-to-resolve issues where traditional analytics tools and solutions cannot reach conclusions. Frequently in traditional analysis solutions, users and analysts may identify false connections in data, but these can be evaluated in graph analytics with node and edge relationships.
- A graph database is a specific storage approach for data resulting from graph analytics. A popular use case for graph analytics output is populating a knowledge graph, which is a model in data that represents a common use of collected knowledge or data sets representing a commonly held concept.
While the infrastructure and terminology are often confused, ultimately the output from graph analytics emerges through visualization tools, knowledge graphs, specific applications and even some advanced dashboard capabilities of business intelligence tools. Graph is also often used in all three parts to make systems run more efficiently and even support data management in a dynamic approach. In this way, there is a direct link between graph theory and analysis, and analysis can always use graph databases.
Why are graphs gaining traction now?
Graph analysis, processing and even data stores have existed for decades, but graph technology is extremely demanding in terms of processing and data management requirements. It has always required significant resources, and systems generally couldn't handle the requirements until recently.
As data volumes grow and companies seek new ways to use that information to drive business results, the types and composition of problems become more varied. Traditional analytics have often failed to address some of the new, complex problems facing businesses, many of which require new technologies and approaches for specific use cases.
In 2020, the pandemic caused an immediate to change the way data is used. But the lessons learned in the health crisis can be applied permanently to business processes across multiple entities. The use of graph theory, graph data stores and even graph programming languages enable significantly enhanced analytics and decisions while accelerating delivery with new tools.
There is growing support for graphs in tools and database platforms, and their improved scalability is necessary in real-world apps. In parallel, there is an ever-greater sense of urgency for graphs because of ever-increasing complexity in data and the need for more adaptability with globalization, competition and digitalization.
When should graph technology be used?
As interest in graph grows, stakeholders are asking data and analytics leaders, "Is graph better than what we already do?"
Most existing analytics practices are based on relational concepts that have existed for more than 40 years. Relational concepts still work, but only when there is some agreement on how data will be captured, how it will be used and the type of reporting or analysis that will be completed.
It is possible to complete graph analytics in a relational database, but the use of graph-specific data, languages and analysis engines can significantly decrease the time-to-delivery for more complex requirements.
Using a data store with a graph language can typically reduce 8,000 to 10,000 lines of code to less than 400. Consider all the programming time, debugging time, production implementation testing and more saved by using complementary technology. On the other hand, consider the time it would take to develop graph mathematical operations over the top of languages and platforms that were built for a different purpose.
Graph technology permits flexibility for all potential interpretations of data, while relational data and analytics represent specifically designated conclusions emerging from a consensus or compromised subset of interpretations. In other words, a graph is more interested in how things relate to each other than forcing them into isolated topics.
Graph databases are specifically useful when there is a demand for frequent reconfiguration of the same data across multiple analytical models, especially in cases where multiple teams use the same data. For example, graph analytics is an essential component of many COVID-19 contact tracing projects, as it enables scientists to track and analyze information about many people at once and the nature of their many connections.
If the demand for data use and analysis has a somewhat consistent conclusion and a targeted or exclusively valid interpretation, relational technology is made for that purpose. Graph technology will also succeed, but it is not necessary when relational concepts are adequate.
Relational technology continues to be the best solution for when highly specified, repeatable and nonvolatile analytics are deployed, if for no other reasons than lower cost, more prevalent technical familiarity and the inherent stability of the resulting analysis outcomes.
Mark Beyer is a Research Vice President and Distinguished Analyst at Gartner, Inc. Mr. Beyer covers broad data architecture solutions including data management and its intersection with use cases and data governance issues. Gartner analysts will provide additional analysis on data and analytics trends at the Gartner Data & Analytics Summits 2021, taking place virtually May 4-6 in the Americas.