Tech Accelerator What is a cloud database? An in-depth cloud DBMS guide

Prev Next

Definition

graph database

Alexander S. Gillis

By

Alexander S. Gillis, Technical Writer and Editor

Published: May 31, 2024

What is a graph database?

A graph database, also referred to as a semantic database, is a software application designed to store, query and modify network graphs. A network graph is a visual construct that consists of nodes and edges. Each node represents an entity, such as a person, and each edge represents a connection or relationship between two nodes.

Graph databases are a type of database design that has been around in some variation for a long time. As an example, a family tree is a simple graph database.

The concept behind graphing a database is often credited to 18th-century mathematician Leonhard Euler. The concept of using databases to map relationships digitally became popular in business around 2015 when increased compute power, in-memory computing and agreed-upon standards moved the concept from academics to real-world uses in business and enterprise computing.

Graph databases store and represent data in the form of nodes, edges and properties. Where nodes represent entities and edges represent the relationships between entities, properties are the attributes associated with nodes and edges that provide additional context. Nodes are stored as records, with edges and properties represented as pointers between nodes.

This article is part of

What is a cloud database? An in-depth cloud DBMS guide

Graph databases are well suited for analyzing interconnections, which is why they're being used to mine data from social media. Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an Internet Protocol telephony issue and creating "Customers who bought this also looked at ..." recommendation engines.

How do graph databases work?

Graph databases are structured using a graph model -- different from traditional relational databases. Graph databases are unique in their structure, how they manage and store information, and how they're queried.

The structure of a graph database

Graph databases work using a graph model, which consists of nodes, edges and properties that form the graph structure. Nodes represent entities like people, places or things, and edges represent relationships between entities. Properties are the attributes -- stored as additional metadata -- associated with edges and nodes.

Traditionally classified as a type of NoSQL database, graph databases exist in several types, such as triple stores, for example. This type of database uses a special index that stores information about nodes, edges and the relationships between them in groups of three.

A triple, which is also referred to as an assertion, has three main fields: a subject, a predicate and an object. Each subject, predicate or object is represented by a unique resource identifier (URI).

How information is indexed

In a triple store, the first field in the database holds the URI for the subject, the second field holds the URI for the predicate, and the third field holds the URI for the object. While there are several strategies that graph databases can use for storing triples, most use an index that abbreviates the three primary fields to ?s, ?p and ?o.

For example, if the visual construct for a graph is given as follows:

An example of a visual construct for a graph database. — Graph models consist of nodes, edges and properties that form a graph.

Then the index will look like this:

Row	?s	?p	?o
1	:Chris	:marriedTo	:Sheila
2	:Chris	:brotherOf	:Tony
3	:Chris	:listensTo	:RockMusic
4	:Sheila	:listensTo	:RockMusic
5	:Sheila	:sisterInLawTo	:Tony
6	:Tom	:worksFor	:XYZ
7	:Tom	:hasPet	:Rover
8	:Chris	:drives	:Audi
9	:Chris	:colleagueOf	:Tom
10	:Chris	:worksFor	:XYZ

Nodes might also be stored in clusters. Clustering in graph databases is the process of identifying groups, also called communities, of nodes that show adjacent connections in a graph. These node clusters are closely related to each other, compared with nodes that aren't in the cluster.

How information in a graph database is queried

Graph algorithms are used to analyze the relationships of interconnected graph data. They perform tasks like finding patterns, shortest connected paths and distance between vertices, as well as the vertices' importance and clustering.

Graph query languages are programming languages used to interact with graph databases. They let users retrieve, manipulate and analyze graph data. Graph query languages have features that enable users to add, edit and query data, and can efficiently process complex queries. For example, the ISO/IEC 39075:2024 standard, published in April 2024, outlines data structures and basic operations on property graphs.

Each triple in a graph database only gets stored once in the index. Just like relational databases, it's a simple process to do a straight lookup query in a graph database. If the query is for what information is known about Chris, the indexer programming only needs to search rows 1-3 of the database.

The real power and speed of a graph database come from indexing combinations of triples. The following are a few examples:

If the query is for who Chris is married to, the indexer will look for the predicate :marriedTo in rows 1-3 and then retrieve the matching object. Chris is married to Sheila.
If the query is to identify everyone who listens to the same kind of music as Chris, the indexer will first ask { :Chris :listensTo ?o } and identify :RockMusic as the object.

In the second query, the results will return :RockMusic in rows 3 and 4. The subject in row 3 is Chris himself, so whoever is the subject in row 4 will be the other person who listens to rock music. It turns out to be Sheila, Chris' wife.

Types of graph databases

There are several usable types of graph databases, including the following:

Knowledge graphs. Knowledge graphs like the one above focus on the semantic aspects of data and store information in triples.
Property graphs. Property graphs support nodes, edges and model relationships among data points, with detailed information about the subject and how that data interrelates.
Resource Description Framework graphs. RDF graphs express data in graphs using object, predicate and subject parts. They're used to manage interlinking resources and metadata, and conform to World Wide Web Consortium standards.
Triple store. Triple store graph data models store data in the triple format consisting of a subject-predicate-object data structure. They're used for storage and querying of data, including semantic relationships.
Hypergraph databases. Hypergraph models enable hyperedges, which are relationships that connect two or more nodes. They're used with graphs that have more complex many-to-many relationships.

Use cases for graph databases

Current use cases for graph databases include the following:

Enabling data analysts to federate data sets without having to create and run complex queries that join combinations of tables together, as in the relational database model.
Helping developers create the back end for voice assistants by mapping possible user questions to correct answers.
Examining direct connections to identify potential indirect connections for recommendation engines.
Detecting cases of fraud by quickly analyzing patterns and connections between entities such as users, accounts or transactions, for example. Anomalous behavior is identified by detecting unusual patterns in the graph.
Modeling relationships between interactions on social media platforms, such as users, posts, comments and likes. This enables more efficient querying and analysis of social connections and more personalized social media features.

A chart comparing graph databases to relational databases. — From nodes and edges to their top use cases, graph databases differ from relational databases in several ways.

Future of graph databases

Graph database tools like Amazon Neptune and Neo4j are becoming increasingly available. In addition, they offer benefits such as enabling more flexible data modeling, efficient querying, real-time insights and simplified data integration. As such, it's becoming easier for developers to add graph analytics to existing applications while still maintaining data integrity, scalability and consistency.

Graph databases are expected to play an increased role in areas such as machine learning, Bayesian analysis, data science and artificial intelligence, as well as help to manage enterprise data and data interchange over the next decade.

Another potential impact on this type of database is improvements in data federation. When knowledge graphs can be easily federated, one database will be able to determine that it needs data it doesn't have and automatically retrieve that data from other knowledge graphs. With this ability, it's possible that federation can help developers create blockchains that use relevant metadata to authenticate transactions in banking, finance, voting and smart contracts.

Graph databases excel at creating data and analytics innovations. Learn more about Gartner's predictions for the future of graph databases.

Continue Reading About graph database

Graph database vs. relational database: Key differences

NoSQL database types explained: Graph

Top use cases for graph databases

The top graph database advantages for enterprises

How graph technology is making a dent in the database market

Search Networking

What is Point-to-Point Protocol over Ethernet (PPPoE)?
Point-to-Point Protocol over Ethernet (PPPoE) is a network protocol that facilitates communication between network endpoints.
What is geo-blocking?
Geo-blocking is blocking online content based on its location.
What is Synchronous Optical Network (SONET)?
Synchronous Optical Network (SONET) is a North American standardized digital communication protocol for synchronous data ...

Search Security

What is governance, risk and compliance (GRC)?
Governance, risk and compliance (GRC) refers to an organization's strategy, or framework, for handling the interdependencies of ...
What is integrated risk management (IRM)?
Integrated risk management (IRM) is a set of proactive, businesswide practices that contribute to an organization's security, ...
What is COMSEC (communications security)?
Communications security (COMSEC) is the prevention of unauthorized access to telecommunications traffic or to any written ...

Search CIO

What is conduct risk?
Conduct risk is the potential for a company's actions or behavior to harm its customers, stakeholders or broader market integrity.
What are the COSO frameworks?
The COSO frameworks are documents that provide guidance on establishing internal controls and enterprise risk management (ERM) ...
What is the three lines model and what is its purpose?
The three lines model is a risk management approach to help organizations identify and manage risks effectively by creating three...

Search HRSoftware

What is a talent pool?
A talent pool is a database of job candidates who have the potential to meet an organization's immediate and long-term needs.
What is a 360 review?
A 360 review, or 360-degree review, is a continuous performance management strategy aimed at helping employees at all levels ...
What is a talent pipeline?
A talent pipeline is a pool of candidates who are ready to fill a position.

Search Customer Experience

What is direct marketing?
Direct marketing is a type of advertising campaign that seeks to elicit an action (such as an order, a visit to a store or ...
What is mobile CRM?
Mobile CRM, or mobile customer relationship management, enables those working in the field or remote employees to use mobile ...
What is field service management (FSM)?
Field service management (FSM) is a system of managing off-site workers and the resources they require to do their jobs ...

Close