Browse Definitions :
Definition

graph database

A graph database, also referred to as a semantic database, is a software application designed to store, query and modify network graphs. A network graph is a visual construct that consists of nodes and edges. Each node represents an entity (such as a person) and each edge represents a connection or relationship between two nodes. 

Graph databases have been around in some variation for along time. For example, a family tree is a very simple graph database. 

The concept of using databases to map relationships digitally started seeing popular usage in business around 2015 when increased compute power, in-memory computing, and agreed-upon standards moved the concept from academics to real-world uses in business and enterprise computing.

Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media. Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an IP telephony issue and creating "customers who bought this also looked at..." recommendation engines.

The concept behind graphing a database is often credited to 18th-century mathematician Leonhard Euler.

The structure of a graph database

Traditionally classified as a type of NoSQL database, graph databases are sometimes referred to as triple stores. That's because this type of database uses a special index that stores information about nodes, edges and the relationship between them in groups of three.

A triple, which may also be referred to as an assertion, has three main fields: a subject, a predicate and an object. Each subject, predicate or object is represented by a unique resource identifier (URI).

How information is indexed

In a triple store, the first field in the database holds the URI for the subject, the second field holds the URI for the predicate and the third field holds a URI for the object.  While there are a number of different strategies that graph databases may use for storing triples, most use an index that abbreviates the three primary fields to {?s, ?p, ?o}. 

For example, if the visual construct for a graph is given as follows:

Nodes and edges

Then the index will look like this:

 Row

?s

?p

?o

1

:Bob

:marriedTo

:Julie

2

:Bob

:brotherOf

:Steve

3

:Bob

:listensTo

:RockMusic

4

:Julie

:listensTo

:RockMusic

5

:Julie

:sisterInLawTo

:Steve

6

:Jim

:worksFor

:IBM

How information in a graph database is queried

Each triple in a graph database only gets stored once in the index. Just like relational databases, it's a simple process to do a straight lookup query in a graph database.

  • If the query is for what information is known about Bob, the indexer programming only needs to search rows 1-3 of the database.

The real power and speed of a graph database comes from indexing combinations of triples.  Here's are a few examples:

  • If the query is for who Bob is married to, the indexer will look for the predicate :marriedTo in rows 1-3 and then retrieve the matching object.  (Bob is married to Julie.) 
  • If the query is to identify everyone who listens to the same kind of music as Bob, the indexer will first ask { :Bob :listensTo ?o } and identify :RockMusic as the object. 

In the second query, the results will return :RockMusic in rows 3 and 4.  The subject in row 3 is Bob himself, so whoever is the subject in row 4 will be the other person who listens to rock music. (It turns out to be Julie, Bob's wife.) 

Types of graph databases

Historically, graph databases have been divided into two categories -- property graphs that simply support nodes and edges, and knowledge graphs like the one above that can focus on the semantic aspects of data and store information in triples. Generally speaking,  indexing strategies for both types are similar.

It is expected that over time, knowledge graphs and property graphs will merge and the architectural distinctions between these two types of graph databases will fade away.

Use cases for graph databases

Current use cases for graph databases include the following:

  • Allow data analysts to federate data sets without having to create and run complex queries that join combinations of tables together, as in the relational database model.
  • Help developers create the back end for voice assistants by mapping possible user questions to correct answers. 
  • Identify clusters of events that are connected in unusual ways to detect fraud.
  • Examine direct connections to identify potential indirect connections for recommendation engines. 

Future of graph databases

Graphs databases are expected to play a major role in areas as diverse as machine learning, Bayesian analysis, data science and artificial intelligence, as well as helping to manage enterprise data and data interchange, over the next decade.

One of the most significant impacts on this type of database will be improvements in data federation. When knowledge graphs can be easily federated, one database will be able to determine that it needs data it doesn’t have and automatically retrieve that data from other knowledge graph. With this ability, it is likely that federation will help developers create blockchains that use relevant metadata to authenticate transactions in banking, finance, voting and smart contracts.

See also:  social graph, graph search

This was last updated in March 2020

Next Steps

Learn how a distributed graph database works

Continue Reading About graph database

SearchNetworking
  • Wi-Fi 6E

    Wi-Fi 6E is one variant of the 802.11ax standard.

  • microsegmentation

    Microsegmentation is a security technique that splits a network into definable zones and uses policies to dictate how data and ...

  • network packet

    A network packet is a basic unit of data that's grouped together and transferred over a computer network, typically a ...

SearchSecurity
  • MICR (magnetic ink character recognition)

    MICR (magnetic ink character recognition) is a technology invented in the 1950s that's used to verify the legitimacy or ...

  • What is cybersecurity?

    Cybersecurity is the protection of internet-connected systems such as hardware, software and data from cyberthreats.

  • Android System WebView

    Android System WebView is a system component for the Android operating system (OS) that allows Android apps to display web ...

SearchCIO
  • privacy compliance

    Privacy compliance is a company's accordance with established personal information protection guidelines, specifications or ...

  • contingent workforce

    A contingent workforce is a labor pool whose members are hired by an organization on an on-demand basis.

  • product development (new product development -- NPD)

    Product development, also called new product management, is a series of steps that includes the conceptualization, design, ...

SearchHRSoftware
  • talent acquisition

    Talent acquisition is the strategic process employers use to analyze their long-term talent needs in the context of business ...

  • employee retention

    Employee retention is the organizational goal of keeping productive and talented workers and reducing turnover by fostering a ...

  • hybrid work model

    A hybrid work model is a workforce structure that includes employees who work remotely and those who work on site, in a company's...

SearchCustomerExperience
  • Salesforce Trailhead

    Salesforce Trailhead is a series of online tutorials that coach beginner and intermediate developers who need to learn how to ...

  • Salesforce

    Salesforce, Inc. is a cloud computing and social enterprise software-as-a-service (SaaS) provider based in San Francisco.

  • data clean room

    A data clean room is a technology service that helps content platforms keep first person user data private when interacting with ...

Close