Sergey Nivens - Fotolia
The typical image that springs to mind when you hear the word database is probably a table where each row is a separate record and the column headings are the field names -- a spreadsheet, basically. Once the data is too complicated to fit into a single table, we move on to relational databases -- multiple tables linked by connected fields.
Setting up a relational database requires a person who understands data structures. And if new information is added, or new relationships become important, the database administrator will need to change the structure of the database and, most likely, update the user interface as well.
So what do you do if you have a data set where you can't map out the relationships ahead of time? Where instead of being connected by a single data point, people can be connected by things you can't predict in advance?
Maybe two people are on the same baseball team or like the same types of books or live in the same city. Adding each of those items as a separate field and creating new relationships for them can be an extremely time-consuming, never-ending task for a database administrator.
One solution is a graph database.
In a graph database, any data point can be connected to any other data point, and the connections can be established at any time by business users without the need for database administrators to rewrite the entire schema.
Graph databases are designed to be scalable, making them well suited to today's big data applications. And they're fast, allowing users to quickly move along a chain of connections, allowing businesses to find insights faster and more efficiently.
"In a regular relational database, if I want to add something or change a relationship, I need to do a lot of planning," said Karen Panetta, IEEE fellow and dean of graduate education, school of engineering, at Tufts University. "A graph database allows you to add new relationships as you go along."
Here are the top use cases for graph databases.
Fraud and anomalies
Fraud detection is one of the most powerful use cases for graph databases right now, Panetta said.
Traditional approaches to fraud detection rely on simple checklists. A transaction is suspicious if it's over a certain amount or involves entities on government watchlists, for example. This simplistic approach can miss more subtle fraud attempts, but a database designed to spot unusual connections between transactions might be able to pick them up.
For example, there might be many e-commerce purchases from different accounts -- but all from the same IP address or cluster of IP addresses. Or there may be several cash withdrawals for the same amount of money from the same neighborhood, followed by cash deposits the same day to different accounts.
Each of these individual transactions might not raise any red flags, but a cluster of related transactions would be cause for concern.
"A big money laundering scheme might use one person's name, another person's Social Security number and a third person's address," Panetta said. "How do you pick that up? Well, the structure of the graph allows you to pop these things up as anomalies. It allows us to explore relationships that don't make sense."
And it's not just shopping or bank fraud that can be detected this way. In cybersecurity, companies defending themselves from hackers look for clusters of events that are connected in unusual ways.
Cybersecurity vendor Brinqa, for example, switched to the Neo4j graph database system when the relational databases the company used were reaching the limits of their flexibility.
"Our platform was dynamic, but it wasn't dynamic enough to handle all types of situations," said Syed Abdur Rahman, director of products at Brinqa. "With graph databases, you can define the schema on the fly. You can define the nodes and relationships. You don't have to define it up ahead. You can do it as you're bringing in the data."
Unusual connections can be a positive thing too. Today's advanced recommendation engines suggest music, books, movies, clothes and other products and services based on connections to other transactions. They can look beyond simple, direct connections.
Yes, people who buy dog food may also be likely to buy dog collars. But maybe they're also interested in comfortable walking shoes or couch slipcovers.
Recommendation engines are starting to show up in a lot of different places, not just in streaming apps and e-commerce sites.
"Graph databases are better than relational for 90% of emerging enterprise projects," said Paul Taylor, Fabric's founder and CEO.
Replacing a traditional relational database with a graph database can also reduce the need for middleware, he said. "Graphs are powerful foundations."
But that doesn't mean they work for every use case, he said.
"Graphs are less suited for heavy-on-write applications where data needs to be queried only a few times throughout its lifecycle," he said.
Privacy regulations like Europe's GDPR and the California Consumer Privacy Act require companies to be able to bring together all the personal data they've collected on individuals and delete it on request. Since companies often store this information in different data silos, this can be a challenging task.
But it's not just compliance requirements that are causing companies to want to link disparate data sets together. Wearable device companies, IoT vendors, healthcare companies and financial firms all have a need for this technology, said Justin Richie, data science director at Nerdery, a digital services consultancy.
"What graph databases are used for most is real-time data synchronization," he said.
Graph databases are still in their infancy, but more applications are going to come out, Tufts University's Panetta said.
Future use cases for graph databases will include advancing AI to the next level, she predicted.
"The way we've been doing AI with data right now is with old-fashioned relational databases," she said. But AI is based on relationships. "Graph databases will help build better AI systems."