Dreaming Andy - Fotolia
Some in the database software industry are skeptical of the proposition that graph databases are vital to AI application...
development. After all, nearly everything seems to have an AI angle these days.
But, for the Neo4j graph database, AI has proved a fertile field. The company counts Caterpillar and eBay among users that have made it part of their AI applications.
The connection between AI and graph databases is genuine, according to Adrian Bowles, analyst at Storm Insights. When you look at the inner architecture AI, he said, you find a clear need to understand interrelationships in data.
"In AI, the way you represent the knowledge that you are handling is very important," Bowles said. "A lot of that is about trying to understand how data fits in."
Enter systems such as Neo4j Inc.'s graph database, structured to help uncover the relationships between data points.
Graph data forensics
Bowles contrasted the graph data structure -- typically based on triplestores in which data nodes are connected via edges to other related nodes -- with more familiar relational databases, which, despite their name, have a roundabout way of handling relationships.
Consider the forensic work accomplished in TV mysteries, on which index cards, mug shots and the like are on corkboards and are connected with kite strings as graphs.
Adrian BowlesStorm Insights
"Nobody watches a detective show and expects to see them putting things in rows and columns as relational databases do," Bowles said. "That's not how people think."
The connection between graph databases and AI has been noted by others besides Neo4j, which faces considerable completion in a still-new field. For example:
- IBM Watson, which had a large role in bringing AI back to the fore after many years in near-hibernation, includes a knowledge graph as part of its overarching design for putting data in context.
- Lisp language pioneer Franz Inc. offers the AllegroGraph database for work in AI and expert knowledge use cases.
- Cambridge Semantics provides the parallelized AnzoGraph database for analytics and semantic data processing.
Machine learning models
Philip Rathle, vice president of products at Neo4j, cites machine learning, one of the most active subsets of AI, as an area that can benefit from the Neo4j graph database.
"If you look at machine learning use cases, what people do is take a bunch of attributes and then define them as inputs for training the machine learning model," he said. "These inputs are mostly individual, disconnected pieces of data."
An important focus of AI applications today -- and often an obstacle to successful implementations -- is correctly selecting attributes for model training. That process, Rathle said, is enhanced by use of graph data.
"The important thing is finding out how the data is connected, and graphs are good for that," he said.
Rathle spoke in a recent interview, after beta version 3.5 of the Neo4j graph database was released. The database includes full-text indexing for natural language processing jobs and new graph algorithms for random walk, DeepWalk and other unsupervised learning methods, he said.
Rathle said the Neo4j graph database has gained performance enhancements with native indexing for faster query sorting and a dedicated transaction memory subsystem for handling large data writes.
Meanwhile, the Neo4j graph database has proved a useful engine for an Adobe system that ties together members of its Behance creative community, according to a developer who has been involved in part of its evolution. Adobe has increasingly focused on schema and system maintainability.
Adobe teams tried a few NoSQL databases before settling on the Neo4j graph database to power personalized content feeds alerting Behance members to other users' activity, said David Fox, software engineer at Adobe. Such social media feeds arguably remain one of the prime areas of AI application development.
Fox said the graph database format is an efficient way to create map relationships between data points. Earlier, Adobe employed a MongoDB document database and then a Cassandra column-oriented key-value store database.
The MongoDB implementation required a considerable number of servers, and setting up schema led to problems with performance, Fox said.
"With Cassandra, data reads were fast, but maintainability and the data model became issues," he said.
"We found the database hard to maintain as the data grew." He said ongoing bug fixes tended to consume a fair amount of developer time, and adding features was difficult. Moving to a graph format led to considerably reduced data size.
Fox said Neo4j enabled IT teams to reduce the working data set from 50 TB to 40 GB, and a three-instance cluster of Neo4j is standing in where 48 Cassandra instances once served.
Fox said he hadn't yet reviewed the latest version of the Neo4j graph database but is interested in interface visualization improvements that could better enable line-of-business users to analyze Behance data.
The jumps Adobe had to make in designs for advanced systems are not unique among fast-moving companies today. Starts, stops and flexibility are bywords as data development teams wrestle with new approaches. And, although progress may be measured and moderate, graph databases appear more and more popular.