Getty Images

Neo4j adds vector search to improve generative AI outputs

The graph database vendor aims to improve semantic search and generative AI applications by enabling customers to better access and use unstructured data such as text and images.

Neo4j added vector search and vector storage to its core database capabilities to help customers get better results from semantic searches and generative AI applications.

In addition, the vendor aims to reduce AI "hallucinations" -- inaccurate and misleading responses generated by AI -- by adding vector search and storage as core capabilities.

Vector search is a way to search unstructured data, such as text and images, by assigning it a numerical representation to give it structure. Once assigned a numerical value, the unstructured data can be used in semantic searches so users can find similar data using approximate nearest neighbor algorithms and eventually model that similar data to inform decisions.

Keyword searches likewise attempt to discover similar data. Vector searches, however, provide faster and more relevant results, according to Neo4j.

In addition, customers can improve the accuracy of generative AI models and semantic searches by using vectors to index previously unstructured data. Language models and semantic searches tend to favor recent data, and users can apply indexed data that might otherwise be ignored by generative AI models and semantic searches to improve their accuracy.

Based in San Mateo, Calif., Neo4j is a graph database vendor whose platform enables customers to access and use data in different ways that traditional relational databases.

Graph databases simplify connections between data points, enabling them to connect with more than one other data point at a time to more quickly discover and combine data from multiple sources and speed the process of turning data into insights and actions. Relational databases enable data points to connect to just one other data point at a time.

In addition to Neo4j, TigerGraph is a graph database specialist, while tech giants including AWS and Oracle are among others that also offer graph databases.

New capabilities

In June, Neo4j unveiled an integration with Google's Vertex AI that enables users to improve their knowledge graphs with generative AI.

Through the integration, Neo4j customers can now use natural language to interact with knowledge graphs rather than code, use Vertex AI to transform unstructured data into knowledge graphs, enrich existing knowledge graphs with generative AI and validate responses from large language models (LLMs) to ensure hallucinations don't result in decisions based on bad data.

Building on the generative AI capabilities added through its integration with Vertex AI -- as well as through ongoing relationships with OpenAI, Microsoft and AWS -- Neo4j on August 22 added vector search to its core database capabilities.

Vector search isn't, on its own, a generative AI capability. But it improves the accuracy of both generative AI models and semantic searches.

For that reason, vector search is a significant addition to Neo4j's core capabilities, according to Doug Henschen, an analyst at Constellation Research.

"There's clearly a broad sense -- among database vendors and customers alike -- that vector search capabilities should be a feature within the databases that customers are already using to manage their data," he said. "This feature will give Neo4j customers an opportunity to inform and improve the accuracy of semantic search and generative AI capabilities."

Neo4j is not alone, however, in adding vector search, Henschen continued.

Given that vector search improves the accuracy of semantic searches and generative AI applications -- and that LLMs, such as ChatGPT and Google Bard, sometimes hallucinate and are subject to security risks -- many database vendors are making vector search a core capability.

Henschen noted that Alibaba, AWS, Cassandra, Cockroach Labs, DataStax and Dremio are among those that have already added vector search to their database capabilities and that more vendors have vector search capabilities in development.

"In announcing this feature, Neo4j is joining a rapidly expanding group of database and data platform companies that have recently made, or are about to make, vector search-related announcements," he said.

One of the key decisions Neo4j had to make was whether to make vector search and storage core capabilities of its existing database or develop a new database specializing in vector search and storage, according to Sudhir Hasbe, the vendor's chief product officer.

After recognizing the growing interest in generative AI, Neo4j consulted about a dozen of its major customers and canvassed them for input on how to go about incorporating generative AI.

Customers told the vendor they wanted to use natural language to ask questions of their knowledge graphs, Hasbe said. They wanted Neo4j to be agnostic in terms of integrating with generative AI vendors. They wanted their generative AI models to have the long-term memories provided by vector storage rather than be trained using only recent data.

And they wanted it all in one place.

"We had to ask whether a vector database should be a different category or whether it should be a feature of an existing database," Hasbe said. "Based on feedback, it made sense to make vector search a feature of our database wherein you can take explicit relationships and implicit similarities and combine them for a single use case. Keeping different environments didn't seem like the right solution."

A screenshot of a Neo4j graph database.
An organization's datasets are displayed in a graph database from Neo4j.

Future plans

Following its recent integration with Vertex AI and the addition of vector search and storage, Neo4j plans to continue adding generative AI functionality, according to Hasbe.

Vector search and storage were built on top of the integration with Vertex AI, advancing the capabilities added through the integration by adding similarity searches and vector storage to the unstructured data conversion capabilities provided by Google.

One of its main objectives is to remain platform agnostic by working as closely with OpenAI's generative AI systems, Azure OpenAI and AWS Bedrock as it does with Vertex AI, Hasbe noted.

There's clearly a broad sense -- among database vendors and customers alike -- that vector search capabilities should be a feature within the databases that customers are already using to manage their data. This feature will give Neo4j customers an opportunity to inform and improve the accuracy of semantic search and generative AI capabilities.
Doug HenschenAnalyst, Constellation Research

"We want to make the capabilities available everywhere," he said.

Another objective may be to develop a text-to-Cypher tool, Hasbe continued.

Cypher is Neo4j's graph query language. Just as vendors including Monte Carlo and Dremio have built text-to-SQL tools that convert natural language to SQL code, Neo4j is considering developing a tool that converts natural language to Cypher code.

Beyond generative AI, Neo4j is focused on integrating with all the major cloud data platforms, according to Hasbe.

"All of our customers are moving to the cloud, so making sure that all of our offerings are on all cloud platforms and completely integrated with all the cloud providers as well as the leading data platforms is a big priority," he said.

Henschen, meanwhile, noted that multi-cloud focus is important for Neo4j.

Specifically, he said he'd like Neo4j to add to the multi-cloud and cross-cloud capabilities of AuraDB, the vendor's fully managed graph database service.

In addition, he noted that Neo4j would be wise to continue adding functionality to Graph Data Science -- the vendor's machine learning tool -- to enable more use of generative AI.

"Support for vector search is a start," he said. "But there's more to do to take advantage of generative AI to support and democratize development, optimization, analytics and more."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Next Steps

Vector search and storage key to AWS' database strategy

Dig Deeper on Data management strategies

Business Analytics
Content Management