Sergey Nivens - Fotolia
Researchers around the world are racing to find ways to combat the coronavirus pandemic.
Among their many challenges is the need to connect and correlate disparate data sets to make connections and derive insights. Graph data science is playing a key role in this work, enabling researchers to create knowledge graphs of information about different research and even infection data about the coronavirus.
Graph database vendor Neo4j, based in San Mateo, Calif., is helping researchers with its Graph4Good project, which uses Neo4j's namesake graph database for research about COVID-19, the disease caused by the coronavirus.
On April 8, Neo4j took its efforts a step further with the release into general availability of its Neo4j for Graph Data Science system, which combines the Graph Data Science Library, Neo4j Bloom for visualization, the core Neo4j graph database, plus premium support.
Graph Data Science is a layer on top of the Neo4j graph database that brings together machine learning algorithms and data science to predict and analyze relationships. It's a capability that is of particular interest to Alexander Jarasch, head of data and knowledge management at the German Center for Diabetes Research (called the Deutsches Zentrum für Diabetesforschung, or DZD) in Neuherberg, Germany.
Building the COVID-19 knowledge graph
The DZD studies various diseases, with a focus on diabetes. People with diabetes have a higher risk of infection and death from COVID-19, Jarasch noted.
"Why we are doing graphs is because at some points, diseases are connected to each other," he said. "This is one of the most critical reasons why I'm using Neo4j."
Alexander JaraschHead of data and knowledge management, German Center for Diabetes Research
While some diseases are connected to each other, not all data is connected. Much science research data in Germany and around the world is isolated, Jarasch noted.
"So we have a highly connected space that is currently not connected at all," he said.
DZD is using Neo4j and graph database technology to connect multiple sources of research that otherwise would remain disconnected.
"For the coronavirus we have a data set of more than 40,000 publications and nobody's able to read all of them," Jarasch explained. "So we made a knowledge graph out of it so that people can automatically analyze and learn something from the research literature."
Graph data science and COVID-19
The DZD has started to use the Neo4j Graph Data Science Library for some of its raw patient data for diabetes research and is now set to use it for COVID-19 research as well.
"This [Graph Data Science Library] is the next step now, since we have the database ready and we have the connections that we wanted to have," Jarasch said. "So the next step is now to run some algorithms on the graph to find some interesting information."
Jarasch described the Graph Data Science capability as a "hypothesis generator" to see if certain connections are possible. He said he is hopeful that researchers can discover some connections that haven't been seen before in the dataset. A key challenge is to build connections that are relevant. As with any type of data science, the quality of the data is critical.
"I am not using the term big data; rather, I'm using the word smart data," he said. "So it doesn't have to necessarily be a big data set, as long as you have data sources that are relevant and then we're able to run some intelligent algorithms to find a new hypothesis."
How Neo4j Graph Data Science works
The Graph Data Science Library uses the data that is already being stored in the Neo4j graph database, to make inferences and derive insights, said Alicia Frame, lead product manager for data science at Neo4j.
The core Neo4j database is useful for transactional read and writes and enables users to do graph-based queries, Frame said. Users come to the database with a question and know how to write the query that will help to get the right answer. The Graph Data Science Library is a workspace to take the transactional graph, project it into memory and use optimized data structures that support the execution of graph algorithms.
"So you can think of it like machine learning tools that can use one the data that you've got stored in Neo4j," Frame said.
Frame said she sees a number of key areas where the Graph Data Science Library can be useful in efforts to combat COVID-19. One of them is contact tracing and identifying all the connection point an infected person has had, although contact tracing in a densely populated place like New York City can be limited because people have so many contacts.
In particular, resource allocation and forecasting for health care providers in New York is a critical need, and Graph Data Science can help, Frame said. An algorithm can be used to help determine when a particular health care provider is likely to be overwhelmed and then recommend an appropriate alternative, she said.
"I can use a similarity algorithm to say, 'Oh, I know this provider is likely to be overwhelmed and I need to find the [best] provider for them based on the global topology to recommend as an alternative to their patient community,'" Frame said.