michelangelus - Fotolia
Apache Rya matures open source triple store database
Open source triple store database technology used by the U.S. Navy moves forward as a stable, mature project at the Apache Software Foundation.
The open source Apache Rya database effort is continuing to move forward as it reaches a new level of project maturity and acceptance.
Rya (pronounced "ree-uh") is an RDF (resource description framework) triple store database. The project started at the U.S. government's Laboratory for Telecommunication Sciences with an initial research paper published in 2012.
The project joined the Apache Software Foundation (ASF) in 2015 as an incubated project, and in September 2019 achieved what is known as Top-Level Project status. The Top-Level status is an indication and validation of the project's maturity, code quality and community. The ASF is home to Hadoop, Spark and other widely used database and data management programs.
Among Rya's users is the U.S. Navy, which is using the open source technology in a number of efforts, including one for autonomous drones. The project got started because there was a need for a scalable RDF triple store and no existing system that met all requirements, said Adina Crainiceanu, vice president of Apache Rya and associate professor of computer science at the U.S. Naval Academy.
Apache Rya triple store
The name Triple Store refers to the structure of the database table. The three items (the "triple") are subject, predicate and object. The triple store works with RDF, which is a W3C (World Wide Web Consortium) standard that was originally for creating metadata for pages on the internet that can be used to express how one page connects to another. RDF is now used in a variety of ways because of its simple format.
An example of the triple store format can be found in the statement "the sky Is blue," in which "sky" is the subject, "is" acts as the predicate and "blue" is the object.
Adina CrainiceanuVice president of Apache Rya and associate professor of computer science at the U.S. Naval Academy
"You can think of each element of the triple store as columns, and then each triple is a row and the entire database is one table," Crainiceanu explained. "You can express a large variety of data in this format and I think the simplicity of the format is the appeal of it."
The SPARQL query language is used for querying RDF. According to Crainiceanu, the complexity in implementing a triple store comes in how to make SPARQL queries fast, because you have so much data and the format is simple. That's where Rya fits in, trying to make SPARQL queries fast for triple store databases. From an architecture standpoint, Rya can be deployed on top of Apache Accumulo, which is a distributed key/value store and can optionally also use MongoDB as a data store.
Using Apache Rya
One Apache Rya user is Modus Operandi, based in Melbourne, Fla. The data management vendor, which serves U.S. defense and intelligence agencies, was one of the early adopters of Rya.
Modus Operandi has deployed Rya using both the Apache Accumulo and MongoDB backends, for storing triples to express extracted concepts, entities and relationships for the Modus Operandi Analyst Workbench product, said Kim Ziehlke, the company's principal systems engineer. Ziehlke said Modus Operandi employs what she referred to as "living intelligence" that is highly linked, enriched and collaboratively sourced data that is up to date, relevant, rated for trustworthiness and presented in a form that enables orchestrated human and machine analysis.
"Rya is scalable and provides a great option for use of semantic triple storage needs," Ziehlke said. "Overall performance has been great for load and query speed."
Bringing Rya to the Apache Software Foundation has been a positive experience for Crainiceanu, she said, though she acknowledged that building a community is hard as it takes time to bring in developers who want to contribute and help grow an open source project. That said, she emphasized that mentors in the incubation experience at Apache helped the project, providing needed advice to help grow the community and mature the project.
Apache Rya's priority now is continuing to build the community as well as the technology. Crainiceanu said that a key focus for future releases of Rya is standardization. One such effort is with Rya's implementation of the SPARQL standard, which has gone through multiple iterations in recent years. The goal for a future Rya update is to make sure it conforms to the latest SPARQL standard.