kalafoto - Fotolia
By definition, a multi-model database provides multiple database models for different use cases and user needs. Among the popular options users have for a multi-model database is ArangoDB from the open source database vendor.
ArangoDB 3.6, released into general availability Jan. 8, brings a series of new updates to the multi-model database platform. Among the updates are improved performance capabilities for queries and overall database operations. Also, the new OneShard feature from the San Mateo, Calif.-based vendor is a way for organizations to create robust data resilience as well as use synchronization capabilities.
For Kaseware, based in Denver, ArangoDB has been a core element since the company was founded in 2016, enabling the law enforcement software vendor's case management system.
"I specifically sought out a multi-model database because for me, that simplified things," said Scott Baugher, the co-founder, president and CTO of Kaseware, and a former FBI special agent. "I had fewer technologies in my stack, which meant fewer things to keep updated and patched."
Kaseware uses ArangoDB as a document, key/value, and graph database. Baugher noted that the one other database the company uses is ElasticSearch, for its full-text search capabilities. Kaseware uses ElasticSearch because until fairly recently, ArangoDB did not offer full-text search capabilities, he said.
"If I were starting Kaseware over again now, I'd take a very hard look at eliminating ElasticSearch from our stack as well," Baugher said. "I say that not because ElasticSearch isn't a great product, but it would allow me to even further simplify my deployment stack."
Adding OneShard to ArangoDB 3.6
With OneShard, users will gain a new option for database distribution. OneShard is a feature for users for whom data is small enough to fit on a single node, but the requirement for fault tolerance still requires the database to replicate data across multiple nodes, said Joerg Schad, head of engineering and machine learning at ArangoDB.
Scott BaugherCo-founder, president and CTO of Kaseware
"ArangoDB will basically colocate all data on a single node and hence offer local performance and transactions as queries can be evaluated on a single node," Schad said. "It will still replicate the data synchronously to achieve fault tolerance."
Baugher said he'll be taking a close look at OneShard.
He noted that Kaseware now uses ArangoDB's "resilient single" database setup, which in his view is similar, but less robust.
"One main benefit of OneShard seems to be the synchronous replication of the data to the backup or failover databases versus the asynchronous replication used by the active failover configuration," Baugher said.
Baugher added that OneShard also allows database reads to happen from any database node. This contrasts with active failover, in that reads are limited to the currently active node only.
"So for read-heavy applications like ours, OneShard should not only offer performance benefits, but also let us make better use of our standby nodes by having them respond to read traffic," he said.
More performance gains in ArangoDB 3.6
The ArangoDB 3.6 multi-model database also provides users with faster query execution thanks to a new feature for subquery optimization. Schad explained that when writing queries, it is a typical pattern to build a complex based on multiple simple queries.
"With the improved subquery optimization, ArangoDB optimizes and processes such queries more efficiently by merging them into one which especially improves performance for larger data sizes up to a factor of 28x," he said.
The new database release also enables parallel execution of queries to further improve performance. Schad said that if a query requires data from multiple nodes, with ArangoDB 3.6 operations can be parallelized to be performed concurrently. The end results, according to Schad, are improvements of 30% to 40% for queries involving data across multiple nodes.
Looking forward to the next release of ArangoDB, scalability improvements will be at the top of the agenda, he said.
"For the upcoming 3.7 release, we are already working on improving the scalability even further for larger data sizes and larger clusters," Schad said.