As applications consume and create new data as part of normal operations, there are also many times when they may need to store data that would otherwise expire in a cache or other memory layer for later use. Traditionally, this wasn't a monumental issue for enterprises that ran monolithic applications linked to static, relational databases.
However, as large web companies discovered that their relational databases could no longer support huge data volumes, companies turned to Hadoop, HBase and other NoSQL database resources for petabyte-scale data processing. Meanwhile, the increased use of microservices for dynamic web applications has made it possible for applications to compartmentalize data storage into independently manageable modules written in multiple different formats.
The concept of polyglot persistence revolves around this coexistence of multiple data storage types within a single application system. However, the benefits of introducing polyglot persistence within a microservices architecture comes with its equal share of challenges -- both of which we'll cover in this article.
The benefits of polyglot persistence for microservices
Because microservices are distributed by default, the supporting architecture should handle data flow between disparate parts of the system using well-defined rules that resolve data conflicts and inconsistencies as they arise.
Relational databases are great for applications that handle highly structured data like financial balances, employee directories, health records and insurance information. Business-grade relational database management systems like Oracle and MySQL are compliant with industry regulations regarding the atomicity, consistency, isolation and durability of data transactions, and aim to ensure accurate data syncing for critical transaction information that needs guaranteed validity.
However, as organizations transform and modernize software-based operations through complex design techniques like microservice-based architecture, it becomes critical to find a means of equipping software teams with an array of varied storage options. A relational database can deal only with structured data; if the data is unstructured or data requirements aren't clear -- as can happen in complex microservice environments -- a database that is strictly relational may meet the limits of its capabilities.
For most modern cloud-native and mobile-first applications, it's best to try to instill polyglot persistence, which can allow teams to meet the data requirements of individual applications and underlying services. There are many reasons to adopt a polyglot database approach, including the flexibility to choose a different database for a different need, such as subsecond latency or high-volume data analytics.
Polyglot persistence also commoditizes data management. Rather than scale up with additional expensive servers, modern databases scale out with cheap hardware and make up for it with an intelligent and powerful data processing layer. These databases stay on standby in anticipation of failures, ensuring that data remains available regardless of any underlying issues related to hardware or data instances.
Challenges of polyglot persistence
Polyglot persistence can be either a blessing or a curse for businesses with large-scale data management needs. For all the benefits, it comes with its own set of downsides in terms of consistency, operational cost and data management complexity.
For one, maintaining data consistency across multiple storage locations and formats requires extremely proactive storage management. Otherwise, it can result in data silos where individual teams are constantly stuck working with isolated collections of potentially outdated data. As data is constantly updated, make sure individual modules of data consistently stay in sync, and consider modeling approaches such as eventual consistency.
The heightened level of attention required by a distributed data storage approach also risks placing a heavy burden on database administrators -- not to mention the extra financial cost imposed by multiple database licensing and maintenance costs. When dealing with disparate storage technologies, teams require expertise in each technology. Further, these databases will require maintenance tasks such as backups, replicas and clusters, requiring a greater focus on continuous training.
Database options that support polyglot persistence
There are many strong options to consider when it comes to NoSQL databases that support polyglot persistence for microservices. While it's important for a team to carefully evaluate their own specific needs, here are some examples of the range of options available today:
Redis, Aerospike. A key-value database works best for large, and static, quantities of data. It's great for archival storage of things like log data, and it scales easily.
MongoDB. A popular NoSQL database that also offers a lot of options for document storage. MongoDB is a good option for data that changes frequently and doesn't require a strict schema, such as categorical product listings or descriptions on an e-commerce site.
Elasticsearch. A full-text search engine database, often used to power search boxes on e-commerce sites. Elasticsearch also excels at analyzing log data that is primarily text-based.
InfluxDB. A time-series database built to ingest streaming data in real time and render for easy data analysis, often used for stock tickers and other real-time monitoring applications.
Neo4j. A graph database that is highly capable of analyzing connections between various data points. It is often used for fraud detection to identify suspicious activity across large samples of financial data.
MariaDB. A columnar database that focuses on speedy read and write times, along with efficient data storage. MariaDB is a community-driven project rooted in MySQL -- an advantage to those already running MySQL databases and place a priority on compatibility.