3 ways to prep microservices applications for big data

It's easy for the expansion of big data to spin out of control in a distributed architecture. Think carefully about database management to avoid performance-killing bottlenecks.

Twain Taylor, Twain Taylor Consulting

Published: 16 Oct 2018

With today's data-driven applications, having a solid data management strategy can mean the difference between a development project's success and failure. With microservices applications, there are numerous services that consume data that are distributed across multiple locations and differ from one another. That's why it's essential to prepare microservices applications to better handle big data.

Industry experts typically classify big data using the 3Vs model: volume, variety and velocity. This categorization holds up whether your business handles enterprise applications with large data sets or whether it's a consumer-facing startup with tens of thousands of users for your social app. The microservices model intersects between big data management and your application's architecture by influencing how front-end applications interact with back-end data services.

Unlike we see with monolithic applications, a microservices architecture creates numerous granular services, each of which requires access to data. These services can cause bottlenecks in the network as the number of requests peak. A single request can touch many databases and network ports before it responds.

Use decentralized data stores

In monolithic applications, data is stored centrally in a single location and a single database -- usually, a relational database. This solitary position makes data management easy because all of your data resides in one place and you know precisely where to look if something goes wrong. Additionally, if you run out of storage, you can simply buy more storage. However, this strategy is an expensive option in the long run; big data grows exponentially, and that creates additional complexities in data management.

The microservices model intersects between big data management and your application's architecture by influencing how front-end applications interact with back-end data services.

Whether it's business-critical or unimportant, rarely used metadata, a monolithic application treats all of your data the same way because of its centralized location. Follow a decentralized model for big data storage with microservices. In this approach, multiple databases store your data according to its importance and type. Frequently, businesses use different types of databases for specific purposes.

For critical data that needs 100% consistency -- think financial transactions, usernames and passwords, and other high-value data -- you need a relational database. However, a relational database isn't the best option for data that changes every second. For transient data or data predominantly in text format, a NoSQL document store database, like MongoDB, is a better choice. The same holds true for data that's required for machine learning algorithms where accuracy isn't as important as quick throughput. Use an in-memory data store to cache frequently used data.

Consider NoSQL and key-value databases

For example, when you use a flight booking application, it's uncommon for users to frequently access customer profile and login data during a session. This type of data is a good candidate for a relational database. On the contrary, frequently accessed data about flight fares and schedules are central to the user experience because this data changes in real time. It's bad practice to store these differing data types in the same database.

Therefore, a relational database would ultimately be a bad choice for a flight search application because of the real-time data involved. Instead, use a NoSQL database that is dynamic, faster and less costly -- even though it doesn't enforce immediate data consistency. Finally, data about a user's search history or booking history -- which can be used to target ads -- isn't critical data, so use a database that can crunch large values of data in aggregate without causing latency in other databases. A simple key-value database, like Redis, is a good choice for this.

Remember the one-database-per-service model

A microservices application enables you to provision one database per service. This capability gives each service full access to the data it needs, and you can manage it so that it delivers peak performance, unhampered by other services and databases. Of course, every service and every database cannot follow this practice. Some databases must be shared by multiple services, and many services will access data from more than one database. Follow the decentralized data model to give each service adequate on-demand access to databases.

When you work with big data, you ideally want options when you choose a database. Large volumes of varyingly complex big data hinder the one-size-fits-all approach to big data management. A microservices architecture gives you the flexibility to build a big data platform that's unique to your application. Your business can decide if it's worth spending more money to focus on critical data that enables important features, like a flight search tool in a flight booking app, and less money on data that's for informational purposes, like user activities. In a monolithic architecture, it's difficult to separate databases and to have the application interact with different types of databases simultaneously. With microservices applications, this is more than just possible -- it's the norm.

3 ways to prep microservices applications for big data

It's easy for the expansion of big data to spin out of control in a distributed architecture. Think carefully about database management to avoid performance-killing bottlenecks.

Use decentralized data stores

Consider NoSQL and key-value databases

Remember the one-database-per-service model

Dig Deeper on Application management tools and practices

What is an entity relationship diagram (ERD)?

PUT vs POST: What's the difference?

11 real benefits of microservices

Dremio raises $160M for cloud data lake platform technology