The emergence of globally scalable online services for social networks, streaming content, news distribution and retail significantly changed the requirements for application infrastructure and software architectures. One of the most substantial transformations came in the way systems store, organize and access data.
Legacy relational database management systems (RDBMSes), such as Oracle Database, are a poor match for web apps that require distributed, scale-out cluster infrastructure. NoSQL databases are a better fit for loosely coupled designs, in which application data and executable code are spread across multiple machines and data centers. With its origins rooted in the open source community and cloud-native development, IaaS providers have built various NoSQL database types to target different data and use cases.
Pros and cons
- ability to handle a variety of data types;
- higher performance and lower latency;
- ideal for unstructured data, such as text, images, audio and video;
- better fit for loosely coupled systems that scale horizontally;
- well suited for time series or other streaming data, such as event logs and IoT data;
- ability to handle availability of different forms of NoSQL systems and unstructured data models; and
- access to a wide variety of open source or low-cost implementations that are cheaper to procure and operate than a sophisticated RDBMS.
However, these benefits come at a cost. For example, RDBMS systems ensure more immediate consistency and reliability with the ACID model: atomicity, consistency, isolation and durability. NoSQL databases follow the BASE model: basic availability, soft state and eventual consistency. Also, these nonrelational databases lack built-in mechanisms to check data integrity; it must be done in external code. Lastly, there is typically no support for complex SQL operations, such as compound select statements or table joins.
NoSQL database categories
The right way to think about NoSQL isn't as a particular type of database but rather as a category with several variants as follows:
- Key-value store: Also known as a hash table, this storage paradigm organizes data as a sequence of records that are indexed by a key or hash value that points to one or more data objects or records. It is similar to a dictionary, in that each key can have a different number of values instead of a fixed length.
- In-memory cache: A type of key-value store designed to fit entirely within system RAM. This accelerates performance and potentially reduces cost by removing the need to scale an entire database just to handle a particular application feature or scenario.
- Document store: Although it is a subset of a key-value database, the values in a document store follow a predefined hierarchical structure that embeds metadata about the stored contents. Document stores are often encoded in text formats, such as XML, YAML or JSON, or binary variants, such as a Microsoft Office file or PDF.
- Search database: A specialized document store in which the document indices can be sharded and distributed across multiple nodes to provide massive scalability to accelerate the retrieval of particular entries.
- Column-based store: This store organizes data by columns rather than rows. Columns are grouped into families of related data that is accessed together.
- Graph database: This type of database does away with the common row-column structure in favor of a collection of items and their relationships to each other.
NoSQL database comparison
As cloud infrastructure became a popular option for deploying web applications, AWS, Microsoft and Google Cloud built NoSQL services and offerings to better suit different data types and use cases. While particular details of each product will vary, the cloud vendors' offerings for each type of NoSQL database are provided in the following table:
As the NoSQL database comparison table indicates, there are several popular open source and commercial offerings in each NoSQL database type. Each third-party option has particular features and strengths not necessarily present in the cloud alternative. For example, MongoDB can be configured so replicated data is immediately consistent for reads rather than for eventually consistency.
However, the overarching difference is the deployment model: privately managed -- on-premises or colocation infrastructure -- versus a cloud service. The choice hinges on whether an organization prefers self-managed, highly configurable and controlled software or a managed cloud service that removes upfront capital expense and ongoing infrastructure management overhead.