How to select the right IoT database architecture
To choose the best database architecture for an organization's IoT initiative, IoT technologists must understand the basics of all the database options.
Organizations have many options to choose from when designing an IoT database, but technologists must decide the best fit by evaluating the different IoT database architectures, such as static vs. streaming and SQL vs. NoSQL.
The right IoT database depends on the requirements of each IoT project. The first step to select a database is to factor in critical characteristics of IoT when selecting among database architectures. IoT technologists must determine the types of data to be stored and managed; the data flow; the functional requirements for analytics, management and security; and the performance and business requirements.
After identifying the organization's requirements for a database, IT admins must assess the IoT database architectures and how they will promote or inhibit IoT data needs.
Understand static and streaming IoT database architectures
Start by understanding the fundamental distinction between static and streaming databases. Static databases, also known as batch databases, manage data at rest. Data that users need to access resides as stored data managed by a database management system (DBMS). Users make queries and receive responses from the DBMS, which typically, but not always, uses SQL. A streaming database handles data in motion. Data constantly streams through the database, with a continuous series of posed queries, typically in a language specific to the streaming database. The streaming database's output may ultimately be stored elsewhere, such as in the cloud, and accessed via standard query mechanisms.
Streaming databases are typically distributed to handle the scale and load requirements of vast volumes of data. Currently, there are a range of commercial, proprietary and open source streaming databases, including Google Cloud Dataflow, Microsoft StreamInsight, Azure Stream Analytics, IBM InfoSphere Streams and Amazon Kinesis. Open source systems are largely based around Apache and include Apache Spark Streaming provided by Databricks, Apache Flink provided by Data Artisans, Apache Kafka provided by Confluent and Apache Storm, which is owned by Twitter. Organizations mainly use streaming databases for real-time decision-making and to meet near-instantaneous latency requirements.
However, organizations can still benefit from standard query techniques and schemas, which is why many streaming databases also include a static database component. These unified databases combine the best of both worlds of streaming and static databases, because they support both the real-time capabilities of a streaming database and the flexibility of a static database's query process and schema. The best database for most IoT applications is a unified database that combines both streaming and static capabilities. Most popular vendors' databases include both types of databases for this reason.
Explore more nuanced database architectures
Time series databases are, in many respects, based on the same technology as streaming databases, but both were developed with a slightly different focus. Time series databases are more tactical. They typically involve implanting specific indexing techniques over NoSQL databases with the goal of enabling high-performance event processing. Streaming databases are more comprehensive, enabling a broader portfolio of data analyses, such as machine learning or windowing.
SQL vs. NoSQL?
SQL databases are relational and feature static schemas that describe how the information is organized. This makes them highly manageable. However, they run into issues scaling effectively. NoSQL databases are nonrelational, don't have schemas, and are generally promoted as highly scalable and better performing than SQL databases.
Some tech professionals might think that a NoSQL database would be the obvious choice because scalability is essential for many IoT uses. But scalability and performance are only two factors that technologists need to consider when selecting databases. A critical factor in many scenarios is ease of integration into existing systems, where SQL is more effective. Many IoT tools and systems assume SQL. This is particularly true in industrial environments that are based on older message protocols or industrial automation platforms.
The ability to create and manage schemas is also a plus. Although technologists might find schema development to be constraining, information must be organized. Putting in the effort to develop schemas up front saves significant effort later to organize data in a non-schema environment.
Organizations may also find combining static and streaming databases challenging when including the choice between SQL and NoSQL. In theory, a static or streaming database could be either SQL or NoSQL. In practice, databases are specifically set to one or the other. IoT technologists interested in a particular unified database may find their SQL vs. NoSQL decision driven by the design of the database.
Whether an organization should choose a SQL or NoSQL database depends on the broader set of functional and technical requirements, particularly scalability, performance and the need to integrate into legacy systems.