Getty Images

Tip

IoT data storage: Top technologies and challenges, explained

IoT data storage spans devices, edge facilities, data centers and cloud, using block, object and file technologies to manage high-volume, high-velocity data throughout its lifecycle.

When it comes to dealing with the data generated by IoT systems, it's not just a matter of slapping some extra hard drives in the data center or attaching a few more S3 buckets and being good to go. IoT data storage has specific needs, and IT professionals must carefully plan where and how they'll meet those needs.

What is IoT data storage?

IoT data storage is the systematic collection, organization and maintenance of data generated by connected devices across various storage systems, from edge facilities to central databases and data lakes. It ensures appropriate retention, security and accessibility throughout the data lifecycle.

The IoT data lifecycle mirrors the general transactional data lifecycle in some respects, with a strong focus on how data storage at various stages reflects its intended use. Data is stored one way during acquisition and another way for archival retention -- if it isn't going to be purged. Further, data is stored differently for immediate, high-intensity use and differently again for medium-term, less intense use.

IoT adds its own challenges because it's as much about where as how. At each stage of the broader IoT data lifecycle, from collection to deletion or archiving, IT must consider where data will be stored just as carefully as how it will be kept while there.

IoT ecosystem diagram
An IoT system collects data from sensors installed in IoT devices and transfers that data through an IoT gateway so it can be analyzed by an application or back-end system.

Types of IoT data storage

There are four types of IoT data storage: on a device, at an edge facility, in a data center or in the cloud.

Device

Because IoT systems revolve around connected devices, whether it's an implanted medical sensor or a shipping container, the first location where IoT data is stored is on the device itself. Storage might be limited to a single data point, such as a 4-bit temperature reading or a multi-megapixel image from a satellite-mounted camera. IT often has little control over this part of the storage process, although if the device has available capacity, the IoT platform might permit IT to manage data retention on it.

Edge

Many IoT systems are built to send data to either a controller or an aggregation unit located in an edge data center. At the edge, data can be preprocessed in various ways and then sent -- raw, condensed or otherwise modified -- onward to a cloud or data center for use. Edge data centers have limited space available but can store large volumes of data.

Data center

If IoT data is stored on-premises, it typically flows into a primary data center for analysis and use, both for short-term, high-intensity activities such as transaction processing or real-time process control and for medium-term, lower-intensity tasks like analyzing historical trends. Long-term archival storage of data might also be kept in the data center, but it's more commonly housed off-site, in another facility or in the cloud.

Cloud

A lot of IoT data flows from devices or edge aggregators directly into cloud platforms, whether under the control of the enterprise or under the control of an IoT service provider.

Technologies for IoT data storage

Any type of storage could be suitable, depending on the IoT system's needs regarding data velocity, volume and variety, as well as the phase in the data lifecycle when the storage is used.

Block

Block storage is the lowest-level access to storage space and is best where the highest-speed access to data is required, such as in real-time transaction processing or real-time device control. A relational database usually manages data in block storage, with access via direct storage device addressing.

Object

Object storage is higher in the data hierarchy, offering space to store files (objects), along with their metadata. It doesn't use directories or names; instead, objects are identified by a unique number. Although not as fast as block storage, object storage is very IoT-friendly, especially for handling unstructured data. Access is through an API using the object ID.

File

File storage has more structure than object storage. It enables hierarchical folder structures containing named files and other ways of organizing information. File storage also allows metadata to be associated with files, with access via APIs or system utilities, using file names or file IDs.

Any storage technology might be used in an edge or central data center, or in a cloud, though object storage tends to dominate in cloud platforms.

Technologies for IoT data management

Holding data is only one part of data storage; managing that data is another. IoT systems use all the major data management technologies available, including the following:

  • Time series databases. These excel at holding a series of data points generated by the same device at different times -- for example, a series of temperatures from a thermostat or a series of spectral fingerprints generated by a gas chromatograph. Each data point is structured identically and is distinguished from others by value and by timestamp or sequence number. These databases are ideal for use cases like trending analysis and anomaly detection.
  • Streaming databases. Like time-series databases, streaming databases are intended to capture a continuous flow of data, but there's no expectation that the data is structured. These are often used to capture streams of images, videos or sounds.
  • Relational databases. Most often used in the middle portion of an IoT lifecycle, where structured data derived from unstructured feeds is analyzed for non-real-time purposes, relational databases store highly structured data.
  • NoSQL databases. These are for structured, semistructured and unstructured data, and their growth can be unbounded -- that is, you can keep adding more data to them. However, NoSQL databases don't make the same kinds of promises that relational databases do regarding data operations -- the so-called "ACID constraints" of atomicity, consistency, isolation and durability.
  • Data lakes. Like NoSQL databases, data lakes are designed to hold structured, semistructured and unstructured data alike. However, data lakes aren't databases in the strict sense, nor are they engines for analysis. Rather, they serve as repositories where raw data can be collected in a way that makes it easy to access data from multiple sources in one location. Data is then analyzed and queried using other tools, with the data lake acting as a powerful front end for object storage.

Challenges and key considerations for IoT data storage

IoT data challenges are often the same fundamental challenges of any big data problem because so many IoT systems generate big data.

Volume

Having data storage in each part of the infrastructure -- from devices to the edge, data center and cloud -- that can manage the volume of data generated can be difficult. Space limitations at the device and edge levels make handling large data volumes challenging, often leading to the practice of storing as little as possible locally and relying on centralized storage. Even when capacity is sufficient, the expense of providing that capacity can become a secondary issue.

Velocity

IoT systems not only generate large amounts of data but can do so very quickly. For example, a well-instrumented airplane tire might produce gigabytes of data during a single flight, whereas an engine on the same flight could produce gigabytes every second. Therefore, storage systems must be capable of ingesting data at the same speed as devices supply it, which poses a challenge for both the storage systems and the networks connecting them to IoT devices and controllers. Additionally, they must support analytics using the stored data at whatever speed the use case demands, ranging from near-real-time analysis to immediate response from, for example, industrial robots.

Variety

IoT storage systems and data management systems need to handle a variety of data types being generated. At the edge, this is usually not a problem because each device typically communicates only with its own controllers or aggregators. In data centers and the cloud, however, data from multiple systems can be combined with non-IoT data for analysis.

Because it combines all the challenges of other enterprise data storage and management with some complications of its own, IoT data storage and management requires careful consideration each time a new IoT project is launched.

John Burke is CTO and a research analyst at Nemertes Research. Burke joined Nemertes in 2005 with nearly two decades of technology experience. He has worked at all levels of IT, including as an end-user support specialist, programmer, system administrator, database specialist, network administrator, network architect and systems architect.

Next Steps

How to plan a successful IoT deployment

What is IoT connectivity? Types explained

IoT and digital twins: How they work together, with examples

How to select the right IoT database architecture

IoT security challenges and how to overcome them

Dig Deeper on Enterprise internet of things