IoT data collection: When time is of the essence
Data collection is an essential part of IoT deployments, but it must also be useful and accurate to operations. Using nanoseconds for events ensures no anomaly is undetected.
A nanosecond is an extraordinarily small division of time. There are as many nanoseconds in one second as there are seconds in 31.7 years. Just let that sink in.
In technology, nanosecond precision is valuable for monitoring specific, high-speed and low-latency processes, such as memory access times and gaps between network packets. Even in these cases, values are typically expressed in tens or hundreds of nanoseconds and are often normalized to milliseconds or seconds for regular use.
Within IoT, very few measurements truly need nanosecond precision for daily use. The most precise analog-to-digital converters measure signals around 1 million times per second, and most off-the-shelf sensors work at much lower sampling rates, such as 16 kHz, or 16,000 signals per second.
A nanosecond isn't even on the radar at those precisions, but the extra information is incredibly useful for IoT use cases and can help teams detect any operational anomalies.
If nanoseconds aren't always used, why do the best IoT and time series data offerings support nanosecond precision? Knowing the differences between discrete signals and events can help IT teams figure out what data they're missing without the use of precision time protocol -- and why such granular data collection is important.
What are discrete signals?
Discrete signals are sequences of digital information that represent some continuous information in the physical realm. For example, a smart power meter attached to the building's energy management system measures continuous analog signals, such as volts, Hz and amps and outputs the measurement values at very precise time intervals and very high rates.
IoT regularly uses discrete signals. Every temperature, pressure, humidity, location data point in an IoT application is a sample from a specific point in time. Quite often, and usually behind the scenes, nanosecond precision is lost through summarization and aggregation compression techniques.
Developers and engineers design these techniques to minimize the number of samples but still maintain an accurate and useful representation of the original signal. One example of this is the swinging door algorithm. This downsampling technique buffers samples and outputs sequences of samples that represent an optimized -- but visually and mathematically similar -- trend line of the original information.
In time series data, events are the "so what" of discrete signals. In the electrical meter example, a voltage loss or current drop is considered an event. If one views the highest-resolution trend of these discrete signals, the trendline might show visible anomaly, such as a spike or drop.
The X axis in this example shows that the waveform is measured in a t0 timeline, and for our purposes, we can think of the major divisions as nanoseconds. Between 200 and 300 nanoseconds from t0, there is a change in the waveform -- an event which we would want to flag for future investigation. This is where everything comes together.
t0+200ns has no application outside of this specific chart. It is only when we substitute t0 with an actual timestamp that the anomaly is truly put in context. For example, if we know that t0 represents Saturday October 26, 1985, at 01:35:18.667 a.m. GMT, we can calculate the exact time of the anomaly and label the event with this GMT timestamp as 49913851866700200. This very large, very precise number is the number of nanoseconds between epoch and our event.
We then must check the precise instant of our anomaly, and to understand more about the anomaly, we must contextualize other events and discrete signals. To get this information, we may query what the waveform looked like before the anomaly, or what a separate discrete signal from the same meter or connected equipment would look like. The work of gathering all of this contextual information and sequencing it into a series of events reiterates why nanosecond precision matters, especially in scientific, engineering and IoT use cases.
A specific sequence of events, as abstracted from the discrete signals, is key to effective operational analysis. Whether teams conduct forensics and post-mortems, or predict and test hypothetical scenarios, knowing event order is critical. Teams should know what exactly happened first and what happened last. If two events happened at the same time, was it precisely at the same time, or did one occur slightly before or after another?
If timing precision is blurred by snapping to the second, or aggregating to the minute, the ability could be lost to confidently understanding event cause and effect -- and there are industries where nanosecond precision matters, where things happen at the speed of light.
If multiple systems generate the data that requires sequencing, it is a big challenge to precisely synchronize time between these systems down to the millisecond. Fortunately, sequencing events derived from discrete signals generated on separate computers has been developed for financial and telecommunications systems and is now accurate to sub-milliseconds.
Precision time protocol is a good start to address timing challenges, and we're hopeful that the evolution of this or similar technologies will eventually make nanosecond precision synchronization the standard.
It is impossible to get back the timestamp precision once it's lost within data pipelines. It is tough for teams to know if they'll need this info in the future, so if IoT processes and pipeline can handle it, include those extra digits.
Brian Gilmore is the director of IoT and emerging technology at InfluxData, the creators of InfluxDB. He has focused the last decade of his career on working with organizations around the world to drive the unification of industrial and enterprise IoT with machine learning, cloud and other transformational technology trends.