Do you grab just a Crescent adjustable wrench or two, or do you need to haul around an entire set of combination or open-end wrenches in both metric and imperial sizes?
More topically in the world of IoT, what is the best tool for the job when it comes to data management, a general-purpose tool or one highly tuned to addressing a specific domain of needs? In IoT, there are streams of transactions, streams of sensor data, data about the device or meta data and context about the streams and environment of the IoT or smart device. The challenge with this data is in the streams which potentially represent extreme velocity and volume — though typically not variety — in an often communication-constrained or power-constrained environment.
It is a whole lot easier to grab your handy Crescent Wrench when you are hiking 10 minutes in the heat of the day to your dock to fix your ladder since you cannot remember what size bolts it has. You have an adaptable solution in your pocket. It may not fit in the tight spots and it may cause some extra wear on the bolt, but it gets the job done. Though, in this use case around water, you might not want to rough up the bolt. You might need the best tool for the use case which in this case means a well-fitting static wrench.
In the world of streaming data in IoT, the goal is to maximize the amount of information, not just data, available about the device within the capabilities of the system. To do this, you need to understand the “physics” of the underlying system to know the difference between data and information. Let’s consider the following example.
Data is not always information
Let’s start with a set of numbers including 98.2, 98.7, 99.2, 98.9 and 98.6. These values were established at t1 through t5 respectively. Which of these numbers represent information?
From the point of view of an operations engineer, if this data was from a thermocouple, you could argue that if the times are equally spaced, then only the first, third and fifth numbers convey information. There is information in the second and fourth numbers, not about the values, but the fact of measurement. If that fact is otherwise captured because there is no indication of measurement failure, then only three measurements are needed to capture the available information. There is even more we can do to trim the data without losing information, including leveraging generic lossless compression after all knowledge of the underlying physics is exploited.
The validity of this relies on some important assumptions tied to the physics of the situation. For example, the sample frequency or interval between tn and tn+1 should be well below the time constant of the underlying process. Second, it assumes that the thermocouple is both precise for this scale and accurate to represent that actual temperature. Assuming both are valid, then any calculations or statistics derived from any value taken from any time on this plot are valid and appropriate for any statistical or operational application.
Let’s look at another series of data: 98.2, 98.7, 99.2, 98.9, 98.6. These values were established at t1 through t5 respectively. I know what you are probably thinking here — is this a cut and paste error? No, this is fundamentally different information, because in this case the values are automated daily deposits to a savings account based on a percentage of your balance. Each piece of data in this series conveys important information and throwing out any values would be throwing out real money. Furthermore, these are discreet measurements, not part of a continuous system. The “physics” of a thermocouple and a bank deposit are completely different. To someone without the context, they look exactly the same.
For decades, data management systems, referred to as historians, have been optimized to leverage the knowledge of continuous signals that typically come from sensors designed to optimize the storage and subsequent use of information in operations. Properly tuned, a historian can capture and serve all relevant information for critical operations and IoT with a fraction of the data and resources that might otherwise be needed. This allows for a more efficient, timely and extensive picture of the physical operations. With communication constraints and data volumes in the zettabytes, we must remember that we don’t live in a world of infinite resources. For financial transactions, use a highly redundant, atomicity, consistency, isolation and durability transactional database, where every piece of data is information.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.