For many organizations, generating data from IoT sensors is easy. The hard part is doing something with that data quickly enough to drive real-time business processes. This is why finding the right software solutions for ingesting, processing and acting upon data is essential for nearly every successful IoT use case.
Consider the requirements for creating a luggage management system for an airline: It starts with collecting streaming sensor data on the current status of all flights, but that is only a small part of the story. The system also requires access to data on the expected arrival and departure times of all flights, the amount of luggage on each flight, the availability of gates at all relevant airports, and the available staffing and equipment at each location.
The luggage management system must also be updated on changes to flight times, ground crew staffing, equipment and the weather. Using this information, the system can incorporate any flight status changes into its overall model and reallocate ground resources to prevent takeoff delays and minimize the wait time for arriving customers to retrieve their luggage.
In addition, access to scheduling and reservation data is also required to plan for future requirements and communicate with customers about the status of their luggage, should that be necessary.
Creating this type of IoT application that combines sensor data with data from other sources is challenging for most organizations. IoT systems that must collect and process data in order to drive real-time business processes can’t require repeated, time-consuming API calls to the source datastores, such as ERP and CRM applications. This is because these datastores might have limits on the number or type of API calls that can be made.
Instead, the data must be efficiently cached where the IoT platform can process the data without any unnecessary delays. Furthermore, IoT systems that must handle terabytes or even petabytes of data must be able to scale to those levels without degraded performance.
As developers look to create such high performance, massively scalable applications, they should consider solutions that will enable them to implement the following strategies for ingesting, processing and acting upon data to drive real-time business processes at scale.
Digital integration hub: A flexible data aggregation for IoT
One approach to solving these challenges is what Gartner calls a digital integration hub (DIH) architecture. A DIH architecture creates a common data access layer for aggregating and processing data from multiple on-premises and cloud-based sources and streaming data feeds.
By caching the required data in memory, a DIH architecture enables multiple applications to access a single view of the ingested data and process the data at in-memory speeds. When the DIH architecture has an in-memory data grid at its core, the system automatically synchronizes any data changes made by the applications back to the datastores. This architecture can slash the number of calls to siloed datastores, eliminating the delay caused by having to wait for data to be retrieved from data sources. The in-memory data grid may also enable processing to take place without movement of the data over the network between DIH server nodes, which further optimizes performance.
In the case of the airline luggage management system example above, the streaming IoT sensor data is just another source system for the DIH and can feed directly into the cache of the digital integration hub. Once in the DIH, the streaming data can be combined with the other relevant data, such as staffing, flight schedules, gate availability and weather conditions, and the aggregated data can be updated in real-time as new data arrives. With all the ingested data stored in the memory cache of the DIH, processing can be performed in real-time.
Modern digital integration hub architectures are being implemented using an in-memory computing platform. These platforms pool the available RAM and compute of a server cluster and maintain data in the RAM to eliminate the delays caused by accessing data stored in disk-based databases. Using the MapReduce programming model, the in-memory computing platform distributes processing across the cluster for massively parallel processing (MPP) and minimizes or eliminates movement of the data in the data grid across the network prior to processing. An in-memory computing platform also supports large-scale IoT use cases because the compute cluster, which is the caching layer of the DIH architecture, can be easily scaled by adding nodes to the cluster.
Generally, whether utilized as a standalone in-memory database or as an in-memory data grid inserted between an existing application and its data layer, in-memory computing platforms can improve application performance up to 1,000 times compared to disk-based solutions.
Achieving performance and scale for IoT
Another critical area for developers to consider as they create their IoT application environments is performance and scale. This approach is applicable for self-contained IoT applications in which all sensor data flows into a single datastore.
For decades, organizations have been forced to rely on a bifurcated data infrastructure. Separating online analytical processing (OLAP) systems from online transactional processing (OLTP) systems ensures analytics won’t impact operational systems. However, this requires a time-consuming extract, transform and load (ETL) process to periodically copy data from the OLTP system to the OLAP system. Periodic ETL can become an obstacle to implementing real-time IoT use cases.
To solve this challenge, organizations are implementing hybrid transactional/analytical processing (HTAP). HTAP has the ability to run pre-defined analytics directly on the operational data without impacting system performance. HTAP also has the long-term cost benefit of reducing or eliminating the need for a separate OLAP system.
HTAP can be used to ensure performance and scale for IoT use cases built on a single datastore that demands instant analysis, such as at-home patient monitoring or navigation systems. It is particularly useful for IoT applications that incorporate continuous learning.
For example, to detect emerging fraud strategies, a mobile payments system requires a continuous learning capability based on examining new potential fraud vectors and updating a machine learning model with new data in real-time. Gartner refers to machine learning-powered HTAP as in-process HTAP.
In-memory computing platforms have proven to be an ideal foundation for enabling HTAP. By running real-time analytics on the operational data in RAM with massively parallel processing, an in-memory computing platform can deliver the performance at scale required for in-process HTAP.
Real-time IoT use cases can’t be implemented if application developers don’t create an infrastructure that supports real-time data aggregation and analysis at scale. DIH architectures and HTAP, powered by an in-memory computing platform, offer developers a path to solving this challenge.
In addition, IoT use cases built on a digital integration hub architecture or HTAP foundation can be developed using commodity servers and open source software solutions such as Apache Ignite for the in-memory computing platform, Apache Kafka for streaming, Apache Spark for analytics and Kubernetes for containers. These solutions enable developers to build, test, go into production and scale their IoT applications, knowing they can achieve the performance they require without a prohibitive upfront cost.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.