Companies in nearly every industry have turned to digital transformation to unleash massive amounts of data and drive real-time business processes, such as 360-degree views of the business and its customers. A fundamental technical challenge for these businesses has been aggregating data from siloed data sources — including live operational data — in a way that allows the data to be accessed in real-time without any degradation of application or data source performance.
IoT use cases based on the real-time analysis of large amounts of sensor data add another layer of complexity because the flood of data must be aggregated in real-time with other data sources while also ensuring data ingestion performance issues don’t slow down the entire system. Consider a predictive maintenance application for a fleet of airplanes, trucks or even industrial washing machines. Real-time sensor data collected from the remote machines — such as location, heat, speed and fluid levels — must be aggregated with the data from past behavior of the individual devices, and other data sources such as weather conditions, maintenance records, account and contract information, response requirements, replacement availability and more.
Distributed in-memory computing platforms have now become a standard and cost-effective approach to enabling enterprises to solve the data aggregation challenge with the performance and scalability they need to drive real-time business processes.
Distributed in-memory computing platforms
A distributed in-memory computing platform deployed on a cluster of servers pools the available RAM and CPUs of the cluster to create a high-performance cache. The platform can be deployed on-premises, in a public or private cloud or on a hybrid environment and can function as an in-memory data grid (IMDG). An IMDG can be deployed between one or many existing applications and one or many data sources. A defined subset of data from the datastores is maintained in the in-memory cache. By distributing the data and compute to the individual IMDG cluster nodes and using massively parallel processing (MPP), organizations can achieve real-time data access even with terabytes of in-memory data.
Distributed in-memory computing platforms can support a variety of APIs including ANSI-99 SQL, key-value, SQL, JAVA, C++, .NET, JDBC/ODBC, REST, PHP, MapReduce, Scala, Groovy and Node.js. Developers can also write custom APIs to access the IMDG cache. A synchronization layer, or change data capture layer, between the data sources and the IMDG ensures that the data in the in-memory cache is constantly updated as changes are made to the underlying datastores.
How a digital integration hub helps overcome data silo limitations
In-memory computing platforms deployed as described above are a key component of a digital integration hub, also known as an API platform, smart data hub or smart operational datastore. With a digital integration hub, the IMDG of the in-memory computing platform aggregates a subset of data from multiple source systems at the same time, including relational and NoSQL databases, data warehouses, data lakes, SaaS applications and streaming data from IoT sensors or other streaming sources. This data may reside in public or private clouds, on-premises datacenters or mainframes. The aggregated data in the digital integration hub can then be accessed by any number of business applications at in-memory speeds.
A digital integration hub can be cost-effectively scaled to meet the demands of IoT applications that rely on large amounts of data collected from IoT endpoints because the in-memory computing platform makes it easy to scale by adding additional nodes to the cluster that the IMDG automatically recognizes,. The hub can also allow new applications to access the aggregated data and overcome the challenges that many developers face related to API access to source data systems. These source datastore APIs may have limited functionality, and the API calls may be expensive, which results in high costs, limited ability to access the data in real-time and difficulty scaling the solution. The digital integration hub allows developers to reduce the number and type of API calls to the source system by providing the widest range of unlimited API access to the data in the hub.
How HTAP delivers real-time performance
Most datacenters today still rely on separate online transactional processing (OLTP) and online analytical processing (OLAP) systems to ensure analytics won’t affect operational systems. However, this approach requires a time-consuming extract, transform and load process to periodically copy data from the OLTP system to the OLAP system. This time delay is a clear obstacle to driving real-time business processes based on data in the OLAP system, which may house critical historical data but is continually out of sync with the OLTP system.
An in-memory computing platform can address this challenge by providing the speed and scale necessary for hybrid transactional/analytical processing (HTAP) or hybrid operational/analytical processing (HOAP), which is the ability to perform real-time analytics on the operational dataset without impacting the performance of the operational datastore. By running real-time analytics on the operational data in RAM with MPP, the in-memory computing platform — whether deployed as an IMDG or an in-memory database — can deliver the performance at scale required for HTAP. The performance at scale provides both real-time transactional and analytical processing using the operational datastore. HTAP also has the long-term cost benefit of reducing or eliminating the need for a separate OLAP system.
The ability of an in-memory computing platform to power a digital integration hub enables architects and developers to achieve the performance and scale required for their most complex, data-intensive IoT applications.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.