The internet of things presents enterprises with unprecedented business opportunities, but also major application performance challenges. The amount of data generated by large-scale IoT applications can dwarf traditional application architectures. As a result, IoT platforms may be based on cloud platforms and are increasingly deployed in hybrid and multi-cloud environments. Still, the need for real-time response means data must be ingested and analyzed at massive scale as it arrives.
Consider a company manufacturing perishable food products that relies on a fleet of thousands of refrigerated delivery trucks for distribution. The company wants to optimize delivery routes, monitor the conditions inside the trucks to ensure product quality, and collect truck performance data to enable predictive maintenance. The IoT platform must continually ingest, process and analyze the data from all the various sensors and other data streams in real time.
For such an application to be successful, the company must build a high-performance data infrastructure. The most practical, cost-effective approach to doing this is with today’s innovative open source technologies for in-memory computing (IMC), stream processing and continuous machine learning.
As they do for other industries, open source offerings provide IoT platform and application developers the following key benefits:
- Innovation — The rapid evolution of IoT plus a highly competitive business environment require enterprise systems that can quickly incorporate the latest innovations. The large and active communities supporting the top open source projects ensure rapid innovation.
- Cost — Creating an IoT platform requires unavoidable expenditures on software development, storage and compute. Open source software systems provide enterprises with at least one development area where they can adopt proven, reliable software and rely on commodity hardware, avoiding the high upfront cost of proprietary technologies.
- Enterprise-grade support — Today, many open source systems are available as fully supported enterprise versions that enable companies to control development costs while still receiving the support they need for deploying in production environments.
Today, the top open source technologies that developers can use to achieve the performance and scalability they need for their IoT initiatives include the following:
To ensure application performance, the Apache Ignite in-memory data grid (IMDG) can be inserted between the application and data layers of new or existing applications without major changes to either. The Ignite IMDG distributes in-memory caching and compute across a cluster of commodity servers deployed on-premises, in private or public clouds, or in a hybrid environment. The recently released GridGain Community Edition is an open source, hardened version of Ignite that is ideal for production environments.
The available memory and compute power of the Ignite cluster is available for massively parallel processing and in-memory data storage. The cluster can be scaled out simply by adding nodes with automatic data rebalancing between nodes. The performance of this architecture can eliminate the need for separate transactional (OLTP) and analytical (OLAP) databases. Separate OLTP and OLAP databases require an extract, transform and load process to copy the data to the analytics database, which introduces unacceptable delays for many use cases.
For greenfield applications, Ignite can be used as an in-memory database. The Ignite Persistent Store feature provides backup and recovery capabilities and allows companies to trade off infrastructure costs and application performance. With the Persistent Store feature, the active data set can be larger than the available RAM. The entire operational data set is kept on disk, while only a user-defined subset of data is maintained in RAM. This infrastructure can be built using an underlying storage layer which can use spinning disks, solid-state drives, Flash, 3D XPoint or other storage-class memory technologies.
Many IoT use cases, such as self-driving cars and smart city traffic systems, also benefit from a continuous learning capability that enables a machine learning model to be automatically and continually updated without human intervention. Apache Ignite features integrated, distributed machine learning libraries that have been optimized for massively parallel processing. This enables each machine learning algorithm to run locally against the operational data residing on the nodes of the IMC cluster, which allows for the continuous updating of machine learning models without impacting system performance, even at petabyte scale.
IoT use cases produce a stream of data from multiple sources. Apache Kafka is an open source system for publishing and subscribing to streams of records, storing the streams of records in a durable way, and processing streams of records as they occur. Kafka can be used as a real-time streaming data pipeline that reliably moves data across all the systems and applications comprising the IoT platform. Like Apache Ignite, Kafka runs on a distributed compute cluster that can span multiple data centers. Apache Ignite users can utilize the native Kafka integration between the two products to ingest data streams from Kafka for processing and analysis. Users of GridGain, an enterprise-grade system based on Apache Ignite, can use the Confluent-certified GridGain Kafka connector, which offers more functionality than the native connector found in Ignite. These integrations make it easy to incorporate Kafka stream processing into an in-memory computing architecture, which provides high-performance data processing and analysis.
Kubernetes is an open source system for automating the deployment, scaling and management of applications that have been containerized in Docker or other container technologies. Kubernetes ensures consistency across a server cluster that has been deployed in any location — on-premises, public or private cloud, or hybrid environment. Open APIs enable Kubernetes to manage Ignite and Kafka resources and automatically scale an IoT in-memory computing-based cluster. For IoT applications, this increased ease of management can dramatically reduce complexity and errors and save development time, enabling IT resources to keep their focus on more strategic activities.
Enterprises that want to ensure successful IoT initiatives and derive the maximum ROI from them should include open source technologies when designing their application architectures. Proven, reliable open source systems can lower development, deployment and maintenance costs, while also supporting increased innovation. And the open APIs on which these technologies are based make it easier for enterprises to integrate them with their existing systems, another important potential area of cost savings compared to proprietary systems. Specifically, the combination of Apache Ignite, Apache Kafka and Kubernetes enables enterprises to cost-effectively build and deploy a high-performance, massively scalable data platform that can support demanding IoT applications.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.