GP - Fotolia
Geographic information systems pioneer Esri faced a storage decision when developing a new cloud-based service to collect and analyze streams of data drawn from the IoT sensors of its customers.
Esri engineers built the ArcGIS Analytics for IoT service on containers, using popular open source tools such as Kubernetes, Kafka, Elasticsearch and Spark. The team also could have used open source technologies for the data storage but soon found that commercial software from container storage specialist Portworx made more sense.
Adam Mollenkopf, Esri's real-time and big data capability lead, estimated the team saved six months of development time by using Portworx Enterprise to provide replication, resilience, high availability, data encryption and disaster recovery for the stateful container workloads that span multiple cloud availability zones.
Portworx designed the Enterprise product explicitly to provide persistent storage for container-based applications. The startup's software-defined storage integrates with all major Kubernetes distributions and orchestration systems, including the Microsoft Azure Kubernetes Service (AKS) that Esri uses with its new IoT service.
Portable across clouds
Portability was an important consideration for Esri engineers. Mollenkopf said that, while the IoT service currently runs on Azure, the team has also verified it on AWS and plans to enable it on other public and private clouds in the future.
Esri's new software as a service (SaaS) -- which is due to go live by the end of the first quarter -- collects data from IoT sensors and writes it to Kafka stream-processing software. The application then maps the data into a Spark-based platform that Esri specialized for geospatial analytics to enable customers to receive alerts and detect patterns in real time as events of interest happen. Esri's SaaS application stores information in an Elasticsearch database so customers can go back later and replay scenarios on a map.
One beta tester collects data from sensors along the side of various roads to track the number of cars that pass by. Other potential use cases include municipal snow-plowing and sanitation operations, mobile workforce tracking and autonomous vehicles.
Adam MollenkopfReal-time and big data capability lead, Esri
Esri uses the Microsoft cloud's AKS to provision the containers and Azure Premium SSD Managed Disks to provide the persistent storage. The Portworx software runs as an agent on each node in the Kubernetes cluster to intercept storage requests from the containers. Portworx Enterprise virtualizes and manages the back-end storage and ensures the data replication across cloud availability zones.
"We needed a consistent, unified way to deal with resilience on stateful applications," Mollenkopf said.
Open source storage alternative
The alternative that Esri considered would have been a "daunting task," according to Mollenkopf. He said engineers could have enabled replication and DR on a piecemeal basis with Elasticsearch, Kafka and any other open source tools they use. The team also would have had to track the steady stream of updates and new versions to the open source software.
Mollenkopf said that without Portworx, Esri might have had to offer lower service levels or reduced guarantees against data loss -- a combination that would not sit well with the customers that Esri plans to target with the new ArcGIS Analytics for IoT service, including federal and local governments and agencies, telecom providers, and oil and gas companies.
Prior to the new Analytics for IoT, Esri offered a more traditional on-premises application that required customers to put together the necessary server and storage infrastructure to ensure reliability and high availability. Mollenkopf said many companies found that too hard or did not understand how to do it. But customers will now have the option to let Esri spin up a new Kubernetes cluster and take care of the resilience as part of the quality of service, he said.
"We're basically licensing X number of nodes of Portworx, and then we cover the cost of that through the fees that we charge for our SaaS," Mollenkopf said.
IoT use case
The Esri service is designed to handle whatever velocities and volumes of data that the IoT sensors pull in. Mollenkopf said that Kubernetes will automatically scale the containers and nodes and Portworx Autopilot can dynamically resize the persistent volumes and add disks, without having to restart the containers. Esri uses open-source Stork software to orchestrate Kubernetes cluster and data backups to Azure Blob object storage on a periodic basis.
The IoT use case is becoming a popular one for Portworx. The startup's co-founder and CTO, Gou Rao, estimated that 25% of the company's 200 customers use Portworx Enterprise in connection with a container-based IoT application or service.
Portworx was the only storage player that Mollenkopf could find dedicated to container storage when the team did its analysis three years ago. He said Esri did initial ROI calculations but soon stopped after realizing Portworx was "such a no-brainer" in solving their problems with stateful workloads.
"Running stateful applications on Kubernetes remains a challenge today," Mollenkopf said. "So enabling Portworx on our Kubernetes clusters helps to ease the burden."