The pros and cons of pull and push models for processing IoT data
Architects of industrial IoT systems face a crucial decision when choosing which model of data flow to use in their design. Their first option, the “push” model, can be simpler to build, reducing the amount of time to release an initial system on which to build enterprise IoT applications. The alternative, a “pull” model, adopts a more centralized approach to data management, reducing the amount of logic required in each application making use of the IoT data.
Advantages of push
In the “push” model, incoming data is streamed to all users and integrated systems. Each application and downstream system receiving the data has its own rules around what data is valid or not and how to clean the incoming flow for use. For simpler systems whose core purpose is around real-time alerts, the “push” model can be the right choice. Each application can look for a specific type of data and trigger actions based on set limits. Rules can be added to each application to transform or ignore data from certain sensors if problems are found, none of which will change the data flowing to any other application.
The power of pull
In the “pull” model, downstream systems using incoming IoT data must ask for what they want, and pull it from a single source: the system of record. Rules about data cleaning and transformation are operationalized at the system level rather than within each application. This system of record provides a single source for everything that happens in the system — including what events have resulted in flagging data as clean or dirty. Rather than stream all incoming data to all users and applications, each consumer only receives the data they specifically request.
For IoT systems whose purpose is to go beyond simple alerts and become learning systems, where operations are not just monitored but also optimized and mined for additional revenue opportunities, data quality is critical. For these systems, the “pull” model for data sharing makes it easier to keep dirty data from reaching downstream applications and jeopardizing data quality across your enterprise environment.
Clean, trusted data from a central source is important for systems with integrated analytics and machine learning tools. In highly regulated industries like food and healthcare, certain data is required to be collected and reported to government agencies. Especially in IoT, hardware, firmware and software bugs will cause errors in incoming data. In the “push” model, once you push dirty data to all users, you’ve lost operational control of it. You can’t retract it and you can’t clean it. If you have a chain of custody you may have virtual breadcrumbs that allow you to find and clean data once you realize it’s dirty, but you’d have to find and clean it everywhere it went. That’s difficult because a “push” model also encourages you to save copies of the data across your system. How do you clean data that has been distributed “shotgun-style” across multiple data stores? It’s not easy. Finding and cleaning dirty data across your IoT system will be virtually impossible. Fixing a problem in one data store doesn’t fix the same problem in another.
For example, a company operating a fleet of refrigerated tractor-trailers may need to track cargo temperature on a continual basis along its journey. A unified system of record ensures a definitive source of truth for answering these regulatory questions. With a “pull” model, a fleet manager can report where the truck has stopped, what time the doors were opened and at what time the temperature in the cargo hold went out of range of the regulatory standard. When reports show unexpected results, or a device malfunctions, the manager can also track down when a sensor failed, what data from it was used by downstream systems (throwing off calculations), and then clean this data retroactively. In a system based on a “push” model, alerts will be sent when specific triggers are activated (i.e., temperature too high), though the chain of events leading to the situation may be difficult to determine and any false alarms will be problematic to explain to regulators with access to the pushed, dirty data.
A design to match your goals
When evaluating which model of IoT data flow — “pull” or “push” — is more appropriate for your production system, it is most important to match the pros and cons of each to your overall goals. A focus on real-time alerts and independent applications without aspirations for machine learning can be brought online more quickly through a simple “push” design. For systems expected to generate increasing value over time and enable deeper insights into the enterprise, are integrated with back-end CRM, ERP and other critical systems, or operate in highly regulated environments, a “pull” model is likely to provide the most flexible, compliant and long-lived system.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.