This is the second part of a four-part series. Start with the first post here.
When considering how to architect for scale with edge computing solutions, it’s important to talk about both hardware and software in a system-level context. As a rule of thumb, needs on both fronts get more and more complex the closer you get to the device edge.
I created the chart below to help visualize this dynamic. The blue line represents hardware complexity and the green line indicates software complexity. The X-axis represents the continuum from the cloud down through various edge categories, ultimately ending at the device edge in the physical world.
Hardware gets custom faster than software as you approach the device edge
There are a few key trends in this continuum that impact architecture and design decisions for IoT and edge computing. From the hardware lens, as you get into the remote field edges you need to consider elevated thermal support to run 24/7 in sealed networking cabinets, as well as potentially telco-specific equipment certifications.
Following the blue line further left from a traditional datacenter, note how hardware complexity starts to grow even faster than software. As you approach IoT and edge gateway-class compute, you begin to see needs for very specific I/O and connectivity protocols, many choices spanning Linux and Windows — what I call OS Soup — increasing ruggedization, specific shapes and form factors, and industry-specific features and certifications, such as Class 1, Division 2 for explosion proof.
The sharp ramp in complexity at the embedded and control edge
There’s a key inflection point for complexity at the embedded or control edge when hardware gets so constrained that software needs to be embedded, losing the flexibility of virtualization and containerization. Alternatively, the software requires a real-time operating system to address deterministic response needs, such as within programmable logic controllers on a factory floor and electronic control units in a vehicle. I call this inflection point the thin compute edge, and from there down to the device edge, the complexity curve ramps sharply up until you’re basically building custom hardware for every connected product.
Software consistency can be extended to the thin compute edge
Meanwhile, the software complexity curve — represented as the green line in Figure 1 — stays flatter a little longer, remaining consistent with established IT standards from the cloud down through telco edge and on-premises data centers until the first significant bump occurs with the aforementioned OS soup. The curve continues to stay relatively flat until you hit resource-constrained devices at the thin compute edge.
This inflection point is driven by total available memory — not CPU processing capability — and these days it’s generally about 512MB, which is enough to accommodate an OS and a minimum set of containerized applications to serve a meaningful purpose. The flexibility afforded by virtualization and containerization to maintain software-defined flexibility from the cloud to all the thin compute edges out there comes with a tax on footprint; however, this is a worthwhile tradeoff if any given device can support it. Eventually the software complexity curve reaches parity with the hardware curve at the extreme device edge, and you’re now creating custom embedded software for every device too.
Key considerations for edge computing solutions
We’ve established that both hardware and software inherently get more complex the closer you get to the device edge. Software stays more consistent a little longer, all the way down to the thin compute edge when available memory becomes a constraint and you have to go embedded. Here are some key considerations for developing edge computing infrastructure.
Extend cloud-native principles, such as platform-independent, loosely-coupled microservice software architecture, down to as close to the thin compute edge as possible. In doing so, you can maintain more consistent software practices across more edges, even when you inevitably need to go more custom for the hardware. The opportunity to bridge the software-hardware complexity gap close to the thin compute edge with more consistent software tools is represented by the yellow bar in Figure 1. Further, abstracting software into individual microservices — such as discrete functions — as much as possible enables you to easily migrate workloads up and down the edge to cloud continuum as needed. For example, in an initial deployment you may start with running an AI model in the cloud for simplicity, but as your data volume grows you’ll find that you need to push that model down to a compute node closer to the device edge to act on data in the moment and only backhaul meaningful data for retention or further batch analysis.
Leverage open interoperability frameworks like EdgeX Foundry for your various edge computing deployments. The Edge X framework extends cloud-native design principles all the way to the thin compute edge, providing flexibility while also unifying an open ecosystem of both commercial and open source value-add around the open API. Furthermore, there will be embedded commercial variants that compress the discrete platform microservices into a tiny C-based binary, so the code can run on highly-constrained devices or serve use cases that need deterministic real-time. There are inherent physics involved in the tradeoff between flexibility and performance, but even these compressed variants will still be able to take advantage of much of the plug-in value-add within the EdgeX ecosystem, such as device and application services for south- and north-bound data transmission. In all cases, with the open, vendor-neutral EdgeX API you can evolve solutions more readily with microservices written by third parties in the broader ecosystem.
Make sure your edge hardware is appropriately robust to handle the demands of the physical world for the deployed use case. A $30 maker board is great for a proof of concept (PoC) projects on the bench; however, it costs more than $100 when you fully package it in an enclosure in low volume, and it will quite possibly fail in a typically rugged field deployment since it wasn’t intended for these environments.
Speaking of robustness, consider leveraging virtualization, automated workload management and orchestration tools and redundant hardware to provide fault tolerance in mission-critical use cases. Probably not something you’re going to care about if your edge solution is monitoring a connected cat toy, but certainly worth consideration if downtime in your factory costs thousands if not tens of thousands of dollars a minute.
Overprovision the hardware that you deploy in the field in terms of I/O and compute capability. As long as you use software-defined technology as much as possible by extending cloud-native software design principles to capable edge devices and deployed devices have the necessary physical I/O and compute headroom, you can continuously update your edge functionality in the field as your needs inevitably evolve over time. If you don’t deploy the right I/O for future-proofing, you’re going to spend money on a truck roll which typically costs upwards of $750. In other words, how much does that maker board really cost?
Speaking of truck rolls, developers often overlook device management when starting an IoT project because naturally their first concern is their application. It’s important to really think about device management from the start, including not only how the health of your infrastructure will be monitored on an ongoing basis, but also how your deployed devices will be updated in the field at scale. When you’re doing a PoC in party of one to few, it’s easy enough to remote into each device individually to manage it through command lines, but try that for thousands, much less tens of thousands to millions of deployed devices. And the last thing you want to be doing is driving with USB sticks out to the sticks to update devices one by one manually.
Consider whether the infrastructure will be running on a LAN or WAN relative to the subscriber devices that access it. Note the break point in Figure 1. This makes a big difference in terms of tolerance for downtime in any given use case.
Modularize your hardware designs as much as possible, including with field-upgradable components. However, note that modularization can come with impact to cost and reliability since modular connections tend to be more failure-prone due to corrosion and vibration. In fact, it’s advisable to balance modularity with soldering down certain components – such as memory modules — on edge hardware that will run 24/7 in harsh environments.
Make sure your edge hardware has appropriate long-term support — typically a minimum of five years beyond the ship date. This applies to both the hardware and available supported OS options.
In general, plan on flexibility to address OS soup at thinner compute edges and both x86 and advanced RISC machines (ARM) based hardware. In Figure 1, the device edge is pretty much all ARM. This is another reason to leverage platform-independent — both silicon and OS — edge application frameworks.
Make sure to invest in root of trust (RoT) to the silicon level. RoT silicon, such as Trusted Platform Module, enables you to make sure your device attests that it is what it says it is and with secure boot that it is running the software that it should be running. This RoT is foundational for any good defense in-depth security strategy. Speaking of the aforementioned security usability, Intel and ARM’s collaboration on secure device onboarding is an important effort to facilitate trusted late binding of ownership to devices in a multi-party supply channel. This effort is gaining steam, including FIDO’s recent decision to launch an IoT track and make secure device onboarding its first standardization effort within.
Stay tuned for the next installments of this series in which I’ll dig deeper into the edge topic with pointers on sizing edge workloads, my three rules for Edge and IoT scale and eventually how we scale to the grail.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.