https://www.techtarget.com/searchapparchitecture/tip/The-basics-benefits-and-risks-of-cell-based-architecture
Classic failover strategies, such as reverting to backup databases or servers, are expensive to design, test and maintain -- but so are cascading failures that can bring down entire systems.
Cell-based architecture (CBA) design is an emergent approach aimed at remedying this issue by eliminating single points of failure. Companies like Slack and products like Amazon's Prime Video have already migrated to a cellular approach in response to outage incidents and increased traffic and data demands. But each architectural decision comes with tradeoffs, so it's important to weigh the advantages of high availability and scalability with the complexity and costs that a cell-based approach can bring to large application systems.
CBA decomposes a software system into large collections of partial or complete copies of the system's various application services and data components. In the case of microservices-based applications, each cell encompasses one or more microservices that operate in accordance with defined business logic.
A cell is an isolated, independent unit of a software system. The cell contains a set of services that make logical sense to connect. A profile, for example, could be stored in a database and have related create, edit and read services. Cells also contain all the dependencies they need to make the services run, such as the databases with the user information in those profiles.
Cells need to be backed up, deployed and created separately. They have one access point for data to enter and leave, typically a gateway like the API gateway pattern. Because a cell is independent and contains everything needed to deploy a service, it can be duplicated and reused during times of peak demand. The routing layer can reroute traffic to new cells to cover the load and includes the following:
The routing layer can also include a load balancer, which works in conjunction with the management plane to observe heavy loads or replace systems that are no longer operating.
The management and routing isolation features of a cell-based approach work together to create new possibilities. For example, multiple redundant cells can enable high availability, segregating data across geographic regions or availability zones (AZs). A large enterprise might place cells in different regions of a public cloud to provide customers with faster responses. In terms of resiliency, the load balancer and management plane can maintain a service even when a cell is lost. The independent nature of the cells can also keep the application up when another service goes down.
The most common executions of cell-based architecture integrate cloud computing and autoscaling, where a cell typically implements one or more RESTful services. Yet, there are other ways to construct a cell-based architecture. Cells can exist on separately running physical servers or virtual servers on the same machine using routers, firewalls and IP security to segregate an existing network. Alternatively, cells could also live on the same machine and use different permissions, processes or user IDs to achieve isolation. These examples demonstrate how cell-based architecture is a pattern, one that offers versatile methods for cell segregation. Companies using microservices or Kubernetes may find what they are already doing is close to a cell-based approach.
Cell-based architecture provides a vision for building large, highly reliable applications with several key advantages. Some significant benefits are the following:
Before implementing a cell-based architecture, it is important to consider the challenges that arise from adding yet another layer of infrastructure. Here are a few things to expect:
The concept of a cell-based architecture originally emerged as a way to address cascading errors and failover problems within complex application systems. Systems that run at a global or internet scale are especially good candidates for cell-based architecture, as sheer scale requires redundancy and scalability.
Cell-based architecture is a natural fit for organizations looking to align business services with the internet services they expose -- and to build and deploy web services in the cloud. Setting up the components and CI/CD pipeline might create a fair amount of work at the outset. As an organization grows, the separation of concerns can accelerate the development of new cells and ensure deployments remain clean and simple.
For growing organizations, moving to a cellular structure requires a cleanup of backdoors, side checks, redundancies and hand-rolled SQL in order to get past a legacy big ball of mud architecture. While teams may still contend with residual issues within cells, they have full authority to fix them.
Amazon uses cell-based architecture to deliver videos with Prime Video, enabling it to adjust cell routing to ensure load balancing, create new cells when demand is high and take a cell that is underperforming out of rotation. Cells serve up video and don't have a state, so if one gets stuck in an infinite loop, the system automatically detects and reroutes traffic, shuts the cell down and creates a new one.
Slack migrated to a cell-based architecture after an incident involving a service outage in 2021. Slack decided to treat the AZs, which can have outages, as cells and build software to enable failover and routing when a cell goes down. To do that, it had to isolate the code within each AZ, creating a silo, or, in other words, a cell. As a result, when AZs fail, Slack users should no longer see an outage.
Matt Heusser is managing director at Excelon Development, where he recruits, trains and conducts software testing and development. The initial lead organizer of the Great Lakes Software Excellence Conference and lead editor of "How to Reduce the Cost of Software Testing," Heusser served a term on the board of directors for the Association for Software Testing.
30 Sep 2024