The case for Kubernetes
Manual provisioning and legacy tooling struggle at scale. Kubernetes adds self-healing, autoscaling and unified observability to reduce downtime and operational risk.
For many companies, application deployment still means packaging apps in containers like Docker and running them on either dedicated legacy servers, traditional Virtual Machines (VMs) or Virtual Private Servers (VPS). This works until it doesn't.
Kubernetes enables better container orchestration with self-healing provisioning and a standardized observability and configuration management stack. Learn how Kubernetes helps teams manage the issues created by manual provisioning, reactive scaling and increasingly complex deployments.
The operational limits of manual provisioning
Consider this example:
When traffic spikes, the host crashes. This could be a classic case of under-provisioning of compute resources. The typical fix is to purchase or spin up additional machines with the same specs, deploy the exact version of the application on them with the same configurations and distribute traffic evenly between them using a load balancer.
For the moment, everything is working as expected; when one server unexpectedly goes down, the load balancer automatically redirects traffic to the other healthy servers.
But then, another series of incidents occurs. One container might have a memory leak, resulting in an OOM kill. The second could be stuck in an infinite retry loop due to a failed database connection. The third might be running on a node that is about to undergo maintenance.
Despite these failures, the load balancer keeps sending traffic to all these problematic servers. Why? Because the health checks in most VM and VPS setups can't detect these application-level failures.
A small team of sysadmins and DevOps engineers can handle these few crashes, but managing all these simultaneous failures by manual provisioning and constant re-configuration is highly unsustainable.
These operational gaps cascade. What happens when a new version of the software needs to be rolled out quickly? The team will have to rebuild and redeploy containers across all their servers. Attempting this while ensuring zero downtime becomes even more complex.
Server provisioning with Ansible, Puppet, and Chef: The configuration drift problem
A logical response to this chaos is to integrate any of these configuration management tools to automate repetitive provisioning tasks and enable workflows such as rollbacks, health checks and change approvals via scripts.
However, these tools weren't built with containers in mind. Rather, they were designed for managing static and predictable infrastructure. As the number of servers increases, scripts multiply quickly.
What starts as a clean and manageable playbook evolves into a large collection of interdependent scripts that are more prone to human errors, difficult to test and risky to modify. Overall, maintaining them becomes a full-time job in itself.
There is also the issue of configuration drift. Puppet and Chef make a good attempt to address it as their agents run continuously on servers, detecting and correcting deviations from the desired state.
Containers, however, are ephemeral and immutable by design. They are constantly being spun up, scaled out and replaced. These tools operate at the host level and were designed for mutable infrastructure, which needs continuous correction.
Puppet and Chef's operating model assumes servers that persist for months. Containers persist for minutes. This mismatch means they have limited visibility into what is running inside individual containers.
For example, a hotfix applied to a running container during an incident or a dependency that got silently updated on just some nodes can cause supposedly identical environments to begin to diverge in ways the tooling simply cannot see. The dashboard reports green while the infrastructure tells a different story.
Impact on businesses
At scale, unplanned downtime translates directly into wasted budget, slower release cycles and compounding operational risk. This is the ceiling that most growing engineering teams hit.
The question CIOs and IT leaders face is not whether their current setup has limitations; it does. The question is how long they can afford to keep hitting that ceiling.
This is where container orchestration and Kubernetes specifically change the equation.
Kubernetes: The de facto standard
Kubernetes was built to solve these exact problems and has become the de facto standard for container orchestration. Where manual provisioning fails, Kubernetes self-heals. Where config management loses visibility, Kubernetes maintains the desired state at the container level, avoiding downtime.
For organizations running containerized workloads at any meaningful scale, Kubernetes is the de facto choice, and using an alternative could be costing you.
Already using an alternative? Here's why it might not be enough
There is a good chance your organization is currently managing containerized workloads with one of the following tools.
Docker Swarm
Docker Swarm, built on Docker, offers a lightweight and straightforward way to manage a swarm of Docker nodes using a manager-worker architecture across hosts.
Despite its strengths, Swarm struggles with handling much larger and more complex deployments due to its lack of advanced scheduling and third-party integrations needed in enterprise-grade environments.
The release of Docker Engine v29 broke Swarm's storage plugins and raised minimum API requirements, causing service outages. Most organizations are now making the gradual migration away from Swarm onto Kubernetes before the next change breaks something and forces their hand.
Amazon ECS
ECS removes the operational burden that comes with running containers at scale. It is built for convenience, not complexity. When your workloads begin to grow, the cracks begin to show. ECS on EC2 requires manual instance sizing; without careful rightsizing, fragmented capacity inflates your bill.
The moment your team needs to set up advanced traffic management, fine-grained autoscaling, custom scheduling, or service mesh capabilities, you are either building expensive workarounds or reaching for tools that were designed for Kubernetes in the first place.
There is also the risk of vendor lock-in. In the future, any migration away from AWS will be a costly constraint.
HashiCorp Nomad
Nomad’s workload-agnostic scheduler orchestrates containers alongside raw binaries, Java apps and VMs, all within the same cluster. This flexibility comes at a cost.
Nomad has a very small ecosystem, with fewer monitoring integrations, security scanners and pre-built operators. With its $6.4 billion acquisition by IBM, Nomad's roadmap is now driven by IBM's enterprise priorities.
Rather than betting your container strategy on a platform that treats containers as just one of many workload types, controlled by a single vendor with an uncertain open-source future, it's best to consider an alternative solution.
Heroku and proprietary PaaS platforms
Heroku and other PaaS abstract the complexity of deployments. Teams don’t have to touch servers, load balancers or deployment scripts. For startups and small teams, this was revolutionary.
But with such simplicity comes an invisible constraint, which is that you lose control of all deep infrastructure decisions.
Now this might not matter to small teams and applications, but for large teams, not being able to choose your runtime, customize your network topology, set up custom networking and advanced observability or use sidecar containers, you hit a wall fast.
6 reasons why every CIO should consider Kubernetes
For decision-makers evaluating whether adopting Kubernetes is the right strategic investment for their teams, and what the real cost of not adopting an orchestration layer looks like at scale.
When not to go with Kubernetes
Kubernetes thrives in many scenarios, but might not be the best solution in every case. Some example cases where a Kubernetes alternative would be preferable include the following.
Stateful workloads with heavy persistence needs
While Kubernetes already provides features such as StatefulSets and Persistent Volumes for managing stateful applications -- e.g., databases and message queues, configuring them can get complex and might introduce challenges around data replication, failover procedures and handling backups. Managed database services like RDS, Cloud SQL, and DynamoDB can handle this more reliably.
Small-scale or simple workloads
For startups or small teams with low-traffic applications, Kubernetes can be overkill.
Legacy monolithic applications
Legacy applications are often monoliths, not designed to be run in microservices and would require lots of refactoring to run on Kubernetes.
Teams without platform engineering capacity
Without dedicated engineers, a poorly maintained cluster will generate more outages than it prevents.
Wisdom Ekpotu is a DevOps engineer and technical writer focused on building infrastructure with cloud-native technologies.