An optimum resource pool consists of homogeneous resources, which applications are specifically designed to consume. However, hosting resources are rarely homogeneous, so container and Kubernetes users have long faced the problem of how to steer applications away from sub-optimal or incompatible resources.
Efficient use of a resource pool is a constant problem, with an ever-evolving set of remedies; Kubernetes taints and tolerations are one of the most recent.
Kubernetes scheduling options
In Kubernetes, containers are grouped into pods, which provide abstractions of the required hosting resources for the containers they hold. Kubernetes pods are assigned to nodes, which provide actual hosting resources, such as VMs.
The original and popular way to align Kubernetes pods with nodes involves labels, which identify specific resource features of nodes. Admins create a similar set of specifications for the applications' pods. Upon deployment, the properly labeled nodes are preferenced for deployment through affinities, which enact further constraints around which nodes the pods will run on.
Affinity works as a kind of gravitational pull to attract pods to nodes. The problem with affinity -- for some users -- is that there are too many suitable nodes, so scheduling is still complex. To avoid management difficulties when using labels and affinities, admins must precisely characterize node capabilities and pod requirements. Taints are the opposite of affinity; they're an anti-gravity mechanism that repels pods and prevents certain pods from being scheduled on "tainted" nodes.
Tolerations are refinements of the Kubernetes taint process; taints apply to nodes, but tolerations apply to pods. Admins apply a series of taints to nodes to reference certain features that render those nodes sub-optimal to host pods. If one or more of the taints do not match the tolerations of a given pod or pod set, then that pod or set will not be scheduled on the tainted nodes. The devil is in the details, and taints and tolerations are only one of several ways to guide pod-to-node deployments.
Kubernetes taints and tolerations process
Affinities aren't an effective way to reserve classes of nodes for specific types of pods and ensure that node capacity is available for critical applications. The taints-and-tolerations approach is a better strategy for this, as taints can drive other pods away from reserved nodes. It does, however, require careful administration.
Both Kubernetes taints and tolerations are based on keys, which are abstract terms that restrict scheduling. A key can represent a physical resource, such as a GPU, or it can reflect a reservation of a group of resources. Admins use identifiers, such as "NoGPU" or "CoreAppsOnly," to taint nodes, and provide pods with tolerations that suggest each key to which they're sensitive. For instance, if pods don't need a GPU or are part of a "CoreApp" designation, they'd have those tolerations specified.
This prevents pods from using a node that's compatible in technical terms, but violates an administrative capacity plan. A non-GPU application can run with or without a GPU, but if GPU nodes are scarce, it's better if applications that didn't need one were scheduled elsewhere. Admins can achieve this with Kubernetes taints and tolerations.
On the nodal side, there are three levels of taint:
- Unless the taint is tolerated, the pod cannot be scheduled for deployment on the node.
- Avoids -- but doesn't disallow -- pod scheduling where the taint isn't tolerated.
- Pods that don't tolerate the taint will be evicted if they are already running on the node.
Pods can specify toleration only; there are no degrees of toleration.
Condition-based Kubernetes taints
Even a little bit of anarchy in Kubernetes taint and toleration management will create errors and even encourage people to game the approach. If application developers or users can define tolerations, everything will tolerate everything -- and, ultimately, taints will have no effect. Similarly, it's possible for zealous IT operations admins to overprotect a node with taints until nothing can run on it.
The above examples are for static taints, but admins can adjust or replace taints to reflect changes in policy, status or configuration. One of the goals behind this taints-and-tolerations concept is to reflect changes in conditions of nodes or infrastructure that should result in changes in hosting policies. IT admins can test node conditions, but the process is a bit unwieldy when applied at the scheduling level.
It's possible to taint nodes based on current conditions, and these condition-based taints are one reason for the NoExecute, or eviction, capability. Kubernetes -- as of version 1.17 -- automatically taints nodes based on the nodal resource state, and the scheduler checks for taints, rather than for node conditions. This enables admins to change the NoSchedule or NoExecute status of a taint based on either node conditions or some external policy factor. For example, a pod might be scheduled onto a node because the taint wasn't present during the initial pod-to-node mapping process, but current conditions might require the eviction of any pods that don't tolerate the taint to make room for another class of pods.
Kubernetes taints and tolerations aren't designed to exclude other options, especially labels and affinities. For example, to make a set of nodes and pods exclusive to each other, use both labels and affinities and taints and tolerations. However, the more complex the deployment and redeployment policies are, the easier it is to make a serious mistake that either opens too many nodes or prevents some pods from running anywhere at all.
Condition-based taints are the strongest feature in the new taints-and-tolerations model because they enable sweeping changes in deployment policy. However, administer Kubernetes taints and tolerations carefully, as exploited features can be a gateway to mistakes.