Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Automate and optimize AI/ML workloads on K8s
Easily manage Kubernetes Edge clusters
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Your single source of truth for everything regarding Komodor’s Platform.
Keep up with all the latest feature releases and product updates.
Leverage Komodor’s public APIs in your internal development workflows.
Get answers to any Komodor-related questions, report bugs, and submit feature requests.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Kubernetes taints are a feature that allows nodes (physical or virtual machines) in a Kubernetes cluster to repel a set of pods. In other words, they ensure that only certain pods can schedule onto certain nodes. This is achieved by marking the nodes with a taint, which then repels pods that do not tolerate the taint.
The concept of tolerations complements taints in Kubernetes. A toleration is a feature that allows a pod to schedule onto a node with a matching taint. Tolerations and taints work together to ensure that pods are not scheduled onto inappropriate nodes.
The taints and tolerations concept in Kubernetes is very useful in multi-tenant environments, where you don’t want certain pods to use resources from other pods. It can also be used when you have special hardware that needs to be reserved for specific tasks, and you don’t want other pods to use it.
This is part of a series of articles about Kubernetes management
In a Kubernetes cluster, there might be nodes that are dedicated to specific tasks. For example, you might have a set of nodes that are dedicated to running database pods, and you don’t want other types of pods to use these nodes.
By applying taints to these dedicated nodes, you can ensure that only pods with the corresponding tolerations can be scheduled onto them. This is a powerful way to manage resources in a Kubernetes cluster.
Another common use case for Kubernetes Taints is in situations where you have nodes with special hardware. For example, you might have nodes with GPUs that are needed for machine learning tasks.
In such cases, you can apply taints to these nodes and ensure that only pods with the appropriate tolerations can be scheduled onto them. This prevents other pods from using the special hardware, ensuring that it is reserved for the tasks that need it.
Taint-based evictions are used to gracefully remove pods from nodes under certain conditions. This is particularly useful when you need to clear a node for maintenance, upgrade hardware, or rebalance the workload. When a taint is applied to a node with the NoExecute effect, any pods that do not tolerate the taint are evicted immediately or after a specified grace period. This mechanism ensures that pods are not abruptly terminated, allowing for a controlled shutdown process.
NoExecute
Related content: Read our guide to cluster autoscaler
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better use Kubernetes taints and tolerations:
Clearly label nodes to simplify taint and toleration management.
Maintain thorough documentation to understand and manage configurations easily.
Avoid over-tainting to prevent scheduling issues and inefficiencies.
Reserve nodes for critical workloads with specific taints and tolerations.
Avoid overlapping tolerations for better resource allocation and predictability.
While Kubernetes Taints and tolerations are powerful features for managing resources in a Kubernetes cluster, they are not the only tools available. Kubernetes also supports pod anti-affinity and node affinity, which provide additional ways to control how pods are scheduled onto nodes.
Pod anti-affinity allows you to prevent certain pods from being scheduled onto the same node. This can be useful in scenarios where you want to ensure high availability of your applications.
Node affinity allows you to specify that certain pods should be scheduled onto certain nodes. This can be useful when you have nodes with special hardware or when you want to ensure that pods are scheduled onto nodes in a specific geographic region.
It is important to note that taints and tolerations don’t ensure pods will be scheduled on the nodes, they only specify that specific pods can run on specific nodes and reject all the others. Affinity, by contrast, can enforce that pods will run on only on specific nodes.
Taints are applied to nodes and tolerations are applied to pods. When a pod with a toleration for a certain taint is scheduled, it can run on a node with that taint.
To assign a taint to a node, you can use the kubectl taint command. Here’s an example:
kubectl taint
kubectl taint nodes node1 key=value:NoSchedule
In this example, node1 is the name of the node we’re tainting, key=value is the taint, and NoSchedule is the effect, which means that no new pods will be scheduled onto this node unless they tolerate the taint.
node1
key=value
NoSchedule
Note: You can use tags for key=value, for example: environment=dev
To assign a toleration to a pod, add the tolerations section to your pod specification:
tolerations
apiVersion: v1 kind: Pod metadata: name: kubenode spec: containers: - name: my-container image: nginx:latest tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule"
In this pod spec, the pod kubenode will tolerate the taint key=value:NoSchedule.
kubenode
key=value:NoSchedule
Now that we understand how to use Kubernetes taints and tolerations, let’s look at some best practices that will help you manage cluster resources more effectively.
Labels are key/value pairs (also known as tags by cloud providers) that can be attached to Kubernetes objects, including nodes and pods. They can be used to organize and to select subsets of objects. When it comes to taints and tolerations, labels are very helpful. By clearly and descriptively labeling your nodes, you can better understand the purpose of each node and why it might have certain taints.
This practice also makes it easier to apply taints and tolerations. For example, if you have a group of nodes that are dedicated to running a specific type of workload, you can label these nodes and then apply a taint to all nodes with that label in one command.
Keeping a record of your taints and tolerations is critical for maintaining a healthy and efficient Kubernetes cluster. As your cluster grows, it can become difficult to remember why certain taints and tolerations were applied.
Documentation helps to prevent misunderstandings and mistakes and makes it easier for new team members to understand your cluster configuration. It’s also useful for troubleshooting. If a pod is not being scheduled as expected, your documentation could help you determine if a missing or incorrect toleration is the cause.
It can be tempting to apply taints to all nodes in your cluster to control where pods are scheduled. However, this can lead to problems. For example, if all nodes are tainted, it may be impossible to schedule some pods, causing them to remain pending indefinitely.
Instead, try to use taints sparingly and strategically. Remember that the purpose of taints is not to prevent pods from being scheduled, but to ensure that the right pods are scheduled on the right nodes.
Taints and tolerations can be used to ensure that critical workloads are given priority on certain nodes. For example, you might have a group of nodes that are reserved for critical workloads. By applying a taint to these nodes, you can prevent other pods from being scheduled on them.
Then, you can add a toleration to your critical pods that allows them to be scheduled on these nodes. This helps to ensure that your critical workloads always have the resources they need to run efficiently.
While it’s possible for a pod to tolerate multiple taints, it’s best to avoid this if possible. Overlapping tolerations can make it more difficult to predict where pods will be scheduled and can lead to inefficient use of resources.
Instead of using overlapping tolerations, try to design your taints and tolerations so that each pod tolerates a unique set of taints. This will make it easier to manage your cluster and ensure that resources are used effectively.
Komodor has launched a new feature, Node Termination Enrichment, targeting two main user groups: Ops teams and Developers/Data teams. For Ops teams, this feature enables understanding the impact of actions taken inside and outside Kubernetes (K8s) clusters on the availability of applications and services. This is crucial as their actions often affect the cluster without a clear link between the actions and their consequences. Developers and Data teams, on the other hand, can use this feature to quickly identify infrastructure-related failures in their applications and data engineering jobs, allowing them to determine whether to address issues themselves or escalate them. This feature brings significant value in identifying and addressing application failures due to infrastructure changes or issues.
From a technical standpoint, the Node Termination Enrichment feature enhances Komodor’s capabilities in several ways. It extends coverage beyond K8s clusters into the infrastructure layer, enriches troubleshooting of availability issues by correlating them with node terminations, and expands value in cases like orphaned pod terminations. Key features include identification of cloud providers (AWS, GKE, Azure) with enriched metadata (OS, region, zone), determination of termination reasons (such as spot interruption or autoscaling events), and analysis of the impact on services, jobs, and pods.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
and start using Komodor in seconds!