Hold on!

Before you go, why not take Komodor for a spin? Simplify Kubernetes troubleshooting in 5 minutes.

Try for Free *No credit card required.
Komodor-platform
This website uses cookies. By continuing to browse, you agree to our Privacy Policy.

Node Affinity: Key Concepts, Examples, and Troubleshooting

86 Views

What is Node Affinity?

Node affinity is one of the mechanisms Kubernetes provides to define where Kubernetes should schedule a pod. It lets you define nuanced conditions that influence which Kubernetes nodes are preferred to run a specific pod.

The Kubernetes scheduler can deploy pods automatically to Kubernetes nodes, without further instructions. However, in many cases you might want to define that pods should run only on specific nodes in a cluster, or avoid running on specific nodes. For example:

  • A pod might need specialized hardware such as an SSD drive or a GPU
  • Pods that communicate frequently might need to be collocated with each other
  • Pods that have high computational requirements might need to be confined to specific nodes

Node affinity, inter-pod affinity, and node anti-affinity, can help you support these and other use cases, defining flexible rules that govern which pods will schedule to which nodes.

Scheduling in Kubernetes is a process of choosing the right node to run a pod or set of pods.

Understanding node affinity in Kubernetes requires a basic understanding of scheduling to automate the pod placement process. The default Kubernetes scheduler is Kube-scheduler, but it is also possible to use a custom scheduler. 

A basic Kubernetes scheduling approach is to use the node selector, available in all Kubernetes versions since 1.0. The node selector lets users define labels (label-key-value pairs) in nodes – these are useful for matching to schedule pods.

You can use key-value pairs in the PodSpec to specify nodeSelector. When a key-value pair perfectly matches a label defined in a node, the node selector matches the associated pod to the relevant node. You can add a label to a node using this command: 

kubectl label nodes <node-name> <key>=<value>

The node selector has the following PodSpec.

spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    <key>: <value>

The node selector is the preferred method of matching pods to nodes for simpler, small cluster-based use cases in Kubernetes. However, this method can become inadequate for more complex use cases with larger Kubernetes clusters. Kubernetes affinity allows administrators to achieve a high degree of control over the scheduling process.


Scan Your Cluster with Komodor

Identify issues, uncover their root cause, and get the context you need to troubleshoot efficiently and independently.

Kubernetes Node Selector vs. Node Affinity

In Kubernetes, you can create flexible definitions to control which pods should be scheduled to which nodes. Two such mechanisms are node selectors and node affinity. Both of them are defined in the pod template.

Both node selectors and node affinity can use Kubernetes labels—metadata assigned to a node. Labels allow you to specify that a pod should schedule to one of a set of nodes, which is more flexible than manually attaching a pod to a specific node.

The difference between node selector and node affinity can be summarized as follows:

  • A node selector is defined via the nodeSelector field in the pod template. It contains a set of key-value pairs that specify labels. The Kubernetes scheduler checks if a node has these labels to determine if it is suitable for running the pod.
  • Node affinity is an expressive language that uses soft and hard scheduling rules, together with logical operators, to enable more granular control over pod placement.

Node Affinity, Inter-Pod Affinity and Anti-affinity

Kubernetes Node Affinity

Node affinity defines under which circumstances a pod should schedule to a node. There are two types of node affinity:

  • Hard affinity—also known as required node affinity. Defined in the pod template
    under
    spec: affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution. This specifies conditions that a node must meet for a pod to schedule to it.
  • Soft affinity—also known as preferred known affinity. Defined in the pod template under spec:affinity:nodeAffinity:preferredDuringSchedulingIgnoredDuringExecution. This specifies conditions that a node should preferably meet, but if they are not present, it is still okay to schedule the pod (as long as hard affinity criteria are met).

Both types of node affinity use logical operators including In, NotIn, Exists, and DoesNotExist.

It is a good idea to define both hard and soft affinity rules for the same pod. This makes scheduling more flexible and easier to control across a range of operational situations.

Kubernetes Inter-Pod Affinity

Inter-pod affinity lets you specify that certain pods should only schedule to a node together with other pods. This enables various use cases where collocation of pods is important, for performance, networking, resource utilization, or other reasons.

Pod affinity works similarly to node affinity:

  • Supports hard and soft affinity via the spec:affinity:podAffinity field of the pod template.
  • Uses the same logical operators.
  • The Kubernetes scheduler evaluates pods currently running on a node, and if they meet the conditions, it schedules the new pod on the node.

Kubernetes Anti-Affinity

Anti-affinity is a way to define under which conditions a pod should not be scheduled to a node. Common use cases include:

  • Avoiding single point of failure—when distributing a service across multiple pods, it is important to ensure each pod runs on a separate node. Anti-affinity can be used to achieve this.
  • Preventing competition for resources—certain pods might require ample system resources. Anti-affinity can be used to place them away from other resource-hungry pods.

Kubernetes provides the spec:affinity:podAntiAffinity field in the pod template, which allows you to prevent pods from scheduling with each other. You can use the same operators to define criteria for pods that the current pod should not be scheduled with.

Note that there is no corresponding “node anti affinity” field. In order to define which nodes a pod should not schedule to, use the Kubernetes taints and tolerations feature (learn more in our guide to Kubernetes nodes).

Quick Tutorial: Assigning Pods to Nodes Using Node Affinity

Let’s see how to use node affinity to assign pods to specific nodes. The code was shared in the official Kubernetes documentation.

Prerequisites

To run this tutorial, you need to have:

  • A Kubernetes cluster with at least two nodes in addition to the control plane.
  • One of the nodes should have an SSD drive, and the other should not. You can also use two machines with some other difference between them, but then you’ll have to adjust the affinity rules in the examples below.
  • kubectl command-line installed on your local machine and communicating with the cluster.
  • Kubernetes control plane with version v1.10 or higher.

Schedule a Pod Using Required Node Affinity

The following manifest specifies a required node affinity, which states that pods created from this template should only be scheduled to a node that has a disk type of ssd.

apiVersion: v1

kind: Pod

metadata:

  name: nginx

spec:

  affinity:

    nodeAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:

        nodeSelectorTerms:

       —matchExpressions:

         —key: disktype

            operator: In

            values:

           —ssd            

  containers:

 —name: nginx

    image: nginx

    imagePullPolicy: IfNotPresent

A few important points about this code:

  • The spec:affinity:nodeAffinity section defines the affinity rules for the pod
  • This manifest uses requiredDuringSchedulingIgnoredDuringExecution, which specifies required node affinity
  • The matchExpressions field specifies the affinity rules. In this case, the pod should schedule on a node if disktype is matched to ssd using the operator In.

Here is how to create a pod using this manifest and verify it is scheduled to an appropriate node:

1. Create a pod based on the manifest using this code:

kubectl apply -f https://k8s.io/examples/pods/pod-nginx-required-affinity.yaml

2. Use the following command to check where the pod is running:

kubectl get pods --output=wide

3. The output will look similar to this. Check that the node the pod scheduled on is the one running the SSD drive:

NAME     READY     STATUS    RESTARTS   AGE    IP           NODE

nginx    1/1       Running   0          13s    10.200.0.4   worker0


Schedule a Pod Using Preferred Node Affinity

The following manifest specifies a preferred node affinity, which states that pods created from this template should preferably be scheduled to a node that has a disk type of ssd, but if this criterion does not exist, the pod can still be scheduled.

apiVersion: v1

kind: Pod

metadata:

  name: nginx

spec:

  affinity:

    nodeAffinity:

      preferredDuringSchedulingIgnoredDuringExecution:

     —weight: 1

        preference:

          matchExpressions:

         —key: disktype

            operator: In

            values:

           —ssd          

  containers:

 —name: nginx

    image: nginx

    imagePullPolicy: IfNotPresent


A few important points about this code:

  • The spec:affinity:nodeAffinity section defines the affinity rules for the pod
  • This manifest uses preferredDuringSchedulingIgnoredDuringExecution, which specifies preferred node affinity
  • Just like in the previous example, the matchExpressions field specifies the affinity rules, stating that the pod should schedule on a node if disktype is matched to ssd using the operator In (but this rule is only “preferred” and not mandatory)

To create a pod using this manifest and verify it is scheduled to an appropriate node, use the same instructions as above—apply the manifest, run get pods and check it was scheduled on the node with the SSD drive.

Kubernetes Node Affinity Errors

Pod Remains Pending Due to Affinity or Anti-Affinity Rules

In some cases, a pod will remain pending and fail to schedule due to overly strict affinity or anti-affinity rules. If a pod is pending and you suspect it is due to affinity rules, you can query its affinity rules.

kubectl get pod <PENDING_POD_NAME> -ojson | jq '.spec.affinity.nodeAntiAffinity'

Replace podAntiAffinity with podAffinity or nodeAffinity if that is the type of affinity applied to the pod.

The output looks something like this:

{

  "requiredDuringSchedulingIgnoredDuringExecution": [

    {

      "labelSelector": {

        "matchExpressions": [

          {

            "key": "app",

            "operator": "In",

            "values": [

              "nginx"

            ]

          }

        ]

      },

      "topologyKey": "kubernetes.io/hostname"

    }

  ]

}

How to diagnose issues based on the affinity configuration:

  • Pay attention to requiredDuringSchedulingIgnoredDuringExecution—if this appears at the top, it indicates that affinity rules are required. Consider changing them to preferred, which will allow the pod to schedule even if the conditions are not met.
  • Check the matchExpressions section and identify what rule specifies pod affinity and whether the required pods actually run on the intended nodes. 
  • Check for errors in your affinity rules.

Volume-Node Affinity Conflict

A common error is that pods, which have node affinity rules, fail to schedule because the node is scheduled to one cloud availability zone (AZ) but the pod’s PersistentVolumeClaim (PVC) binds to a PersistentVolume (PV) in a different zone. This could also happen if there is another difference between the criteria of the node and the PV, which violates the affinity rules.

When this happens, the pod will always fail to schedule to a node, even though the node is in the correct availability zone (or meets the other affinity rules).

To diagnose a volume-node affinity conflict, run two commands:

  • kubectl describe nodes <node-name>—this will show you the current status of the node you want the pod to schedule to. Alternatively, you can view node details via the management console of your cloud provider.
  • kubectl get pv—this will show you a list of PVs in the cluster. Identify the relevant PV and then run:
  • kubectl describe pv

Once you identify a volume-node conflict, there are two ways to fix the issue:

  • Delete the PV and PVC‍ and recreate them in the same AZ as the node—run the commands kubectl delete pvc <pvc-name> and kubectl delete pv <pv-name> and recreate them in the correct AZ.
  • Delete the pod deployment and recreate in the same AZ as the PV—run the command kubectl delete deployment <deployment-name> and re-apply the pod in the other AZ.

These fixes will help with the most basic affinity conflicts, but in many cases these conflicts involve multiple moving parts in your Kubernetes cluster, and will be very difficult to diagnose and resolve without dedicated troubleshooting tools. This is where Komodor comes in.

Solving Kubernetes Node Errors Once and for All with Komodor

Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.

Komodor can help with our new ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:

  • See service-to-node associations
  • Correlate service and node health issues
  • Gain visibility over node capacity allocations, restrictions, and limitations
  • Identify “noisy neighbors” that use up cluster resources
  • Keep track of changes in managed clusters
  • Get fast access to historical node-level event data

Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs. Komodor provides:

  • Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when. 
  • In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
  • Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system. 
  • Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial

Related Articles

Latest Blogs

Taking Your Kubernetes Helm Charts to the Next Level

Taking Your Kubernetes Helm Charts to the Next Level

This post discusses best practices to optimize your Helm charts - from V2 vs V3, upgrades, how to use them in CI/CD and more....

The 4 Golden Signals for Monitoring Kubernetes: Everything You Need to Know

The 4 Golden Signals for Monitoring Kubernetes: Everything You Need to Know

This post will focus on the four golden signals you need to consider when troubleshooting in k8s: latency, traffic, errors, and saturation...

CI/CD Pipelines for Kubernetes: Best Practices and Tools

CI/CD Pipelines for Kubernetes: Best Practices and Tools

In this blog post, we discuss the challenges as well as best practices for CI/CD pipelines for Kubernetes. ...