Kubernetes Nodes – The Complete Guide

A Kubernetes node is a physical or virtual machine that runs pods in a Kubernetes cluster. Each node is managed by the control plane and typically runs kubelet, a container runtime, and kube-proxy. Nodes provide the CPU, memory, storage, and networking resources that Kubernetes uses to run containerized workloads.

There are two types of nodes:

  • The Control Plane node—runs the Kubernetes control plane which controls the entire cluster. A cluster must have at least one control plane node; there may be two or more for redundancy. Control plane nodes run the core components that manage the cluster, including the API server, scheduler, controller manager, and etcd. Worker nodes run the application workloads.
  • Worker nodes—these are nodes on which you can run containerized workloads. Each node runs the kubelet—an agent that enables the Kubernetes control plane to control the node. Kubernetes nodes are used by organizations to run a variety of workloads, as a core component in modern DevOps processes.

This is part of an extensive series of guides about microservices.

Kubernetes Nodes vs Pods

A Kubernetes node is a single machine in a cluster that serves as an abstraction. Instead of managing specific physical or virtual machines, you can treat each node as pooled CPU and RAM resources on which you can run containerized workloads. When an application is deployed to the cluster, Kubernetes distributes the work across the nodes. Workloads can be moved seamlessly between nodes in the cluster.

A Kubernetes pod is the smallest unit of management in a Kubernetes cluster. A pod includes one or more containers, and operators can attach additional resources to a pod, such as storage volumes. Pods are stateless by design, meaning they are dispensable and replaced by an identical unit if one fails. A pod has its own IP, allowing pods to communicate with other pods on the same node or other nodes.

ConceptWhat it isManaged byExample
NodeMachine that provides compute resourcesControl plane + kubeletVM, bare-metal server
PodSmallest deployable workload unitScheduler + kubeletOne app container plus sidecar
ContainerRuntime unit inside a podContainer runtimeNGINX, Redis, app process
Kubernetes Node vs Pod

The Kubernetes Scheduler, running on the control plane node, is responsible for searching for eligible worker nodes for each pod and deploying it on those nodes. Each pod has a template that defines how many instances of the pod should run and on which types of nodes. When a node fails or has insufficient resources to run a pod, the pod is evicted and rerun on another node.

 
expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better manage Kubernetes nodes:

Regularly update nodes

Keep your nodes updated with the latest security patches and Kubernetes versions.

Monitor node health

Use tools like Prometheus and Grafana to monitor node health and performance.

Implement node taints and tolerations

Control pod placement on nodes using taints and tolerations.

Use node affinity and anti-affinity

Define rules to influence pod scheduling based on node labels.

Balance workloads

Distribute workloads evenly across nodes to avoid overloading.

Kubernetes Node Components

Here are the primary software components that run on every Kubernetes node:

ComponentRuns onWhat it doesUseful command
kubeletEvery nodeEnsures pods and containers are running as expectedsystemctl status kubelet
Container runtimeEvery nodeRuns containers through CRI, usually containerd or CRI-Ocrictl ps
kube-proxyUsually every nodeMaintains network rules for Kubernetes Serviceskubectl get pods -n kube-system -o wide
CNI pluginEvery nodeHandles pod networkingkubectl get pods -n kube-system
Kubernetes node components

kubelet

The kubelet is a software agent that runs on Kubernetes nodes and communicates with the cluster control plane. It allows the control plane to monitor the node, see what it is running, and deliver instructions to the container runtime.

When Kubernetes wants to schedule a pod on a specific node, it sends the pod’s PodSpecs to the kubelet. The kubelet reads the details of the containers specified in the PodSpecs, pulls the images from the registry and runs the containers. From that point onwards, the kubelet is responsible for ensuring these containers are healthy and maintaining them according to the declarative configuration.

kube-proxy

kube-proxy enables networking on Kubernetes nodes, with network rules that allow communication between pods and entities outside the Kubernetes cluster. kube-proxy either forwards traffic directly or leverages the operating system packet filtering layer.

kube-proxy can run in three different modes: iptables, ipvs, and userspace (a deprecated mode that is not recommended for use). iptables, the default mode, is suitable for clusters of moderate size, however it uses sequential network rules which can impact routing performance. ipvs can support a large number of services, as it supports parallel processing of network rules.

Container runtime

The container runtime, such as Docker, containerd, or CRI-O, is a software component responsible for running containers on the node. Kubernetes does not take responsibility for stopping and starting containers, and managing basic container lifecycle. The kubelet interfaces with any container engine that supports the Container Runtime Interface (CRI), giving it instructions according to the needs of the Kubernetes cluster.

Kubernetes no longer includes dockershim, which was removed in Kubernetes v1.24. Modern clusters typically use CRI-compatible runtimes such as containerd or CRI-O. If an environment still depends on Docker Engine as the node runtime, it requires a CRI-compatible adapter such as cri-dockerd.

Understanding Kubernetes Node StatusSystem Info

Understanding Kubernetes Node Status

You can use the kubectl command line to view the status of a Kubernetes node.

kubectl describe node [node-name]

Here is an example of the status returned by a node:

Name:			kubernetes-node-861h Role Labels:		 kubernetes.io/arch=amd64            kubernetes.io/os=linux            kubernetes.io/hostname=kubernetes-node-861h Annotations:        node.alpha.kubernetes.io/ttl=0                     volumes.kubernetes.io/controller-managed-attach-detach=true Taints:              CreationTimestamp:	Mon, 04 Sep 2017 17:13:23 +0800 Phase: Conditions: ... Addresses:	10.240.115.55,104.197.0.26 Capacity: ... Allocatable: ... System Info: ... 

The most important parts of a node status report are: Addresses, Conditions, Capacity/Allocatable, and System Info. The node status report also shows the node’s taints and tolerations, which tell the Kubernetes scheduler which nodes are more appropriate to a specific node. You can read more about node affinities, taints and tolerations below.

Node status / conditionWhat it meansWhat to check first
ReadyNode can accept podsConfirm capacity and workload health
NotReadyNode cannot accept podskubelet, runtime, network, resource pressure
UnknownControl plane stopped hearing from nodeNode heartbeat, network, kubelet
DiskPressureNode is low on diskImage garbage collection, logs, ephemeral storage
MemoryPressureNode is low on memoryPod requests/limits, OOM events
PIDPressureToo many processesProcess limits, runaway workloads
NetworkUnavailableNode networking is not configured properlyCNI plugin, routes, kube-proxy
Kubernetes node status explained

Addresses

The Addresses section of the node status report can represent the hostname, as reported by the kernel of the node, the external IP of the node, and the internal IP that is routable within the cluster. The way these fields are displayed depends on whether the node is a bare-metal machine or a compute instance running in the cloud.

Conditions

The Conditions section of the node status report looks like this:

... Conditions:   Type		Status		LastHeartbeatTime			LastTransitionTime			Reason					Message   ----    ------    -----------------     ------------------      ------          -------   OutOfDisk             Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.   MemoryPressure        Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.   DiskPressure          Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.   Ready                 Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status. ... 

Here are some of the common conditions that appear in a node status report:

  • Ready—this is true if the node is ready to accept pods and false if the node is not healthy and cannot run new pods. Unknown means that the node controller has not received feedback from the node in the past 40 seconds.
  • DiskPressure—this is true if the node is close to running out of disk space.
  • MemoryPressure—true if the node is close to running out of memory.
  • PIDPressure—true if there is too much processing running on the node as reported by the kernel.
  • NetworkUnavailable—true if the node does not have networking configured properly.

Capacity and Allocatable

The Capacity and Allocatable sections of the node status report looks like this:

... Capacity: cpu: 2 hugePages: 0 memory: 4046788Ki pods: 110 Allocatable: cpu: 1500m hugePages: 0 memory: 1479263Ki pods: 110 ... 

These parameters reflect the node’s available resources, which determine how many pods can run on the node:

  • Capacity—indicates the total amount of computing resources available on the node
  • Allocatable—the amount of computing resources available for running normal pods

System Info

The System Info section of the node status report looks like this:

... System Info: Machine ID: 8e025a21a4254e11b028584d9d8b12c4 System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3 Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0 Kernel Version: 4.4.0-31-generic OS Image: Debian GNU/Linux 8 (jessie) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.5 Kubelet Version: v1.6.9+a3d1dfa6f4335 Kube-Proxy Version: v1.6.9+a3d1dfa6f4335 ExternalID: 15233045891481496305 Non-terminated Pods: (9 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%) Events: ... 

This provides useful information about hardware and software on the node, including:

  • Operating system
  • Kernel version
  • Version of kubelet and kube-proxy
  • Container runtime details

How Many Kubernetes Nodes Should be in a Cluster?

Here are three criteria you can use to determine the optimal number of nodes in your Kubernetes cluster:

  1. Performance—simply put, more nodes enable you to run workloads with higher performance. Each node adds more compute and memory resources to the cluster. Some nodes might add special hardware resources like high-speed storage or graphical processing units (GPUs). A rule of thumb is to have about 20% more computing resources than the expected workloads, to allow for peaks and node failures.
  2. High availability—additional nodes in a cluster can enable high availability strategies, such as running multiple instances of the same pod. You can also use some of the nodes as redundancy for the control plane node, which is a single point of failure. There is no universal ideal number of Kubernetes nodes. The right node count depends on workload resource needs, availability requirements, failure-domain design, autoscaling strategy, and cloud-provider quotas. As of Kubernetes v1.36, Kubernetes supports clusters with up to 5,000 nodes, 110 pods per node, 150,000 total pods, and 300,000 total containers.
  3. Bare metal or virtual machines (VMs)—you can add nodes to the cluster by adding more physical machines or running additional VMs on the same bare metal machine. In the cloud, when using services like Amazon EC2, all resources are virtualized. VMs are less reliable than physical machines because if a machine fails, all the VMs (nodes) running on it shut down. However, VMs are more cost-effective.

Kubernetes Node Sizing and Autoscaling

Choosing the right number of Kubernetes nodes is only part of the planning process. You also need to decide how large each node should be, how much spare capacity to keep available, and how the cluster should scale when workloads change.

There is no single ideal node size for every Kubernetes cluster. Smaller nodes can reduce the impact of a single node failure, because fewer pods are affected when one node goes down. Larger nodes can improve resource efficiency and reduce operational overhead, but they also increase the blast radius of a failure. The right choice depends on your workload patterns, availability requirements, pod density, cloud provider limits, and cost strategy.

As a practical starting point, teams should evaluate node sizing across four areas:

Node sizing factorWhat to consider
Workload resource needsCPU, memory, storage, GPU, and network requirements for the pods running on the node
Pod densityHow many pods can safely run on one node without creating scheduling, networking, or resource-pressure issues
Failure impactHow many workloads would be disrupted if a single node became unavailable
Cost efficiencyWhether the cluster is wasting capacity or frequently running out of allocatable resources

Kubernetes supports large clusters, but there are still recommended scale boundaries. As of Kubernetes v1.36, Kubernetes is designed for clusters with no more than 5,000 nodes, 110 pods per node, 150,000 total pods, and 300,000 total containers. These limits are useful when designing high-scale clusters, but most teams should focus first on efficient node pools, resource requests, and autoscaling behavior.

Fewer large nodes vs. more small nodes

ApproachAdvantagesTradeoffsBest fit
Fewer large nodesBetter bin packing, fewer nodes to manage, potentially lower overheadLarger failure impact if a node goes downStable workloads with predictable resource usage
More small nodesSmaller failure impact, more flexible workload distributionMore node-level overhead and more objects to manageMixed workloads, high-availability services, variable demand
Specialized node poolsBetter isolation for specific workloads such as GPU, memory-heavy, or storage-heavy podsMore scheduling complexity and potential unused capacityAI/ML workloads, databases, regulated workloads, performance-sensitive apps

How autoscaling affects node planning

Kubernetes node planning should also account for autoscaling. The Cluster Autoscaler can add or remove nodes from preconfigured node groups. When pods cannot be scheduled because the current cluster does not have enough resources, the autoscaler can add nodes to the node group that best fits the pending pods. When nodes are underused and workloads can safely run elsewhere, it can remove unnecessary nodes.

For autoscaling to work well, pods need accurate CPU and memory requests. If requests are too low, Kubernetes may pack too many pods onto a node and create resource pressure. If requests are too high, the autoscaler may add more nodes than necessary and increase infrastructure costs.

A strong node autoscaling setup usually includes:

  • Separate node pools for different workload types
  • Accurate resource requests and limits
  • Enough spare capacity for traffic spikes and node failures
  • Pod disruption budgets for important workloads
  • Clear rules for scaling down underused nodes
  • Monitoring for unschedulable pods, resource pressure, and noisy neighbors

Node sizing is not a one-time decision. As workloads grow, teams should regularly review node utilization, pod density, failed scheduling events, and autoscaler activity to make sure the cluster has enough capacity without wasting compute.

What are Node Selector and Node Affinity?

Kubernetes allows you to flexibly control which nodes should run your pods. It is possible to manually assign a pod to a node, but in most cases, you will define a mechanism that allows Kubernetes to dynamically assign pods to nodes. Two of these mechanisms are node selectors and node affinity.

Both node selectors and affinity are closely tied to Kubernetes labels. A label is a metadata you can attach to a Kubernetes resource, which lets you identify and manage it.

Node Selector

A node selector lets you specify which nodes the pod should be deployed on. The Kubernetes scheduler reads the pod template (also called pod specification), searches for eligible nodes and deploys the pod.

The simplest type of node selection is the nodeSelector field of the podSpec. It is a set of key-value pairs, which lets you define labels that the node needs to match in order to be eligible to run the pod. This is known as a label selector.

Node Affinity

Node affinities provide an expressive language you can use to define which nodes to run a pod on. You can define:

  • Exact matches using the AND operator
  • Soft rules indicating a preference for a certain type of node, but allowing the Scheduler to deploy a pod even if the constraint cannot be met
  • Rules taking into account the labels of other pods on the same node, enabling you to define the colocation of pods

Node affinity is conceptually similar to nodeSelector – it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.

Taints and Tolerations

Taints are the opposite of affinity – a taint is like defining that a node “doesn’t like” a certain set of pods and those pods will, if possible, not schedule on the node. A node can have one or more taints defined on it.

You can define tolerations in pods templates, to indicate that despite a taint, you want to allow – not require – the pod to run on nodes that have a matching “taint”.

You can taints and tolerations to ensure pods are not scheduled onto nodes that are not appropriate for them.

Common Kubernetes Node Errors

Kubernetes node errors indicate an issue on a machine participating in a Kubernetes cluster, which can affect its ability to run and manage pods. Below are two common errors and what you can do about them.

Kubelet Stopped Posting Node Status (Kubernetes Node Not Ready)

If a node has a NotReady status for over five minutes, the status of pods running on it becomes Unknown, and new pods fail with ContainerCreating error.

How to identify the issue

  • Run the command kubectl get nodes and see if node status is NotReady
  • To check if pods are being moved to other nodes, run the command get pods and see if pods have the status ContainerCreating

Resolving the issue
In some cases, this issue will be resolved on its own if the node is able to recover or the user reboots it. If this doesn’t happen, you can remove the failed node from the cluster using the kubectl delete node command.

Learn more about Node Not Ready issues in Kubernetes.

Kubelet Node Not Found

This error indicates that kubelet is not running properly on the node, so it cannot participate in the Kubernetes cluster.

How to identify the issue
Run systemctl status kubelet and look for the message node [node-name] not found

Resolving the issue
A common way to resolve this issue is to reset the node using the kubeadm reset command, use kubeadm to recreate a token, and then use the new token in a kubectl join command.

Troubleshooting Kubernetes Node Issues with Komodor’s AI SRE Platform

Node issues rarely stay isolated. A NotReady node, DiskPressure condition, failed kubelet, noisy neighbor, or container runtime problem can quickly turn into pod failures, delayed deployments, failed scheduling events, and service-level incidents.

Komodor helps teams troubleshoot Kubernetes node issues by connecting node health, workload behavior, recent changes, events, alerts, logs, and ownership context in one place. Instead of investigating node problems manually across kubectl, dashboards, logs, alerts, and tickets, teams can use Komodor’s AI SRE platform to detect, investigate, and remediate issues faster.

With Komodor, teams can:

  • Detect node-related issues before they cascade into wider service incidents
  • Correlate node conditions with affected pods, deployments, workloads, and services
  • Identify resource pressure, noisy neighbors, and capacity constraints across clusters
  • Investigate NotReady nodes, failed scheduling events, kubelet issues, and runtime problems with full cluster context
  • Understand whether a node issue was caused by a deployment, configuration change, infrastructure change, or resource bottleneck
  • Use Klaudia, Komodor’s agentic AI technology, to accelerate root cause analysis and get guided remediation steps
  • Reduce MTTR by giving platform, DevOps, and SRE teams one place to investigate Kubernetes incidents

For example, if pods are stuck in ContainerCreating because a node is NotReady, Komodor can help connect the pod symptoms to the underlying node condition, recent cluster changes, related events, and impacted services. If a node is under MemoryPressure or DiskPressure, Komodor can help teams understand which workloads are consuming resources and whether the issue is isolated to one node, one node pool, or a larger cluster-wide pattern.

This is especially useful in multi-cluster environments, where node issues can be difficult to prioritize manually. Komodor gives teams a centralized view of Kubernetes health and reliability, while Klaudia helps investigate complex incidents, surface likely root causes, and recommend the next best action.

FAQs About Kubernetes Nodes

A Kubernetes node is a physical or virtual machine that runs workloads in a Kubernetes cluster. Nodes provide the CPU, memory, storage, and networking resources that pods need to run. Each node is managed by the Kubernetes control plane.

A Kubernetes node usually runs three main components: kubelet, a container runtime, and kube-proxy. Kubelet communicates with the control plane and makes sure pods are running as expected. The container runtime runs the containers, and kube-proxy helps manage network traffic for Kubernetes Services.

A node is the machine that provides compute resources. A pod is the smallest deployable workload unit in Kubernetes. In simple terms, pods run on nodes, and nodes provide the infrastructure that pods need.

A control plane node runs the components that manage the cluster, such as the API server, scheduler, controller manager, and etcd. A worker node runs application workloads. In production environments, the control plane and worker nodes are often separated for reliability and scalability.

Node NotReady means Kubernetes does not consider the node healthy enough to run workloads. This can happen because of kubelet problems, network issues, container runtime failures, resource pressure, or communication problems between the node and the control plane.

DiskPressure happens when a node is running low on available disk space. Common causes include too many container images, large logs, high ephemeral storage usage, or workloads writing too much data to the node filesystem. Kubernetes may evict pods if disk pressure becomes severe.

MemoryPressure happens when a node is running low on available memory. This is often caused by workloads using more memory than expected, missing or inaccurate memory requests and limits, memory leaks, or too many pods running on the same node.

The default recommended limit is usually 110 pods per node. The actual number can vary depending on cluster configuration, cloud provider limits, networking setup, workload resource usage, and operational requirements. Most teams should size nodes based on real CPU, memory, storage, and networking needs rather than aiming for the maximum pod count.

Kubernetes can support very large clusters, but the right number of nodes depends on workload size, availability needs, cost strategy, cloud provider quotas, and operational complexity. Instead of choosing a fixed node count, teams should design around workload requirements, failure domains, autoscaling behavior, and resource efficiency.

Fewer large nodes can improve resource efficiency and reduce management overhead, but each node failure can affect more workloads. More small nodes can reduce the impact of a single node failure and improve workload distribution, but they add more node-level overhead. The best choice depends on workload patterns, availability requirements, and cost goals.

Kubernetes node autoscaling adjusts the number of nodes in a cluster based on workload demand. When pods cannot be scheduled because there is not enough available capacity, autoscaling can add nodes. When nodes are underused and workloads can safely move elsewhere, autoscaling can remove unnecessary nodes.

Taints are applied to nodes to prevent pods from being scheduled there unless the pods have matching tolerations. They are commonly used for dedicated nodes, GPU nodes, infrastructure nodes, or workloads that need special isolation.

Pods can stay in Pending when Kubernetes cannot find a suitable node to run them. Common reasons include insufficient CPU or memory, node taints without matching tolerations, strict node affinity rules, unavailable node pools, or cluster autoscaling problems.

Pods may get stuck in ContainerCreating when the assigned node has problems pulling images, mounting volumes, connecting to the container runtime, configuring networking, or communicating with required services. Checking node events, pod events, kubelet logs, and container runtime status can help identify the cause.

AI SRE tools can help by correlating node health, pod behavior, recent changes, alerts, logs, and events in one place. For node issues like NotReady, DiskPressure, failed scheduling, or noisy neighbors, an AI SRE platform can help teams find the likely root cause faster and identify the next remediation steps.