Disrupted? Why You Need Pod Disruption Budgets and How to Use Them

What Are Kubernetes Pod Disruption Budgets (PDBs)? 

Kubernetes Pod Disruption Budgets (PDBs) are a feature designed to ensure that a specified minimum number of pods are always running for an application, even during voluntary disruptions such as upgrades or maintenance.

By defining a PDB, developers can set policies that limit the number of pods that can be simultaneously disrupted, maintaining the application’s availability and reliability. A PDB applies to pods that match certain criteria defined by the user, typically through label selectors. It specifies either a minimum number of available pods or a maximum percentage of pods that may be unavailable during voluntary disruptions. 

This mechanism helps in preventing the application from becoming unavailable or underperforming due to an insufficient number of running pods, ensuring continuous operation and service reliability, especially in production environments.

How Do Pod Disruption Budgets Work?

Pod Disruption Budgets define a set of conditions that must be met before Kubernetes can safely perform voluntary disruptions on pods. 

When an operation, such as a node upgrade or application scaling, requires pod termination, Kubernetes first checks the PDB associated with those pods. If the operation would violate the PDB’s conditions—either by making too many pods unavailable or dropping below the minimum availability threshold—the operation is delayed until it can comply with the PDB.

This process ensures that critical applications maintain their required level of service even during maintenance activities. For example, if a PDB specifies that at least three instances of a particular service must be available at all times, Kubernetes will not evict pods from this service if doing so would reduce its running instances below three. 

Similarly, if a maximum unavailability is defined, Kubernetes respects this limit and orchestrates pod disruptions in a way that does not exceed it. This selective disruption management allows for cluster operations and maintenance while preserving application stability and availability.

Related content: Read our guide to horizontal pod autoscaler

Pod Disruption Budget Benefits 

Implementing PDBs in Kubernetes clusters provides the following benefits:

  • Improved application resilience: Ensures that critical services maintain their desired availability levels during planned disruptions. This is particularly beneficial for high-traffic applications and services that require strict uptime guarantees. By limiting the number of simultaneous pod disruptions, PDBs help avoid service degradation or downtime.
  • Smoother cluster operations: Provides a safety net that allows system administrators to perform necessary updates, scaling, or node replacements with minimal risk of inadvertently causing outages. This improves operational efficiency and increases confidence in the stability and reliability of the Kubernetes environment.
  • Easier maintenance: Grants a high level of control over disruption management, making it easier for admins to avoid issues during voluntary disruptions.

Example: How to Use Pod Disruption Budgets

Here’s an overview of how to configure, manage and monitor PDBs in Kubernetes. 

Configure a PDB 

To configure a Pod Disruption Budget, you define it in a YAML file. This file outlines the rules that Kubernetes must follow when performing voluntary disruptions, ensuring that the specified number of pods remains available. Here is an example of a simple PDB configuration:

apiVersion: policy/v1
kind: PodDisruptionBudget
name: example-pdb
minAvailable: 1
app: exampleapp

In this configuration, minAvailable specifies the minimum number of pods that must remain available at all times, ensuring at least one instance of the application is always running. The selector uses matchLabels to identify which pods fall under this PDB’s policy, targeting those with the app: exampleapp label. This setup prevents Kubernetes from evicting all instances during operations like updates or node maintenance.
To adapt to different scenarios or more complex applications, you might opt for specifying maxUnavailable instead of minAvailable. This approach allows a certain percentage or number of pods to be unavailable during disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
name: my-pdb-flexible
maxUnavailable: 25%
app: myflexibleapp

Here, maxUnavailable allows up to a quarter of targeted pods to be disrupted simultaneously. This is particularly useful for applications that can tolerate temporary reductions in capacity without affecting overall functionality or performance.

Monitor and Manage PDBs

To manage and monitor Pod Disruption Budgets, you can use Kubernetes command-line tools like kubectl. For example, to get a list of all PDBs in your system, you would execute the kubectl get pdb command. This command provides a concise overview of existing PDBs, including their names and how many pods are currently healthy versus the total number of pods that match the PDB’s selector criteria. 

For more detailed information about a specific PDB, such as its configuration and status, use this command:

kubectl describe pdb <pdb-name>

The describe pdb command outputs detailed information including the selector criteria, current status of pods matching the PDB criteria (e.g., how many are currently available), and events related to the PDB. This is useful for troubleshooting or ensuring that your disruption budgets are configured correctly.

Best Practices for Using Pod Disruption Budgets

Here are some of the ways that you can ensure the availability and stability of your Kubernetes applications with PDBs.

Use Selectors Correctly

Selectors define the scope of a PDB, targeting the pods it will protect. They function by matching labels assigned to pods, enabling precise control over which pods fall under the PDB’s policy. This precision is crucial for ensuring that PDBs apply only to intended pods, avoiding unintended protection of non-critical pods that could otherwise consume valuable resources.

To maximize the effectiveness of selectors, use clear and consistent labeling strategies for your pods. Labels should accurately reflect the role, environment, and other significant characteristics of each pod. This enables more efficient resource management and enhances the clarity and maintainability of your Kubernetes configurations. 

Use Percentage-Based Disruption Budgets 

Instead of specifying a fixed number of pods that must remain available, you can define the required availability as a percentage. This approach automatically adjusts to the size of your deployment, ensuring that the proportion of available pods remains constant, regardless of how your application scales.

For example, setting minAvailable to 60% means that at least 60% of the targeted pods must always be available. This method is particularly useful in dynamic environments where application workloads change or scale frequently.

Integrate PDBs with Higher-Level Objects

Integrating PDBs with Kubernetes objects like deployments, StatefulSets, and ReplicaSets enhances application resilience. These objects manage the pods’ desired state, including their number and lifecycle. When a PDB is used in conjunction with these controllers, it ensures that voluntary disruptions do not cause the number of available replicas to fall below the specified threshold. 

This setup automates the enforcement of availability policies during updates or node maintenance. It also simplifies management by leveraging the controllers’ built-in mechanisms for rolling updates and automatic replacements. For example, when a deployment is updated, it tries to maintain availability according to its strategy (e.g., RollingUpdate), while also respecting any PDBs that apply to its pods. 

Prepare for Involuntary Disruptions 

Pod Disruption Budgets are designed to protect against voluntary disruptions, but it’s essential to consider involuntary disruptions such as hardware failures or network issues. These events can impact your application’s availability outside the scope of PDBs. 

To mitigate these risks, ensure your Kubernetes cluster is sufficiently resilient. This involves strategies like distributing pods across multiple nodes or availability zones, which helps prevent a single point of failure from affecting your application’s overall availability. Monitoring and alerting systems can provide early warnings for involuntary disruptions, allowing you to take action before they impact your services. 

Consistently Monitor the PDBs 

Monitoring PDBs is essential for ensuring they function as intended and provide the expected level of protection against disruptions. Regular monitoring allows you to verify that the PDB settings align with your application’s availability requirements and to adjust them as those requirements evolve. 

Use kubectl get pdb to list all PDBs and their current status, which shows how many pods are protected and whether any disruptions are currently blocked due to the PDB constraints.

In addition to command-line tools, consider integrating Kubernetes monitoring solutions that can track PDB status changes and alert you to potential issues. 

Solving Kubernetes Node Errors Once and for All with Komodor

Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.

Komodor can help with its ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:

  • See service-to-node associations
  • Correlate service and node health issues
  • Gain visibility over node capacity allocations, restrictions, and limitations
  • Identify “noisy neighbors” that use up cluster resources
  • Keep track of changes in managed clusters
  • Get fast access to historical node-level event data

Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues. As the leading Continuous Kubernetes Reliability Platform, Komodor is designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.

By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial