Karpenter is an open-source autoscaler for Kubernetes nodes, which can improve the efficiency and cost-effectiveness of running workloads on Kubernetes clusters. It was originally developed by Amazon Web Services (AWS), is licensed under the permissive Apache License 2.0, and has over 300 GitHub contributors.
Unlike traditional Kubernetes autoscalers that manage cluster node operations based on pre-defined metrics, Karpenter proactively adjusts compute resources to ensure applications have as many Kubernetes nodes as they need. Karpenter simplifies cluster management challenges such as over-provisioning and underutilization by dynamically provisioning the right size and type of resources based on application needs.
The original Karpenter project works in the AWS environment. However, forks of the project are available for Azure and other cloud environments.
Get Karpenter at the official GitHub repo
Get Karpenter for Azure here
Source: Karpenter
This is part of a series of articles about Kubernetes management
Key Features of Karpenter
Karpenter offers the following capabilities.
Rapid Scaling
Karpenter can quickly scale nodes in a Kubernetes cluster by making fast decisions to launch additional nodes when needed. It uses real-time metrics to forecast and provision resources, reducing latency in resource allocation. This capacity to respond rapidly to workload demands helps maintain application performance and ensures service reliability.
Resource Optimization
Karpenter focuses on deploying the most appropriate compute resources based on workload characteristics. By analyzing the actual usage and needs of running applications, it matches resources, which minimizes waste and reduces costs. The technology continuously refines these decisions as it learns from the environment’s patterns and behaviors.
Flexible Provisioning
Karpenter offers flexible resource provisioning, allowing users to specify requirements such as instance types, zones, and procurement options. The flexibility extends to adjusting these specifications dynamically, accommodating the changing needs of applications. This allows organizations to optimize their infrastructure in real time, aligning application requirements and cost-efficiency.
Node Lifecycle Management
Node lifecycle management with Karpenter is automated to handle various tasks, including node creation, updating, and eventual decommissioning. This process minimizes manual interventions and lowers the risk of human error. Karpenter’s intelligent lifecycle management also helps in maintaining cluster health and efficiency, ensuring nodes are only operational when needed and gracefully decommissioning them after their useful life.
How Does Karpenter Work?
Karpenter operates by integrating tightly with the Kubernetes API, continuously monitoring cluster state and workload demands. The process can be simplified as follows:
- Metrics collection: Karpenter gathers real-time metrics from the Kubernetes cluster, including pod requests, resource utilization, and node availability. This data provides insights into the current state of the cluster and the demands of running applications.
- Decision making: Based on the collected metrics, Karpenter evaluates whether the current node capacity can meet the application demands. If additional resources are required, Karpenter determines the optimal type and size of nodes to provision. It uses predefined policies and real-time data to make these decisions quickly and accurately.
- Node provisioning: Karpenter interfaces with cloud providers’ APIs to launch new nodes. It selects the appropriate instance types, taking into account factors like cost, availability, and performance. This ensures that the new nodes align with the cluster’s requirements.
- Node integration: Once the new nodes are provisioned, they are integrated into the Kubernetes cluster. Karpenter ensures that these nodes are properly configured and ready to handle workloads immediately.
- Continuous optimization: Karpenter continuously monitors the cluster’s performance and resource utilization. If it detects underutilized nodes or opportunities for cost savings, it can decommission nodes gracefully, reallocating workloads as needed. This dynamic adjustment helps maintain an optimal balance between performance and cost-efficiency.
Tips from the expert
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better utilize Karpenter:
Implement priority-based scheduling:
Integrate priority classes with Karpenter to ensure that critical workloads receive resources before less important ones. This can help in maintaining the performance of essential services during high demand.
Utilize mixed instance types:
Configure Karpenter to use a mix of instance types and sizes to optimize costs and availability. This strategy reduces dependency on specific instance types that might be in short supply.
Enable spot instance fallbacks:
Configure Karpenter to fallback to on-demand instances when spot instances are unavailable. This ensures that your applications remain resilient and available even during spot market fluctuations.
Use predictive scaling:
Implement predictive scaling models to anticipate future workloads and scale nodes proactively. This can reduce latency in resource allocation and improve overall application performance.
Optimize pod distribution:
Configure Karpenter to optimize pod distribution across nodes to minimize latency and maximize resource utilization. This can be achieved by fine-tuning node selectors and affinity rules.
Karpenter vs. Cluster Autoscaler
While both Karpenter and the traditional Kubernetes Cluster Autoscaler aim to manage node scaling, they differ in their approaches and capabilities:
Scaling Mechanism
Cluster autoscaler relies on predefined thresholds and scales based on pod scheduling events. It typically adds nodes when there are pending pods that cannot be scheduled due to resource constraints.
Karpenter uses real-time metrics and predictive analysis to proactively adjust resources. It can scale nodes up or down based on current and forecasted workload demands, resulting in faster and more accurate scaling decisions.
Flexibility
Cluster autoscaler is limited to adding and removing nodes based on predefined policies and thresholds.
Karpenter offers greater flexibility with dynamic provisioning options, allowing users to specify instance types, zones, and procurement strategies. It adapts to changing application needs in real time.
Resource Optimization
Cluster autoscaler focuses on ensuring pods are scheduled but might not always optimize resource utilization efficiently, potentially leading to over-provisioning.
Karpenter continuously optimizes resources by analyzing actual usage patterns, minimizing waste and reducing costs. It learns from the environment to make better provisioning decisions over time.
Node Lifecycle Management
Cluster autoscaler handles basic node management tasks but may require manual interventions for complex scenarios.
Karpenter automates the entire node lifecycle, from creation to decommissioning, reducing the need for manual management and minimizing the risk of human error.
Karpenter Limitations
When evaluating Karpenter, it’s important to be aware of the following limitations.
New Solution Still in Beta
As Karpenter is a relatively new solution and still in its beta phase, users might encounter bugs and feature gaps. The community and development team are actively working on improvements, but early adopters should be prepared for potential stability issues and may need to contribute to testing and feedback processes to help mature the project.
Configuration Complexity
Karpenter’s efficiency is heavily reliant on its configuration process. While it automates many aspects of cluster management, the initial setup and ongoing adjustments require deep knowledge and understanding of both Karpenter and the underlying Kubernetes framework. New users might find the learning curve steep, and misconfigurations can lead to inefficiencies.
Risk of Overspending
While Karpenter is intended to optimize costs by matching resource use to demand accurately, it requires careful configuration and understanding of workload patterns to achieve the expected financial benefits. Users must precisely define parameters to avoid over-provisioning, which can otherwise negate cost savings.
In the event of misconfiguration, costs might escalate quickly due to unnecessary scaling actions. Thus, continuous monitoring and adjustment of configurations are critical when using Karpenter.
Lack of Awareness of Real-Time Spot Prices
Karpenter’s ability to integrate real-time spot pricing information allows for cost-effective provisioning decisions. This awareness helps in selecting the most cost-efficient compute resources available, reducing expenses when market conditions are favorable.
However, reliance on spot instances can introduce risks, especially in volatile markets where availability may suddenly change. Implementing fallback strategies and understanding cloud provider pricing models is essential.
Tutorial: Creating a Kubernetes Cluster and Installing Karpenter
Here’s an overview of how to get started with Karpenter. Code and instructions are adapted from the Karpenter documentation.
Install the Necessary Utilities
Utilities that need to be installed include the AWS CLI, kubectl, eksctl (v0.180.0 or later), and Helm:
- Install the AWS CLI:
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /
- Install
kubectl
:
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
- Install eksctl:
curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
- Install Helm:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
- Configure the AWS CLI with a user that has sufficient privileges to create an EKS cluster. Verify that the CLI can authenticate properly by running:
aws sts get-caller-identity
Set Environment Variables
After installing the tools, proceed with these steps:
- Set the Karpenter and Kubernetes versions:
export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="0.37.0"
export K8S_VERSION="1.30"
- Set the relevant environment variables:
export AWS_PARTITION="aws"
export MY_CLUSTER="${USER}-karpenter-demo"
export AWS_DEFAULT_REGION="us-west-2"
export MY_AWS_ACCOUNT="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"
export ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
Create a Kubernetes Cluster
Create a basic cluster with eksctl
:
- This cluster configuration uses CloudFormation to set up the infrastructure needed by the EKS cluster. This script uses eksctl to create a Kubernetes cluster, configuring it to support Karpenter by setting IAM roles and node groups. Finally, the CloudFormation template sets up the infrastructure required by Karpenter, ensuring the cluster is ready for dynamic node scaling.
curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > "${TEMPOUT}" \
&& aws cloudformation deploy \
--stack-name "Karpenter-${MY_CLUSTER}" \
--template-file "${TEMPOUT}" \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${MY_CLUSTER}"
eksctl create cluster -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ${MY_CLUSTER}
region: ${AWS_DEFAULT_REGION}
version: "${KUBERNETES_VERSION}"
tags:
karpenter.sh/discovery: ${MY_CLUSTER}
iam:
withOIDC: true
podIdentityAssociations:
- namespace: "${KARPENTER_NAMESPACE}"
serviceAccountName: karpenter
roleName: ${MY_CLUSTER}-karpenter
permissionPolicyARNs:
- arn:${AWS_PARTITION}:iam::${MY_AWS_ACCOUNT}:policy/KarpenterControllerPolicy-${MY_CLUSTER}
iamIdentityMappings:
- arn: "arn:${AWS_PARTITION}:iam::${MY_AWS_ACCOUNT}:role/KarpenterNodeRole-${MY_CLUSTER}"
username: system:node:{{NameOfEC2PrivateDNS}}
groups:
- system:bootstrappers
- system:nodes
managedNodeGroups:
- instanceType: m5.large
amiFamily: AmazonLinux2
name: ${MY_CLUSTER}-ng
desiredCapacity: 2
minSize: 1
maxSize: 10
addons:
- name: eks-pod-identity-agent
EOF
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name "${MY_CLUSTER}" --query "cluster.endpoint" --output text)"
export KARPENTER_IAM_ROLE_ARN="arn:${AWS_PARTITION}:iam::${MY_AWS_ACCOUNT}:role/${MY_CLUSTER}-karpenter"
echo "${CLUSTER_ENDPOINT} ${KARPENTER_IAM_ROLE_ARN}"
The output should look similar to this:
- Unless your AWS account has already onboarded to EC2 Spot, you will need to create the service-linked role to avoid the ServiceLinkedRoleCreationNotPermitted error:
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true
Install Karpenter
You can install Karpenter using Helm:
helm registry logout public.ecr.aws
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
--set "settings.clusterName=${MY_CLUSTER}" \
--set "settings.interruptionQueue=${MY_CLUSTER}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
Create NodePool
To create a default NodePool that can handle different pod shapes, use the following script. This script creates a NodePool and an EC2NodeClass for Karpenter to manage.
The NodePool specifies requirements such as architecture, operating system, and instance types. It sets a limit on the CPU resources and defines policies for node consolidation to optimize resource usage. The EC2NodeClass includes configurations for Amazon Machine Images (AMIs), roles, and security groups, which ensure the new nodes meet the cluster’s security and operational standards.
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h # 30 * 24h = 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-${MY_CLUSTER}"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${MY_CLUSTER}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${MY_CLUSTER}"
amiSelectorTerms:
- id: "${ARM_AMI_ID}"
- id: "${AMD_AMI_ID}"
EOF
Scale Up the Deployment
Use the following script to deploy and scale up the application. The Deployment resource defines an application called “scaleup” with a placeholder container image.
Initially, the number of replicas is set to zero. The kubectl scale command then increases the number of replicas to three, prompting Karpenter to provision additional nodes if needed. The final command retrieves logs from the Karpenter controller to monitor the scaling activities and ensure that the deployment and node provisioning are functioning as expected.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: scaleup
spec:
replicas: 0
selector:
matchLabels:
app: scaleup
template:
metadata:
labels:
app: scaleup
spec:
terminationGracePeriodSeconds: 0
containers:
- name: scaleup
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1
EOF
kubectl scale deployment scaleup --replicas 3
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller
The output should look something like:
Simplifying Kubernetes Management with Komodor
Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.