What Are Kubernetes CronJobs?
A Kubernetes Job is a workload controller object that performs specific tasks on a cluster. It differs from most controller objects such as Deployments and ReplicaSets, which need to constantly reconcile the current state of the cluster with a desired configuration.
A Job has a much more limited function: it runs pods until they complete a specified task, and then terminates them.
A CronJob is the same as a regular Job, only it creates jobs on a schedule (with a syntax similar to the Linux cron utility). CronJobs are used to perform regularly scheduled tasks such as backups and report generation. You can define the tasks to run at specific times or repeat indefinitely (for example, daily or weekly).
This article is a part of a series on Kubernetes Troubleshooting.
CronJobs vs. Traditional Cron Tasks
Traditional cron tasks and Kubernetes CronJobs both allow you to schedule tasks at specific intervals. However, they operate in different environments and have distinct advantages and drawbacks.
Aspect | Traditional Cron Tasks | Kubernetes CronJobs |
Environment | Run on a single machine, often a server | Run in a Kubernetes cluster, leveraging containerization |
Management | Managed through the cron daemon on Unix-like systems | Managed by Kubernetes, utilizing the same mechanisms that handle other Kubernetes objects |
Configuration | Configured through crontab files, where each line specifies a task and its schedule | Defined using YAML manifests, offering more flexibility and standardization |
Resource Utilization | Directly use the machine’s resources, potentially leading to conflicts or overloads if not managed carefully | Isolate resources for each job using Kubernetes resource limits and requests, ensuring better resource management and minimizing conflicts |
Isolation | Lack of isolation between tasks; all tasks run in the same environment, which may cause dependency conflicts or security issues | Each CronJob runs in its own container, providing strong isolation and avoiding dependency conflicts |
Monitoring and Logging | Basic logging capabilities; monitoring usually requires additional tools or scripts | Integrated with Kubernetes’ logging and monitoring tools, allowing for more comprehensive tracking and troubleshooting |
Tips from the expert
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage Kubernetes CronJobs:
Monitor job execution
Use monitoring tools to track CronJob executions and failures.
Set concurrency policies
Define concurrency policies to control overlapping job executions.
Use resource requests and limits
Allocate appropriate resources to CronJobs to avoid resource contention.
Handle job retries
Configure retry policies for failed jobs to ensure task completion.
Log job output
Enable detailed logging for CronJobs to aid in troubleshooting.
How Do CronJobs Work?
Kubernetes CronJobs function by creating and managing jobs according to a defined schedule. Here’s a detailed look at how they operate:
- Definition: A CronJob is defined using a YAML manifest file. This file includes specifications such as the job name, the schedule, and the container image and commands to run. The schedule is specified in a format similar to the Unix cron syntax.
- Scheduling: The
spec.schedule
field in the CronJob’s manifest uses a five-field cron expression to define the timing and frequency of the job. This expression determines when Kubernetes will create a job from the CronJob template. - Job creation: At the specified times, the Kubernetes controller creates a Job object from the CronJob template. Each Job object manages one or more pods, which execute the defined task.
- Execution: The job controller starts the pods as defined in the job specification. These pods run the specified container image and execute the commands provided.
- Completion and cleanup: Once the task completes, the job controller marks the job as completed. Kubernetes can be configured to retain or clean up the job history based on the
successfulJobsHistoryLimit
andfailedJobsHistoryLimit
fields in the CronJob specification. - Retry and failure handling: CronJobs can be configured with retry policies to handle failures. If a job fails, Kubernetes can retry the execution based on the specified policy.
- Concurrency policy: CronJobs support concurrency policies that control how overlapping jobs are handled. You can choose to allow concurrent runs, forbid them, or replace running jobs with new ones when the next schedule is due.
What Are the Benefits of Kubernetes CronJobs?
Here are a few reasons CronJobs can be highly useful:
- CronJobs run in their own separate containers, letting you run an operation in the exact containers you need. This allows you to lock each CronJob to a specific version of a container, update each cron individually, and customize it with any specific dependencies it needs.
- You can choose how many resources a CronJob will receive, by setting the minimal available resources a node should have for the CronJob to run on it. You can also set resource limits to avoid overloading the node.
- CronJobs have a built-in retry policy. If a CronJob fails, you can define whether it should run again and how many times it should retry.
Related content: read our guide to fixing kubernetes node not ready error.
Kubernetes CronJobs: Quick Tutorial
How to Create Kubernetes CronJob
Creating a CronJob is very similar to creating a regular Job. We’ll need to define a YAML manifest file that includes the Job name, which containers to run, and commands to execute on the containers.
To create a Kuberntes CronJob:
1. Create a YAML file in a text editor.
nano [mycronjob].yaml
2. The CronJob YAML configuration should look something like this. Pay attention to the spec.schedule
field, which defines when and how often the Job should run. We explain the cron schedule syntax in the following section.
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.28
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello World
restartPolicy: OnFailure
A few important points about this code:
- The
spec:schedule
field specifies that the job should run once per day at midnight. - The
spec:template:containers
field specifies which container the CronJob should run – an BusyBox image. - Within the containers field we define a series of shell commands to run on the container – in this case, printing “Hello World” to the console.
- The
restartPolicy
can be eitherNever
orOnFailure
. If you use the latter, your code needs to be able to handle the possibility of restart after failure.
3. Create your CronJob in the cluster using this command:
kubectl apply -f [filename].yaml
4. Run the following command to monitor task execution:
kubectl get cronjob --watch
CronJob Schedule Syntax
The cron schedule syntax used by the spec.schedule
field has five characters, and each character represents one time unit.
FIELD | MINUTES | HOURS | DAYS IN A MONTH | MONTHS | WEEKDAYS |
Values | 0-59 | 0-23 | 1-31 | 1-12 | 0-6 |
Example 1 | 0 | 21 | * | * | 4 |
Example 2 | 0 | 0 | 12 | * | 5 |
Example 3 | * | */1 | * | * | * |
Here is what each of the examples means:
- Example 1 runs the job at 9 pm every Thursday
- Example 2 runs the job every Friday at midnight and also on the 12th of each month at midnight
- Example 3 runs the job every minute – the */n syntax means to repeat the job every interval
Kubernetes CronJobs Monitoring and Considerations
One CronJob can serve as the model for various jobs, but you may need to adjust it. Here are some considerations when defining a CronJob.
CronJob Concurrency Policy
CronJobs have embedded concurrency controls (a major difference from Unix cron) that let you disable concurrent execution, although Kubernetes enables concurrency by default. With concurrency enabled, a scheduled CronJob run will start even if the last run is incomplete. Concurrency is not desirable for jobs that require sequential execution.
You can control concurrency by configuring the concurrency policy on CronJob objects. You can set one of three values:
- Allow – this is the default setting.
- Forbid – prevents concurrent runs. Kubernetes skips scheduled starts if the last run hasn’t finished.
- Replace – terminates incomplete runs when the next job is scheduled, allowing the new run to proceed.
You can apply the concurrency policy to the cluster to create CronJobs that only permit a single run at any time.
Starting Deadline
A starting deadline determines if your scheduled CronJob run can start. This concept is specific to Kubernetes, defining how long each job run is eligible to begin after the scheduled time has lapsed. It is useful for jobs with disabled concurrency when job runs cannot always start on schedule.
The starting deadline seconds field controls this value. For example, a starting deadline of 15 seconds allows a limited delay – a job scheduled for 10:00:00 can start if the previous run ends at 10:00:14, but not if it ends at 10:00:15.
Retaining Job History
Another two values are the successful jobs history limit and the failed jobs history limit. They control the time limit for retaining the history of these job types (by default, three successful and one failed job). You can change these values – higher values keep the history for longer, which is useful for debugging.
CronJob Monitoring
Kubernetes allows you to monitor CronJobs with mechanisms like the kubectl command. The get command provides a CronJob’s definition and job run details. The jobs within a CronJob should have the CronJob name alongside an appended starting timestamp.After identifying an individual job, you can use a kubectl command to retrieve container logs:
$ kubectl logs job/example-cron-1648239040
Kubernetes CronJobs Errors
Kubernetes Not Scheduling CronJob
This error arises when CronJobs don’t fire as scheduled on Kubernetes. Manually firing the Job shows that it is functioning, yet the Pod for the cron job doesn’t appear.
Sample Scenario
A CronJob is scheduled to fire every 60 seconds on an instance of Microk8s but doesn’t happen. The user tries to fire the Job with the following command manually:
k create job --from=cronjob/demo-cron-job demo-cron-job
While the Job runs after this command, it doesn’t run as scheduled. Here is the manifest of the API object in YAML format:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: demo-cron-job
namespace: {{ .Values.global.namespace }}
labels:
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/release-name: {{ .Release.Name }}
app.kubernetes.io/release-namespace: {{ .Release.Namespace }}
spec:
schedule: "* * * * *"
concurrencyPolicy: Replace
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: demo-cron-job
image: demoImage
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- /usr/bin/curl -k http://restfulservices/api/demo-job
restartPolicy: OnFailure
A possible resolution is to restart the entire namespace and redeploy. However, you should check the following details In such a case:
- Was the CronJob scheduling working at one point and doesn’t anymore?
- If any cron pods show a “failed” status and if they do, check those pods for a reason behind the failure.
- Use the following command to see if the CronJob resource has anything in the events:
kubectl describe cronjob demo-cron-job -n tango
- Does the code that the CronJob runs take more than a minute to complete? In that case, the schedule is too congested and needs loosening.
- The CronJob controller has built-in restrictions like freezing a job and not scheduling it anymore if it misses scheduling it more than 100 times. Check for this and other restrictions, which you can find here.
- Do you scale the cluster down when it isn’t in use?
- Are there any third-party webhooks or plugins installed in the cluster? Such webhooks can interfere with pod creation.
- Does the namespace have any jobs created? Use the following command to check:
kubectl get jobs -n tango
If there are several job objects, investigate to see why they didn’t generate pods.
Kubernetes CronJob Stops Scheduling Jobs
This error arises when a CronJob stops scheduling the specified job. It is a common error to face when some part of the job fails consistently for a few times.
Sample Scenario
The user scheduled a CronJob that was functioning for some time before it stopped scheduling new jobs. The Job involved a step where it had to pull a container image and failed. Their manifest is shown below:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app.kubernetes.io/instance: demo-cron-job
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: cron
helm.sh/chart: cron-0.1.0
name: demo-cron-job
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
spec:
containers:
- args:
- -c
- npm run script
command:
- /bin/sh
env:
image:
imagePullPolicy: Always
name: cron
resources: {}
securityContext:
runAsUser: 1000
terminationMessagePath: /dev/demo-termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: 0/30 * * * *
successfulJobsHistoryLimit: 3
suspend: false
status: {}
Here, the spec.restartPolicy
specification is set to Never
. Hence, the entire Pod fails whenever a container in the Pod fails. However, the manifest doesn’t include the .spec.backoffLimit
field that specifies how many times the Job will retry before being considered a failed Job. Then, it resorts to the default value, which is 6. Hence, the Job here tries to pull the container image six times before considering it a failed job.
Here are some possible resolutions:
- Specify the
.spec.backoffLimit
field and set a high value - Set the
spec.restartPolicy
toonFailureso
the Pod stays on the node and only the failing container reruns - Consider setting the
imagePullPolicy
toifNotPresent
. This won’t force re-pulling the image on every job start, unless images are retagged
Error Status on Kubernetes Cron Job with Connection Refused
This error arises when the CronJob involves communicating with an API endpoint. If the endpoint doesn’t respond successfully, the job shows an error status.
Sample Scenario
The user has a cron job that hits a REST API’s endpoint to pull an image of the concerned application. The manifest of the job is as follows:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: demo-cronjob
labels:
app: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
release: {{ .Release.Name }}
spec:
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 2
failedJobsHistoryLimit: 2
startingDeadlineSeconds: 1800
jobTemplate:
spec:
template:
metadata:
name: demo-cronjob
labels:
app: demo
spec:
restartPolicy: OnFailure
containers:
- name: demo
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
command: ["/bin/sh", "-c", "curl http://localhost:8080/demo"]
readinessProbe:
httpGet:
path: "/demojob"
port: 8081
initialDelaySeconds: 300
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 3
livenessProbe:
httpGet:
path: "/demojob"
port: 8081
initialDelaySeconds: 300
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 3
resources:
requests:
cpu: 200m
memory: 4Gi
limits:
cpu: 1
memory: 8Gi
schedule: "*/40 * * * *"
The user then faces the following error:
curl: (7) Failed to connect to localhost port 8080: Connection refused
Here, the issue is that the user has provided command
and arguments which override the container image’s commands and arguments. The command here overrides the default entry point of the container, and no application starts. A possible resolution is to use a bash script that first sets up and runs the REST application, and then the Job can communicate with its endpoint.
Simplifying Kubernetes Management & Troubleshooting With Komodor
Kubernetes troubleshooting is complex and involves multiple components; you might experience errors that are difficult to diagnose and fix. Without the right tools and expertise in place, the troubleshooting process can become stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can – especially across hybrid cloud environments.
This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.