Home
Learning Center
Ultimate Guide to Kubernetes Operators and How to Create New Operators

Ultimate Guide to Kubernetes Operators and How to Create New Operators

Itiel Shwartz, CTO & co-founder

8 min read February 1st, 2023

What Is a Kubernetes Operator?

A Kubernetes operator is a method of packaging, deploying, and managing a Kubernetes application. An operator uses the Kubernetes API to automate tasks such as deployment, scaling, and management of applications.

Operators are typically implemented as custom controllers that extend the Kubernetes API with new resources, and provide custom logic for managing these resources. For example, a database operator might create a custom resource called Database that represents a database instance, and provide custom logic for creating and managing instances of this resource.

Operators allow you to encode operational knowledge and best practices for running a specific application on Kubernetes into the operator itself, making it easier to deploy and manage complex applications on Kubernetes. They also allow you to use the Kubernetes API and tools such as kubectl to manage your applications, rather than having to use custom scripts or tools.

Operators are an increasingly popular way to deploy and manage applications on Kubernetes, and are used by a growing number of organizations to automate the management of complex applications such as databases, message brokers, and other types of infrastructure.

This is part of a series of articles about Kubernetes Architecture.

4 Problems Kubernetes Operators Can Solve

Kubernetes has many powerful features for deploying and managing applications at scale, but it does have some limitations that operators can help to address. Some of the main limitations that operators can solve are:

Complexity: Complex applications often require a lot of custom configuration and operational knowledge to deploy and manage on Kubernetes. Operators can encode this knowledge into the operator itself, making it easier to deploy and manage these applications.
Custom logic: Kubernetes provides a set of core features that can be used to deploy and manage applications, but it may not always have the specific features needed to manage certain types of applications. Operators can provide custom logic to handle these cases, making it possible to use Kubernetes to manage a wider range of applications.
Custom resource definitions: Some applications require custom resources that are not part of the core Kubernetes API. Operators can create custom resource definitions (CRDs) to represent these resources, and provide custom logic for managing them.
Ongoing management: Applications often require ongoing management, such as updates, backups, and scaling. Operators can provide custom logic to handle these tasks, making it easier to manage applications over time.

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better work with Kubernetes Operators:

Use finalizers for cleanup tasks

When a custom resource is deleted, using finalizers ensures that your operator can perform necessary cleanup tasks, such as deleting associated resources or taking final backups, before the resource is completely removed.

Leverage CRD validation

Use Custom Resource Definition (CRD) validation to enforce schema constraints on custom resources. This helps prevent invalid resource definitions from causing runtime issues and improves overall stability.

Implement leader election

For high availability, implement leader election in your operators. This ensures that only one instance of your operator handles resource management tasks, preventing conflicts and redundant operations.

Use versioning for custom resources

Manage changes in your custom resources by versioning your CRDs. This allows you to introduce new features or changes without breaking existing deployments, and enables smoother upgrades.

Integrate with Prometheus for monitoring

Expose custom metrics from your operators and integrate with Prometheus. This allows you to monitor the health and performance of your operator and the resources it manages, providing valuable insights and alerts.

How Operators Manage Kubernetes Applications

Kubernetes operators are designed to manage Kubernetes applications in a more automated and efficient way. They do this by providing domain-specific knowledge and custom logic to handle the deployment and ongoing management of applications on Kubernetes. Here are some of the main ways in which operators manage Kubernetes applications:

Domain-specific knowledge: Operators can provide domain-specific knowledge and custom logic to handle the deployment and management of applications in a specific domain, such as databases or message brokers. This can make it easier to deploy and manage complex applications that require specialized knowledge.
Removing difficult manual tasks: Operators can automate a wide range of tasks that would otherwise be performed manually, such as updates, backups, and scaling. This can make it easier to manage applications over time and reduce the workload of operators.
Making it easier to deploy foundation services: Operators can make it easier to deploy and run the foundation services that applications depend on, such as databases, message brokers, and other types of infrastructure. This can save time and effort when deploying applications on Kubernetes.
Providing a consistent way to distribute software: Operators can provide a consistent way to distribute software on Kubernetes clusters, making it easier to deploy applications consistently across multiple clusters.
Reducing support burdens: Operators can help to identify and correct problems with applications, reducing the support burden on operators and making it easier to manage applications over time.
Implementing SRE: Operators can help to implement site reliability engineering (SRE) principles in Kubernetes, making it easier to ensure that applications are reliable and available.

Popular Kubernetes Operators

There are many popular Kubernetes operators available that can be used to deploy and manage applications on a Kubernetes cluster. Some of the most popular operators include:

Prometheus

Prometheus is a popular open-source monitoring and alerting system. It is designed to collect metrics from various sources, including Kubernetes, and store them in a time-series database. Prometheus can be used to monitor the health and performance of a Kubernetes cluster, and to trigger alerts when certain conditions are met.

Grafana

Grafana is an open-source visualization and analytics platform. It is often used in conjunction with Prometheus to display metrics and provide insights into the performance and health of a system. Grafana provides a range of visualization options, including graphs, gauges, and dashboards, and can be used to monitor a variety of metrics.

Elastic Cloud on Kubernetes Operator

Elastic Cloud on Kubernetes (ECK) is an operator for deploying and managing Elasticsearch and Kibana on Kubernetes. It provides an easy way to deploy and manage Elasticsearch clusters on Kubernetes, and includes features such as automatic scaling, rolling updates, and disaster recovery.

RBAC Manager

RBAC Manager is a Kubernetes operator that helps to manage role-based access control (RBAC) in a cluster. It provides a set of custom resources for defining and managing RBAC rules, and includes features such as automatic synchronization of RBAC rules with the cluster state.

What Is the Operator SDK?

The Operator SDK is a toolkit for building Kubernetes operators. It includes a CLI, a set of libraries, and a number of tools that make it easier to develop and maintain operators.

Some of the main components of the Operator SDK are:

CLI: The Operator SDK includes a CLI that provides a number of commands for developing and maintaining operators. These commands can be used to create a new operator project, generate code and manifests, and build and test the operator.
Make build automation tool: The Operator SDK uses the Make build automation tool to manage the build process for operators. Make is a powerful tool that allows you to define a set of build rules for your operator, and then automate the build process using these rules.
Pre-built Make commands: The Operator SDK includes a set of pre-built Make commands that can be used to automate common tasks such as building and testing the operator, and generating manifests.
Operator Lifecycle Manager (OLM): The Operator Lifecycle Manager (OLM) is a component of the Operator SDK that helps to manage the lifecycle of operators in a cluster. It includes a set of controllers that handle tasks such as installing, upgrading, and uninstalling operators, and ensuring that they are running correctly.

An Example: Creating a Simple Kubernetes Operator

To create a Kubernetes operator, you first define a custom resource definition (CRD) for the resource you want to manage. For example, you might define a CRD named SampleDB to represent a sample database instance.

Next, you write a custom controller that watches for instances of the SampleDB resource and performs actions based on changes to these resources. For example, the controller might deploy a database instance when a SampleDB resource is created, or take a backup of the database when the SampleDB resource is updated.

To deploy an operator, you first build the operator using the Operator SDK or another tool. Then you create a deployment in the cluster that runs the operator, and create an instance of the SampleDB resource using kubectl or another tool.

Once the operator is deployed and the SampleDB resource is created, the operator will begin to manage the resource. It will perform tasks such as deploying the database, taking backups, handling upgrades, and simulating failure.

Here is an example showing how to define a CRD and write a controller for the new operator:

Step 1: Define the SampleDB custom resource definition (CRD)

apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata:   name: sampledbs.app.example.com spec:   group: app.example.com   names:     kind: SampleDB     plural: sampledbs   scope: Namespaced   version: v1

Step 2: Write the operator controller:

package main  import ( 	"fmt" 	"time"  	"github.com/operator-framework/operator-sdk/pkg/sdk" 	"github.com/operator-framework/operator-sdk/pkg/util/k8sutil" 	sdkVersion "github.com/operator-framework/operator-sdk/version"  	"github.com/example/app-operator/pkg/apis" 	"github.com/example/app-operator/pkg/controller" )  func main() { 	sdk.ExposeMetricsPort()  	resource := "app.example.com/v1/sampledbs" 	kind := "SampleDB" 	namespace, err := k8sutil.GetWatchNamespace() 	if err != nil { 		fmt.Println(err) 		os.Exit(1) 	} 	resyncPeriod := time.Duration(5) * time.Second 	logger := log.NewLogfmtLogger(os.Stderr) 	logger = log.With(logger, "ts", log.DefaultTimestampUTC) 	logger = log.With(logger, "caller", log.DefaultCaller)  	ctx := sdk.New(sdkVersion.Version, sdk.WithLogger(logger), sdk.WithNamespace(namespace))  	sdk.Watch(resource, kind, namespace, resyncPeriod) 	sdk.Handle(controller.NewHandler() ) }

Best Practices for Writing Kubernetes Operators

Use the Operator SDK

There are several reasons why you might use the Operator SDK when developing a Kubernetes operator:

Simplify operator development: The Operator SDK provides a set of tools and libraries that make it easier to develop an operator, including a CLI for generating code and manifests, and a Makefile for automating the build process. This can save time and effort when developing an operator, and reduce the risk of errors.
Improve operator quality: The Operator SDK includes a number of best practices for developing operators, and can help to ensure that your operator is of high quality. It also includes a test framework that can be used to write unit and integration tests for your operator, helping to ensure that it is reliable and maintainable.
Integrate with other operator-related tools: The Operator SDK integrates with other operator-related tools, such as the Operator Lifecycle Manager (OLM) and the Operator Registry. This makes it easier to deploy and manage your operator, and can simplify the process of distributing your operator to other users.
Support for multiple languages: The Operator SDK supports the development of operators in multiple languages, including Go, Ansible, and Helm. This makes it possible to use the language that you are most familiar with when developing your operator.

Avoid Overstuffed Functions

Here are some specific reasons why avoiding overstuffed functions is important:

Readability: Overstuffed functions are often difficult to read and understand because they contain a large amount of code and logic. This can make it harder for other developers to understand what the function does, and can increase the risk of errors. By avoiding overstuffed functions, you can make your code more readable and easier to understand.
Maintainability: Overstuffed functions are often hard to maintain because they contain a lot of code and logic in a single location. This can make it difficult to modify or update the function, and can increase the risk of breaking the function when making changes. By avoiding overstuffed functions, you can make your code more maintainable and easier to modify.
Testability: Overstuffed functions can be difficult to test because they often contain a lot of code and logic, and may have multiple return points or side effects. This can make it hard to write reliable tests that cover all of the code in the function. By avoiding overstuffed functions, you can make your code more testable and easier to write reliable tests for.

One Custom Resource Modification at a Time

In Kubernetes, operators use a reconcile loop to manage the state of a custom resource. The reconcile loop is a loop that runs continuously, and is responsible for checking the current state of the custom resource and making any necessary changes to bring it into the desired state.

If an operator makes multiple modifications to a custom resource at the same time, it can be difficult to determine the cause of any errors or issues that may occur. This can make it harder to troubleshoot and fix problems with the operator.

On the other hand, if an operator makes only one modification at a time, it is easier to determine the cause of any errors or issues that may occur. This can make it easier to troubleshoot and fix problems with the operator, and can improve its reliability.

Additionally, making one custom resource modification at a time can also improve the performance of the operator. By making fewer changes at once, the operator can avoid overloading the Kubernetes API server and reduce the risk of delays or timeouts.

Wrap External Dependencies

External dependencies are libraries or services that an operator relies on to perform its functions. These dependencies can include things like database drivers, HTTP clients, and other types of libraries or services.

If an operator directly depends on external dependencies, it can be difficult to handle errors or issues that may occur with these dependencies. For example, if an external dependency is unavailable or returns an error, the operator may fail or behave unexpectedly.

By wrapping external dependencies in a layer of abstraction, you can create a more robust and reliable operator. The wrapper can handle errors and issues with the external dependencies, and provide a consistent interface for the operator to use. This can make it easier to handle errors and issues with external dependencies, and improve the reliability of the operator.

In addition to improving reliability, wrapping external dependencies can also make your operator more maintainable. By abstracting the dependencies behind a wrapper, you can more easily swap out or update the dependencies without changing the rest of the operator code. This can save time and effort when maintaining the operator, and can make it easier to keep the operator up to date.

Latest Articles

Beyond Karpenter: The True Limits of Node Autoscaling

Ultimate Guide to Kubernetes Operators and How to Create New Operators

What Is a Kubernetes Operator?

4 Problems Kubernetes Operators Can Solve

Tips from the expert

Use finalizers for cleanup tasks

Leverage CRD validation

Implement leader election

Use versioning for custom resources

Integrate with Prometheus for monitoring

How Operators Manage Kubernetes Applications

Popular Kubernetes Operators

Prometheus

Grafana

Elastic Cloud on Kubernetes Operator

RBAC Manager

What Is the Operator SDK?

An Example: Creating a Simple Kubernetes Operator

Best Practices for Writing Kubernetes Operators

Use the Operator SDK

Avoid Overstuffed Functions

One Custom Resource Modification at a Time

Wrap External Dependencies

Latest Articles

Beyond Karpenter: The True Limits of Node Autoscaling

Kubernetes for Financial Services: Compliance, Resilience, and Operations

How Does AI Contribute to Cloud Resource Optimization?

Ultimate Guide to Kubernetes Operators and How to Create New Operators

What Is a Kubernetes Operator?

4 Problems Kubernetes Operators Can Solve

Tips from the expert

Use finalizers for cleanup tasks

Leverage CRD validation

Implement leader election

Use versioning for custom resources

Integrate with Prometheus for monitoring

How Operators Manage Kubernetes Applications

Popular Kubernetes Operators

Prometheus

Grafana

Elastic Cloud on Kubernetes Operator

RBAC Manager

What Is the Operator SDK?

An Example: Creating a Simple Kubernetes Operator

Best Practices for Writing Kubernetes Operators

Use the Operator SDK

Avoid Overstuffed Functions

One Custom Resource Modification at a Time

Wrap External Dependencies

Latest Articles

Beyond Karpenter: The True Limits of Node Autoscaling

Kubernetes for Financial Services: Compliance, Resilience, and Operations

How Does AI Contribute to Cloud Resource Optimization?

Get started with Komodor

Get started with Komodor

AI SRE Summit 2026

You're In!