A microservices architecture focuses on creating a high number of independent services. Each service is designed to be self-contained to handle a single business context. This loosely coupled approach enables each microservice to have its respective source code base, development language, and team. In addition, they can be packaged as small container images, deployed to clusters, updated, and scaled independently. 

Unlike monolith applications that have all services packaged into a single extensive application, you can now scale the individual microservices experiencing heavy use. To quickly create, deploy, and manage microservices, orchestration platforms like Kubernetes have also emerged in the past decade. 

This blog will discuss cloud-native modern scaling methods for microservices and their cloud infra, as well as show you how to use them correctly.

Scaling Methods for Microservices

Microservices are packaged as containers and run on virtual machines or bare metal servers. When applications get more popular with new users or other applications connecting to them, their resource usage also increases. However, servers are configured with limited CPU, memory, and storage, and when these resources are used up, scaling is the solution. 

There are two mainstream methods for increasing the number of available resources: vertical and horizontal scaling.

Vertical Scaling

Figure 1: Vertical scaling (Source: MongoDB)

Vertical scaling is the easiest method. It adds more resources to the server, such as new CPUs, memory units, or hard disks. If adding resources is not an option, moving the application to a larger server is also classified as vertical scaling. 

Horizontal Scaling

Figure 2: Horizontal scaling (Source: MongoDB

Horizontal scaling is the novel and more appropriate scaling for microservices where it adds new application instances to the stack. The load is distributed over the old and new instances so each application instance continues living within the resource limits.

Microservices run on the cloud or on-premises infrastructure, where scaling is undertaken for the applications, clusters, and infrastructure. Kubernetes is the de facto platform for deploying containerized microservices. The following sections will dive into different scaling options with Kubernetes and microservices in a cloud-native modern world.

Manual Methods

When applications get busier and consume more and more resources, it is possible to scale the microservices and then the cluster with the following approaches.

Horizontal Scaling of Single Microservices

When a microservice is overloaded and becomes a bottleneck, scaling up by increasing the number of instances is possible. In Kubernetes, you can update the replicas field in Deployment as follows:

apiVersion: apps/v1
kind: Deployment
  name: nginx
    app: nginx
  replicas: 3

Similarly, you can imperatively use the kubectl command to make scaling changes such as:

$ kubectl scale --replicas=4 deployment/nginx

When there are more microservice instances, Kubernetes will distribute the load, and the bottleneck will be obsolete. 

Vertical Scaling of a Cluster

Kubernetes clusters run the workloads on the worker nodes connected to the Kubernetes control plane. Let’s assume there are three nodes in the cluster with 16 GB of memory. If the nodes are maxed out, vertical scaling recommends changing the three nodes with a higher memory resource, such as 32 GB. With double the amount of total memory, it is now possible to deploy more applications. 

Horizontal Scaling of Cluster

When the Kubernetes nodes run out of resources, you can add new nodes to the cluster. If you have three nodes with 16 GB of memory and it is not enough, you can manually add any number of nodes with any configuration and increase the total resource amount. In other words, you can either add three more nodes with 16 GB memory or add a single node with 48 GB memory. 

Horizontal scaling of the cluster brings flexibility regarding node specifications and the number of nodes, which also helps reduce your cloud bill.

Automated Methods

Adding new nodes to clusters and changing the number of running pods is straightforward. However, these are all manual changes that require human interaction with the cluster. When it comes to the cloud-native modern era, it is burdensome to watch resource usage and immediately scale. 

Elastic scaling is the automatic approach of adding and removing compute, memory, storage, and networking infrastructure based on resource usage. Kubernetes offers elastic scaling out of the box for both clusters and microservices.

Elastic Scaling of a Cluster 

Cluster autoscaler is the Kubernetes component that automatically adjusts the size of the Kubernetes cluster. When there are unscheduled pods due to resource limitations or affinity rules, cluster autoscaler evaluates whether adding a new node will resolve the issue. If so, it scales the cluster up and opens up space for the unscheduled pods. 

Cluster autoscaler also keeps an eye out for opportunities to move some pods within PodDisruptionBudgets in order to free up some nodes for scaling down. While using cluster autoscaler, it is critical to ensure that all pods have their resource requests and limits configured since they are used for node pool size calculation.

Elastic Scaling of Single Microservices

Kubernetes offers two autoscaling mechanisms for single microservices. 

Kubernetes Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA) watches pods to compare actual usage and resource requests. When the resource usage reaches the configured target values, HPA increases the number of pods. For instance, you can configure HPA to scale up when the CPU usage exceeds 50% of the requested value:

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

In order to use HPA efficiently, you need to make sure your pods’ resource request configuration is correct. In addition, it’s possible to use custom metrics in HPA to work with business-critical indicators. 

Kubernetes Vertical Pod Autoscaler (VPA) 

Vertical Pod Autoscaling (VPA) watches for the actual resource usage of the pods in the cluster and tries to find over-committed resources. When there are substantial differences, VPA increases or decreases the resource requests of the pods. The critical drawback of VPA is that resource updates cause pods to restart—and possibly on another node. 

As a best practice, you should use VPA together with cluster autoscaler since they both try to optimize overall cluster utilization.

How to Scale Microservices Correctly?

To scale microservices in a dynamic environment like Kubernetes, you need to ensure that you have software architecture and deployment strategies that are compatible with the environment. When microservices are scaled, it’s critical to ensure that each instance works correctly while the whole swarm works coherently. For instance, let’s assume you use a local cache for each pod. Scaling up for more than one instance could lead to different responses to the same requests. 

Instead, it’s better to use shared caching systems between pod instances to ensure consistency. You should also implement data governance models and limit collisions between pods to make the swarm of pods work coherently. This means the architecture of your application stack should be properly structured for scaling up and down. 

Scaling manually or automatically relies on watching resource usage and metrics. So finding and monitoring what is vital for business continuity is critical. The first metrics to consider are CPU, memory, and storage since they‘re easy to monitor and integrate into autoscalers. The next step is to consider the Four Golden Signals:

  • Latency: Time taken to respond to a request
  • Traffic: Number of requests in a given time
  • Errors: Rate of failing requests
  • Saturation: Utilization of the service

When the application architecture is suitable for scalability and the correct metrics are utilized, the rest is simply configuring autoscalers and Kubernetes resources to have a cloud-native scalable microservice. However, as mentioned throughout this article, scaling manually or automatically is not straightforward. 

Kubernetes and the distributed applications running in the cluster create a reasonably complicated stack. With the dynamic autoscalers and automated deployments, it becomes even more challenging to troubleshoot and debug without cloud-native tooling. 

