Difficulty: Beginner
Estimated Time: 15 minutes

Scaling Your Applications, Automatically

Almost always there will be more than once instance of each of your applications on Kubernetes. Multiple instances provide both fault tolerance and increased traffic serving when the demand for your service increases. After all, why did you move your applications to a distributed platform like Kubernetes? Because you want to leverage the large amount of CPU, memory and I/O across your cluster. However, as you know these resources cost money so you only want your service replications to increase when the demand increases. When service demand is low the instances should scale down to save you money, and lessen your carbon footprint.

There are three types of scaling in Kubernetes:

  1. Horizontal Pod Scaling
  2. Cluster Node Scaling
  3. Vertical Pod Scaling

This scenario shows you how to achieve Horizontal Pod Scaling, automatically. While you can scale manually, you really want scaling to be automatic based on demand, so the complete name for thi Kubernetes feature is the Horizontal Pod Autoscalar (HPA).

Basic automatic scaling is simply achieved by declaring the CPU threshold and the minimum and maximum number of Pods to scale up and the minimum Pod count down. Exceeding the CPU threshold is monitored by observing the current CPU load metric and triggering scaling events when activity goes up or down within as specified time period. It's essentially a control loop comparing metrics against declared states.

In the following steps you will learn how to:

  • install the metrics-server for gathering metrics,
  • install a Pod that can be scaled,
  • define the scaling rules and the number of pods to scale up and down,
  • increase service demand to trigger scaling up,
  • observe scaling up and down.

The scenario introduced the fundamental techniques to scale up and down your Pods in a Kubernetes cluster using the Horizontal Pod AutoScalar (HPA). There are more complex rules that can be applied to the HPA triggering logic and the HPA can reference metrics from other metrics registeries such as Prometheus. The HPA uses the standardized Custom Metrics API to reference metrics from different sources.

Lessons Learned

With these steps you have learned how to:

  • ✔ install the metrics-server for gathering metrics,
  • ✔ install a Pod that can be scaled,
  • ✔ define the scaling rules and the number of Pods to scale up and down,
  • ✔ increase service demand to trigger scaling up,
  • ✔ observe scaling up and down.

References


For a deeper understanding of these topics and more join
Jonathan Johnson
at various conferences, symposiums, workshops and meetups.

Software Architectures ★ Speaker ★ Workshop Hosting ★ Kubernetes & Java Specialist

Scaling Your Applications, Automatically

Step 1 of 6

Your Kubernetes Cluster

For this scenario, Katacoda has just started a fresh Kubernetes cluster for you. Verify it's ready for your use.

kubectl version --short && \ kubectl get componentstatus && \ kubectl get nodes && \ kubectl cluster-info

The Helm package manager used for installing applications on Kubernetes is also available.

helm version --short

Kubernetes Dashboard

You can administer your cluster with the kubectl CLI tool or use the visual Kubernetes Dashboard. Use this script to access the protected Dashboard.

token.sh