Difficulty: Intermediate
Estimated Time: 30-40 minutes

Chaos Engineering is a way of testing the resiliency of distributed systems. What you do when doing chaos engineering is that we turn off parts of the system in a controlled manner. The idea is that this way bugs that are induced by failures in a distributed setting will be uncovered in a controlled way while engineers are hopefully in the office. As more of these bugs are fixed this will lead to improved resiliency and decreased downtime over time.

This was first popularized by Netflix's Chaos Monkey which runs in Netflix's orchestration system Spinnaker and randomly terminates virtual machines and containers. It has been used to great success within Netflix to increase resiliency of their systems source 1, source 2.

In this tutorial we are going to be causing chaos in Kubernetes cluster using kube-monkey which is an implementation of Chaos Monkey specifically for Kubernetes. After "kube-monkey" is deployed it will randomly turn off pods in the cluster in a controlled manner. It leverages the Kubernetes API to find pods and turn them off. This means we need to give the deployment special permissions. The easy way is deploying in the "kube-system" namespace - but this means it will be granted way more permissions than it requires. In this tutorial, following the principle of least privilege we are going to deploy it with only the permissions it requires.

Chaos Engineering in Kubernetes

Step 1 of 6

Initialization

But before we can get started we need to initialize the cluster, to do this run:

./init.sh

This will start a minikube kubernetes cluster. It will also add single deployment containing a http server which will be deployed into three replica pods.