Chaos engineering using ChaosMess
Chaos engineering is to proactively uncover potential issues in complex systems before they manifest in real-world scenarios, thereby improving system reliability and performance. Chaos Mesh is an open-source cloud-native Chaos Engineering platform and is in the incubating stage in CNCF landscape.
What kind of failure(s) Chaos Mesh support and how it helps
- PodChaos:
- Simulates pod faults such as pod unavailability or termination.
- Assists in identifying downstream behaviors for intermittent failures or system downtime.
- NetworkChaos:
- Simulates network faults.
- Assists in identifying situations where pods are running but the network is unreachable, experiencing high-time delays, or packet loss.
- Example: What happens if interservice communication breaks or experiences delays?
- StressChaos:
- Simulates stress within containers.
- Assists in identifying pod behavior when resources are heavily utilized.
- Example: How are response times impacted when resources are heavily used?
- HTTP Faults
- Interrupts the connection
- Simulates latency into the request or response
- Simulates replacing part of content in HTTP request or response messages
- Simulates adding additional content to HTTP request or response messages
- DNSChaos:
- Simulates incorrect DNS responses.
- Assists in identifying scenarios where DNS URLs are unreachable.
- TimeChaos:
- Simulates time offsets.
- Assists in identifying behaviors when schedules are moved forward or postponed.
- Example: When scheduled synchronization jobs are moved forward or postponed.
- AzureFaults:
- Simulates fault scenarios on specified Azure instances.
- Assists in identifying behaviors when instances are restarting or stopping.
Install Chaosmess
Follow the steps given: Chaosmess installation steps
ChaosMess Architecture
Chaos Mesh is built on Kubernetes CRD (Custom Resource Definition). Details available at https://chaos-mesh.org/docs/#architecture-overview
Choose the scope of permissions
If you want to give the account the appropriate permissions for all chaos experiments in the cluster, tick the Cluster scoped checkbox. If you specify a namespace in the Namespace dropdown, the account will only have permissions in the specified namespace.
In summary, there are two options to choose from:
Cluster scoped
: the account has permissions for all chaos experiments in the cluster.Namespace scoped
: the account has permissions for all chaos experiments in the specified namespace.
More info is available at https://chaos-mesh.org/docs/manage-user-permissions/
Token used by Chaos Mess to run the experiments
Sample code from
https://medium.com/nerd-for-tech/chaos-engineering-in-kubernetes-using-chaos-mesh-431c1587ef0a
kubectl create ns chaos-k8s
kubectl config set-context --current --namespace=chaos-k8s
kubectl create deploy httpd --image=httpd --replicas=2
kubectl expose deploy --port=80 httpd
kubectl create deploy nginx --image=nginx --replicas=2
We can create an experiment using the UI or can apply the below using Kubectl apply -f
One-time chaos mesh experiment
kind: PodChaos
apiVersion: chaos-mesh.org/v1alpha1
metadata:
namespace: chaos-k8s
name: pod-fails
spec:
selector:
namespaces:
- chaos-k8s
labelSelectors:
app: httpd
mode: all
action: pod-failure
duration: 5m
Kind - what kind of chaos
selector - is all about where it should this apply
namespaces - which name space
lable selectors - which pods
action - what kind of failure
duration - how long experiment should run
verification
kubectl get pods (take one of nginx pod id)
kubectl -it exec {podid} /bin/sh
# curl httpd (while experiment running)
curl: (7) Failed to connect to httpd port 80 after 38 ms: Couldn't connect to server
## curl httpd (when experiment stopped)
<html><body><h1>It works!</h1></body></html>
Scheduled or Cyclic Chaos experiments
kind: Schedule
apiVersion: chaos-mesh.org/v1alpha1
metadata:
namespace: chaos-k8s
name: http-fail-scheduled
spec:
schedule: '* * * * *'
startingDeadlineSeconds: null
concurrencyPolicy: Forbid
historyLimit: 1
type: HTTPChaos
httpChaos:
selector:
namespaces:
- chaos-k8s
labelSelectors:
app: httpd
mode: all
target: Response
delay: 10s
port: 80
path: '*'
method: GET
duration: 50s
verification
kubectl get pods (take one of nginx pod id)
kubectl -it exec {podid} /bin/sh
# curl httpd -s -o /dev/null -w "%{time_starttransfer}\n"
10.005104 (while experiment running)
# curl httpd -s -o /dev/null -w "%{time_starttransfer}\n"
0.001722 (after experiment run)
Chaosmesh workflow
Workflow supports injecting a series of faults.
Currently, Chaos Mesh Workflow supports the following features:
- Serial Orchestration
- Parallel Orchestration
- Customized tasks
- Conditional branch
apiVersion: chaos-mesh.org/v1alpha1
kind: Workflow
metadata:
name: parallel-workflow
namespace: chaos-k8s
spec:
entry: entry
templates:
- name: entry
templateType: Serial
children:
- work-flow
- httpd-pod-fail
- name: work-flow
templateType: Parallel
- name: httpd-pod-fail
templateType: PodChaos
deadline: 5m
podChaos:
action: pod-failure
selector:
namespaces:
- chaos-k8s
labelSelectors:
app: httpd
mode: all
verification
kubectl get po
NAME READY STATUS RESTARTS AGE
httpd-5c98f79dfc-g86ds 1/1 Running 2 (3m37s ago) 13m
httpd-5c98f79dfc-km68s 1/1 Running 2 (3m37s ago) 13m