Auto scaling in Kubernetes (Part 2)

Using linkerd for auto scaling

posted 2021-08-07 by Thomas Kooi

Kubernetes Autoscaling hpa linkerd ingress-nginx

In the first part of this post, we visited auto scaling using the metrics server. In part two, we will look at using custom metrics, specifically those from linkerd and ingress-nginx to perform auto scaling based on latency and requests per second.

Auto scaling with linkerd

Linkerd is a light weight service mesh and CNCF member project. It can help you get instant platform wide metrics for things such as success rates, latency, and many other traffic related metrics, without having to change your code.

For the first part of our auto scaling, we will look at using the metrics obtained from the linkerd proxy to scale our deployment up or down. For this to work, you need to have both linkerd and the viz plugin installed.

linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -

See the linkerd getting started documentation for detailed instructions on how to install linkerd.

Setting-up the adapter

The prometheus adapter is a project that implements the custom metrics apiservice from Kubernetes and connects to a Prometheus instance.

prometheus:
  url: http://prometheus.linkerd-viz.svc
  port: 9090
  path: ""

rules:
  custom:
    - seriesQuery: 'response_latency_ms_bucket{namespace!="",pod!=""}'
      resources:
        template: '<<.Resource>>'
      name:
        matches: '^(.*)_bucket$'
        as: "${1}_99th"
      metricsQuery: 'histogram_quantile(0.99, sum(irate(<<.Series>>{<<.LabelMatchers>>, direction="inbound", deployment!="", namespace!=""}[5m])) by (le, <<.GroupBy>>))'

Example

apiVersion: v1
kind: Namespace
metadata:
  name: scalingtest
  annotations:
    linkerd.io/inject: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  minReadySeconds: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: sampleapp
  template:
    metadata:
      labels:
        app: sampleapp
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: sampleapp
        image: nginx
        resources:
          requests:
            cpu: 100m
          limits:
            memory: "128Mi"
            cpu: "600m"
        ports:
        - containerPort: 80
          name: http
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
        startupProbe:
          httpGet:
            path: /
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  selector:
    app: sampleapp
  ports:
  - port: 80
    targetPort: 80

We apply the example.yaml into the cluster:

$ kubectl apply -f example.yaml 
namespace/scalingtest created
deployment.apps/sampleapp created
service/sampleapp created

$ kubectl get pod
NAME                         READY   STATUS        RESTARTS   AGE
sampleapp-7db7fdcd9d-95czg   2/2     Running       0          4s

Next, we deploy a load generator. For this, we make use of the slow_cooker project from Buoyant, the people behind linkerd.

kubectl run load-generator --image=buoyantio/slow_cooker -- -qps 100 -concurrency 10 http://sampleapp

This will generate 100 rps traffic against the deployed nginx. You can follow the logs of the load-generator pod to view various metrics, such as latency.

If you wait a minute or so, and run kubectl top pod, you will notice that the CPU usage of the nginx pod has risen.

$ kubectl top pod
NAME                        CPU(cores)   MEMORY(bytes)   
load-generator              128m         5Mi             
sampleapp-7db7fdcd9d-95czg  79m          2Mi

We will now start configuring a HPA policy:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sampleapp
  minReplicas: 1
  maxReplicas: 20
  metrics:
  # Scale based on request latency, linkerd-proxy metric
  - type: Object
    object:
      metric:
        name: response_latency_ms_99th
      describedObject:
        apiVersion: apps/v1
        kind: Deployment
        name: sampleapp
      target:
        type: AverageValue
        averageValue: 1000000m # 1s