posted 2021-08-07 by Thomas Kooi
Kubernetes Autoscaling hpa linkerd ingress-nginx
In the first part of this post, we visited auto scaling using the metrics server. In part two, we will look at using custom metrics, specifically those from linkerd and ingress-nginx to perform auto scaling based on latency and requests per second.
Linkerd is a light weight service mesh and CNCF member project. It can help you get instant platform wide metrics for things such as success rates, latency, and many other traffic related metrics, without having to change your code.
For the first part of our auto scaling, we will look at using the metrics obtained from the linkerd proxy to scale our deployment up or down. For this to work, you need to have both linkerd and the viz plugin installed.
linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -
See the linkerd getting started documentation for detailed instructions on how to install linkerd.
The prometheus adapter is a project that implements the custom metrics apiservice from Kubernetes and connects to a Prometheus instance.
prometheus:
url: http://prometheus.linkerd-viz.svc
port: 9090
path: ""
rules:
custom:
- seriesQuery: 'response_latency_ms_bucket{namespace!="",pod!=""}'
resources:
template: '<<.Resource>>'
name:
matches: '^(.*)_bucket$'
as: "${1}_99th"
metricsQuery: 'histogram_quantile(0.99, sum(irate(<<.Series>>{<<.LabelMatchers>>, direction="inbound", deployment!="", namespace!=""}[5m])) by (le, <<.GroupBy>>))'
apiVersion: v1
kind: Namespace
metadata:
name: scalingtest
annotations:
linkerd.io/inject: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sampleapp
namespace: scalingtest
spec:
minReadySeconds: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: sampleapp
template:
metadata:
labels:
app: sampleapp
spec:
terminationGracePeriodSeconds: 10
containers:
- name: sampleapp
image: nginx
resources:
requests:
cpu: 100m
limits:
memory: "128Mi"
cpu: "600m"
ports:
- containerPort: 80
name: http
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
startupProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: sampleapp
namespace: scalingtest
spec:
selector:
app: sampleapp
ports:
- port: 80
targetPort: 80
We apply the example.yaml into the cluster:
$ kubectl apply -f example.yaml
namespace/scalingtest created
deployment.apps/sampleapp created
service/sampleapp created
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
sampleapp-7db7fdcd9d-95czg 2/2 Running 0 4s
Next, we deploy a load generator. For this, we make use of the slow_cooker project from Buoyant, the people behind linkerd.
kubectl run load-generator --image=buoyantio/slow_cooker -- -qps 100 -concurrency 10 http://sampleapp
This will generate 100 rps traffic against the deployed nginx. You can follow the logs of the load-generator
pod to view various metrics, such as latency.
If you wait a minute or so, and run kubectl top pod
, you will notice that the CPU usage of the nginx pod has risen.
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
load-generator 128m 5Mi
sampleapp-7db7fdcd9d-95czg 79m 2Mi
We will now start configuring a HPA policy:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
name: sampleapp
namespace: scalingtest
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sampleapp
minReplicas: 1
maxReplicas: 20
metrics:
# Scale based on request latency, linkerd-proxy metric
- type: Object
object:
metric:
name: response_latency_ms_99th
describedObject:
apiVersion: apps/v1
kind: Deployment
name: sampleapp
target:
type: AverageValue
averageValue: 1000000m # 1s