A short howto
What do we mean here with “Controling a Hetzner Failover IP from Kubernetes”? It means that we have a Pod that is checking whether the failover-IP is pinging and in case it’s not it triggers a failover of the IP to some other host that should receive the traffic of that failover-IP.
Let’s do this!
Create the docker image for the failover-ip-manager pod:
$ cat Dockerfile
# Usage:
#
# docker run --volume config:/heartbeat/config
# --volume logs:/heartbeat/log
# -e HEARBEAT_LOG=STDOUT
# https://hub.docker.com/_/ruby/
FROM ruby:3.0-alpine
MAINTAINER "Tomáš Pospíšek" <tpo_deb@sourcepole.ch>
RUN echo "force update 0" && \
apk update && \
apk upgrade && \
apk --update add git
RUN git clone https://github.com/mrkamel/heartbeat
WORKDIR heartbeat
# install heartbeat dependencies
RUN bundle
COPY heartbeat-api-health-check /
ENTRYPOINT ["sh", "-c"]
CMD ["cd /heartbeat && bin/heartbeat"]
We are using Benjamin Vetter’s nice heartbeat application to do the IP monitoring and the failover for us.
Next let’s create the k8s deployment.
One problem with Hetzner’s failover API is that you first have to register all the IPs that are allowed to access Hetzner’s Robot API. The latter is used to trigger the IP failover.
Kubernetes clusters are inherently dynamic, so it might happen, that the pod that controls the IP gets restarted on a different node, with a different IP or a new node gets created with a different IP… Thus we need to make sure that:
-
the failover-ip-manager pod won’t switch nodes randomly:
apiVersion: apps/v1 kind: Deployment metadata: name: failover-ip-manager namespace: [...] labels: app: failover-ip-manager spec: replicas: 1 selector: matchLabels: app: failover-ip-manager template: metadata: name: failover-ip-manager labels: app: failover-ip-manager spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - THE_NODE_WHERE_YOU_WANT_TO_PIN_THE_POD_TO
-
we notice if the failover-ip-manager pod for whatever reason gets a different IP, that maybe isn’t registered with Hetzner…
So let’s make a readinessProbe for k8s, that will chec whether we can access the Hetzner Robot API periodically:
spec: affinity: [...] containers: - image: YOUR_IMAGE_HERE name: failover-ip-manager env: - name: HEARTBEAT_LOG value: "STDOUT" volumeMounts: - name: heartbeat-config mountPath: /heartbeat/config/heartbeat.yml # mount single file subPath: heartbeat.yml - name: heartbeat-api-health-check-config mountPath: /heartbeat/heartbeat-api-health-check.yml # mount single file subPath: heartbeat-api-health-check.yml resources: requests: memory: 45M # this should suffice readinessProbe: failureThreshold: 1 exec: command: - /heartbeat-api-health-check # periodSeconds seems to be broken, see # the delay is implemented in heartbeat-api-health-check instead #periodSeconds: 21600 # every 6h periodSeconds: 10 timeoutSeconds: 10
And have a script that implements the readinessProbe (it gets installed above via the Dockerfile):
#!/bin/sh # # This is meant as a readynessProbe for k8s. It # checks whether heartbeat can access the # Hetzner Failover API # # We only want to check once every 6 hours whether the API is available. # This should be done via a k8s readinessProbe, which however doesn't # work, see https://github.com/kubernetes/kubernetes/issues/99979 # below is a workaround for the readinessProbe bug, in that we implement # the 6h interval here check_interval=360 cd /heartbeat check_and_touch_last_check_result() { # you want to mount heartbeat-api-health-check.yml into the image! if HEARTBEAT_LOG=STDOUT bin/heartbeat --config heartbeat-api-health-check.yml | grep -q "Unable to retrieve the active server ip"; then echo 1 > /tmp/last_check_result exit 1 else echo 0 > /tmp/last_check_result exit 0 fi } if [ -e /tmp/last_check_result ]; then if [[ $(find /tmp/last_check_result -mmin +$check_interval -print) ]]; then check_and_touch_last_check_result else exit `cat /tmp/last_check_result` fi else check_and_touch_last_check_result fi
And finally prepare the heartbeat-api-health-check.yml config file for
heartbeat
that will ping an unreachable IP address once and then access Hetzner’s Robot API, which if not allowed to access, will make the readinessProbe fail and thus mark the Pod as “Ready: False” which you can get alerted on via your prefered monitoring tool or maybe just by checking manually:base_url: https://robot-ws.your-server.de basic_auth: username: username password: password failover_ip: 0.0.0.0 ping_ip: 0.0.0.1 # invalid IP!!! ips: - ping: 1.1.1.1 target: 1.1.1.1 - ping: 2.2.2.2 target: 2.2.2.2 timeout: 1 tries: 1 only_once: true dry: true