JUN 15TH, 2021

Controling a Hetzner Failover IP from Kubernetes

A short howto

What do we mean here with “Controling a Hetzner Failover IP from Kubernetes”? It means that we have a Pod that is checking whether the failover-IP is pinging and in case it’s not it triggers a failover of the IP to some other host that should receive the traffic of that failover-IP.

Let’s do this!

Create the docker image for the failover-ip-manager pod:

$ cat Dockerfile
# Usage:
#
#     docker run --volume config:/heartbeat/config
#                --volume logs:/heartbeat/log
#                -e HEARBEAT_LOG=STDOUT


# https://hub.docker.com/_/ruby/
FROM ruby:3.0-alpine

MAINTAINER "Tomáš Pospíšek" <tpo_deb@sourcepole.ch>

RUN echo "force update 0" && \
    apk update && \
    apk upgrade && \
    apk --update add git

RUN git clone https://github.com/mrkamel/heartbeat

WORKDIR heartbeat

# install heartbeat dependencies
RUN bundle

COPY heartbeat-api-health-check /

ENTRYPOINT ["sh", "-c"]
CMD ["cd /heartbeat && bin/heartbeat"]

We are using Benjamin Vetter’s nice heartbeat application to do the IP monitoring and the failover for us.

Next let’s create the k8s deployment.

One problem with Hetzner’s failover API is that you first have to register all the IPs that are allowed to access Hetzner’s Robot API. The latter is used to trigger the IP failover.

Kubernetes clusters are inherently dynamic, so it might happen, that the pod that controls the IP gets restarted on a different node, with a different IP or a new node gets created with a different IP… Thus we need to make sure that:

the failover-ip-manager pod won’t switch nodes randomly:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: failover-ip-manager
  namespace: [...]
  labels:
    app: failover-ip-manager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: failover-ip-manager
  template:
    metadata:
      name: failover-ip-manager
      labels:
        app: failover-ip-manager
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - THE_NODE_WHERE_YOU_WANT_TO_PIN_THE_POD_TO

we notice if the failover-ip-manager pod for whatever reason gets a different IP, that maybe isn’t registered with Hetzner…

So let’s make a readinessProbe for k8s, that will chec whether we can access the Hetzner Robot API periodically:

 spec:
   affinity:
     [...]
   containers:
   - image: YOUR_IMAGE_HERE
     name: failover-ip-manager
     env:
       - name: HEARTBEAT_LOG
         value: "STDOUT"
     volumeMounts:
     - name: heartbeat-config
       mountPath: /heartbeat/config/heartbeat.yml
       # mount single file
       subPath: heartbeat.yml
     - name: heartbeat-api-health-check-config
       mountPath: /heartbeat/heartbeat-api-health-check.yml
       # mount single file
       subPath: heartbeat-api-health-check.yml
     resources:
       requests:
         memory: 45M # this should suffice
     readinessProbe:
       failureThreshold: 1
       exec:
         command:
         - /heartbeat-api-health-check
       # periodSeconds seems to be broken, see
       # the delay is implemented in heartbeat-api-health-check instead
       #periodSeconds: 21600 # every 6h
       periodSeconds: 10
       timeoutSeconds: 10

And have a script that implements the readinessProbe (it gets installed above via the Dockerfile):

#!/bin/sh
#
# This is meant as a readynessProbe for k8s. It
# checks whether heartbeat can access the
# Hetzner Failover API
#

# We only want to check once every 6 hours whether the API is available.
# This should be done via a k8s readinessProbe, which however doesn't
# work, see https://github.com/kubernetes/kubernetes/issues/99979
# below is a workaround for the readinessProbe bug, in that we implement
# the 6h interval here
check_interval=360

cd /heartbeat

check_and_touch_last_check_result() {
  # you want to mount heartbeat-api-health-check.yml into the image!
  if HEARTBEAT_LOG=STDOUT bin/heartbeat --config heartbeat-api-health-check.yml | grep -q "Unable to retrieve the active server ip"; then
    echo 1 > /tmp/last_check_result
    exit 1
  else
    echo 0 > /tmp/last_check_result
    exit 0
  fi
}

if [ -e /tmp/last_check_result ]; then
  if [[ $(find /tmp/last_check_result -mmin +$check_interval -print) ]]; then
    check_and_touch_last_check_result
  else
    exit `cat /tmp/last_check_result`
  fi
else
  check_and_touch_last_check_result
fi

And finally prepare the heartbeat-api-health-check.yml config file for heartbeat that will ping an unreachable IP address once and then access Hetzner’s Robot API, which if not allowed to access, will make the readinessProbe fail and thus mark the Pod as “Ready: False” which you can get alerted on via your prefered monitoring tool or maybe just by checking manually:

base_url: https://robot-ws.your-server.de
basic_auth:
  username: username
  password: password
failover_ip: 0.0.0.0

ping_ip: 0.0.0.1 # invalid IP!!!

ips:
  - ping: 1.1.1.1
    target: 1.1.1.1
  - ping: 2.2.2.2
    target: 2.2.2.2

timeout: 1
tries: 1
only_once: true
dry: true

Recent Posts

Archive

Categories