Chris Graham

Kubernetes on Hetzner Cloud

Published on: • 20 min read

This website, as well as several of my other projects, run on a Kubernetes cluster that I have built and deployed on Hetzner Cloud. I do this for a couple of reasons, firstly to develop my experience of running and managing Kubernetes, but also as a desire to stay towards the cheap, flexible, self-hosted end of hosting options, rather than deploy on a more centralised and often more costly option like AWS or Vercel.

As I spent a reasonable amount of time figuring out how to do this somewhat securely, and with the goal of keeping costs and complexity low I figured I should write up a guide of what I did.

Key Services / Tools Used

Before I dive into the implementation here’s a quick list and summary of the key services and tools used:

Hetzner Cloud

https://www.hetzner.com/cloud/

Hetzner has long been my favourite place to host servers, I rented a dedicated server from them for many years when I hosted game servers and never had any issues. It was also very good value. I have since heard great things about their Cloud offering. It’s not as diverse an offering as the big providers (AWS, GCP, etc.), but I don’t need any of that, so is a great fit.

Pulumi

https://www.pulumi.com

Pulumi is the infrastructure as code tool I opted to use for this. This was mostly to test it out as an alternative to Terraform (which I use at work), with the benefit of being able to define servers in a more familiar Python or Typescript, and avoid the Terraform markup, and some of the more painful points I’ve encountered there with state diverging for silly reasons.

Ansible

https://docs.ansible.com

Ansible is still my goto tool for managing packages and OS level config on servers. It may not be the fastest or cleanest, but it’s simple to write and familiar.

K3s

https://k3s.io

Previously I’ve used Rancher for Kubernetes cluster setup, but RKE1 (which I’ve used in the past) was fairly recently deprecated, it’s successor RKE2 read as quite complex to configure, and I really don’t need anything fancy out of Kubernetes, so K3s seemed like a nice easy to manage option.

PostgreSQL

https://www.postgresql.org

While I still find MySQL an easier database engine to administer, PostgreSQL seems to be much more preferred by the developer community so figured I should use it and learn more about how to manage it.

NGINX

https://www.f5.com/products/nginx

NGINX is still my goto for reverse proxying due to familiarity, and I need very little from it. I could have opted for a Hetzner load balancer and done away with the need to sort this out myself, but wanted a bit more control over how sites and services resolved, and didn’t want to wrestle unfamiliar load balancer configuration to do that.

Steps

The process for getting everything set up took several steps, which I’ll go through in order:

Architecture Plan

The first step in my process was deciding how many servers I’d need, and of what size. Given the workloads I’m dealing with are very low I opted for smallest and cheapest “Cost-Optimized” servers in Hetzner’s Cloud offering. Obviously you’d want to tweak this to your needs, but it’s fairly easy to scale up number and size of servers for this setup as needed. Notably you can’t really scale down as reducing disk sizes prevents this, so if you’re unsure you could always undershoot and adjust accordingly.

One thing to watch out for is that the cheapest server offerings are limited in quantity, and not always available. Luckily in my case I managed to grab them without any issues when I needed them, but I have seen them being unavailable for a period. An initial Hetzner Cloud account also limits you to 5 servers and you have to manually request an increase. For me 5 servers was all I needed so I’ve not had to test how responsive that is yet, but I’d assume a reasonable number is unlikely to be an issue.

My proposed servers came out to be:

These would be organised something like this:

           +---------------------------------+
           |  Reverse Proxy / Bastion Host   |
           +---------------------------------+
                            |
        +-------------------+-------------------+
        |                   |                   |
+---------------+   +---------------+   +---------------+
| Kubernetes #1 |   | Kubernetes #2 |   | Kubernetes #3 |
+---------------+   +---------------+   +---------------+
        \                   |                   /
         \                  |                  /
          \                 |                 /
           +---------------------------------+
           |       Database / Storage        |
           +---------------------------------+

I decided for a slightly beefier storage server as I figured it would be the most disruptive one to scale up if needed.

Initial Hetzner Cloud Setup

Once I’d figured out what my planned server architecture was I needed to set up the basic Hetzner config. I decided to only request the first CX22 for the proxy server until I was ready to build the rest. Doing this meant quickly setting up a Pulumi account, and then laying out the config for the following:

The latter point was one thing I discovered while building the cluster. I only wanted to have the one server directly accessible from the internet for security reasons, but this meant that I’d have no way for the other servers to be reached to configure them, or any way for them to reach out to the internet to pull any updates. Thankfully setting up a NAT to make sure all the traffic got to the right places wasn’t too tricky.

I’ll save the details on how to set up a Pulumi account and project, as that’s well documented by them, but the config I created in TypeScript for these first bits looked something like this:

import * as std from "@pulumi/std";
import * as hcloud from "@pulumi/hcloud";

const primarySSH = new hcloud.SshKey("primary", {
    name: "Primary",
    publicKey: std.file({
        input: "./files/ssh/primary.pub",
    }).then(invoke => invoke.result),
});

const network = new hcloud.Network("network", {
    name: "Primary",
    ipRange: "10.0.0.0/16",
});

const natRoute = new hcloud.NetworkRoute("gateway", {
    networkId: network.id.apply(parseInt),
    destination: "0.0.0.0/0",
    gateway: "10.0.1.1",
});

const subnet = new hcloud.NetworkSubnet("subnet", {
    networkId: network.id.apply(parseInt),
    type: "cloud",
    networkZone: "eu-central",
    ipRange: "10.0.1.0/24",
});

const proxyServer = new hcloud.Server("proxyserver", {
    name: "ProxyServer",
    serverType: "cx22",
    location: "nbg1",
    image: "ubuntu-24.04",
    networks: [{
        networkId: subnet.id.apply(parseInt),
        ip: "10.0.1.1",
    }],
    sshKeys: [primarySSH.id],
    publicNets: [{
        ipv4Enabled: true,
        ipv6Enabled: true,
    }],
}, {
    dependsOn: [subnet],
});

export const proxyIpv4 = proxyServer.ipv4Address;
export const proxyIpv6 = proxyServer.ipv6Address;

When run with Pulumi this loads the SSH key, creates the network, subnet and NAT, and provisions the first server into the subnet. The exports at the end of the script mean it will output the IP addresses assigned to the proxy server during setup. The server is then immediately reachable using the SSH key. Nice!

Bastion Host

With the first server available to be configured, my next task was to set it up as a bastion host, directing any access to other servers I might set up through the single entry point. This bastion setup would enable me to:

Access to this server would only be via SSH, and to reach the other servers SSH tunneling would be necessary.

This server would also be the point of entry for web users, so needed NGINX and some extra security added.

To do this configuration I created some simple Ansible configuration to apply. As with Pulumi I’ll skip the details of setting up Ansible itself.

I’ll skip over the NGINX configuration aspects for the moment and cover that towards the end once everything else is up and running.

The main tasks are:

My Ansible hosts file (hosts.yml) at this point simply looked like this, with just the Proxy host:

all:
  vars:
    ansible_user: root
  children:
    proxy:
      hosts:
        proxy:
          ansible_host: xxx.xxx.xxx.xxx

and the inventory file (main.yml) like this, with a single playbook targetting the proxy:

- hosts: proxy
  become: true
  become_user: root
  gather_facts: true
  roles:
    - proxy

The key ansible tasks for the Proxy playbook (roles/proxy/main.yml) were then:

# Update Apt Cache
- name: Update apt cache
  apt:
    update_cache: yes
    cache_valid_time: 3600

# Install Key Packages
- name: Install useful packages
  apt:
    name: ['fail2ban', 'ufw', 'nginx']

# Configure Networking
- name: Allow SSH and HTTP/S
  community.general.ufw:
    rule: allow
    port: "{{item}}"
  with_items:
    - '22'
    - '80'
    - '443'

- name: Allow NAT routing
  community.general.ufw:
    rule: allow
    route: true
    interface_in: enp7s0
    interface_out: eth0

- name: Enable UFW
  community.general.ufw:
    state: enabled

- name: Allow IPv4 forwarding
  ansible.posix.sysctl:
    name: net.ipv4.ip_forward
    value: '1'
    sysctl_set: true
    reload: true

- name: Add iptables rule for NAT
  ansible.builtin.iptables:
    table: nat
    chain: POSTROUTING
    out_interface: eth0
    source: 10.0.0.0/16
    jump: MASQUERADE

- name: Copy post-up script
  ansible.builtin.copy:
    src: 10-eth0-post-up
    dest: /etc/networkd-dispatcher/routable.d/10-eth0-post-up
    owner: root
    group: root
    mode: '0755'

# Copy SSH key
- name: Copy SSH key
  copy:
    src: 'id_rsa.pub'
    dest: '/root/.ssh/id_rsa.pub'
    owner: root
    group: root
    mode: '0400'

# Fail2Ban
- name: Configure fail2ban
  copy:
    src: jail.local
    dest: /etc/fail2ban/jail.local
    owner: root
    group: root
    mode: '0644'
  notify:
    - restart fail2ban

The key networking file being copied here being the post-up script, which makes sure that the NAT iptables config persists on a reboot:

#!/bin/bash

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s '10.0.0.0/16' -o eth0 -j MASQUERADE

Unfortunately I can’t really share the config for fail2ban as it would give away some details about the protections I’ve enabled, but Fail2Ban is pretty straight-forward to configure, and there’s nothing especially novel I’m doing.

Additional Hetzner Servers

With the bastion host in place my next task was to create the remaining 4 servers of my cluster in Hetzner Cloud. Three of these servers would be for the Kubernetes cluster, and one would be for storage, including databases. At this stage I also decided to add extra disk space to the storage server in the form of an attached volume so I could use it as an off-site backup, this is entirely optional though as the default disk on the server should be sufficient for some small databases.

The config to add to Pulumi for this was all pretty simple, and the servers were almost identical to the initial proxy one, bar names and IP addresses:

const k8sServers = [
  {name: "K8sServer1", ip: "10.0.1.11"},
  {name: "K8sServer2", ip: "10.0.1.12"},
  {name: "K8sServer3", ip: "10.0.1.13"},
];

for (const k8sServer of k8sServers) {
    const server = new hcloud.Server(`${k8sServer.name.toLowerCase()}`, {
        name: k8sServer.name,
        serverType: "cx22",
        location: "nbg1",
        image: "ubuntu-24.04",
        networks: [{
            networkId: network.id.apply(parseInt),
            ip: k8sServer.ip,
        }],
        sshKeys: sshKeys,
        publicNets: [{
            ipv4Enabled: false,
            ipv6Enabled: false,
        }],
    }, {
        dependsOn: [network],
    });
}

const storageServer = new hcloud.Server("storageserver", {
    name: "StorageServer",
    serverType: "cx32",
    location: "nbg1",
    image: "ubuntu-24.04",
    networks: [{
        networkId: network.id.apply(parseInt),
        ip: "10.0.1.21",
    }],
    sshKeys: sshKeys,
    publicNets: [{
        ipv4Enabled: false,
        ipv6Enabled: false,
    }],
}, {
    dependsOn: [network],
});

const volume = new hcloud.Volume("backup", {
    name: "Backup",
    size: 150,
    serverId: storageServer.id.apply(parseInt),
    automount: true,
    format: "ext4",
});

Another quick run of Pulumi and all the servers were ready to go!

K3s Cluster

With all the servers now provisioned the next step was to configure the K8s cluster onto the 3 machines set aside for it. As mentioned at the top I opted to use K3s as the Kubernetes distribution as I wanted something very simple to configure as my requirements are very modest.

To achieve this I needed to update my Ansible config, and opted to use an Ansible Galaxy collection which meant I had very little to really setup.

The first change was to update my Ansible hosts file with all the new servers:

all:
  children:
    proxy:
      hosts:
        proxy:
          ansible_host: xxx.xxx.xxx.xxx
    k8s:
      hosts:
        k8s1:
          ansible_host: 10.0.1.11
        k8s2:
          ansible_host: 10.0.1.12
        k8s3:
          ansible_host: 10.0.1.13
      vars:
        ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'
    storage:
      hosts:
        storage:
          ansible_host: 10.0.1.21
      vars:
        ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'

Notice the ansible_ssh_common_args config in the file, which allows all Ansible configuration to be tunneled through the public Bastion/Proxy host to the other machines since they’re not directly reachable.

I also needed to update my inventory file like so, the storage playbook will be added in the next section:

- hosts: proxy
  become: true
  become_user: root
  gather_facts: true
  roles:
    - proxy
- hosts: k8s
  become: true
  become_user: root
  gather_facts: true
  roles:
    - k8s
- hosts: storage
  become: true
  become_user: root
  gather_facts: true
  roles:
    - storage

Before I could actually configure the cluster there were a few preparatory bits of config to apply to the servers via the playbooks mentioned, namely connecting the servers to the NAT (and making sure the config persisted), and installing some required packages with basic config.

The key tasks for the Kubernetes server playbook (roles/k8s/main.yml) looked something like this:

# Configure the NAT
- name: Ensure a default route via 10.0.0.1
  vars:
    default_gateway: 10.0.0.1

  block:
    - name: Check current default gateway
      ansible.builtin.command: ip route get 8.8.8.8
      register: current_route
      changed_when: false
      ignore_errors: true

    - name: Add default route if missing or different
      ansible.builtin.command: ip route add default via {{ default_gateway }}
      when: >
        (current_route.stdout is not defined) or
        (default_gateway not in current_route.stdout)

- name: Ensure DNS servers are present in systemd‑resolved
  vars:
    dns_servers: "xxx.xxx.xxx.1 xxx.xxx.xxx.2"

  block:
    - name: Set DNS line in /etc/systemd/resolved.conf
      ansible.builtin.ini_file:
        path: /etc/systemd/resolved.conf
        section: Resolve
        option: DNS
        value: "{{ dns_servers }}"
      register: dns_line

    - name: Reload systemd‑resolved so the new DNS takes effect
      ansible.builtin.service:
        name: systemd-resolved
        state: restarted
        enabled: yes
      when: dns_line is changed

- name: Copy enp7s0 network config
  ansible.builtin.copy:
    src: 10-enp7s0.network
    dest: /etc/systemd/network/10-enp7s0.network
    owner: root
    group: root
    mode: '0644'

# Update Apt Cache
- name: Update apt cache
  apt:
    update_cache: yes
    cache_valid_time: 3600

# Install Key Packages
- name: Install useful packages
  apt:
    name: ['ufw', 'nfs-common']

# UFW
- name: Allow internal only
  ufw:
    rule: allow
    src: '{{ item }}'
  with_items:
    - 10.0.0.0/8
    - 172.16.0.0/12
    - 192.168.0.0/16

- name: Enable UFW
  ufw:
    state: enabled

The internal only UFW rules are very broad, covering all possible internal IP addresses, and could be tightened up, however they are acceptable to keep the setup simple and avoid accidental blocking of access.

The key networking file being copied this time being the NAT gateway config below.

Note as well the installation of the nfs-common package so the Kubernetes cluster can properly access an NFS volume on the storage server.

[Match]
Name=enp7s0

[Route]
Destination=0.0.0.0/0
Gateway=10.0.0.1

Now the servers were prepared I could set about getting the cluster itself running. As I opted to use an Ansible Galaxy collection as mentioned there was very little to be beyond installing the collection, some very basic config and then running the collection.

To install the collection I ran this:

ansible-galaxy collection install git+https://github.com/k3s-io/k3s-ansible.git

I then created a very simple hosts file (cluster.yml) specifically for managing the cluster that matches what the collection needed below.

As this is a small cluster all 3 nodes are defined in the server section and will be control plane nodes. A larger cluster would want to have some purely agent nodes as well.

k3s_cluster:
  children:
    server:
      hosts:
        10.0.1.11:
        10.0.1.12:
        10.0.1.13:
      vars:
        ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'
  vars:
    ansible_user: root
    k3s_version: v1.35.2+k3s1
    api_endpoint: "{{ hostvars[groups['server'][0]]['ansible_host'] | default(groups['server'][0]) }}"

The K3s version should be updated to a suitable current version as this is almost certainly out of date. The versions available can be found here: https://github.com/k3s-io/k3s/releases

With this in place deploying the cluster was just a matter of running:

ansible-playbook k3s.orchestration.site -i cluster.yml

NOTE: For any updates in the future (say an upgrade of the K8s version) a slightly different command is needed:

ansible-playbook k3s.orchestration.upgrade -i cluster.yml

The collection does take a while to run, but hopefully after it’s successfully completed you have a small Kubernetes cluster ready to go!

Data Storage

With the cluster all set up my penultimate task was to configure the server used for storage, namely a Postgres database and an NFS server for any volumes I might want to attach to the cluster.

Note that as mentioned above if you have no need for a database or volumes you can skip this entirely.

The key tasks for the Storage server playbook (roles/storage/main.yml) looked something like this:

# Configure the NAT
- name: Ensure a default route via 10.0.0.1
  ...

- name: Ensure DNS servers are present in systemd‑resolved
  ...

- name: Copy enp7s0 network config
  ...

# Update Apt Cache
- name: Update apt cache
  apt:
    update_cache: yes
    cache_valid_time: 3600

# Install Key Packages
- name: Install useful packages
  apt:
    name: ['nfs-kernel-server', 'postgresql', 'postgresql-contrib', 'python3-psycopg2']

# Create volumes folder
- name: Create volumes directory
  file:
    path: /srv/volumes
    state: directory

# Postgres Config
- name: Copy Postgres main config
  ansible.builtin.copy:
    src: postgresql.conf
    dest: /etc/postgresql/16/main/postgresql.conf
    owner: postgres
    group: postgres
    mode: '0644'
  notify:
    - restart postgresql

- name: Copy Postgres HBA config
  ansible.builtin.copy:
    src: pg_hba.conf
    dest: /etc/postgresql/16/main/pg_hba.conf
    owner: postgres
    group: postgres
    mode: '0640'
  notify:
    - restart postgresql

- name: Copy Postgres Ident config
  ansible.builtin.copy:
    src: pg_ident.conf
    dest: /etc/postgresql/16/main/pg_ident.conf
    owner: postgres
    group: postgres
    mode: '0640'
  notify:
    - restart postgresql

# Postgres Databases
- name: Create database
  become: true
  become_user: postgres
  community.postgresql.postgresql_db:
    name: database1
    encoding: UTF-8
    lc_collate: en_US.UTF-8
    lc_ctype: en_US.UTF-8
    template: template0
  notify:
    - restart postgresql

# Postgres Users
- include_vars: db_passwords.yml

- name: Create user
  become: true
  become_user: postgres
  community.postgresql.postgresql_user:
    db: database1
    name: user1
    password: '{{ user1_password }}'
    priv: ALL
  notify:
    - restart postgresql

- name: Strip unnecessary database user permissions
  become: true
  become_user: postgres
  community.postgresql.postgresql_user:
    name: user1
    role_attr_flags: NOSUPERUSER,NOCREATEDB
  notify:
    - restart postgresql

- name: Grant CREATE on public schema to db users
  become: true
  become_user: postgres
  community.postgresql.postgresql_privs:
    db: database1
    type: schema
    privs: CREATE
    objs: public
    roles: user1

# NFS Folders
- name: Create example folder
  ansible.builtin.file:
    path: "/srv/volumes/example"
    state: directory
    owner: nobody
    group: nogroup
    mode: '0777'

- name: Export example folder via NFS
  ansible.builtin.lineinfile:
    path: /etc/exports
    line: "/srv/volumes/example 10.0.1.0/24(rw,sync,no_subtree_check)"
    create: yes
    state: present
  notify:
    - reload nfs

The NAT config is exactly the same as for the Kubernetes servers, but snipped out in the config above to save repetition.

For security reasons I’ve excluded the 3 Postgres config files used, but as with other places there’s nothing particularly novel in the config, other than making sure there’s an entry in pg_hba.conf to allow access for local network addresses, which looks like this:

host    all             all             10.0.1.0/24             scram-sha-256

NGINX Proxy

With everything else now set up and ready I can configure up NGINX to start properly accepting and forwarding traffic to the Kubernetes cluster.

To do this we’ll need to add a bit more to the Ansible Proxy playbook (roles/proxy/main.yml):

# Configure NGINX
- name: Configure nginx
  copy:
    src: nginx.conf
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
  notify:
    - restart nginx

- name: Remove default sites-enabled entry
  file:
    path: /etc/nginx/sites-enabled/default
    state: absent
  notify:
    - restart nginx

- name: Configure sites-available
  copy:
    src: 'sites-available/{{item}}'
    dest: '/etc/nginx/sites-available/{{item}}'
    owner: root
    group: root
    mode: '0644'
  notify:
    - restart nginx
  with_items:
    - example.com

- name: Configure folders
  file:
    path: '/var/www/{{item}}'
    state: directory
  with_items:
    - example.com

- name: Enable sites
  file:
    src: '/etc/nginx/sites-available/{{item}}'
    dest: '/etc/nginx/sites-enabled/{{item}}'
    owner: root
    group: root
    state: link
  notify:
    - restart nginx
  with_items:
    - example.com

This sets up some default NGINX configuration, and an example site with the domain example.com, obviously swap that out for whatever domain you’re working with, and you can add multiple domains to the list as you wish.

The contents of the nginx.conf file are pretty straight forward and mostly follows defaults and best practices to avoid a few common issues:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
}

http {
    # Basic Settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging Settings
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    # Default
    server {
        listen 80 default_server;

        server_name _;

        location / {
            return 444;
        }
    }

    upstream backend {
        server 10.0.1.11:80;
        server 10.0.1.12:80;
        server 10.0.1.13:80;
    }

    # Virtual Host Configs
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

You’ll notice I’m only dealing with http here. Setting up HTTPS could be a whole post on its own as there are various things to consider. You’ll almost certainly want to set up TLS config and certificate renewal processes for a complete production environment.

The most important lines from a Kubernetes point of view are in the backend upstream section which points to the Kubernetes servers we’ve set up.

The final file needed, for the example.com site itself is this:

# Main Site
server {
    listen 80;
    listen [::]:80;
    server_name example.com www.example.com;

    root /var/www/example.com;

    gzip on;

    access_log /var/log/nginx/example.access.log;
    error_log /var/log/nginx/example.error.log;

    location / {
        proxy_pass http://backend;

        include /etc/nginx/proxy_params;
    }
}

This is a very minimal setup for an HTTP website that just sends all traffic on to the backend defined in the main config, and sets up some logging to a file specific to this site to aid in debugging issues.

Summary

This might seem like a lot, but if you follow these steps, with a bit of filling in of gaps I’ve had to remove for security reasons, you can have your own cheap Kubernetes cluster up and running pretty easily. For a personal or otherwise low-traffic setup, it strikes a nice balance between cost, control, and operational complexity.

The cluster will have:

All costing ~£30/month assuming availability of Hetzner Cloud servers.