Kubernetes on Hetzner Cloud
Published on: • 20 min read
This website, as well as several of my other projects, run on a Kubernetes cluster that I have built and deployed on Hetzner Cloud. I do this for a couple of reasons, firstly to develop my experience of running and managing Kubernetes, but also as a desire to stay towards the cheap, flexible, self-hosted end of hosting options, rather than deploy on a more centralised and often more costly option like AWS or Vercel.
As I spent a reasonable amount of time figuring out how to do this somewhat securely, and with the goal of keeping costs and complexity low I figured I should write up a guide of what I did.
Key Services / Tools Used
Before I dive into the implementation here’s a quick list and summary of the key services and tools used:
Hetzner Cloud
https://www.hetzner.com/cloud/
Hetzner has long been my favourite place to host servers, I rented a dedicated server from them for many years when I hosted game servers and never had any issues. It was also very good value. I have since heard great things about their Cloud offering. It’s not as diverse an offering as the big providers (AWS, GCP, etc.), but I don’t need any of that, so is a great fit.
Pulumi
Pulumi is the infrastructure as code tool I opted to use for this. This was mostly to test it out as an alternative to Terraform (which I use at work), with the benefit of being able to define servers in a more familiar Python or Typescript, and avoid the Terraform markup, and some of the more painful points I’ve encountered there with state diverging for silly reasons.
Ansible
Ansible is still my goto tool for managing packages and OS level config on servers. It may not be the fastest or cleanest, but it’s simple to write and familiar.
K3s
Previously I’ve used Rancher for Kubernetes cluster setup, but RKE1 (which I’ve used in the past) was fairly recently deprecated, it’s successor RKE2 read as quite complex to configure, and I really don’t need anything fancy out of Kubernetes, so K3s seemed like a nice easy to manage option.
PostgreSQL
While I still find MySQL an easier database engine to administer, PostgreSQL seems to be much more preferred by the developer community so figured I should use it and learn more about how to manage it.
NGINX
https://www.f5.com/products/nginx
NGINX is still my goto for reverse proxying due to familiarity, and I need very little from it. I could have opted for a Hetzner load balancer and done away with the need to sort this out myself, but wanted a bit more control over how sites and services resolved, and didn’t want to wrestle unfamiliar load balancer configuration to do that.
Steps
The process for getting everything set up took several steps, which I’ll go through in order:
- Architecture Plan
- Initial Hetzner Cloud Setup
- Bastion Host
- Additional Hetzner Servers
- K3s Cluster
- Data Storage
- NGINX Proxy
Architecture Plan
The first step in my process was deciding how many servers I’d need, and of what size. Given the workloads I’m dealing with are very low I opted for smallest and cheapest “Cost-Optimized” servers in Hetzner’s Cloud offering. Obviously you’d want to tweak this to your needs, but it’s fairly easy to scale up number and size of servers for this setup as needed. Notably you can’t really scale down as reducing disk sizes prevents this, so if you’re unsure you could always undershoot and adjust accordingly.
One thing to watch out for is that the cheapest server offerings are limited in quantity, and not always available. Luckily in my case I managed to grab them without any issues when I needed them, but I have seen them being unavailable for a period. An initial Hetzner Cloud account also limits you to 5 servers and you have to manually request an increase. For me 5 servers was all I needed so I’ve not had to test how responsive that is yet, but I’d assume a reasonable number is unlikely to be an issue.
My proposed servers came out to be:
- 1x CX22 server as a reverse proxy / bastion host, with an external IP address
- 3x CX22 servers for the Kubernetes cluster
- 1x CX32 server as a storage / database machine
These would be organised something like this:
+---------------------------------+
| Reverse Proxy / Bastion Host |
+---------------------------------+
|
+-------------------+-------------------+
| | |
+---------------+ +---------------+ +---------------+
| Kubernetes #1 | | Kubernetes #2 | | Kubernetes #3 |
+---------------+ +---------------+ +---------------+
\ | /
\ | /
\ | /
+---------------------------------+
| Database / Storage |
+---------------------------------+
I decided for a slightly beefier storage server as I figured it would be the most disruptive one to scale up if needed.
Initial Hetzner Cloud Setup
Once I’d figured out what my planned server architecture was I needed to set up the basic Hetzner config. I decided to only request the first CX22 for the proxy server until I was ready to build the rest. Doing this meant quickly setting up a Pulumi account, and then laying out the config for the following:
- An SSH key to be applied on any machines provisioned
- A simple network and subnet for the servers to sit on and communicate internally
- The first server, that would become the reverse proxy
- A NAT gateway for the other servers to route traffic via the first server so they didn’t need to be exposed to the internet directly
The latter point was one thing I discovered while building the cluster. I only wanted to have the one server directly accessible from the internet for security reasons, but this meant that I’d have no way for the other servers to be reached to configure them, or any way for them to reach out to the internet to pull any updates. Thankfully setting up a NAT to make sure all the traffic got to the right places wasn’t too tricky.
I’ll save the details on how to set up a Pulumi account and project, as that’s well documented by them, but the config I created in TypeScript for these first bits looked something like this:
import * as std from "@pulumi/std";
import * as hcloud from "@pulumi/hcloud";
const primarySSH = new hcloud.SshKey("primary", {
name: "Primary",
publicKey: std.file({
input: "./files/ssh/primary.pub",
}).then(invoke => invoke.result),
});
const network = new hcloud.Network("network", {
name: "Primary",
ipRange: "10.0.0.0/16",
});
const natRoute = new hcloud.NetworkRoute("gateway", {
networkId: network.id.apply(parseInt),
destination: "0.0.0.0/0",
gateway: "10.0.1.1",
});
const subnet = new hcloud.NetworkSubnet("subnet", {
networkId: network.id.apply(parseInt),
type: "cloud",
networkZone: "eu-central",
ipRange: "10.0.1.0/24",
});
const proxyServer = new hcloud.Server("proxyserver", {
name: "ProxyServer",
serverType: "cx22",
location: "nbg1",
image: "ubuntu-24.04",
networks: [{
networkId: subnet.id.apply(parseInt),
ip: "10.0.1.1",
}],
sshKeys: [primarySSH.id],
publicNets: [{
ipv4Enabled: true,
ipv6Enabled: true,
}],
}, {
dependsOn: [subnet],
});
export const proxyIpv4 = proxyServer.ipv4Address;
export const proxyIpv6 = proxyServer.ipv6Address;
When run with Pulumi this loads the SSH key, creates the network, subnet and NAT, and provisions the first server into the subnet. The exports at the end of the script mean it will output the IP addresses assigned to the proxy server during setup. The server is then immediately reachable using the SSH key. Nice!
Bastion Host
With the first server available to be configured, my next task was to set it up as a bastion host, directing any access to other servers I might set up through the single entry point. This bastion setup would enable me to:
- Keep the servers and interface for my Kubernetes cluster inaccessible directly from the internet
- Keep other servers and services I might setup, like databases, safely internal to the network as well
- Allow me to focus most of my security measures in one place
Access to this server would only be via SSH, and to reach the other servers SSH tunneling would be necessary.
This server would also be the point of entry for web users, so needed NGINX and some extra security added.
To do this configuration I created some simple Ansible configuration to apply. As with Pulumi I’ll skip the details of setting up Ansible itself.
I’ll skip over the NGINX configuration aspects for the moment and cover that towards the end once everything else is up and running.
The main tasks are:
- Install Fail2Ban, UFW and NGINX
- Enable UFW and allow incoming traffic on ports 22, 80 and 443 only
- Do additional configuration on the server for the NAT routing
- Set up Fail2Ban as some basic security against brute force attacks
My Ansible hosts file (hosts.yml) at this point simply looked like this, with just the Proxy host:
all:
vars:
ansible_user: root
children:
proxy:
hosts:
proxy:
ansible_host: xxx.xxx.xxx.xxx
and the inventory file (main.yml) like this, with a single playbook targetting the proxy:
- hosts: proxy
become: true
become_user: root
gather_facts: true
roles:
- proxy
The key ansible tasks for the Proxy playbook (roles/proxy/main.yml) were then:
# Update Apt Cache
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
# Install Key Packages
- name: Install useful packages
apt:
name: ['fail2ban', 'ufw', 'nginx']
# Configure Networking
- name: Allow SSH and HTTP/S
community.general.ufw:
rule: allow
port: "{{item}}"
with_items:
- '22'
- '80'
- '443'
- name: Allow NAT routing
community.general.ufw:
rule: allow
route: true
interface_in: enp7s0
interface_out: eth0
- name: Enable UFW
community.general.ufw:
state: enabled
- name: Allow IPv4 forwarding
ansible.posix.sysctl:
name: net.ipv4.ip_forward
value: '1'
sysctl_set: true
reload: true
- name: Add iptables rule for NAT
ansible.builtin.iptables:
table: nat
chain: POSTROUTING
out_interface: eth0
source: 10.0.0.0/16
jump: MASQUERADE
- name: Copy post-up script
ansible.builtin.copy:
src: 10-eth0-post-up
dest: /etc/networkd-dispatcher/routable.d/10-eth0-post-up
owner: root
group: root
mode: '0755'
# Copy SSH key
- name: Copy SSH key
copy:
src: 'id_rsa.pub'
dest: '/root/.ssh/id_rsa.pub'
owner: root
group: root
mode: '0400'
# Fail2Ban
- name: Configure fail2ban
copy:
src: jail.local
dest: /etc/fail2ban/jail.local
owner: root
group: root
mode: '0644'
notify:
- restart fail2ban
The key networking file being copied here being the post-up script, which makes sure that the NAT iptables config persists on a reboot:
#!/bin/bash
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s '10.0.0.0/16' -o eth0 -j MASQUERADE
Unfortunately I can’t really share the config for fail2ban as it would give away some details about the protections I’ve enabled, but Fail2Ban is pretty straight-forward to configure, and there’s nothing especially novel I’m doing.
Additional Hetzner Servers
With the bastion host in place my next task was to create the remaining 4 servers of my cluster in Hetzner Cloud. Three of these servers would be for the Kubernetes cluster, and one would be for storage, including databases. At this stage I also decided to add extra disk space to the storage server in the form of an attached volume so I could use it as an off-site backup, this is entirely optional though as the default disk on the server should be sufficient for some small databases.
The config to add to Pulumi for this was all pretty simple, and the servers were almost identical to the initial proxy one, bar names and IP addresses:
const k8sServers = [
{name: "K8sServer1", ip: "10.0.1.11"},
{name: "K8sServer2", ip: "10.0.1.12"},
{name: "K8sServer3", ip: "10.0.1.13"},
];
for (const k8sServer of k8sServers) {
const server = new hcloud.Server(`${k8sServer.name.toLowerCase()}`, {
name: k8sServer.name,
serverType: "cx22",
location: "nbg1",
image: "ubuntu-24.04",
networks: [{
networkId: network.id.apply(parseInt),
ip: k8sServer.ip,
}],
sshKeys: sshKeys,
publicNets: [{
ipv4Enabled: false,
ipv6Enabled: false,
}],
}, {
dependsOn: [network],
});
}
const storageServer = new hcloud.Server("storageserver", {
name: "StorageServer",
serverType: "cx32",
location: "nbg1",
image: "ubuntu-24.04",
networks: [{
networkId: network.id.apply(parseInt),
ip: "10.0.1.21",
}],
sshKeys: sshKeys,
publicNets: [{
ipv4Enabled: false,
ipv6Enabled: false,
}],
}, {
dependsOn: [network],
});
const volume = new hcloud.Volume("backup", {
name: "Backup",
size: 150,
serverId: storageServer.id.apply(parseInt),
automount: true,
format: "ext4",
});
Another quick run of Pulumi and all the servers were ready to go!
K3s Cluster
With all the servers now provisioned the next step was to configure the K8s cluster onto the 3 machines set aside for it. As mentioned at the top I opted to use K3s as the Kubernetes distribution as I wanted something very simple to configure as my requirements are very modest.
To achieve this I needed to update my Ansible config, and opted to use an Ansible Galaxy collection which meant I had very little to really setup.
The first change was to update my Ansible hosts file with all the new servers:
all:
children:
proxy:
hosts:
proxy:
ansible_host: xxx.xxx.xxx.xxx
k8s:
hosts:
k8s1:
ansible_host: 10.0.1.11
k8s2:
ansible_host: 10.0.1.12
k8s3:
ansible_host: 10.0.1.13
vars:
ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'
storage:
hosts:
storage:
ansible_host: 10.0.1.21
vars:
ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'
Notice the ansible_ssh_common_args config in the file, which allows all Ansible configuration to be tunneled through the public Bastion/Proxy host to the other machines since they’re not directly reachable.
I also needed to update my inventory file like so, the storage playbook will be added in the next section:
- hosts: proxy
become: true
become_user: root
gather_facts: true
roles:
- proxy
- hosts: k8s
become: true
become_user: root
gather_facts: true
roles:
- k8s
- hosts: storage
become: true
become_user: root
gather_facts: true
roles:
- storage
Before I could actually configure the cluster there were a few preparatory bits of config to apply to the servers via the playbooks mentioned, namely connecting the servers to the NAT (and making sure the config persisted), and installing some required packages with basic config.
The key tasks for the Kubernetes server playbook (roles/k8s/main.yml) looked something like this:
# Configure the NAT
- name: Ensure a default route via 10.0.0.1
vars:
default_gateway: 10.0.0.1
block:
- name: Check current default gateway
ansible.builtin.command: ip route get 8.8.8.8
register: current_route
changed_when: false
ignore_errors: true
- name: Add default route if missing or different
ansible.builtin.command: ip route add default via {{ default_gateway }}
when: >
(current_route.stdout is not defined) or
(default_gateway not in current_route.stdout)
- name: Ensure DNS servers are present in systemd‑resolved
vars:
dns_servers: "xxx.xxx.xxx.1 xxx.xxx.xxx.2"
block:
- name: Set DNS line in /etc/systemd/resolved.conf
ansible.builtin.ini_file:
path: /etc/systemd/resolved.conf
section: Resolve
option: DNS
value: "{{ dns_servers }}"
register: dns_line
- name: Reload systemd‑resolved so the new DNS takes effect
ansible.builtin.service:
name: systemd-resolved
state: restarted
enabled: yes
when: dns_line is changed
- name: Copy enp7s0 network config
ansible.builtin.copy:
src: 10-enp7s0.network
dest: /etc/systemd/network/10-enp7s0.network
owner: root
group: root
mode: '0644'
# Update Apt Cache
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
# Install Key Packages
- name: Install useful packages
apt:
name: ['ufw', 'nfs-common']
# UFW
- name: Allow internal only
ufw:
rule: allow
src: '{{ item }}'
with_items:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- name: Enable UFW
ufw:
state: enabled
The internal only UFW rules are very broad, covering all possible internal IP addresses, and could be tightened up, however they are acceptable to keep the setup simple and avoid accidental blocking of access.
The key networking file being copied this time being the NAT gateway config below.
Note as well the installation of the nfs-common package so the Kubernetes cluster can properly access an NFS volume on the storage server.
[Match]
Name=enp7s0
[Route]
Destination=0.0.0.0/0
Gateway=10.0.0.1
Now the servers were prepared I could set about getting the cluster itself running. As I opted to use an Ansible Galaxy collection as mentioned there was very little to be beyond installing the collection, some very basic config and then running the collection.
To install the collection I ran this:
ansible-galaxy collection install git+https://github.com/k3s-io/k3s-ansible.git
I then created a very simple hosts file (cluster.yml) specifically for managing the cluster that matches what the collection needed below.
As this is a small cluster all 3 nodes are defined in the server section and will be control plane nodes. A larger cluster would want to have some purely agent nodes as well.
k3s_cluster:
children:
server:
hosts:
10.0.1.11:
10.0.1.12:
10.0.1.13:
vars:
ansible_ssh_common_args: '-J root@xxx.xxx.xxx.xxx'
vars:
ansible_user: root
k3s_version: v1.35.2+k3s1
api_endpoint: "{{ hostvars[groups['server'][0]]['ansible_host'] | default(groups['server'][0]) }}"
The K3s version should be updated to a suitable current version as this is almost certainly out of date. The versions available can be found here: https://github.com/k3s-io/k3s/releases
With this in place deploying the cluster was just a matter of running:
ansible-playbook k3s.orchestration.site -i cluster.yml
NOTE: For any updates in the future (say an upgrade of the K8s version) a slightly different command is needed:
ansible-playbook k3s.orchestration.upgrade -i cluster.yml
The collection does take a while to run, but hopefully after it’s successfully completed you have a small Kubernetes cluster ready to go!
Data Storage
With the cluster all set up my penultimate task was to configure the server used for storage, namely a Postgres database and an NFS server for any volumes I might want to attach to the cluster.
Note that as mentioned above if you have no need for a database or volumes you can skip this entirely.
The key tasks for the Storage server playbook (roles/storage/main.yml) looked something like this:
# Configure the NAT
- name: Ensure a default route via 10.0.0.1
...
- name: Ensure DNS servers are present in systemd‑resolved
...
- name: Copy enp7s0 network config
...
# Update Apt Cache
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
# Install Key Packages
- name: Install useful packages
apt:
name: ['nfs-kernel-server', 'postgresql', 'postgresql-contrib', 'python3-psycopg2']
# Create volumes folder
- name: Create volumes directory
file:
path: /srv/volumes
state: directory
# Postgres Config
- name: Copy Postgres main config
ansible.builtin.copy:
src: postgresql.conf
dest: /etc/postgresql/16/main/postgresql.conf
owner: postgres
group: postgres
mode: '0644'
notify:
- restart postgresql
- name: Copy Postgres HBA config
ansible.builtin.copy:
src: pg_hba.conf
dest: /etc/postgresql/16/main/pg_hba.conf
owner: postgres
group: postgres
mode: '0640'
notify:
- restart postgresql
- name: Copy Postgres Ident config
ansible.builtin.copy:
src: pg_ident.conf
dest: /etc/postgresql/16/main/pg_ident.conf
owner: postgres
group: postgres
mode: '0640'
notify:
- restart postgresql
# Postgres Databases
- name: Create database
become: true
become_user: postgres
community.postgresql.postgresql_db:
name: database1
encoding: UTF-8
lc_collate: en_US.UTF-8
lc_ctype: en_US.UTF-8
template: template0
notify:
- restart postgresql
# Postgres Users
- include_vars: db_passwords.yml
- name: Create user
become: true
become_user: postgres
community.postgresql.postgresql_user:
db: database1
name: user1
password: '{{ user1_password }}'
priv: ALL
notify:
- restart postgresql
- name: Strip unnecessary database user permissions
become: true
become_user: postgres
community.postgresql.postgresql_user:
name: user1
role_attr_flags: NOSUPERUSER,NOCREATEDB
notify:
- restart postgresql
- name: Grant CREATE on public schema to db users
become: true
become_user: postgres
community.postgresql.postgresql_privs:
db: database1
type: schema
privs: CREATE
objs: public
roles: user1
# NFS Folders
- name: Create example folder
ansible.builtin.file:
path: "/srv/volumes/example"
state: directory
owner: nobody
group: nogroup
mode: '0777'
- name: Export example folder via NFS
ansible.builtin.lineinfile:
path: /etc/exports
line: "/srv/volumes/example 10.0.1.0/24(rw,sync,no_subtree_check)"
create: yes
state: present
notify:
- reload nfs
The NAT config is exactly the same as for the Kubernetes servers, but snipped out in the config above to save repetition.
For security reasons I’ve excluded the 3 Postgres config files used, but as with other places there’s nothing particularly novel in the config, other than making sure there’s an entry in pg_hba.conf to allow access for local network addresses, which looks like this:
host all all 10.0.1.0/24 scram-sha-256
NGINX Proxy
With everything else now set up and ready I can configure up NGINX to start properly accepting and forwarding traffic to the Kubernetes cluster.
To do this we’ll need to add a bit more to the Ansible Proxy playbook (roles/proxy/main.yml):
# Configure NGINX
- name: Configure nginx
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
notify:
- restart nginx
- name: Remove default sites-enabled entry
file:
path: /etc/nginx/sites-enabled/default
state: absent
notify:
- restart nginx
- name: Configure sites-available
copy:
src: 'sites-available/{{item}}'
dest: '/etc/nginx/sites-available/{{item}}'
owner: root
group: root
mode: '0644'
notify:
- restart nginx
with_items:
- example.com
- name: Configure folders
file:
path: '/var/www/{{item}}'
state: directory
with_items:
- example.com
- name: Enable sites
file:
src: '/etc/nginx/sites-available/{{item}}'
dest: '/etc/nginx/sites-enabled/{{item}}'
owner: root
group: root
state: link
notify:
- restart nginx
with_items:
- example.com
This sets up some default NGINX configuration, and an example site with the domain example.com, obviously swap that out for whatever domain you’re working with, and you can add multiple domains to the list as you wish.
The contents of the nginx.conf file are pretty straight forward and mostly follows defaults and best practices to avoid a few common issues:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
}
http {
# Basic Settings
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging Settings
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
# Default
server {
listen 80 default_server;
server_name _;
location / {
return 444;
}
}
upstream backend {
server 10.0.1.11:80;
server 10.0.1.12:80;
server 10.0.1.13:80;
}
# Virtual Host Configs
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
You’ll notice I’m only dealing with http here. Setting up HTTPS could be a whole post on its own as there are various things to consider. You’ll almost certainly want to set up TLS config and certificate renewal processes for a complete production environment.
The most important lines from a Kubernetes point of view are in the backend upstream section which points to the Kubernetes servers we’ve set up.
The final file needed, for the example.com site itself is this:
# Main Site
server {
listen 80;
listen [::]:80;
server_name example.com www.example.com;
root /var/www/example.com;
gzip on;
access_log /var/log/nginx/example.access.log;
error_log /var/log/nginx/example.error.log;
location / {
proxy_pass http://backend;
include /etc/nginx/proxy_params;
}
}
This is a very minimal setup for an HTTP website that just sends all traffic on to the backend defined in the main config, and sets up some logging to a file specific to this site to aid in debugging issues.
Summary
This might seem like a lot, but if you follow these steps, with a bit of filling in of gaps I’ve had to remove for security reasons, you can have your own cheap Kubernetes cluster up and running pretty easily. For a personal or otherwise low-traffic setup, it strikes a nice balance between cost, control, and operational complexity.
The cluster will have:
- Infrastructure defined as code using Pulumi and Ansible for reproducible deployments
- Private internal networking with NAT via the Bastion so only one machine is directly exposed to the internet
- A reasonably secure management access method via a single Bastion host
- A minimal Kubernetes cluster with the recommended minimum 3 nodes to form a proper quorum
- Some basic data storage setup for both files and databases
- A minimal HTTP NGINX proxy to direct traffic to the cluster
- Some basic additional security hardening with UFW and Fail2Ban
- Plenty of room for easily expanding according to needs
All costing ~£30/month assuming availability of Hetzner Cloud servers.