Quick repo update: the www_k3s GitHub repo just got a significant firewall overhaul. The old setup used classic iptables โ which worked fine, but was starting to feel like driving a stick-shift with three pedals. Time to upgrade. The firewall is now fully rewritten in nftables, the modern successor that’s been shipping as the Linux default since kernel 3.13. Here’s why that’s the right call, and a full walkthrough of what the new ruleset actually does. ๐งต
โก nftables vs. iptables โ Why it matters
iptables has been the standard Linux packet filter since the late 90s. It works, it’s battle-tested, and approximately one billion Stack Overflow answers reference it. But it’s also showing its age:
- No atomic rule updates โ changes are applied line by line, which means there’s a brief window where your ruleset is partially applied
- Separate tools for IPv4/IPv6 โ iptables vs. ip6tables, two separate rule stores to keep in sync
- No native sets โ IP blocklists require third-party tools like ipset
- Performance โ rule matching is linear; the more rules, the slower
nftables (introduced in Linux 3.13, default on most distros since ~2019) fixes all of this:
- โ
Single ruleset for IPv4 + IPv6 (
table inet) - โ Atomic rule replacement โ the whole ruleset is swapped in one syscall
- โ Native sets with timeout support (built-in IP blocklists, no ipset needed)
- โ Better performance via a JIT-compiled virtual machine in the kernel
On AlmaLinux 9, nftables is the system default. The iptables command is literally just a compatibility shim on top of nftables. So using actual iptables syntax in 2025 means you’re writing iptables rules that get translated into nftables rules anyway โ you’re just losing all the nice syntax in the process. ๐
๐ The ruleset โ section by section
The firewall is an Ansible Jinja2 template (roles/common_firewall/templates/fw.nft.j2) that gets rendered per host and applied via nftables. Let’s go through it.
๐๏ธ The blocklist sets
set banned4 {
type ipv4_addr
flags timeout
timeout 1d
}
set banned6 {
type ipv6_addr
flags timeout
timeout 1d
}
Two sets โ one for IPv4, one for IPv6 โ that act as a dynamic blocklist. The flags timeout + timeout 1d means every entry auto-expires after 24 hours. fail2ban writes into these sets when it detects brute-force or port-scan activity. No cron job needed to clean up stale bans. ๐งน
๐ Chain: INPUT โ the main event
Default policy is DROP โ anything not explicitly allowed is silently discarded.
iif lo accept
Loopback traffic (127.0.0.1, ::1) is always accepted. All host-internal communication โ kubectl, kubelet, Prometheus scraping, database sockets โ goes through here.
ct state established,related accept
ct state invalid drop
Connection tracking does the heavy lifting. Already-established connections are accepted without checking further rules. Invalid state (e.g. TCP packets that don’t belong to any known connection) are dropped immediately.
tcp flags & (fin|syn|rst|psh|ack|urg) == fin|syn|rst|psh|ack|urg drop # XMAS
tcp flags & (fin|syn|rst|psh|ack|urg) == 0x0 drop # NULL
Classic malformed packet drops. XMAS packets have all TCP flags set simultaneously โ impossible in real traffic, used by port scanners (nmap -sX). NULL packets have no flags at all โ same deal. Both are fingerprinting/evasion tricks. Drop ’em. ๐๐ซ
ct state new tcp flags & (syn|ack) == syn|ack reject with tcp reset
Anti-spoofing: a SYN,ACK arriving as a new connection means someone is spoofing a TCP handshake reply out of thin air. Rejected with a TCP RST.
tcp flags & (syn|ack|fin|rst) == rst limit rate 1/second accept
tcp flags & (syn|ack|fin|rst) == rst drop
RST flood protection. RST packets are legitimate (used to tear down connections), but a flood of them is a DoS technique. Allow at most 1 per second, drop the rest.
ip saddr @banned4 drop
ip6 saddr @banned6 drop
Check the dynamic blocklist sets from fail2ban. These are placed before the K3s exception rules so a banned IP can never sneak through via the pod CIDR exemptions below.
{% if 'nextcloud' in group_names %}
ip saddr {{ k3s_pod_cidr }} tcp dport 3306 accept
ip saddr {{ k3s_pod_cidr }} tcp dport 6379 accept
{% endif %}
A Jinja2 conditional โ this block only appears on Nextcloud hosts. On Nextcloud servers, MariaDB and Redis run as host services (not inside K3s), so pods need to reach them via the host IP. On WordPress hosts, MariaDB runs inside the cluster as a pod, so these rules are unnecessary and therefore omitted. ๐ง
ip saddr {{ server_ipv4 }} tcp dport { 3000, 10254 } accept
ingress-nginx runs with hostNetwork: true, which means it lives in the host’s network namespace. When it forwards a request to Grafana (port 3000) via a Kubernetes ExternalService, the source IP is the host’s own public IP โ not a pod IP. Same for kubelet health probes on port 10254. So we allow the host’s own IP, but only for these two specific ports โ not a blanket accept. ๐ฏ
ip saddr {
0.0.0.0/8, 10.0.0.0/8, 127.0.0.0/8, 169.254.0.0/16,
172.16.0.0/12, 192.168.0.0/16, 224.0.0.0/4, 240.0.0.0/5
} drop
Drop all RFC-1918 and reserved source ranges arriving on the external interface. If a packet claims to come from 10.x.x.x or 192.168.x.x on the public internet, it’s spoofed. The K3s pod CIDR exceptions above are placed before this block precisely so legitimate pod traffic isn’t caught here.
icmp type { destination-unreachable, echo-request, time-exceeded, parameter-problem } accept
Allow only the ICMP types we actually need: ping, traceroute, and network-unreachable signals. Everything else (redirect, router-advertisement, etc.) is dropped.
icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem,
router-advertisement, router-solicitation,
nd-neighbor-solicitation, nd-neighbor-advertisement,
echo-request, echo-reply } accept
ICMPv6 needs a longer allowlist because IPv6 fundamentally depends on it. Neighbor Discovery Protocol (NDP) โ the IPv6 equivalent of ARP โ runs over ICMPv6. Without these, IPv6 simply doesn’t work.
ip daddr {{ server_ipv4 }} tcp dport { 80, 443 } ct state new accept
ip daddr {{ server_ipv4 }} udp dport 443 ct state new accept
ip6 daddr {{ server_ipv6 }} tcp dport { 80, 443 } accept
ip6 daddr {{ server_ipv6 }} udp dport 443 accept
Open HTTP, HTTPS, and UDP/443 (QUIC/HTTP3) for the web server. Both IPv4 and IPv6. The UDP rule is what enables HTTP/3 โ browsers advertise it via Alt-Svc and then switch over for subsequent connections. ๐
ip daddr {{ server_ipv4 }} tcp dport 10022 ct state new accept
SSH on a non-standard port. Port 22 is just noise at this point.
limit rate 2/minute log prefix "nftables DROP IN: " level warn flags all
Rate-limited logging of everything else before the policy drop. You get visibility into what’s being blocked without flooding the kernel log. The prefix makes it easy to grep for in journald.
๐ Chain: FORWARD
K3s manages its own KUBE-* chains for service routing โ this chain just needs to pass pod-to-pod and pod-to-service traffic within the pod CIDR, and block everything else. The blocklist sets are checked here too so banned IPs can’t exploit forwarded traffic.
๐ค Chain: OUTPUT
Outbound traffic is also restricted by default DROP. Allowed: DNS, NTP, HTTP/HTTPS for package downloads and ACME cert renewal, SMTP for mail delivery, the K3s API server on 6443, and QUIC outbound. Everything else โ including random outbound connections that malware might try to make โ is dropped and logged.
๐ Wrap-up
The nftables ruleset does a lot of work in relatively few lines. Single file, dual-stack IPv4/IPv6, native dynamic blocklists, connection tracking, and Kubernetes-aware exceptions โ all in one atomic unit that gets applied cleanly on every Ansible run. That’s the kind of thing that makes you appreciate having infrastructure-as-code. ๐ค
The repo is at github.com/aptupgrademe/www_k3s if you want to dig in.
