For few years now, I’ve been running a small Kubernetes cluster for managing a few services, including this blog.
My setup has generally had a strong focus on cost-savings and efficient resource usage, so I generally avoided anything that didn’t have a way to set spending limits or predictable cost. It is just a personal project hobby after all.
After a few iterations of this setup on DigitalOcean and adquiring some dedicated hardware, I eventually moved to hosting the cluster at home.
Setting up and maintaining a Kubernetes cluster is not a trivial task and requires some level of planning. However, once everything is up and running, it’s generally a very pleasant setup to work with.
One part of the puzzle is storage. For clusters hosted on a large provider like AWS or Google Cloud, you generally have a few storage options available like EBS and S3. With my homelab setup, however, I had to look for options that I could run locally.
Rook is a project that I’ve been following and using for a while to accomplish this. It allows you to run a storage layer on your cluster and provides multiple interfaces: Block Storage, Object Storage (i.e. S3-like), and Distributed Filesystems. It does all this by deploying and managing a Ceph cluster for you.
I’ve primarily used it for Block Storage support, but more recently I’ve begun to use it’s Object Storage gateway (RGW), which allows you to consume storage using an S3-compatible API.
When deployed, the default setup provides you with buckets accessible over a
path-based API. Nonetheless, if you have used a block storage solution from a
cloud provider, you’ve likely noticed that they generally provide DNS-based
bucket access (e.g. {bucket_name}.{service_endpoint}
). Similarly, a lot of
S3 clients seem to expect this DNS-based API.
From reading the documentation of Ceph’s RGW, it seemed that there is some level support for this, including serving static websites. So I set out to explore what would it take to get this working with Rook.
I eventually got this working using Rook 1.2 and Ceph Nautilus. Below are some of my notes on some of the steps I took.
DNS
For the DNS records themselves, I started by picking out a name for my storage service, and created two DNS entries:
buckets.chromabits.com
: Just points to one of my ingress nodes.*.buckets.chromabits.com
: A wildcard that also points to my ingress nodes.
Once set up, I verified that that visiting random subdomains resolved correctly to my cluster’s ingress endpoints.
Certificates
In order to serve buckets using HTTPS, I needed a wildcard certificate. Fortunately, this is trivial to set up using cert-manager and LetsEncrypt.
When creating the certificate CRD, I included the wildcard host in the list of
dnsNames
:
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: buckets-chromabits-com
namespace: rook-ceph
spec:
secretName: buckets-chromabits-com-tls
issuerRef:
name: letsencrypt-prod
dnsNames:
- buckets.chromabits.com
- '*.buckets.chromabits.com'
Ingress
Next, is routing requests to the RGW service. I had a preexisting ingress set up so the main challenge was figuring out how to handle requests for a wilcard domain.
Upon some initial reading, it seems that Kubernetes ingresses don’t have a way
to handle wilcard domains. However, after skimming through some issues on
GitHub, I learned that it is possible to configure nginx-ingress
to handle
this case.
This is done through the nginx.ingress.kubernetes.io/server-alias
annotation
on the Ingress resource. I set nginx.ingress.kubernetes.io/server-alias
to
'*.buckets.chromabits.com'
in the annotations and it began handling requests
for the subdomains.
Another option here would be to manually modify the ingress every time a new bucket is created, but that doesn’t really scale well and only seems feasible if you only plan to have a small number of buckets.
Rook Configuration
The last step is to configure the RGW to handle requests from these domains.
The documentation mentions that a domain can be set in rgw dns name
in the
daemon’s configuration. Though this didn’t seem like a simple change to
implement using Rook, so I looked for alternatives.
I eventually learned that it is possible to specify one or more hostnames per
zonegroup on the RGW, without having to mess with global settings. So, adding
the hostnames I needed was just a matter of modifying the default
zonegroup.
I deployed the Rook Toolbox container and used
radosgw-admin zonegroup get default
to get the configuration of the default
zonegroup.
I stored the output on a JSON file and modified the JSON object to include a
new hostname in the hostnames
key:
radosgw-admin zonegroup get default > default.json
Once satisfied with the changes, I applied them and restarted the RGW:
radosgw-admin zonegroup set --infile default.json
radosgw-admin period update --commit
After the RGW came back up, buckets began to resolve correctly via DNS! (e.g.
example1.buckets.chromabits.com
).