Skip to main content
The Path to Automated TLS - Part 3  Automated Certificates with Cert-Manager

The Path to Automated TLS - Part 3 Automated Certificates with Cert-Manager

·7 mins·
Andrei Vasiliu
Author
Andrei Vasiliu
Romanian expat in Italy. Platform Engineer by trade, homelab builder by passion. Documenting every step of building enterprise-grade infrastructure at home.
Table of Contents
The Path to Automated TLS - This article is part of a series.
Part 3: This Article

Locking it Down - From HTTP to HTTPS
#

In the preceding chapters, we established the networking foundation for a production-grade bare-metal Kubernetes platform.

  • In Chapter 1, we implemented MetalLB to provide stable LoadBalancer IPs, solving the primary hurdle of bare-metal service exposure.
  • In Chapter 2, we deployed Traefik using the Gateway API to handle L7 routing and configured Technitium DNS for internal name resolution, successfully routing http://test.dev.thebestpractice.tech to our NGINX test service.

The logical and final step is to secure this ingress path. Unencrypted HTTP is unacceptable for a production-grade setup, even within a homelab.

This chapter addresses that gap by implementing automated TLS certificate management. We will deploy Cert-Manager and configure it to perform DNS-01 challenges against a public DNS provider (GCP Cloud DNS) to obtain publicly trusted certificates from Let’s Encrypt. This “split-horizon DNS” approach allows us to secure internal services with valid certificates, completing our platform’s networking stack.

Let’s lock it down.

The Split-Horizon DNS Strategy: Public Challenges, Private Resolution
#

To get a publicly trusted certificate from Let’s Encrypt, we must prove we own the domain we’re requesting it for. The most robust way to do this is with a DNS-01 challenge. This involves creating a specific TXT record in our domain’s public DNS zone.

This presents a classic homelab dilemma:

  • Our cluster uses an internal DNS server (Technitium) that resolves dev.thebestpractice.tech to a private IP (10.20.0.90).
  • Let’s Encrypt needs to verify a TXT record on a public DNS server.

The solution is a split-horizon DNS (or “split-brain”) setup. My primary domain, thebestpractice.tech, is managed by Cloudflare. However, my current Cloudflare plan doesn’t allow for the creation of separate, delegable sub-zones. To work around this while still using the powerful DNS-01 challenge, I will introduce GCP Cloud DNS for a very specific purpose.

We will configure the same domain, dev.thebestpractice.tech, in two different places:

  1. Publicly, on GCP Cloud DNS: This zone will be used only by Cert-Manager to solve the DNS-01 challenges. Let’s Encrypt will query this public zone.
  2. Privately, on Technitium DNS: This zone will continue to serve our internal network, resolving our services to their private IPs.

To make this work, we must delegate the dev.thebestpractice.tech subdomain from our primary registrar (Cloudflare) to GCP’s nameservers.

GCP_cloud_dns

Cloudflare_delegation

This setup gives us the best of both worlds: the security of public validation and the privacy of internal resolution.

Cert-Manager: Your Automated Certificate Authority
#

With our DNS strategy in place, we can now deploy Cert-Manager. This powerful Kubernetes tool automates the entire lifecycle of TLS certificates. It will:

  • Watch for Certificate resources.
  • Communicate with Let’s Encrypt to initiate challenges.
  • Create the necessary TXT records in GCP Cloud DNS using a Service Account.
  • Verify the challenge and retrieve the signed certificate.
  • Store the certificate in a Kubernetes Secret.
  • Automatically renew the certificate before it expires.

A critical piece of the configuration is telling Cert-Manager to use public DNS servers for its validation checks, ensuring it bypasses our internal Technitium DNS and can see the public records it creates in GCP.

GitOps Implementation: Deploying Cert-Manager with ArgoCD
#

As always, we turn to our GitOps repository to declaratively manage the deployment.

Directory Structure
#

Following our established pattern, the configuration for Cert-Manager is laid out in our GitOps repository.

.
├── base
│   ├── cert-manager
│   │   ├── cert-manager.yaml
│   │   └── values.yaml
│   └── secrets
│       └── cert-manager
│           └── cert-manager-dns-sa.yaml
└── environments
    └── dev
        └── cert-manager
            ├── certificate
            │   ├── ClusterIssuer_letsencrypt.yaml
            │   └── cert-dev-tbp.yaml
            ├── custom-values
            │   └── custom-values.yaml
            └── root-certificate.yaml

1. The Base Application and DNS Resolver Configuration
#

First, we define the base ArgoCD Application for Cert-Manager. The most important part is in the Helm values, where we configure Cert-Manager to use public DNS resolvers for its validation checks. This ensures it can see the public TXT records it creates in GCP.

environments/dev/cert-manager/custom-values/custom-values.yaml:

crds:
  # This option decides if the CRDs should be installed
  # as part of the Helm installation.
  enabled: true

# Additional command line flags to pass to cert-manager controller binary.
# The internal network use a local DNS -> Technitium DNS server
# for Let's Encrypt DNS-01 challenge validation we need to instruct cert-manager
# to use public recursive nameservers only for DNS-01 challenge validation.
extraArgs:
  - '--dns01-recursive-nameservers-only'
  - '--dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53'

2. The GCP Service Account Secret
#

Cert-Manager needs credentials to modify DNS records in GCP. We create a GCP Service Account with the “DNS Administrator” role, generate a JSON key, and store it securely in 1Password.

GCP_DNS_sa

Then, we use ExternalSecret to sync this key into a Kubernetes Secret in the cert-manager namespace. This process relies on the External Secrets Operator and 1Password integration that I detailed in a previous post… if you haven’t set up this foundation, I highly recommend reading that article first.

base/secrets/cert-manager/cert-manager-dns-sa.yaml:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: cert-manager-dns-sa
  namespace: cert-manager
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: op-cluster-secret-store
  target:
    creationPolicy: Owner
  data:
  - secretKey: service_account.json
    remoteRef:
      key: EXTSEC_1Password_GCP_TBP_DNS_Admin # 1Password item name
      property: password

3. The ClusterIssuer
#

This ClusterIssuer is the heart of our setup. It tells Cert-Manager how to issue certificates. We configure it to use the cloudDNS solver, pointing it to our GCP project and the Secret we just created.

environments/dev/cert-manager/certificate/ClusterIssuer_letsencrypt.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  annotations:
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  name: letsencrypt-cluster-issuer
spec:
  acme:
    email: andrei@thebestpractice.com
    privateKeySecretRef:
      name: letsencrypt-issuer-account-key
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        cloudDNS:
          hostedZoneName: dev-thebestpractice-tech
          project: diesel-polymer-445422-e3 # The GCP Project ID
          serviceAccountSecretRef:
            key: service_account.json
            name: cert-manager-dns-sa
      selector:
        dnsZones:
        - dev.thebestpractice.tech
        - '*.dev.thebestpractice.tech'

4. Requesting the Wildcard Certificate
#

Now we request the wildcard certificate that will secure all services in our dev environment. Cert-Manager will see this resource and begin the DNS-01 challenge process.

environments/dev/cert-manager/certificate/cert-dev-tbp.yaml:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  name: wildcard.dev.thebestpractice.tech
  namespace: traefik
spec:
  commonName: '*.dev.thebestpractice.tech'
  dnsNames:
    - '*.dev.thebestpractice.tech'
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-cluster-issuer
  renewBefore: 360h0m0s
  secretName: wildcard-dev-thebestpractice-tech-cert-tls

Once the challenge is complete, Cert-Manager will create a secret named wildcard-dev-thebestpractice-tech-cert-tls in the traefik namespace, containing the signed certificate and private key.

argo_root_cert

cert_ok

5. Configuring Traefik for TLS
#

The final step is to tell our Traefik Gateway to use this new certificate. We update the Traefik Helm values to enable the websecure listener on port 8443 and reference the secret created by Cert-Manager.

environments/dev/ingress/traefik/custom-values/override.values.yaml:

# ... other values
gateway:
  enabled: true
  listeners:
    websecure:
      port: 8443
      protocol: HTTPS
      namespacePolicy:
        from: All
      mode: Terminate
      certificateRefs:
        - name: wildcard-dev-thebestpractice-tech-cert-tls
# ... other values

6. Verification: The Padlock Appears
#

With all the pieces in place, the entire flow is automated.

  1. ArgoCD syncs all our new manifests.
  2. Cert-Manager sees the Certificate resource and starts the DNS-01 challenge with Let’s Encrypt.
  3. It uses the GCP SA credentials to create a TXT record in GCP Cloud DNS.
  4. Let’s Encrypt verifies the record and issues the certificate.
  5. Cert-Manager saves it to the wildcard-dev-thebestpractice-tech-cert-tls secret.
  6. Traefik automatically loads the secret and begins serving traffic over HTTPS.

Now, when we navigate to https://test.dev.thebestpractice.tech, we are greeted with the NGINX welcome page, but this time, it’s served securely with a valid TLS certificate.

nginx_tls

Conclusion: A Production-Grade Networking Stack
#

This three-part series has systematically constructed a complete, production-grade networking stack on a bare-metal Kubernetes cluster. The final platform integrates several key technologies to achieve a level of automation and security on par with enterprise cloud environments:

  • Network Load Balancing: Provided by MetalLB, enabling stable LoadBalancer IP addresses.
  • Intelligent L7 Routing: Managed by Traefik using the modern Gateway API.
  • Split-Horizon DNS: Implemented with Technitium for internal resolution and GCP Cloud DNS for public challenges.
  • Fully Automated TLS: Orchestrated by Cert-Manager to issue and renew publicly trusted certificates from Let’s Encrypt.

With every component managed declaratively through GitOps, the resulting infrastructure is reproducible, version-controlled, and resilient. This architecture transforms a standard homelab into a powerful personal platform, built with the same principles that drive modern production systems.

Stay tuned! Andrei

The Path to Automated TLS - This article is part of a series.
Part 3: This Article