Skip to content

Fix BYOC TLS#38

Merged
monrax merged 8 commits intomainfrom
fix/jks
Sep 8, 2025
Merged

Fix BYOC TLS#38
monrax merged 8 commits intomainfrom
fix/jks

Conversation

@monrax
Copy link
Copy Markdown
Collaborator

@monrax monrax commented Aug 28, 2025

This closes #34

This PR attempts to address the following issue:

  • The cert mounted at GRAYLOG_HTTP_TLS_CERT_FILE is presented to both clients outside the cluster for external access, as well as internal clients for traffic between nodes. This means a cert with only a CN field that does not match any of the node names will prevent inter-node secure communication.
  • In addition, "None of the TrustManagers trust this certificate chain" pops up when using a self-signed cert.
  • Thus, when using a self-signed cert, both:
    • Graylog nodes acting as servers does not advertise a certificate with the correct node names.
    • Graylog nodes acting as clients do not even trust the certificate that is presented to them.
  • This causes inputs to not work when BYOC TLS is enabled.

The fix:

  • The cert to be mounted should have both a CN field, and a SAN field with at least two entries: the CN again, but also the internal node names as a wild card like so: DNS:*.graylog-svc.graylog.svc.cluster.local (adjusted to the current service name and namespace)
  • The mounted cert, and the root CA included inside it if that's the case, should be added to the Java Key Store as indicated here (in the "Java Key Store" section).

Implementation details:

  • values.yaml: added updateKeyStore (which defaults to true) and keyStorePass (which defaults to Java's own default "changeit") to graylog.config.tls.byoc to control when to enable updating the Java Key Store.
  • _helpers.tpl: the full Java options passed to the containers will also include "-Djavax.net.ssl.trustStore=/usr/share/graylog/data/cacerts/graylog.jks" (as well as the corresponding password option) whenever both graylog.config.tls.byoc.enabled and graylog.config.tls.byoc.updateKeyStore are set to true.
  • init-script.sh: this is the main bit. In here we perform one check to see if the wildcard *.graylog-svc.graylog.svc.cluster.local is present as a SAN in the cert mounted using the provided TLS k8s secret when graylog.config.tls.byoc.enabled is true . If this is not the case the Graylog application pods won't start up until a valid certificate is provided (valid meaning the cert has the wildcard SAN field). Secondly, if graylog.config.tls.byoc.updateKeyStore is also true the certificate (and any root CA in it) will be added to the Java Key Store automatically.

The part I'm less comfortable with is the one where the application doesn't start up if a cert is given without the DNS:*.graylog-svc.graylog.svc.cluster.local SAN (which are going to be most certificates out there), even if the init container logs tell the user what's going on. I would rather have Graylog use two certs: one for inter-node communication (which we would generate internally automatically) and leave GRAYLOG_HTTP_TLS_CERT_FILE for external access, or something similar (perhaps, even allow an option to skip hostname verification altogether?). However, that would require internal changes to the application, and not an option right now.

Notes for Reviewers

  • The commit history must be preserved - please use the rebase-merge or standard merge option instead of squash-merge
  • Sync up with the author before merging

@monrax monrax marked this pull request as draft August 28, 2025 08:50
@monrax
Copy link
Copy Markdown
Collaborator Author

monrax commented Aug 28, 2025

Turning into draft until feature/fix/patch has been documented.

@monrax monrax force-pushed the fix/jks branch 2 times, most recently from 502aeed to 279d590 Compare September 2, 2025 14:00
@monrax
Copy link
Copy Markdown
Collaborator Author

monrax commented Sep 3, 2025

Validation steps:

  1. Checkout this branch:
git checkout fix/jks
git pull
  1. Install chart:
helm install graylog ./graylog -n graylog --set provider="aws-managed-sc"
  1. Set the service type to LoadBalancer:
helm upgrade graylog ./graylog -n graylog --set graylog.custom.service.type="LoadBalancer" --reuse-values
  1. Get the service hostname/IP address and update the DNS entry corresponding to your domain:
SERVICE_NAME=graylog-svc; kubectl get svc $SERVICE_NAME -n graylog
  1. If you don't already have a certificate, create one with the additional DNS:*.graylog-svc.graylog.svc.cluster.local SAN entry:
openssl req -x509 -sha256 -newkey rsa:2048 -days 365 -keyout server.key -out server.crt -nodes -subj "/CN=<your domain here>" -addext "subjectAltName=DNS:<your domain here>,DNS:*.graylog-svc.graylog.svc.cluster.local"
  1. Create a corresponding k8s secret
kubectl create secret tls my-cert --cert=public.crt --key=private.key -n graylog
  1. Enable TLS
helm upgrade graylog ./graylog -n graylog --reuse-values --set graylog.config.tls.enabled=true --set  graylog.config.tls.secretName="my-cert" --set graylog.config.tls.updateKeyStore=true
  1. Check that you can configure and run inputs:
  • Wait for all pods to restart
  • Go to http://<your domain>:9000/ and verify the certificate corresponds to the one configured above
  • Log in with your credentials, set up an input, and start the input
  • Check the container logs to make sure nothing unexpected shows up

Done!

@monrax monrax marked this pull request as ready for review September 3, 2025 16:17
@williamtrelawny
Copy link
Copy Markdown
Collaborator

williamtrelawny commented Sep 5, 2025

It works!

The only minor thing I'd add is in NOTES.txt - when a user has graylog.config.tls.enabled=true, change the URL on line 28 to say https instead of the default http.

If I was better at Go, I'd go ahead and do this for you, I'm sorry 😢

@monrax
Copy link
Copy Markdown
Collaborator Author

monrax commented Sep 5, 2025

It works!

The only minor thing I'd add is in NOTES.txt - when a user has graylog.config.tls.enabled=true, change the URL on line 28 to say https instead of the default http.

Patched! Please re-validate when convenient.

If I was better at Go, I'd go ahead and do this for you, I'm sorry 😢

No worries -- reviewing that the code works is more important than the actual code 💪

Copy link
Copy Markdown
Collaborator

@williamtrelawny williamtrelawny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀 Awesome work @monrax !!

@williamtrelawny
Copy link
Copy Markdown
Collaborator

I already resolved the merge conflict but it's still telling me there's another conflict, so I'm going to leave it to you @monrax to resolve, sorry... Hope I didn't mess anything up when resolving the first conflict.

@monrax monrax merged commit ba0da4c into main Sep 8, 2025
1 check passed
@monrax monrax mentioned this pull request Sep 8, 2025
@monrax monrax deleted the fix/jks branch December 5, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix BYO TLS - "None of the TrustManagers trust this certificate chain"

2 participants