Skip to content

Commit 4d858ba

Browse files
Tezos signer forwarder (#498)
* tezos signer forwarder chart The last remaining piece of https://github.com/midl-dev/tezos-on-gke/ to move into tezos-k8s, tezos-signer-forwarder is a terminating pod for ssh tunnels exposing a tezos signing endpoint from an on-prem location. * support for HA signers * support for loadbalancerip instead of annotation * instead of 2 service monitors, relabel the alerts from signer enable cold standby * support for selection of the signer port * set port for service as well * set scrape timeout for remote signers to 20s * add toggle for signer metrics * name ports in statefulset as well * make sure signer-forwarder pod restarts when endpoint config changes * replace janky python injection script with `targetLabels` I didn't know about `targetLabels` but it seems more natural to do it this way. * last part - replace ad-hoc relabeling with proper ServiceMonitor config * lint * better explanation for sidecar * remove namespace from serviceMonitor (bc it's not set anywhere else) * midl => tezos * pin alpine to more stable * add -D and -e to CMD in signerForwarder dockerfile does not do anything since we use entrypoint in chart * move signer forwarder image into tezos_k8s_images * values: uncomment and make "" * load balancer ip: uncomment and set to "" * Update charts/tezos-signer-forwarder/templates/statefulset.yaml Co-authored-by: Aryeh Harris <harryttd@users.noreply.github.com> * simplify enumeration * Update charts/tezos-signer-forwarder/scripts/signer_exporter.py Co-authored-by: Aryeh Harris <harryttd@users.noreply.github.com> * add readonly for the ssh secrets * default mode 400 for more config files * remove range and add enumeration in service * only expose metrics port in service when enabled in values * Revert "only expose metrics port in service when enabled in values" This reverts commit 49cf3a9. * grab endpoint port straight from values.yaml instead of going thru a cm * re-add missing quotes * revert some of the perm changes to make it work * add comment why container runs as root * handle readiness probe timeout just like for the node * do not hardcode pulumi annotation --------- Co-authored-by: Aryeh Harris <harryttd@users.noreply.github.com>
1 parent 9d1750c commit 4d858ba

File tree

13 files changed

+567
-0
lines changed

13 files changed

+567
-0
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: v2
2+
name: tezos-signer-forwarder
3+
description: A chart for tezos-signer-forwarder
4+
type: application
5+
version: 0.0.0
6+
appVersion: "10.0"
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/sh
2+
3+
/usr/sbin/sshd -D -e -p ${TUNNEL_ENDPOINT_PORT}
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/usr/bin/env python
2+
import os
3+
from flask import Flask, request, jsonify
4+
import requests
5+
6+
import logging
7+
log = logging.getLogger('werkzeug')
8+
log.setLevel(logging.ERROR)
9+
10+
application = Flask(__name__)
11+
12+
readiness_probe_path = os.getenv("READINESS_PROBE_PATH")
13+
signer_port = os.getenv("SIGNER_PORT")
14+
signer_metrics = os.getenv("SIGNER_METRICS") == "true"
15+
16+
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
17+
# Configured readiness probe timeoutSeconds is 5s, timeout sync request before that.
18+
SIGNER_CONNECT_TIMEOUT = 4.5
19+
20+
@application.route('/metrics', methods=['GET'])
21+
def prometheus_metrics():
22+
'''
23+
Prometheus endpoint
24+
This combines:
25+
* the metrics from the signer, which themselves are a combination of the
26+
prometheus node-exporter and custom probes (power status, etc)
27+
* the `unhealthy_signers_total` metric exported by this script, verifying
28+
whether the signer URL configured upstream returns a 200 OK
29+
'''
30+
31+
try:
32+
probe = requests.get(f"http://localhost:{signer_port}{readiness_probe_path}", timeout=SIGNER_CONNECT_TIMEOUT)
33+
except requests.exceptions.ConnectTimeout:
34+
#Timeout connect to node
35+
probe = None
36+
except requests.exceptions.ReadTimeout:
37+
#Timeout read from node
38+
probe = None
39+
except requests.exceptions.RequestException:
40+
probe = None
41+
if probe and signer_metrics:
42+
try:
43+
healthz = requests.get(f"http://localhost:{signer_port}/healthz").text
44+
except requests.exceptions.RequestException:
45+
healthz = None
46+
else:
47+
healthz = None
48+
return '''# number of unhealthy signers - should be 0 or 1
49+
unhealthy_signers_total %s
50+
%s''' % (0 if probe else 1, healthz or "")
51+
52+
if __name__ == "__main__":
53+
application.run(host = "0.0.0.0", port = 31732, debug = False)
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
{{/*
2+
Expand the name of the chart.
3+
*/}}
4+
{{- define "tezos-signer-forwarder.name" -}}
5+
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
6+
{{- end }}
7+
8+
{{/*
9+
Create a default fully qualified app name.
10+
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
11+
If release name contains chart name it will be used as a full name.
12+
*/}}
13+
{{- define "tezos-signer-forwarder.fullname" -}}
14+
{{- if .Values.fullnameOverride }}
15+
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
16+
{{- else }}
17+
{{- $name := default .Chart.Name .Values.nameOverride }}
18+
{{- if contains $name $.Release.Name }}
19+
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
20+
{{- else }}
21+
{{- printf "%s-%s" $.Release.Name $name | trunc 63 | trimSuffix "-" }}
22+
{{- end }}
23+
{{- end }}
24+
{{- end }}
25+
26+
{{/*
27+
Create chart name and version as used by the chart label.
28+
*/}}
29+
{{- define "tezos-signer-forwarder.chart" -}}
30+
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
31+
{{- end }}
32+
33+
{{/*
34+
Common labels
35+
*/}}
36+
{{- define "tezos-signer-forwarder.labels" -}}
37+
helm.sh/chart: {{ include "tezos-signer-forwarder.chart" . }}
38+
{{ include "tezos-signer-forwarder.selectorLabels" . }}
39+
{{- if .Chart.AppVersion }}
40+
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
41+
{{- end }}
42+
app.kubernetes.io/managed-by: {{ .Release.Service }}
43+
{{- end }}
44+
45+
{{/*
46+
Selector labels
47+
*/}}
48+
{{- define "tezos-signer-forwarder.selectorLabels" -}}
49+
app.kubernetes.io/name: {{ include "tezos-signer-forwarder.name" . }}
50+
app.kubernetes.io/instance: {{ .Release.Name }}
51+
{{- end }}
52+
53+
{{/*
54+
Create the name of the service account to use
55+
*/}}
56+
{{- define "tezos-signer-forwarder.serviceAccountName" -}}
57+
{{- if .Values.serviceAccount.create }}
58+
{{- default (include "tezos-signer-forwarder.fullname" .) .Values.serviceAccount.name }}
59+
{{- else }}
60+
{{- default "default" .Values.serviceAccount.name }}
61+
{{- end }}
62+
{{- end }}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
{{- if .Values.alertmanagerConfig.enabled }}
2+
{{- range .Values.signers }}
3+
{{- if .monitoring_email }}
4+
{{ $signer := . }}
5+
{{- range .endpoints }}
6+
{{- if .alert_when_down }}
7+
apiVersion: monitoring.coreos.com/v1alpha1
8+
kind: AlertmanagerConfig
9+
metadata:
10+
name: tezos-signer-{{ $signer.name }}-{{ .alias }}-email
11+
labels:
12+
{{- toYaml $.Values.alertmanagerConfig.labels | nindent 4 }}
13+
spec:
14+
route:
15+
groupBy: ['job']
16+
groupWait: 30s
17+
groupInterval: 5m
18+
repeatInterval: 12h
19+
receiver: 'email_{{ $signer.name }}'
20+
matchers:
21+
- name: service
22+
value: tezos-remote-signer-{{ $signer.name }}
23+
regex: false
24+
- name: alertType
25+
value: tezos-remote-signer-alert
26+
regex: false
27+
- name: tezos_endpoint_name
28+
value: {{ .alias }}
29+
regex: false
30+
continue: false
31+
32+
receivers:
33+
- name: 'email_{{ $signer.name }}'
34+
emailConfigs:
35+
- to: "{{ $signer.monitoring_email }}"
36+
sendResolved: true
37+
headers:
38+
- key: subject
39+
value: '{{`[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}`}}'
40+
html: >-
41+
{{`{{ if eq .Status "firing" }}
42+
Attention Required for Tezos Remote Signer:
43+
{{ else }}
44+
Resolved Alert for Tezos Remote Signer:
45+
{{ end }}
46+
{{ range .Alerts -}}
47+
{{ .Annotations.summary }}
48+
{{ end }}`}}
49+
text: >-
50+
{{`{{ if eq .Status "firing" }}
51+
Attention Required for Tezos Remote Signer:
52+
{{ else }}
53+
Resolved Alert for Tezos Remote Signer:
54+
{{ end }}
55+
{{ range .Alerts -}}
56+
{{ .Annotations.summary }}
57+
{{ end }}`}}
58+
---
59+
{{- end }}
60+
{{- end }}
61+
{{- end }}
62+
{{- end }}
63+
{{- end }}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{- range .Values.signers }}
2+
{{- $name := .name }}
3+
{{- range $i, $endpoint := .endpoints }}
4+
apiVersion: v1
5+
kind: ConfigMap
6+
metadata:
7+
name: tezos-signer-forwarder-config-{{ $name }}-{{ $i }}
8+
data:
9+
authorized_keys: "{{ $endpoint.ssh_pubkey }} signer"
10+
---
11+
{{- end }}
12+
{{- end }}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
{{- if .Values.prometheusRule.enabled }}
2+
apiVersion: monitoring.coreos.com/v1
3+
kind: PrometheusRule
4+
metadata:
5+
labels:
6+
{{- toYaml .Values.prometheusRule.labels | nindent 4 }}
7+
name: tezos-remote-signer-rules
8+
spec:
9+
groups:
10+
- name: tezos-remote-signer.rules
11+
rules:
12+
- alert: SignerPowerLoss
13+
annotations:
14+
description: 'Remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" has lost power'
15+
summary: 'Remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" has lost power'
16+
expr: power{namespace="{{ .Release.Namespace }}"} != 0
17+
for: 1m
18+
labels:
19+
severity: critical
20+
alertType: tezos-remote-signer-alert
21+
- alert: SignerWiredNetworkLoss
22+
annotations:
23+
description: 'Remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" has lost wired internet connection'
24+
summary: 'Tezos remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" has lost wired internet connection'
25+
expr: wired_network{namespace="{{ .Release.Namespace }}"} != 0
26+
for: 1m
27+
labels:
28+
severity: critical
29+
alertType: tezos-remote-signer-alert
30+
---
31+
apiVersion: monitoring.coreos.com/v1
32+
kind: PrometheusRule
33+
metadata:
34+
labels:
35+
{{- toYaml .Values.prometheusRule.labels | nindent 4 }}
36+
name: tezos-remote-signer-reachability-rules
37+
spec:
38+
groups:
39+
- name: tezos-remote-signer.rules
40+
rules:
41+
- alert: NoRemoteSigner
42+
annotations:
43+
description: 'Remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" is down'
44+
summary: 'Remote signer "{{`{{ $labels.tezos_endpoint_name }}`}}" for baker "{{`{{ $labels.tezos_baker_name }}`}}" is down or unable to sign.'
45+
expr: unhealthy_signers_total{namespace="{{ .Release.Namespace }}"} != 0
46+
for: 1m
47+
labels:
48+
severity: critical
49+
alertType: tezos-remote-signer-alert
50+
---
51+
{{- end }}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: tezos-signer-forwarder-secret-{{ .Values.name }}
5+
data:
6+
ssh_host_ecdsa_key: |
7+
{{ println .Values.secrets.signer_target_host_key | b64enc | indent 4 -}}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: tezos-remote-signer-ssh-ingress-{{ .Values.name }}
5+
annotations:
6+
{{ toYaml .Values.service_annotations | indent 4 }}
7+
spec:
8+
type: LoadBalancer
9+
selector:
10+
app.kubernetes.io/name: tezos-signer-forwarder
11+
ports:
12+
{{- range .Values.signers }}
13+
{{- $name := .name }}
14+
# undocumented k8s feature to make a service route to different pods
15+
# based on the port - allows to reuse the same public ip in all cloud
16+
# providers. For it to work, ports need to have names.
17+
# https://github.com/kubernetes/kubernetes/issues/24875#issuecomment-794596576
18+
{{- range $i, $endpoint := .endpoints }}
19+
- port: {{ $endpoint.tunnel_endpoint_port }}
20+
name: ssh-{{ trunc 9 $name }}-{{ $i }}
21+
targetPort: ssh-{{ trunc 9 $name }}-{{ $i }}
22+
{{- end }}
23+
{{- end }}
24+
# ensures that remote signers can always ssh
25+
publishNotReadyAddresses: true
26+
{{ if .Values.load_balancer_ip }}
27+
loadBalancerIP: {{ .Values.load_balancer_ip }}
28+
{{ end }}
29+
---
30+
{{- range .Values.signers }}
31+
apiVersion: v1
32+
kind: Service
33+
metadata:
34+
name: tezos-remote-signer-{{ .name }}
35+
labels:
36+
app.kubernetes.io/name: tezos-signer-forwarder
37+
tezos_baker_name: {{ .name }}
38+
spec:
39+
selector:
40+
app.kubernetes.io/name: tezos-signer-forwarder
41+
tezos_baker_name: {{ .name }}
42+
ports:
43+
- port: {{ .signer_port }}
44+
name: signer
45+
- port: 31732
46+
name: metrics
47+
# make sure that the service always targets the same signer, when HA is in use.
48+
sessionAffinity: ClientIP
49+
---
50+
{{- end }}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{{- if .Values.serviceMonitor.enabled }}
2+
{{- range .Values.signers }}
3+
apiVersion: monitoring.coreos.com/v1
4+
kind: ServiceMonitor
5+
metadata:
6+
labels:
7+
app.kubernetes.io/name: tezos-signer-forwarder
8+
name: tezos-remote-signer-monitoring-{{ .name }}
9+
spec:
10+
endpoints:
11+
- port: metrics
12+
path: /metrics
13+
# default scrape timeout of 10 can be too small for remote raspberry pis
14+
scrapeTimeout: "20s"
15+
selector:
16+
matchLabels:
17+
app.kubernetes.io/name: tezos-signer-forwarder
18+
tezos_baker_name: {{ .name }}
19+
targetLabels:
20+
- tezos_baker_name
21+
podTargetLabels:
22+
- tezos_endpoint_name
23+
---
24+
{{- end }}
25+
{{- end }}

0 commit comments

Comments
 (0)