Skip to content

Commit 7553162

Browse files
committed
Merge remote-tracking branch 'upstream/main' into traefik-fallback-service
2 parents 5d3f211 + 6df102b commit 7553162

File tree

99 files changed

+1646
-1752
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

99 files changed

+1646
-1752
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
- [ ] Service has placement constraints or is global
1919
- [ ] Service is restartable
2020
- [ ] Service restart is zero-downtime
21+
- [ ] Service is monitored (via prometheus and grafana)
2122
- [ ] Service is not bound to one specific node (e.g. via files or volumes)
2223
- [ ] Relevant OPS E2E Test are added
23-
- [ ] Service's Public URL is included in maintenance mode -->
24+
- [ ] Service's Public URL is included in maintenance mode
25+
- [ ] Service's Public URL is included in testing mode -->

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ docs/_build
129129
/services/monitoring/pgsql_query_exporter_config.yaml
130130
/services/monitoring/docker-compose.yml
131131
/services/monitoring/smokeping_prober_config.yaml
132-
132+
services/monitoring/tempo_config.yaml
133133

134134
# Simcore: Contains location of repo.config file on the machine and of the whole config directory
135135
.config.location

.pre-commit-config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ repos:
7979
hooks:
8080
- id: shellcheck
8181
name: Shell scripts conform to shellcheck
82+
- repo: https://github.com/antonbabenko/pre-commit-terraform
83+
rev: v1.89.1 # Get the latest from: https://github.com/antonbabenko/pre-commit-terraform/releases
84+
hooks:
85+
- id: terraform_fmt
8286
- repo: local
8387
hooks:
8488
- id: run-pylint

Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@ down-maintenance: ## Stop the maintenance mode
7171
fi \
7272
,)
7373

74-
7574
# Misc: info & clean
7675
.PHONY: info info-vars info-local
7776
info: ## Displays some important info

charts/Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ helmfile-sync: .check-helmfile-installed helmfile.yaml ## Syncs the helmfile con
4949
$(MAKE) -s .helmfile-local-post-install; \
5050
fi
5151

52-
5352
.PHONY: configure-local-hosts
5453
configure-local-hosts: ## Adds local hosts entries for the machine
5554
@echo "Adding $(MACHINE_FQDN) hosts to /etc/hosts ..."
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
## How to delete volumes with `recalimPolicy: retain`
2+
1. Delete pvc:
3+
```
4+
kubectl delete pvc <pvc-name>
5+
```
6+
7+
2. Verify PV is `released`
8+
```
9+
kubectl get pv <pv-name>
10+
```
11+
12+
3. Manually remove EBS in AWS
13+
1. Go to AWS GUI and List EBS Volumes
14+
1. Filter by tag `ebs.csi.aws.com/cluster=true`
15+
1. Identify the volume associated with your PV (check `kubernetes.io/created-for/pv/name` tag of the EBS Volume)
16+
1. Verify that EBS Volume is `Available`
17+
1. Delete EBS Volume
18+
19+
4. Delete the PV
20+
```
21+
kubectl delete pv <pv-name>
22+
```
23+
24+
5. Remove Finalizers (if necessary)
25+
If the PV remains in a Terminating state, remove its finalizers:
26+
```
27+
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'
28+
```

charts/aws-ebs-csi-driver/values.yaml.gotmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ image:
55
tag: "v1.38.1"
66

77
storageClasses:
8-
- name: "ebs-sc"
8+
- name: "{{ .Values.ebsStorageClassName }}"
99
parameters:
1010
type: "gp3"
1111
allowVolumeExpansion: true

charts/longhorn/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Longhorn (LH) Knowledge Base
2+
3+
### Can LH be used for critical services (e.g., Databases)?
4+
5+
No. We should not use it for volumes of critical services.
6+
7+
As of now, we should avoid using LH for critical services. Instead, we should rely on easier-to-maintain solutions (e.g., application-level replication [Postgres Operators], S3, etc.). Once we get hands-on experience, extensive monitoring and ability to scale LH, we can consider using it for critical services.
8+
9+
LH uses networking to keep replicas in sync, and IO-heavy workloads may easily overload it, leading to unpredictable consequences. Until we can extensively monitor LH and scale it properly on demand, it should not be used for critical or IO-heavy services.
10+
11+
### How does LH decide which node's disk to use as storage?
12+
13+
It depends on the configuration. There are three possibilities:
14+
* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
15+
16+
When using the `Create Default Disk on Labeled Nodes` option, it relies on the `node.longhorn.io/create-default-disk` Kubernetes node label.
17+
18+
Source: https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
19+
20+
### Will LH pick up storage from a newly added node?
21+
22+
By default, LH will use storage on all nodes (including newly created ones) where it runs. If `createDefaultDiskLabeledNodes` is configured, it will depend on the label of the node.
23+
24+
Source:
25+
* https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
26+
* https://longhorn.io/docs/1.8.1/nodes-and-volumes/nodes/default-disk-and-node-config/#customizing-default-disks-for-new-nodes
27+
28+
### Can workloads be run on nodes where LH is not installed?
29+
30+
Workloads can run on nodes without LH as long as LH is not restricted to specific nodes via the `nodeSelector` or `systemManagedComponentsNodeSelector` settings. If LH is configured to run on specific nodes, workloads can only run on those nodes.
31+
32+
Note: There is an [ongoing bug](https://github.com/longhorn/longhorn/discussions/7312#discussioncomment-13030581) where LH will raise warnings when workloads run on nodes without LH. However, it will still function correctly.
33+
34+
Source: https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/
35+
36+
### Adding new volumes to (PVs that rely on) LH
37+
38+
Monitor carefully whether LH is capable of handling new volumes. Test the new volume under load (when many read/write operations occur) and ensure LH does not fail due to insufficient resource capacities (e.g., network or CPU). You can also consider LH's performance section from this Readme.
39+
40+
LH's minimum recommended resource requirements:
41+
* https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
42+
43+
### LH's performance / resources
44+
45+
Insights into LH's performance:
46+
* https://longhorn.io/blog/performance-scalability-report-aug-2020/
47+
* https://github.com/longhorn/longhorn/wiki/Performance-Benchmark
48+
49+
Resource requirements:
50+
* https://github.com/longhorn/longhorn/issues/1691

charts/longhorn/values.yaml.gotmpl

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Values documentation:
2+
# https://github.com/longhorn/longhorn/tree/v1.8.1/chart#values
3+
4+
global:
5+
# Warning: updating node selectors (after installation) will cause downtime
6+
# https://longhorn.io/docs/archives/1.2.2/advanced-resources/deploy/node-selector/#setting-up-node-selector-after-longhorn-has-been-installed
7+
#
8+
# Warning: using node selectors will restrict our workloads to the same nodes
9+
# https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/#deploy-longhorn-components-only-on-a-specific-set-of-nodes
10+
nodeSelector: {}
11+
systemManagedComponentsNodeSelector: {}
12+
13+
defaultSettings:
14+
replicaAutoBalance: best-effort
15+
16+
# control on which nodes LH will use disks
17+
# use `node.longhorn.io/create-default-disk` node label for control
18+
createDefaultDiskLabeledNodes: true
19+
# use dedicated folder (disk) for storage
20+
defaultDataPath: /longhorn
21+
22+
# https://longhorn.io/docs/1.8.1/best-practices/#minimal-available-storage-and-over-provisioning
23+
storageMinimalAvailablePercentage: 10
24+
25+
# Prevent LH deletion. Set to true if you want to delete LH
26+
deletingConfirmationFlag: false
27+
28+
# let replicas to be scheduled on the same node
29+
replicaSoftAntiAffinity: false
30+
31+
# we always use dedicated disks. 5% is a good value
32+
storageReservedPercentageForDefaultDisk: 5
33+
34+
persistence:
35+
# use only for non-critical ops workloads
36+
# for critical workloads (e.g. database)
37+
# use application replication (e.g. postgres HA operator)
38+
defaultClass: false
39+
40+
# https://longhorn.io/docs/1.8.1/best-practices/#io-performance
41+
defaultDataLocality: best-effort
42+
defaultClassReplicaCount: 2
43+
44+
# minimum volume size is 300Mi
45+
# https://github.com/longhorn/longhorn/issues/8488
46+
defaultFsType: xfs
47+
48+
resources: # https://longhorn.io/docs/1.8.1/best-practices/#minimum-recommended-hardware
49+
requests:
50+
cpu: 0.5
51+
memory: 128Mi
52+
limits:
53+
cpu: 4
54+
memory: 4Gi
55+
56+
ingress:
57+
enabled: true
58+
className: ""
59+
annotations:
60+
namespace: {{ .Release.Namespace }}
61+
cert-manager.io/cluster-issuer: "cert-issuer"
62+
traefik.ingress.kubernetes.io/router.entrypoints: websecure
63+
traefik.ingress.kubernetes.io/router.middlewares: traefik-traefik-basic-auth@kubernetescrd,traefik-longhorn-strip-prefix@kubernetescrd # namespace + middleware name
64+
tls: true
65+
tlsSecret: monitoring-tls
66+
host: {{ requiredEnv "K8S_MONITORING_FQDN" }}
67+
path: /longhorn
68+
pathType: Prefix
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
persistence:
22
enabled: true
33
size: "1Gi" # minimal size for gp3 is 1Gi
4-
storageClass: "ebs-sc"
4+
storageClass: "{{ .Values.ebsStorageClassName }}"

0 commit comments

Comments
 (0)