Skip to content

Commit e985900

Browse files
authored
Add 4519 to v120 upgrade known issue (#466)
Signed-off-by: Jian Wang <[email protected]>
1 parent 216c41a commit e985900

File tree

3 files changed

+224
-7
lines changed

3 files changed

+224
-7
lines changed

docs/upgrade/v1-1-2-to-v1-2-0.md

Lines changed: 112 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for
298298
1. Check if the `prometheus-rancher-monitoring-prometheus-0` pod is stuck with the status `Terminating`.
299299
300300
```
301-
$ kubectl -n cattle-monitoring-system get pods
301+
$ kubectl -n cattle-monitoring-system get pods
302302
NAME READY STATUS RESTARTS AGE
303303
prometheus-rancher-monitoring-prometheus-0 0/3 Terminating 0 19d
304304
```
@@ -399,15 +399,16 @@ If an upgrade is stuck in an `Upgrading System Service` state for an extended pe
399399
400400
---
401401
402-
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in airgapped environment
402+
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in air-gapped environment
403403
404404
Harvester does not package the `registry.suse.com/harvester-beta/vmdp:latest` image in the ISO file as of v1.1.0. For Windows VMs before v1.1.0, they used this image as a container disk. However, kubelet may remove old images to free up bytes. Windows VMs can't access an air-gapped environment when this image is removed. You can fix this issue by changing the image to `registry.suse.com/suse/vmdp/vmdp:2.5.4.2` and restarting the Windows VMs.
405405
406406
- Related issue:
407407
- [[BUG] VMDP Image wrong after upgrade to Harvester 1.2.0](https://github.com/harvester/harvester/issues/4534)
408408
409409
---
410-
### 9. Upgrade stuck in the Post-draining state
410+
411+
### 9. An Upgrade is stuck in the Post-draining state
411412
412413
The node might be stuck in the OS upgrade process if you encounter the **Post-draining** state, as shown below.
413414
@@ -483,3 +484,111 @@ After performing the steps above, you should pass post-draining with the next re
483484
- [A potential bug in NewElementalPartitionsFromList which caused upgrade error code 33](https://github.com/rancher/elemental-toolkit/issues/1827)
484485
- Workaround:
485486
- https://github.com/harvester/harvester/issues/4526#issuecomment-1732853216
487+
488+
---
489+
490+
### 10. An upgrade is stuck in the Upgrading System Service state due to the `customer provided SSL certificate without IP SAN` error in `fleet-agent`
491+
492+
If an upgrade is stuck in an **Upgrading System Service** state for an extended period, follow these steps to investigate this issue:
493+
494+
1. Find the pods related to the upgrade:
495+
496+
```
497+
kubectl get pods -A | grep upgrade
498+
```
499+
500+
Example output:
501+
502+
```
503+
# kubectl get pods -A | grep upgrade
504+
cattle-system system-upgrade-controller-5685d568ff-tkvxb 1/1 Running 0 85m
505+
harvester-system hvst-upgrade-vq4hl-apply-manifests-65vv8 1/1 Running 0 87m // waiting for managedchart to be ready
506+
..
507+
```
508+
509+
2. The pod `hvst-upgrade-vq4hl-apply-manifests-65vv8` has the following loop log:
510+
511+
```
512+
Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 23
513+
Sleep for 5 seconds to retry
514+
```
515+
516+
3. Check the status for all bundles. Note thata couple of bundles are `OutOfSync`:
517+
518+
```
519+
# kubectl get bundle -A
520+
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
521+
...
522+
fleet-local mcc-local-managed-system-upgrade-controller 1/1
523+
fleet-local mcc-rancher-logging 0/1 OutOfSync(1) [Cluster fleet-local/local]
524+
fleet-local mcc-rancher-logging-crd 0/1 OutOfSync(1) [Cluster fleet-local/local]
525+
fleet-local mcc-rancher-monitoring 0/1 OutOfSync(1) [Cluster fleet-local/local]
526+
fleet-local mcc-rancher-monitoring-crd 0/1 WaitApplied(1) [Cluster fleet-local/local]
527+
```
528+
529+
4. The pod `fleet-agent-*` has following error log:
530+
531+
```
532+
fleet-agent pod log:
533+
534+
time="2023-09-19T12:18:10Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-local-system/fleet-agent-bootstrap: Post \"https://192.168.122.199/apis/fleet.cattle.io/ v1alpha1/namespaces/fleet-local/clusterregistrations\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.122.199 because it doesn't contain any IP SANs"
535+
```
536+
537+
5. Check the `ssl-certificates` settings in Harvester:
538+
539+
From the command line:
540+
541+
```
542+
# kubectl get settings.harvesterhci.io ssl-certificates
543+
NAME VALUE
544+
ssl-certificates {"publicCertificate":"-----BEGIN CERTIFICATE-----\nMIIFNDCCAxygAwIBAgIUS7DoHthR/IR30+H/P0pv6HlfOZUwDQYJKoZIhvcNAQEL\nBQAwFjEUMBIGA1UEAwwLZXhhbXBsZS5j...."}
545+
```
546+
547+
From the Harvester Web UI:
548+
549+
![](/img/v1.2/upgrade/known_issues/4519-harvester-settings-ssl-certificates.png)
550+
551+
6. Check the `server-url` setting, it is the value of VIP:
552+
553+
```
554+
# kubectl get settings.management.cattle.io -n cattle-system server-url
555+
NAME VALUE
556+
server-url https://192.168.122.199
557+
```
558+
559+
7. The root cause:
560+
561+
User sets the self-signed `ssl-certificates` with FQDN in the Harvester settings, but the `server-url` points to the VIP, the `fleet-agent` pod fails to register.
562+
563+
```
564+
For example: create self-signed certificate for (*).example.com
565+
566+
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \
567+
-keyout example.key -out example.crt -subj "/CN=example.com" \
568+
-addext "subjectAltName=DNS:example.com,DNS:*.example.com"
569+
570+
The general outputs are: example.crt, example.key
571+
```
572+
573+
8. The workaround:
574+
575+
Update `server-url` with the value of `https://harv31.example.com`
576+
577+
```
578+
# kubectl edit settings.management.cattle.io -n cattle-system server-url
579+
setting.management.cattle.io/server-url edited
580+
...
581+
582+
# kubectl get settings.management.cattle.io -n cattle-system server-url
583+
NAME VALUE
584+
server-url https://harv31.example.com
585+
```
586+
587+
After the workaround is applied, the `fleet-agent` pod is replaced by Rancher automatically and registers successfully, the upgrade continues.
588+
589+
- Related issue:
590+
- [[BUG] Upgrade to Harvester 1.2.0 fails in fleet-agent due to customer provided SSL certificate without IP SAN](https://github.com/harvester/harvester/issues/4519)
591+
- Workaround:
592+
- https://github.com/harvester/harvester/issues/4519#issuecomment-1727132383
593+
594+
---
123 KB
Loading

versioned_docs/version-v1.2/upgrade/v1-1-2-to-v1-2-0.md

Lines changed: 112 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for
298298
1. Check if the `prometheus-rancher-monitoring-prometheus-0` pod is stuck with the status `Terminating`.
299299
300300
```
301-
$ kubectl -n cattle-monitoring-system get pods
301+
$ kubectl -n cattle-monitoring-system get pods
302302
NAME READY STATUS RESTARTS AGE
303303
prometheus-rancher-monitoring-prometheus-0 0/3 Terminating 0 19d
304304
```
@@ -330,7 +330,7 @@ If you notice the upgrade is stuck in the **Upgrading System Service** state for
330330
331331
---
332332
333-
### 7. Upgrade stuck in the `Upgrading System Service` state
333+
### 7. An upgrade is stuck in the `Upgrading System Service` state
334334
335335
If an upgrade is stuck in an `Upgrading System Service` state for an extended period, some system services' certificates may have expired. To investigate and resolve this issue, follow these steps:
336336
@@ -399,15 +399,16 @@ If an upgrade is stuck in an `Upgrading System Service` state for an extended pe
399399
400400
---
401401
402-
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in airgapped environment
402+
### 8. The `registry.suse.com/harvester-beta/vmdp:latest` image is not available in air-gapped environment
403403
404404
Harvester does not package the `registry.suse.com/harvester-beta/vmdp:latest` image in the ISO file as of v1.1.0. For Windows VMs before v1.1.0, they used this image as a container disk. However, kubelet may remove old images to free up bytes. Windows VMs can't access an air-gapped environment when this image is removed. You can fix this issue by changing the image to `registry.suse.com/suse/vmdp/vmdp:2.5.4.2` and restarting the Windows VMs.
405405
406406
- Related issue:
407407
- [[BUG] VMDP Image wrong after upgrade to Harvester 1.2.0](https://github.com/harvester/harvester/issues/4534)
408408
409409
---
410-
### 9. Upgrade stuck in the Post-draining state
410+
411+
### 9. An Upgrade is stuck in the Post-draining state
411412
412413
The node might be stuck in the OS upgrade process if you encounter the **Post-draining** state, as shown below.
413414
@@ -484,3 +485,110 @@ After performing the steps above, you should pass post-draining with the next re
484485
- Workaround:
485486
- https://github.com/harvester/harvester/issues/4526#issuecomment-1732853216
486487

488+
---
489+
490+
### 10. An upgrade is stuck in the Upgrading System Service state due to the `customer provided SSL certificate without IP SAN` error in `fleet-agent`
491+
492+
If an upgrade is stuck in an **Upgrading System Service** state for an extended period, follow these steps to investigate this issue:
493+
494+
1. Find the pods related to the upgrade:
495+
496+
```
497+
kubectl get pods -A | grep upgrade
498+
```
499+
500+
Example output:
501+
502+
```
503+
# kubectl get pods -A | grep upgrade
504+
cattle-system system-upgrade-controller-5685d568ff-tkvxb 1/1 Running 0 85m
505+
harvester-system hvst-upgrade-vq4hl-apply-manifests-65vv8 1/1 Running 0 87m // waiting for managedchart to be ready
506+
..
507+
```
508+
509+
2. The pod `hvst-upgrade-vq4hl-apply-manifests-65vv8` has the following loop log:
510+
511+
```
512+
Current version: 102.0.0+up40.1.2, Current state: WaitApplied, Current generation: 23
513+
Sleep for 5 seconds to retry
514+
```
515+
516+
3. Check the status for all bundles. Note thata couple of bundles are `OutOfSync`:
517+
518+
```
519+
# kubectl get bundle -A
520+
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
521+
...
522+
fleet-local mcc-local-managed-system-upgrade-controller 1/1
523+
fleet-local mcc-rancher-logging 0/1 OutOfSync(1) [Cluster fleet-local/local]
524+
fleet-local mcc-rancher-logging-crd 0/1 OutOfSync(1) [Cluster fleet-local/local]
525+
fleet-local mcc-rancher-monitoring 0/1 OutOfSync(1) [Cluster fleet-local/local]
526+
fleet-local mcc-rancher-monitoring-crd 0/1 WaitApplied(1) [Cluster fleet-local/local]
527+
```
528+
529+
4. The pod `fleet-agent-*` has following error log:
530+
531+
```
532+
fleet-agent pod log:
533+
534+
time="2023-09-19T12:18:10Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-local-system/fleet-agent-bootstrap: Post \"https://192.168.122.199/apis/fleet.cattle.io/ v1alpha1/namespaces/fleet-local/clusterregistrations\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.122.199 because it doesn't contain any IP SANs"
535+
```
536+
537+
5. Check the `ssl-certificates` settings in Harvester:
538+
539+
From the command line:
540+
541+
```
542+
# kubectl get settings.harvesterhci.io ssl-certificates
543+
NAME VALUE
544+
ssl-certificates {"publicCertificate":"-----BEGIN CERTIFICATE-----\nMIIFNDCCAxygAwIBAgIUS7DoHthR/IR30+H/P0pv6HlfOZUwDQYJKoZIhvcNAQEL\nBQAwFjEUMBIGA1UEAwwLZXhhbXBsZS5j...."}
545+
```
546+
547+
From the Harvester Web UI:
548+
549+
![](/img/v1.2/upgrade/known_issues/4519-harvester-settings-ssl-certificates.png)
550+
551+
6. Check the `server-url` setting, it is the value of VIP:
552+
553+
```
554+
# kubectl get settings.management.cattle.io -n cattle-system server-url
555+
NAME VALUE
556+
server-url https://192.168.122.199
557+
```
558+
559+
7. The root cause:
560+
561+
User sets the self-signed `ssl-certificates` with FQDN in the Harvester settings, but the `server-url` points to the VIP, the `fleet-agent` pod fails to register.
562+
563+
```
564+
For example: create self-signed certificate for (*).example.com
565+
566+
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \
567+
-keyout example.key -out example.crt -subj "/CN=example.com" \
568+
-addext "subjectAltName=DNS:example.com,DNS:*.example.com"
569+
570+
The general outputs are: example.crt, example.key
571+
```
572+
573+
8. The workaround:
574+
575+
Update `server-url` with the value of `https://harv31.example.com`
576+
577+
```
578+
# kubectl edit settings.management.cattle.io -n cattle-system server-url
579+
setting.management.cattle.io/server-url edited
580+
...
581+
582+
# kubectl get settings.management.cattle.io -n cattle-system server-url
583+
NAME VALUE
584+
server-url https://harv31.example.com
585+
```
586+
587+
After the workaround is applied, the `fleet-agent` pod is replaced by Rancher automatically and registers successfully, the upgrade continues.
588+
589+
- Related issue:
590+
- [[BUG] Upgrade to Harvester 1.2.0 fails in fleet-agent due to customer provided SSL certificate without IP SAN](https://github.com/harvester/harvester/issues/4519)
591+
- Workaround:
592+
- https://github.com/harvester/harvester/issues/4519#issuecomment-1727132383
593+
594+
---

0 commit comments

Comments
 (0)