Skip to content

Fix portal rollout timeout by increasing to 600s and adding startup probe#529

Merged
aurelianware merged 1 commit intomainfrom
claude/fix-portal-rollout-timeout-eTHyI
Mar 20, 2026
Merged

Fix portal rollout timeout by increasing to 600s and adding startup probe#529
aurelianware merged 1 commit intomainfrom
claude/fix-portal-rollout-timeout-eTHyI

Conversation

@aurelianware
Copy link
Owner

The portal deployment was timing out at 180s during rollout. Root causes:

  • HPA scales to 2 replicas but timeout only allowed 180s for both
  • .NET Blazor Server app needs significant startup time plus image pull
  • No startup probe meant liveness probe could kill slow-starting pods

Changes:

  • Increase rollout status timeout from 180s to 600s
  • Add rolling update strategy (maxUnavailable: 0, maxSurge: 1)
  • Add startupProbe (30 attempts × 5s = 150s startup budget)
  • Remove initialDelaySeconds from liveness/readiness (startup probe handles this)

https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS

…robe

The portal deployment was timing out at 180s during rollout. Root causes:
- HPA scales to 2 replicas but timeout only allowed 180s for both
- .NET Blazor Server app needs significant startup time plus image pull
- No startup probe meant liveness probe could kill slow-starting pods

Changes:
- Increase rollout status timeout from 180s to 600s
- Add rolling update strategy (maxUnavailable: 0, maxSurge: 1)
- Add startupProbe (30 attempts × 5s = 150s startup budget)
- Remove initialDelaySeconds from liveness/readiness (startup probe handles this)

https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS
Copilot AI review requested due to automatic review settings March 20, 2026 11:09
@aurelianware aurelianware merged commit 59f267e into main Mar 20, 2026
54 checks passed
@aurelianware aurelianware deleted the claude/fix-portal-rollout-timeout-eTHyI branch March 20, 2026 11:10
@github-actions
Copy link

Code Coverage

Package Line Rate Branch Rate Health
CloudHealthOffice.Portal 13% 3%
CloudHealthOffice.Portal 13% 3%
Summary 13% (2498 / 18662) 3% (174 / 5968)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the Portal’s AKS deployment behavior to reduce rollout failures caused by slow startup and multi-replica rollouts, aligning Kubernetes probe configuration and CI rollout waiting time with the Portal’s startup characteristics.

Changes:

  • Increase the GitHub Actions kubectl rollout status timeout for the Portal from 180s to 600s.
  • Configure a RollingUpdate strategy for the Portal deployment (maxUnavailable: 0, maxSurge: 1).
  • Add a startupProbe and remove initialDelaySeconds from existing liveness/readiness probes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/portal/CloudHealthOffice.Portal/k8s/portal-deployment.yaml Adds RollingUpdate strategy and a startup probe; updates probe timing to rely on startupProbe gating.
.github/workflows/deploy-azure-aks.yml Extends Portal rollout status wait time to 600s during deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants