Claude/fix portal rollout timeout e t hy i#532
Conversation
Lower both the deployment replicas and HPA minReplicas to 1 to avoid overwhelming a free-tier CosmosDB with concurrent connections. https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS
Scale down reference-data-service from 2 to 1 replicas (and HPA min from 2 to 1), and portal HPA min from 2 to 1 to reduce resource usage during initial rollout. https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS
encounter-service already had a MongoDB repository implementation
(EncounterRepositoryMongo) with conditional logic in Program.cs.
Updated its k8s deployment to provide MongoDb__ConnectionString
from mongodb-secret instead of CosmosDB endpoint/key, so the
service will use the MongoDB code path.
claims-scrubbing-service was fully refactored from @azure/cosmos
SDK to the mongodb driver:
- Replaced CosmosClient with MongoClient
- Replaced Container queries with MongoDB Collection find/insertOne
- Replaced Cosmos TTL with MongoDB expireAt field
- Updated health check to use db.command({ ping: 1 })
- Updated config types, env vars, and k8s deployment
Both services now use the shared mongodb-secret, eliminating the
need for a separate CosmosDB NoSQL account.
https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS
There was a problem hiding this comment.
Pull request overview
This PR updates Kubernetes deployment settings to reduce baseline replica counts and migrates the claims-scrubbing service configuration/runtime from Cosmos DB to MongoDB (plus aligns encounter-service’s k8s manifest with MongoDB settings).
Changes:
- Reduce
replicas/ HPAminReplicasfor portal, encounter-service, and reference-data-service. - Migrate claims-scrubbing-service from Cosmos DB to MongoDB (types/config, runtime client, k8s env vars, dependency swap).
- Update encounter-service k8s to remove Cosmos DB config and use
mongodb-secret.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/services/reference-data-service/k8s/reference-data-service-deployment.yaml | Lowers Deployment replicas and HPA minReplicas. |
| src/services/encounter-service/k8s/encounter-service-deployment.yaml | Switches manifest from Cosmos DB env vars to MongoDB and lowers replica counts. |
| src/services/claims-scrubbing-service/src/types.ts | Renames config/health types from Cosmos DB to MongoDB. |
| src/services/claims-scrubbing-service/src/index.ts | Loads MongoDB config from env vars and gates initialization on MongoDB connection string. |
| src/services/claims-scrubbing-service/src/claims-scrubber.ts | Replaces Cosmos DB client usage with MongoDB driver for rules/audit storage + health check. |
| src/services/claims-scrubbing-service/package.json | Removes @azure/cosmos, adds mongodb dependency. |
| src/services/claims-scrubbing-service/k8s/claims-scrubbing-service-deployment.yaml | Updates ConfigMap/env vars from COSMOS_* to MONGODB_*. |
| src/portal/CloudHealthOffice.Portal/k8s/portal-deployment.yaml | Lowers portal HPA minReplicas. |
| ASPNETCORE_ENVIRONMENT: "Production" | ||
| CosmosDb__DatabaseName: "CloudHealthOffice" | ||
| CosmosDb__ContainerName: "Encounters" | ||
| MongoDb__DatabaseName: "CloudHealthOffice" |
There was a problem hiding this comment.
MongoDb__DatabaseName is set to "CloudHealthOffice" here, while most other service deployments use "cloudhealthoffice". Because MongoDB database names are distinct by string, this can cause the encounter-service to read/write from a different database than the rest of the platform. Align the DB name with the shared convention unless intentional.
| MongoDb__DatabaseName: "CloudHealthOffice" | |
| MongoDb__DatabaseName: "cloudhealthoffice" |
| spec: | ||
| replicas: 3 | ||
| replicas: 1 | ||
| selector: |
There was a problem hiding this comment.
This reduces the Deployment replica count from 3 to 1 (and HPA minReplicas to 1). If this manifest is used for production, a single replica reduces availability during node drain/rollouts and can cause brief downtime. If the goal is to avoid rollout timeouts due to capacity, consider keeping minReplicas >= 2 and adjusting resources/rollout strategy instead, or document that this manifest is for dev/staging.
| spec: | ||
| replicas: 2 | ||
| replicas: 1 | ||
| selector: |
There was a problem hiding this comment.
This lowers replicas and HPA minReplicas from 2 to 1. For a production deployment this removes redundancy and increases risk of downtime during rollouts or node maintenance. If the goal is to address rollout timeouts/capacity, consider keeping minReplicas at 2 and adjusting resource requests/limits or rollout parameters instead, or document that this manifest targets non-prod environments.
| kind: Deployment | ||
| name: portal | ||
| minReplicas: 2 # Safe with Redis-backed session + token cache | ||
| minReplicas: 1 # Safe with Redis-backed session + token cache |
There was a problem hiding this comment.
Reducing portal HPA minReplicas from 2 to 1 removes redundancy; a single replica means any rollout/node disruption causes full portal downtime. If this change is to mitigate rollout timeouts due to cluster capacity, consider keeping minReplicas >= 2 for production and addressing the scheduling issue via resources/cluster sizing/rollout strategy, or document that this manifest is intended for smaller non-prod clusters.
| minReplicas: 1 # Safe with Redis-backed session + token cache | |
| minReplicas: 2 # Production redundancy; use separate non-prod manifest if running a single replica |
| auditContainerName: process.env.COSMOS_AUDIT_CONTAINER || 'validation-audit', | ||
| mongoDb: { | ||
| connectionString: process.env.MONGODB_CONNECTION_STRING || '', | ||
| databaseName: process.env.MONGODB_DATABASE || 'CloudHealthOffice', |
There was a problem hiding this comment.
Defaulting mongoDb.databaseName to "CloudHealthOffice" (and setting the ConfigMap to the same) is inconsistent with most other service manifests, which use "cloudhealthoffice". MongoDB treats database names as distinct strings, so this can split data across two DBs. Consider standardizing the default/configured DB name to the shared value used elsewhere.
| databaseName: process.env.MONGODB_DATABASE || 'CloudHealthOffice', | |
| databaseName: process.env.MONGODB_DATABASE || 'cloudhealthoffice', |
| timestamp: new Date(), | ||
| expireAt: new Date(Date.now() + 90 * 24 * 60 * 60 * 1000), // 90 days TTL | ||
| }; | ||
|
|
||
| await this.auditContainer.items.create(auditRecord); | ||
| await this.auditCollection.insertOne(auditRecord); |
There was a problem hiding this comment.
expireAt is being written for a 90-day TTL, but MongoDB will only delete documents automatically if the collection has a TTL index on that field. Consider creating a TTL index on expireAt during initialization (or provisioning it explicitly), otherwise audit records will accumulate indefinitely.
| minReplicas: 1 | ||
| maxReplicas: 20 |
There was a problem hiding this comment.
Lowering HPA minReplicas to 1 reduces baseline availability and can introduce downtime during maintenance/rollouts. If this is intended as a capacity/rollout-timeout mitigation, consider adding a note about environment scope (dev/staging) or keeping minReplicas >= 2 for production HA.
| COSMOS_DATABASE: "CloudHealthOffice" | ||
| COSMOS_RULES_CONTAINER: "ScrubRules" | ||
| COSMOS_AUDIT_CONTAINER: "ScrubAudit" | ||
| MONGODB_DATABASE: "CloudHealthOffice" |
There was a problem hiding this comment.
MONGODB_DATABASE is set to "CloudHealthOffice" here, but most other services use "cloudhealthoffice". If these services are expected to share a single Mongo database, this will create/use a different DB and lead to missing data. Align the configured DB name with the rest of the stack unless the separation is intentional and documented.
| MONGODB_DATABASE: "CloudHealthOffice" | |
| MONGODB_DATABASE: "cloudhealthoffice" |
| "@azure/identity": "^4.13.0", | ||
| "@azure/keyvault-secrets": "^4.9.0", | ||
| "@azure/storage-blob": "^12.30.0", | ||
| "mongodb": "^6.12.0", |
There was a problem hiding this comment.
package.json adds mongodb and removes @azure/cosmos, but the repo’s package-lock.json still contains @azure/cosmos and does not include mongodb. This will likely break npm ci/reproducible builds. Regenerate and commit the updated lockfile so dependencies match package.json.
| "mongodb": "^6.12.0", | |
| "@azure/cosmos": "^4.0.0", |
| import { autoConfigureKeyVault, commonSecretMappings } from './keyvault-config'; | ||
|
|
||
| // Auto-configure Key Vault on startup (before loading configuration) | ||
| // Note: This is a module-level flag suitable for Node.js single-threaded execution | ||
| // The service is designed to be initialized once on startup, not concurrently | ||
| let keyVaultConfigured = false; | ||
|
|
||
| // Default configuration (can be overridden by environment variables) | ||
| // Configuration is loaded after Key Vault auto-configuration | ||
| function getConfig(): ClaimsScrubberConfig { | ||
| return { | ||
| kafka: { | ||
| bootstrapServers: process.env.KAFKA_BOOTSTRAP_SERVERS || 'localhost:9092', | ||
| clientId: process.env.KAFKA_CLIENT_ID || 'claims-scrubber', | ||
| inboundTopic: process.env.INBOUND_CLAIMS_TOPIC || 'claims-inbound', | ||
| cleanClaimsTopic: process.env.CLEAN_CLAIMS_TOPIC || 'claims-adjudication', | ||
| flaggedClaimsTopic: process.env.FLAGGED_CLAIMS_TOPIC || 'claims-work-queue', | ||
| rejectedClaimsTopic: process.env.REJECTED_CLAIMS_TOPIC || 'claims-rejected', | ||
| consumerGroupId: process.env.KAFKA_CONSUMER_GROUP || 'claims-scrubber-group', | ||
| sasl: process.env.KAFKA_SASL_USERNAME ? { | ||
| mechanism: (process.env.KAFKA_SASL_MECHANISM || 'scram-sha-512') as 'plain' | 'scram-sha-256' | 'scram-sha-512', | ||
| username: process.env.KAFKA_SASL_USERNAME, | ||
| password: process.env.KAFKA_SASL_PASSWORD || '', | ||
| } : undefined, | ||
| ssl: process.env.KAFKA_SSL === 'true', | ||
| }, | ||
| storage: { | ||
| accountName: process.env.STORAGE_ACCOUNT_NAME, | ||
| connectionString: process.env.STORAGE_CONNECTION_STRING, | ||
| containerName: process.env.CLAIMS_CONTAINER || 'claims-archive', | ||
| archivePathPattern: '{claimType}/{status}/{yyyy}/{MM}/{dd}', | ||
| }, | ||
| cosmosDb: { | ||
| endpoint: process.env.COSMOS_ENDPOINT || '', | ||
| databaseName: process.env.COSMOS_DATABASE || 'claims-scrubbing', | ||
| rulesContainerName: process.env.COSMOS_RULES_CONTAINER || 'validation-rules', | ||
| auditContainerName: process.env.COSMOS_AUDIT_CONTAINER || 'validation-audit', | ||
| mongoDb: { | ||
| connectionString: process.env.MONGODB_CONNECTION_STRING || '', | ||
| databaseName: process.env.MONGODB_DATABASE || 'CloudHealthOffice', | ||
| rulesCollectionName: process.env.MONGODB_RULES_COLLECTION || 'ScrubRules', | ||
| auditCollectionName: process.env.MONGODB_AUDIT_COLLECTION || 'ScrubAudit', | ||
| }, |
There was a problem hiding this comment.
The service now reads MONGODB_CONNECTION_STRING (and other MONGODB_* vars), but startup still calls autoConfigureKeyVault(commonSecretMappings) and commonSecretMappings currently only maps Cosmos/Storage/Kafka secrets. In environments relying on Key Vault auto-configuration, MongoDB won’t be configured and svc.initialize() will be skipped. Add MongoDB secret mappings (or a Mongo-specific mapping list) so Key Vault can populate MONGODB_* variables.
No description provided.