Skip to content

Uptime service#73

Merged
kroist merged 3 commits intomainfrom
uptime-service
Nov 7, 2025
Merged

Uptime service#73
kroist merged 3 commits intomainfrom
uptime-service

Conversation

@kroist
Copy link
Contributor

@kroist kroist commented Nov 5, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added uptime monitoring service with configurable endpoint health checks
    • Exposes Prometheus metrics for service availability, response times, and probe timestamps
    • Includes health and metrics HTTP endpoints for monitoring integration
    • Docker-based deployment with automated image builds
  • Chores

    • Updated project configuration for service deployment

@kroist kroist requested a review from JanKuczma November 5, 2025 12:39
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

[Hushed, measured tone] Here we observe the introduction of an entirely new uptime-service microservice—a carefully constructed ecosystem comprising configuration management, Prometheus metrics collection, periodic health probing, and HTTP server endpoints, orchestrated through a specialized Docker build workflow and deployed via GitHub Actions.

Changes

Cohort / File(s) Change Summary
CI/CD Workflow
.github/workflows/build-and-push-uptime-service.yml
New GitHub Actions workflow that accepts git ref input, checks out repository, retrieves ref properties, sets up Docker Buildx, authenticates with ECR, and builds/pushes uptime-service Docker image with SHA and optional latest tags.
Workspace Configuration
ts/pnpm-workspace.yaml
Adds exclusion pattern !uptime-service to pnpm packages list, preventing uptime-service from inclusion in the workspace.
Uptime-Service Setup & Configuration
ts/uptime-service/.env.example, ts/uptime-service/.gitignore, ts/uptime-service/package.json
Introduces environment defaults (PORT, PROBE_INTERVAL, TIMEOUT, ENDPOINTS), standard gitignore patterns (node_modules, .env, logs, build artifacts), and package.json with ESM configuration, scripts (start/dev), and dependencies on express and prom-client.
Containerization
ts/uptime-service/Dockerfile
Multi-stage Dockerfile using Bun runtime; installs ca-certificates, copies dependencies and source, configures non-root appuser, exposes port 9615, includes healthcheck endpoint, and defines bun entrypoint.
Documentation
ts/uptime-service/README.md
Comprehensive documentation covering prerequisites, installation, environment configuration (including ENDPOINTS JSON structure), run modes, HTTP endpoints (/metrics, /health, /), Prometheus metrics exposition, Grafana dashboard queries, and sample alert rules.
Configuration Logic
ts/uptime-service/src/config.js
Loads and validates environment variables with defaults; enforces ENDPOINTS as non-empty JSON array of endpoint objects with required fields (name, url), optional fields (method, expectedStatus), validates numeric constraints, checks for duplicate names, and exits on invalid configuration.
Metrics Infrastructure
ts/uptime-service/src/metrics.js
Establishes Prometheus metrics: service_up (Gauge), service_response_time_seconds (Histogram), service_last_probe_timestamp (Gauge), all with service_name and endpoint labels; exports helper functions recordSuccess() and recordFailure() for metric updates.
Health Probing Engine
ts/uptime-service/src/prober.js
Implements periodic health checks via startProbing(config): performs concurrent endpoint probes using AbortController for timeout enforcement, records metrics via helper functions, logs UP/DOWN statuses, handles SIGINT/SIGTERM gracefully, and manages interval-based scheduling.
Metrics HTTP Server
ts/uptime-service/src/server.js
Establishes Express HTTP server with endpoints: GET /health (status check with timestamp), GET /metrics (Prometheus metrics from registry), GET / (service info); includes error handling for startup failures and metrics generation errors.
Application Entry Point
ts/uptime-service/src/index.js
Main entry point that loads configuration, starts metrics server on configured port, initiates health probing, logs startup banner, and implements error handling with fatal exit on failure.

Sequence Diagram(s)

sequenceDiagram
    participant main as index.js
    participant config as config.js
    participant server as server.js
    participant metrics as metrics.js
    participant prober as prober.js
    participant endpoint as External Endpoint
    participant prometheus as Prometheus Client

    main->>config: loadConfig()
    config-->>main: config (validated)
    
    main->>server: startMetricsServer(port)
    server->>server: Create Express app
    server-->>main: server listening
    
    main->>prober: startProbing(config)
    
    Note over prober: Initial probe + periodic interval
    
    loop Every PROBE_INTERVAL
        prober->>endpoint: HTTP request with timeout
        alt Success
            endpoint-->>prober: response (status, time)
            prober->>metrics: recordSuccess(name, endpoint, time)
            metrics->>prometheus: Update service_up, response_time, timestamp
        else Timeout/Failure
            prober->>metrics: recordFailure(name, endpoint)
            metrics->>prometheus: Update service_up, timestamp
        end
    end
    
    Note over prober: SIGINT/SIGTERM received
    prober->>prober: Clear interval & exit gracefully
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Key areas requiring attention:
    • Configuration validation logic in config.js: verify edge cases for JSON parsing, numeric constraints, and duplicate name detection
    • Timeout and error handling in prober.js: review AbortController usage and concurrent probe execution pattern
    • Metrics recording functions (recordSuccess/recordFailure in metrics.js): ensure label consistency and metric accuracy across the probing lifecycle
    • HTTP error handling in server.js: confirm graceful fallback for metrics generation failures and EADDRINUSE port conflicts

Poem

In the digital savanna, a new creature stirs—
The uptime-service emerges, vigilant and sure,
Measuring heartbeats across the network's expanse, 🫀
With metrics precise and probes that advance, 📊
A magnificent beast of Prometheus and Docker's craft. 🐳✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Uptime service' is vague and generic, lacking specific detail about what changes were made or added in this pull request. Consider a more descriptive title such as 'Add uptime service with health probing and Prometheus metrics' to clearly communicate the scope and nature of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch uptime-service

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

📊 Coverage Report

📈 Total Coverage Summary

Type Covered Total Coverage
📝 Lines 1772 2513 🟠 70.51%
📄 Statements 1772 2513 🟠 70.51%
⚡ Functions 117 138 🟡 84.78%
🔀 Branches 281 300 🟡 93.66%

Coverage Legend

  • ✅ 100% Coverage
  • 🟡 80-99% Coverage
  • 🟠 50-79% Coverage
  • ❌ 0-49% Coverage

📁 File Coverage

📋 Detailed Coverage Report
File Lines Statements Functions Branches
🟡 blanksquare-monorepo/ts/shielder-sdk/src/actions/deposit.ts 89.74% 89.74% 100% 85.71%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/actions/newAccount.ts 86.27% 86.27% 100% 80%
blanksquare-monorepo/ts/shielder-sdk/src/actions/types.ts 100% 100% 100% 100%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/actions/utils.ts 97.33% 97.33% 100% 90.9%
🟠 blanksquare-monorepo/ts/shielder-sdk/src/actions/withdraw.ts 78.14% 78.14% 87.5% 100%
blanksquare-monorepo/ts/shielder-sdk/src/chain/contract.ts 1.8% 1.8% 0% 0%
blanksquare-monorepo/ts/shielder-sdk/src/chain/relayer.ts 35.22% 35.22% 33.33% 50%
blanksquare-monorepo/ts/shielder-sdk/src/client/actions.ts 100% 100% 100% 100%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/client/client.ts 86.5% 86.5% 71.42% 80%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/client/factories.ts 97.4% 97.4% 100% 85.71%
blanksquare-monorepo/ts/shielder-sdk/src/client/types.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/constants.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/errors.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/index.ts 0% 0% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/protocolFees.ts 18.51% 18.51% 14.28% 100%
blanksquare-monorepo/ts/shielder-sdk/src/referral.ts 36.84% 36.84% 0% 0%
blanksquare-monorepo/ts/shielder-sdk/src/state/accountFactory.ts 100% 100% 100% 100%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/state/accountOnchain.ts 94.44% 94.44% 100% 77.77%
blanksquare-monorepo/ts/shielder-sdk/src/state/accountRegistry.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/accountStateSerde.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/idManager.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/localStateTransition.ts 100% 100% 100% 100%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/state/sync/chainStateTransition.ts 93.75% 93.75% 100% 95%
blanksquare-monorepo/ts/shielder-sdk/src/state/sync/historyFetcher.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/sync/synchronizer.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/sync/tokenAccountFinder.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/state/types.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/storage/storageManager.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/storage/storageSchema.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/types.ts 100% 100% 100% 100%
blanksquare-monorepo/ts/shielder-sdk/src/utils.ts 100% 100% 100% 100%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/utils/errorHandler.ts 93.33% 93.33% 100% 75%
🟡 blanksquare-monorepo/ts/shielder-sdk/src/utils/waitForTransactionInclusion.ts 87.5% 87.5% 66.66% 100%

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Transaction NameMainCurrentDifference (%)
Deposit (size)20255202550.00000%
DepositERC20 (gas)17660751765979-0.00544%
DepositERC20Fees (gas)17746671774835+0.00947%
DepositERC20FeesMemo (gas)17754811775517+0.00203%
DepositERC20Memo (gas)17668291766901+0.00408%
DepositNative (gas)17483531748257-0.00549%
DepositNativeFees (gas)17577571757709-0.00273%
DepositNativeFeesMemo (gas)17585351758559+0.00136%
DepositNativeMemo (gas)17661471766279+0.00747%
NewAccount (size)22867228670.00000%
NewAccountERC20 (gas)17996671799655-0.00067%
NewAccountERC20Fees (gas)18083671808451+0.00465%
NewAccountERC20FeesMemo (gas)179212817921280.00000%
NewAccountERC20Memo (gas)17832481783392+0.00808%
NewAccountNative (gas)176427517642750.00000%
NewAccountNativeFees (gas)18157911815851+0.00330%
NewAccountNativeFeesMemo (gas)17745401774492-0.00270%
NewAccountNativeMemo (gas)17650521765088+0.00204%
Withdraw (size)21573215730.00000%
WithdrawERC20 (gas)18671851867341+0.00835%
WithdrawERC20Fees (gas)18417811841745-0.00195%
WithdrawERC20FeesMemo (gas)184255918425590.00000%
WithdrawERC20Memo (gas)18339311833883-0.00262%
WithdrawNative (gas)18674211867457+0.00193%
WithdrawNativeFees (gas)18267171826861+0.00788%
WithdrawNativeFeesMemo (gas)18446561844752+0.00520%
WithdrawNativeMemo (gas)18182121818164-0.00264%

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2ecec6 and 2efe37d.

⛔ Files ignored due to path filters (1)
  • ts/uptime-service/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .github/workflows/build-and-push-uptime-service.yml (1 hunks)
  • ts/pnpm-workspace.yaml (1 hunks)
  • ts/uptime-service/.env.example (1 hunks)
  • ts/uptime-service/.gitignore (1 hunks)
  • ts/uptime-service/Dockerfile (1 hunks)
  • ts/uptime-service/README.md (1 hunks)
  • ts/uptime-service/package.json (1 hunks)
  • ts/uptime-service/src/config.js (1 hunks)
  • ts/uptime-service/src/index.js (1 hunks)
  • ts/uptime-service/src/metrics.js (1 hunks)
  • ts/uptime-service/src/prober.js (1 hunks)
  • ts/uptime-service/src/server.js (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T11:14:22.103Z
Learnt from: Marcin-Radecki
Repo: Cardinal-Cryptography/blanksquare-monorepo PR: 48
File: tee/docker/Dockerfile:9-10
Timestamp: 2025-09-16T11:14:22.103Z
Learning: The tee/docker/Dockerfile currently runs as root without tini init system - this is a security hardening opportunity that can be addressed in a future PR by installing tini, creating a non-root user, and updating the ENTRYPOINT.

Applied to files:

  • ts/uptime-service/Dockerfile
🧬 Code graph analysis (5)
ts/uptime-service/src/prober.js (2)
ts/uptime-service/src/metrics.js (2)
  • recordSuccess (38-44)
  • recordFailure (49-54)
ts/uptime-service/src/config.js (2)
  • endpoints (21-21)
  • config (7-12)
ts/uptime-service/src/server.js (1)
ts/uptime-service/src/metrics.js (2)
  • register (8-8)
  • register (8-8)
ts/uptime-service/src/metrics.js (1)
ts/uptime-service/src/prober.js (1)
  • responseTime (28-28)
ts/uptime-service/src/config.js (1)
ts/uptime-service/src/index.js (1)
  • config (16-16)
ts/uptime-service/src/index.js (3)
ts/uptime-service/src/config.js (2)
  • config (7-12)
  • loadConfig (5-84)
ts/uptime-service/src/server.js (1)
  • startMetricsServer (11-64)
ts/uptime-service/src/prober.js (1)
  • startProbing (78-110)
🪛 Hadolint (2.14.0)
ts/uptime-service/Dockerfile

[warning] 19-19: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)


[info] 19-19: Avoid additional packages by specifying --no-install-recommends

(DL3015)

⏰ Context from checks skipped due to timeout of 360000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-scheduler-common
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-prover-common
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-scheduler-server
  • GitHub Check: Rust crates checks / Run e2e tests
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-prover-server
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-account
  • GitHub Check: Rust crates checks / Run tests for crates/powers-of-tau
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-scheduler-tee
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-cli
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-setup
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-circuits
  • GitHub Check: Rust crates checks / Run tests for crates/shielder-contract
  • GitHub Check: Rust crates checks / Run tests for crates/halo2-verifier
  • GitHub Check: Rust crates checks / Run tests for crates/shielder_bindings
  • GitHub Check: Rust crates checks / Run lints and format for Rust workspace
  • GitHub Check: Typescript modules checks / Run lints and tests
  • GitHub Check: Typescript modules checks / Run E2E tests on local anvil node (nonzero fee)
  • GitHub Check: Typescript modules checks / Run E2E tests on local anvil node (zero fee)
  • GitHub Check: Measure gas usage / Measure gas and verifier code size

Comment on lines +35 to +39
EXPOSE 9615

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Observe the port-shy service revealing a mismatch. From our hide, the container broadcasts and probes port 9615, yet loadConfig leaves the creature nesting on 9090 by default. Without setting PORT, every healthcheck will cry “unhealthy.” Please align the habitat by exporting the expected port inside the image or updating the runtime default.

+# Ensure the runtime listens on the probed port
+ENV PORT=9615
 # Expose metrics port
 EXPOSE 9615
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
EXPOSE 9615
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
# Ensure the runtime listens on the probed port
ENV PORT=9615
EXPOSE 9615
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
🤖 Prompt for AI Agents
In ts/uptime-service/Dockerfile around lines 35 to 39, the HEALTHCHECK and
EXPOSE use port 9615 while the app defaults to port 9090, causing healthchecks
to fail; fix by making the image and runtime agree — either export the expected
PORT (e.g., add ENV PORT=9615) so the service listens on 9615, or change the
EXPOSE and HEALTHCHECK to 9090 to match the app default; apply only one approach
and ensure the chosen port is documented in the container runtime config.

Comment on lines +38 to +39
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Behold the healthcheck’s misfire. Here the script calls bun run -e, but Bun’s evaluative call is the solitary bun -e, leaving this probe to stumble before it begins. Switching to bun -e lets the sentinel report health as intended.

-  CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
+  CMD bun -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun run -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD bun -e "fetch('http://localhost:9615/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"
🤖 Prompt for AI Agents
In ts/uptime-service/Dockerfile around lines 38 to 39, the HEALTHCHECK uses "bun
run -e" which is incorrect for evaluating inline JS with Bun; change the command
to use "bun -e" (i.e., replace "bun run -e" with "bun -e") so the healthcheck
executes the inline fetch expression correctly and returns proper exit codes.

Comment on lines +8 to +81
port: parseInt(process.env.PORT || "9090", 10),
probeInterval: parseInt(process.env.PROBE_INTERVAL || "30000", 10),
timeout: parseInt(process.env.TIMEOUT || "5000", 10),
endpoints: []
};

// Parse and validate endpoints
if (!process.env.ENDPOINTS) {
console.error("ERROR: ENDPOINTS environment variable is required");
process.exit(1);
}

try {
const endpoints = JSON.parse(process.env.ENDPOINTS);

if (!Array.isArray(endpoints)) {
throw new Error("ENDPOINTS must be a JSON array");
}

if (endpoints.length === 0) {
throw new Error("ENDPOINTS array cannot be empty");
}

// Validate each endpoint
config.endpoints = endpoints.map((endpoint, index) => {
if (!endpoint.name || typeof endpoint.name !== "string") {
throw new Error(
`Endpoint at index ${index} missing required 'name' field`
);
}
if (!endpoint.url || typeof endpoint.url !== "string") {
throw new Error(
`Endpoint at index ${index} missing required 'url' field`
);
}

return {
name: endpoint.name,
url: endpoint.url,
method: endpoint.method || "GET",
expectedStatus: endpoint.expectedStatus || 200
};
});

// Check for duplicate names
const names = config.endpoints.map((e) => e.name);
const duplicates = names.filter(
(name, index) => names.indexOf(name) !== index
);
if (duplicates.length > 0) {
throw new Error(
`Duplicate endpoint names found: ${duplicates.join(", ")}`
);
}
} catch (error) {
console.error("ERROR: Failed to parse ENDPOINTS:", error.message);
process.exit(1);
}

// Validate config values
if (config.port < 1 || config.port > 65535) {
console.error("ERROR: PORT must be between 1 and 65535");
process.exit(1);
}

if (config.probeInterval < 1000) {
console.error("ERROR: PROBE_INTERVAL must be at least 1000ms");
process.exit(1);
}

if (config.timeout < 100) {
console.error("ERROR: TIMEOUT must be at least 100ms");
process.exit(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

A NaN slinks through the underbrush, evading validation. When this habitat receives something like PORT=mystery, parseInt produces NaN, and the later bounds checks at Lines 68-80 never stir—comparisons with NaN are always false. The service then wanders onward clutching NaN into app.listen, which will crash, and into setInterval, which collapses to a zero-delay stampede. Similar ghosts can haunt PROBE_INTERVAL and TIMEOUT. Invite firm sentries with explicit Number.isNaN checks before these values migrate further.

-  const config = {
-    port: parseInt(process.env.PORT || "9090", 10),
-    probeInterval: parseInt(process.env.PROBE_INTERVAL || "30000", 10),
-    timeout: parseInt(process.env.TIMEOUT || "5000", 10),
-    endpoints: []
-  };
+  const config = {
+    port: Number.parseInt(process.env.PORT ?? "9090", 10),
+    probeInterval: Number.parseInt(process.env.PROBE_INTERVAL ?? "30000", 10),
+    timeout: Number.parseInt(process.env.TIMEOUT ?? "5000", 10),
+    endpoints: []
+  };
+
+  if (Number.isNaN(config.port)) {
+    console.error("ERROR: PORT must be a number");
+    process.exit(1);
+  }
+
+  if (Number.isNaN(config.probeInterval)) {
+    console.error("ERROR: PROBE_INTERVAL must be a number");
+    process.exit(1);
+  }
+
+  if (Number.isNaN(config.timeout)) {
+    console.error("ERROR: TIMEOUT must be a number");
+    process.exit(1);
+  }
🤖 Prompt for AI Agents
In ts/uptime-service/src/config.js around lines 8 to 81, the numeric env parses
for PORT, PROBE_INTERVAL and TIMEOUT use parseInt but never check for NaN, so
invalid values (e.g. PORT=mystery) bypass later bounds checks; after parsing
each value (port, probeInterval, timeout) validate with Number.isNaN and if any
is NaN log a clear error (including the env name and raw value) and call
process.exit(1); keep the existing bounds checks afterwards so valid numbers are
still range-validated.

Comment on lines +13 to +61
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);

const response = await fetch(endpoint.url, {
method: endpoint.method,
signal: controller.signal,
headers: {
"User-Agent": "uptime-service/1.0"
}
});

clearTimeout(timeoutId);

const endTime = Date.now();
const responseTime = (endTime - startTime) / 1000; // Convert to seconds

// Check if status code matches expected
if (response.status === endpoint.expectedStatus) {
recordSuccess(endpoint.name, endpoint.url, responseTime);
console.log(
`[${new Date().toISOString()}] ✓ ${endpoint.name} - UP (${
response.status
}, ${responseTime.toFixed(3)}s)`
);
return true;
} else {
recordFailure(endpoint.name, endpoint.url);
console.log(
`[${new Date().toISOString()}] ✗ ${endpoint.name} - DOWN (status: ${
response.status
}, expected: ${endpoint.expectedStatus})`
);
return false;
}
} catch (error) {
recordFailure(endpoint.name, endpoint.url);

let errorMessage = error.message;
if (error.name === "AbortError") {
errorMessage = "timeout";
}

console.log(
`[${new Date().toISOString()}] ✗ ${
endpoint.name
} - DOWN (${errorMessage})`
);
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

A lingering timeout rustles the canopy. When a probe fails before the allotted interval, the timeout trap remains set, only to spring harmlessly later and abort an already-fallen fetch. Over time, these ghostly timers clutter the understory. Hoist the controller and timeout handle, then clear it in a finally block so every path tidies the terrain.

-  try {
-    const controller = new AbortController();
-    const timeoutId = setTimeout(() => controller.abort(), timeout);
+  const controller = new AbortController();
+  let timeoutId;
+
+  try {
+    timeoutId = setTimeout(() => controller.abort(), timeout);
     const response = await fetch(endpoint.url, {
       method: endpoint.method,
       signal: controller.signal,
       headers: {
         "User-Agent": "uptime-service/1.0"
       }
     });
-
-    clearTimeout(timeoutId);
+    clearTimeout(timeoutId);
@@
-  } catch (error) {
+  } catch (error) {
     recordFailure(endpoint.name, endpoint.url);
@@
-  }
+  } finally {
+    if (timeoutId) {
+      clearTimeout(timeoutId);
+    }
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);
const response = await fetch(endpoint.url, {
method: endpoint.method,
signal: controller.signal,
headers: {
"User-Agent": "uptime-service/1.0"
}
});
clearTimeout(timeoutId);
const endTime = Date.now();
const responseTime = (endTime - startTime) / 1000; // Convert to seconds
// Check if status code matches expected
if (response.status === endpoint.expectedStatus) {
recordSuccess(endpoint.name, endpoint.url, responseTime);
console.log(
`[${new Date().toISOString()}] ✓ ${endpoint.name} - UP (${
response.status
}, ${responseTime.toFixed(3)}s)`
);
return true;
} else {
recordFailure(endpoint.name, endpoint.url);
console.log(
`[${new Date().toISOString()}] ✗ ${endpoint.name} - DOWN (status: ${
response.status
}, expected: ${endpoint.expectedStatus})`
);
return false;
}
} catch (error) {
recordFailure(endpoint.name, endpoint.url);
let errorMessage = error.message;
if (error.name === "AbortError") {
errorMessage = "timeout";
}
console.log(
`[${new Date().toISOString()}] ✗ ${
endpoint.name
} - DOWN (${errorMessage})`
);
return false;
const controller = new AbortController();
let timeoutId;
try {
timeoutId = setTimeout(() => controller.abort(), timeout);
const response = await fetch(endpoint.url, {
method: endpoint.method,
signal: controller.signal,
headers: {
"User-Agent": "uptime-service/1.0"
}
});
clearTimeout(timeoutId);
const endTime = Date.now();
const responseTime = (endTime - startTime) / 1000; // Convert to seconds
// Check if status code matches expected
if (response.status === endpoint.expectedStatus) {
recordSuccess(endpoint.name, endpoint.url, responseTime);
console.log(
`[${new Date().toISOString()}] ✓ ${endpoint.name} - UP (${
response.status
}, ${responseTime.toFixed(3)}s)`
);
return true;
} else {
recordFailure(endpoint.name, endpoint.url);
console.log(
`[${new Date().toISOString()}] ✗ ${endpoint.name} - DOWN (status: ${
response.status
}, expected: ${endpoint.expectedStatus})`
);
return false;
}
} catch (error) {
recordFailure(endpoint.name, endpoint.url);
let errorMessage = error.message;
if (error.name === "AbortError") {
errorMessage = "timeout";
}
console.log(
`[${new Date().toISOString()}] ✗ ${
endpoint.name
} - DOWN (${errorMessage})`
);
return false;
} finally {
if (timeoutId) {
clearTimeout(timeoutId);
}
}
🤖 Prompt for AI Agents
In ts/uptime-service/src/prober.js around lines 13 to 61, the AbortController
and timeout handle are created inside the try so the timeout may still fire
after the fetch has completed; hoist the controller and timeoutId declarations
above the try, assign the timeout inside the try before calling fetch, and add a
finally block that checks if timeoutId is set and calls clearTimeout(timeoutId)
(and optionally controller.abort() only when needed) so every execution path
clears the timer and avoids lingering timeouts.

Copy link
Collaborator

@JanKuczma JanKuczma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@kroist kroist merged commit 761de81 into main Nov 7, 2025
36 checks passed
@kroist kroist deleted the uptime-service branch November 7, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants