feat: cache syscall results in health check endpoints by nao1215 · Pull Request #135 · nao1215/truss

nao1215 · 2026-03-13T05:15:52Z

Summary

Add lock-free HealthCache that caches disk_free_bytes() and process_rss_bytes() results with a configurable TTL (default 5s), eliminating redundant kernel context switches under high-frequency health polling
Add TRUSS_HEALTH_CACHE_TTL_SECS environment variable (0–300, default 5) to control cache duration; set to 0 to disable caching
Extract collect_resource_checks() helper to deduplicate ~70 lines of identical check logic between handle_health() and handle_health_ready()
Remove unused TransformImageRequestPayload import

Closes #74

Test plan

Unit tests for HealthCache: cached values within TTL, refresh after TTL, TTL=0 bypass, default TTL in ServerConfig
Unit tests for TRUSS_HEALTH_CACHE_TTL_SECS env var parsing (valid value, zero)
All 913 existing tests pass
cargo clippy --tests reports zero warnings

Summary by CodeRabbit

New Features
- Performance-optimized health check caching with configurable TTL (default 5s) via TRUSS_HEALTH_CACHE_TTL_SECS; setting TTL to 0 disables caching
- Health endpoints now return unified, richer payloads including service, version, uptime, max input pixels, and per-check details
- Health cache available as a public component for integrations
Tests
- Added tests covering TTL parsing, cache behavior, and default TTL handling

…ers (#74)

coderabbitai · 2026-03-13T05:16:12Z

Warning

Rate limit exceeded

@nao1215 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a0e767b-f5b6-4b99-a5d4-53273c93b9a9

📥 Commits

Reviewing files that changed from the base of the PR and between 6388746 and 0614fa2.

📒 Files selected for processing (3)

src/adapters/server/config.rs
src/adapters/server/handler.rs
src/adapters/server/mod.rs

📝 Walkthrough

Walkthrough

Adds a lock-free HealthCache with configurable TTL, reworks /health and /health/ready to use the cache via a collect_resource_checks helper, and wires HealthCache into ServerConfig and public exports; includes TTL parsing and cache-behavior tests.

Changes

Cohort / File(s)	Summary
ServerConfig & env wiring `src/adapters/server/config.rs`	Added `pub health_cache: Arc<HealthCache>` to `ServerConfig`; initialize in `new()` and `from_env()` using `TRUSS_HEALTH_CACHE_TTL_SECS`; updated `Clone`, `Debug`, `PartialEq`; added env-parse and TTL tests.
HealthCache & handler refactor `src/adapters/server/handler.rs`	Introduced `HealthCache` (Atomic/lock-free TTL cache for disk_free and RSS), `DEFAULT_HEALTH_CACHE_TTL_SECS`, `disk_free` and `rss` accessors, and `collect_resource_checks()`; refactored `handle_health`/`handle_health_ready` to use cache and emit unified JSON checks/metadata.
Public exports & tests `src/adapters/server/mod.rs`	Exported `HealthCache` via `pub use handler::{HealthCache, ...}`; added tests validating cached RSS/disk_free behavior, TTL refresh semantics, and ServerConfig default TTL wiring.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as Server (actix)
    participant HealthCache as HealthCache
    participant Storage as Filesystem
    participant Proc as /proc

    Client->>Server: GET /health or /health/ready
    Server->>HealthCache: disk_free(path)?
    alt cached and fresh
        HealthCache-->>Server: cached disk_free
    else expired or missing
        HealthCache->>Storage: statvfs()/disk_free_bytes()
        Storage-->>HealthCache: disk_free_bytes
        HealthCache-->>Server: disk_free (cached)
    end
    Server->>HealthCache: rss()?
    alt cached and fresh
        HealthCache-->>Server: cached rss
    else expired or missing
        HealthCache->>Proc: read /proc/self/status (rss)
        Proc-->>HealthCache: process_rss_bytes
        HealthCache-->>Server: rss (cached)
    end
    Server->>Server: collect_resource_checks(config)
    Server-->>Client: JSON health payload (status, checks, service, version, uptime, ...)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

v0.6.1: security fixes, refactoring, and CI fix #55: Modifies ServerConfig in src/adapters/server/config.rs — related to the same struct changes and constructor wiring.
feat: configurable keep-alive, validate subcommand, enhanced health checks #71: Changes server health-check logic and ServerConfig — overlaps with health endpoint and cache-related refactors.

Poem

🐰
I cached the hops of disk and RAM,
Five seconds calm — no syscall jam.
Checks line up, then quietly rest,
A rabbit's cache keeps probes impressed. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding caching to syscall results in health check endpoints, which directly aligns with the PR's primary objective.
Linked Issues check	✅ Passed	The implementation meets all acceptance criteria from `#74`: disk_free_bytes() and process_rss_bytes() are cached with TTL, cache TTL is configurable via environment variable (0–300s, default 5s), and tests verify cache refresh behavior.
Out of Scope Changes check	✅ Passed	All changes are within scope: HealthCache implementation, environment variable parsing, health endpoint refactoring with collect_resource_checks(), and removal of unused import align with issue `#74` objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/health-cache-syscalls

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/adapters/server/handler.rs`:
- Around line 557-565: The cached timestamp/value pair must use acquire/release
ordering: in the reader path replace the Relaxed load of disk_free_at with an
Acquire load and keep the disk_free load Relaxed; in the writer keep storing the
disk_free value before publishing the timestamp but change the timestamp store
to Release (use disk_free.store(..., Relaxed) then disk_free_at.store(now,
Release)). Apply the same pattern to the rss/rss_at pair so readers load rss_at
with Acquire and writers publish rss value then rss_at with Release.

In `@src/adapters/server/mod.rs`:
- Around line 123-124: The import list for the server module is missing
TransformImageRequestPayload which causes unresolved type errors when the s3
feature is enabled; restore TransformImageRequestPayload in the same use block
that currently imports HealthCache, TransformSlot, TransformSourcePayload,
WatermarkSource, etc., so the s3-only tests referencing
TransformImageRequestPayload (used in the test functions guarded by
#[cfg(feature = "s3")]) can compile successfully.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 420cbf7a-9845-44ad-a021-557f94187337

📥 Commits

Reviewing files that changed from the base of the PR and between 2d86a9f and 7a0e06b.

📒 Files selected for processing (3)

src/adapters/server/config.rs
src/adapters/server/handler.rs
src/adapters/server/mod.rs

src/adapters/server/handler.rs

src/adapters/server/mod.rs

…gated import (#74)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/adapters/server/handler.rs (1)
663-678: Inconsistent response format for 503 status codes.

The draining case (line 660) uses HttpResponse::problem("503 Service Unavailable", body) while resource check failures (line 676) use HttpResponse::json("503 Service Unavailable", body). Both return the same HTTP status but with potentially different Content-Type headers (application/problem+json vs application/json).

If this is intentional (draining is an RFC 7807 "problem" while resource failures are health-check format), consider adding a brief comment. Otherwise, consider using consistent response types for 503 responses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/adapters/server/handler.rs` around lines 663 - 678, The 503 responses use
inconsistent content types: the draining branch calls HttpResponse::problem("503
Service Unavailable", body) while the resource-check failure branch calls
HttpResponse::json("503 Service Unavailable", body); decide on a consistent
response shape and update the resource-check branch (the code using
collect_resource_checks and the all_ok variable) to match the draining branch or
vice versa, e.g., change the resource failure return to
HttpResponse::problem(...) or change the draining return to
HttpResponse::json(...), and if the difference is intentional, add a brief
comment next to the HttpResponse::problem and HttpResponse::json calls
explaining why RFC7807 problem+json is used for draining but plain
application/json is used for resource failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/adapters/server/config.rs`:
- Around line 1322-1324: Update the from_env rustdoc to document the new
TRUSS_HEALTH_CACHE_TTL_SECS environment variable: add an entry describing
TRUSS_HEALTH_CACHE_TTL_SECS (accepted range 0–300, default value taken from
super::handler::DEFAULT_HEALTH_CACHE_TTL_SECS) so operators know it controls the
health_cache_ttl_secs used when constructing the HealthCache via
super::handler::HealthCache::new; ensure the doc entry matches the
parse_env_u64_ranged call and notes units (seconds) and default behavior when
unset.

---

Nitpick comments:
In `@src/adapters/server/handler.rs`:
- Around line 663-678: The 503 responses use inconsistent content types: the
draining branch calls HttpResponse::problem("503 Service Unavailable", body)
while the resource-check failure branch calls HttpResponse::json("503 Service
Unavailable", body); decide on a consistent response shape and update the
resource-check branch (the code using collect_resource_checks and the all_ok
variable) to match the draining branch or vice versa, e.g., change the resource
failure return to HttpResponse::problem(...) or change the draining return to
HttpResponse::json(...), and if the difference is intentional, add a brief
comment next to the HttpResponse::problem and HttpResponse::json calls
explaining why RFC7807 problem+json is used for draining but plain
application/json is used for resource failures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ebcff73a-01af-4a69-8541-e12dda39aaeb

📥 Commits

Reviewing files that changed from the base of the PR and between 7a0e06b and c1abe1e.

📒 Files selected for processing (3)

src/adapters/server/config.rs
src/adapters/server/handler.rs
src/adapters/server/mod.rs

src/adapters/server/config.rs

…pe rationale (#74)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/adapters/server/config.rs (2)

1329-1331: Add one integration test for the actual wiring.

These tests stop at parse_env_u64_ranged(). A regression in ServerConfig::from_env() or ServerConfig::new() would still pass, so please assert health_cache.ttl_nanos on the constructed config as well.

Example test

#[test]
#[serial]
fn from_env_wires_health_cache_ttl_secs() {
    let _env = ScopedEnv::set("TRUSS_HEALTH_CACHE_TTL_SECS", "10");
    let config = ServerConfig::from_env().unwrap();
    assert_eq!(config.health_cache.ttl_nanos, 10 * 1_000_000_000);
}

Also applies to: 2341-2355

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/adapters/server/config.rs` around lines 1329 - 1331, Add an integration
test that verifies the health cache TTL is actually wired into ServerConfig
instead of stopping at parse_env_u64_ranged; call ServerConfig::from_env() (or
ServerConfig::new() if used in that codepath) with a ScopedEnv setting like
"TRUSS_HEALTH_CACHE_TTL_SECS" and assert the resulting
config.health_cache.ttl_nanos equals the expected seconds * 1_000_000_000,
referencing the HealthCache constructed via super::handler::HealthCache to
ensure the field is propagated.

429-433: Expose TTL, not the cache instance.

ServerConfig::new() hardcodes the default cache TTL, but HealthCache::new is pub(super), so embedders using the public ServerConfig API cannot override this programmatically. A with_health_cache_ttl_secs() builder would keep the knob available without leaking HealthCache as public runtime state.

Possible API shape

pub fn with_health_cache_ttl_secs(mut self, ttl_secs: u64) -> Self {
    self.health_cache = Arc::new(super::handler::HealthCache::new(ttl_secs));
    self
}

Also applies to: 807-809

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/adapters/server/config.rs` around lines 429 - 433, The ServerConfig
currently exposes the HealthCache instance publicly and hardcodes its TTL in
ServerConfig::new(), preventing callers from setting TTL; make the health_cache
field non-public (remove pub) and add a builder method pub fn
with_health_cache_ttl_secs(mut self, ttl_secs: u64) -> Self that replaces
self.health_cache = Arc::new(super::handler::HealthCache::new(ttl_secs)) so
callers can set the TTL without leaking HealthCache; also ensure
ServerConfig::new initializes health_cache with the default TTL and update other
sites that relied on the public field (e.g., the usages around the other
referenced lines) to use the new API or internal accessors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/adapters/server/handler.rs`:
- Around line 770-771: The docstring claims a "writability" probe but the
implementation only calls config.storage_root.is_dir(); either update the
comment to remove "writability" or implement an actual write probe: in the same
handler where the storage health check runs (the function currently using
config.storage_root.is_dir()), attempt a safe write-and-remove (e.g., create a
temporary file under config.storage_root with OpenOptions create+write, flush,
then delete) and return a failure if that write fails; ensure you still check
is_dir() first and handle permission errors cleanly.

---

Nitpick comments:
In `@src/adapters/server/config.rs`:
- Around line 1329-1331: Add an integration test that verifies the health cache
TTL is actually wired into ServerConfig instead of stopping at
parse_env_u64_ranged; call ServerConfig::from_env() (or ServerConfig::new() if
used in that codepath) with a ScopedEnv setting like
"TRUSS_HEALTH_CACHE_TTL_SECS" and assert the resulting
config.health_cache.ttl_nanos equals the expected seconds * 1_000_000_000,
referencing the HealthCache constructed via super::handler::HealthCache to
ensure the field is propagated.
- Around line 429-433: The ServerConfig currently exposes the HealthCache
instance publicly and hardcodes its TTL in ServerConfig::new(), preventing
callers from setting TTL; make the health_cache field non-public (remove pub)
and add a builder method pub fn with_health_cache_ttl_secs(mut self, ttl_secs:
u64) -> Self that replaces self.health_cache =
Arc::new(super::handler::HealthCache::new(ttl_secs)) so callers can set the TTL
without leaking HealthCache; also ensure ServerConfig::new initializes
health_cache with the default TTL and update other sites that relied on the
public field (e.g., the usages around the other referenced lines) to use the new
API or internal accessors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 22527b32-6306-4ca5-9385-a09c59b38729

📥 Commits

Reviewing files that changed from the base of the PR and between c1abe1e and 6388746.

📒 Files selected for processing (2)

src/adapters/server/config.rs
src/adapters/server/handler.rs

src/adapters/server/handler.rs

nao1215 added 3 commits March 13, 2026 13:49

feat: cache syscall results in health check endpoints (#74)

40bb943

chore: remove unused TransformImageRequestPayload import

7cb8e6f

refactor: extract collect_resource_checks to deduplicate health handl…

7a0e06b

…ers (#74)

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

src/adapters/server/handler.rs Outdated Show resolved Hide resolved

src/adapters/server/mod.rs Outdated Show resolved Hide resolved

fix: use acquire/release ordering in HealthCache and restore feature-…

c1abe1e

…gated import (#74)

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

src/adapters/server/config.rs Show resolved Hide resolved

docs: document TRUSS_HEALTH_CACHE_TTL_SECS and clarify 503 content-ty…

6388746

…pe rationale (#74)

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

src/adapters/server/handler.rs Outdated Show resolved Hide resolved

refactor: hide HealthCache from public API and add builder method (#74)

0614fa2

nao1215 merged commit 127617d into main Mar 13, 2026
17 checks passed

nao1215 deleted the feat/health-cache-syscalls branch March 13, 2026 06:53

This was referenced Mar 13, 2026

chore: bump version to v0.8.0 #136

Merged

feat: add /health Bearer auth (#73) and readiness probe hysteresis (#72) #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cache syscall results in health check endpoints#135

feat: cache syscall results in health check endpoints#135
nao1215 merged 6 commits intomainfrom
feat/health-cache-syscalls

nao1215 commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nao1215 commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nao1215 commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading