Skip to content

Commit 61b3b63

Browse files
authored
fix(executor): error logging in HTTP executor (#498)
Log errors with `error` level to improve observability. Adds a styleguide for gemini, to avoid logging things in inefficient way.
1 parent 5576ac1 commit 61b3b63

File tree

4 files changed

+237
-10
lines changed

4 files changed

+237
-10
lines changed

.gemini/styleguide.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Gemini Code Assistant - Style Guide (Performance‑First)
2+
3+
> TL;DR: Treat performance as a feature. Prefer zero/low‑overhead code paths, avoid accidental allocations, and never pay logging costs on hot paths unless explicitly gated.
4+
5+
---
6+
7+
## Quick Facts (Tracing Semantics You Must Know)
8+
9+
- **Default `#[instrument]` level is `debug`.**
10+
- Tracing levels are ordered: `error > warn > info > debug > trace`. Enabling `info` **does not** enable `debug`.
11+
- **Field expressions inside `#[instrument(... fields(...))]` are evaluated at function entry** (macro expansion time for that call site), before any subscriber decides to record or drop the span. If those expressions do heavy work (e.g., `pretty_print`, `format!`, `to_string`, serialization), you **pay that cost** regardless of whether the span is ultimately recorded.
12+
- Therefore, any non‑trivial work inside `fields(...)` is a foot‑gun even when you think "it's only at `debug`."
13+
14+
---
15+
16+
## Blocking Policy (exact)
17+
18+
Gemini must **request changes (block the PR)** when any of the following appear:
19+
20+
- **A. `#[tracing::instrument]` uses the default level (`info`) on hot paths.**
21+
`info` spans are commonly enabled in production. Span creation + any eager field evaluation will run on hot paths. Use `level = "trace"` for hot code and record richer details lazily (see patterns below) when a higher level is actually enabled.
22+
23+
- **B. `#[tracing::instrument]` uses `level = "debug"` (or higher verbosity) and either condition holds:**
24+
1) **`skip_all` is missing**, or
25+
2) **`fields(...)` performs any computation** (function/method calls, formatting, allocation, serialization, cloning, traversals).
26+
Even if the runtime filter disables that `debug` span, those field expressions are still evaluated at function entry - wasted work.
27+
28+
- **C. Any heavy work is performed just to log/trace on a hot path** without a guard (e.g., `tracing::enabled!` or a feature flag).
29+
30+
> **Baseline for hot paths:** `#[instrument(skip_all, level = "trace", fields(... cheap only ...))]` + gated/lazy recording for anything expensive or no instrumentation at all.
31+
32+
---
33+
34+
## Goals
35+
36+
- **Always review PRs through a performance lens.**
37+
- **Block** changes that add avoidable runtime overhead (allocations, stringification, extra clones, unnecessary async/await boundaries, unbounded maps/vecs, etc.).
38+
- **Strict rules for `tracing`** so instrumentation doesn't silently tax hot paths.
39+
- **Review PRs through a "best-practices in Rust" lens.**
40+
41+
---
42+
43+
## Default Review Posture
44+
45+
- If the change touches a **hot path** (request handling, routing, query planning, JSON/IO, allocators, pools), assume **budget = near zero**.
46+
- If a PR adds logging/tracing/metrics:
47+
- Confirm **runtime cost ≈ 0** when disabled or below the current level.
48+
- If cost is non‑trivial, **require gating** (`tracing::enabled!`, feature flags) or relocation to cold paths.
49+
- Prefer **benchmarks or micro‑benchmarks** for anything that plausibly affects p95/p99 latency or memory.
50+
51+
---
52+
53+
## `tracing` Rules (Allowed vs Banned)
54+
55+
### ✅ Allowed (safe‑by‑default)
56+
57+
```rust
58+
use tracing::instrument;
59+
60+
#[instrument(skip_all, level = "trace", fields(user_id = %user.id, req_id = %req.id))]
61+
async fn handle(user: &User, req: &Request) -> Result<Response> {
62+
// Optional event with cheap fields only
63+
tracing::trace!(%user.id, %req.id, "entered handler");
64+
Ok(...)
65+
}
66+
```
67+
68+
- `level = "trace"` on `instrument` for hot paths.
69+
- `skip_all` + **explicit, cheap fields** (`%`/`?` over already‑available, small values).
70+
- No allocations or heavy computation in attribute arguments.
71+
72+
### ❌ Banned (examples)
73+
74+
```rust
75+
// 1) Default level (info) + no skip_all.
76+
#[instrument] // reject
77+
fn hot_path(a: &Big, b: &Bigger) { /* ... */ }
78+
79+
// 2) Debug‑level span with expensive field computation.
80+
#[instrument(level = "debug", fields(details = %self.pretty_print()))] // reject
81+
fn parse(&self) { /* ... */ }
82+
83+
// 3) Any format/to_string/clone/serialize in fields
84+
#[instrument(skip_all, level="info", fields(blob = %format!("{:?}", self.blob)))] // reject
85+
async fn foo(self) { /* ... */ }
86+
```
87+
88+
---
89+
90+
## Patterns for Safe, Lazy Recording
91+
92+
When you truly need expensive fields, **gate them** and **record lazily**:
93+
94+
### Record after `enabled!` check
95+
96+
```rust
97+
use tracing::{self, Level};
98+
99+
#[instrument(skip_all, level = "trace", fields(details = tracing::field::Empty))]
100+
fn process(&self) {
101+
if tracing::enabled!(Level::DEBUG) {
102+
// Compute only when actually enabled at runtime.
103+
let details = self.pretty_print(); // expensive
104+
tracing::Span::current().record("details", &tracing::field::display(details));
105+
}
106+
// ...
107+
}
108+
```
109+
110+
---
111+
112+
## Field Hygiene
113+
114+
- **Cheap only:** ids, small scalars, already‑borrowed references.
115+
- **Forbidden in fields:** `.pretty_print()`, `format!`, `.to_string()`, (de)serialization, cloning big structs, traversing collections.
116+
- Prefer `fields(foo = %id)` over `fields(foo = id.to_string())`.
117+
118+
---
119+
120+
## Reviewer Checklist (what Gemini should look for)
121+
122+
- [ ] Any `#[instrument]`?
123+
- [ ] Uses `skip_all`? If **no****request changes**.
124+
- [ ] Uses `level = "trace"` for hot paths? If default or `level = "info"` on hot code → **request changes**.
125+
- [ ] **No function calls/allocs** inside `fields(...)`? If present → **request changes** with Pattern A/B.
126+
- [ ] New logs/metrics in hot paths?
127+
- [ ] Gated via `enabled!` or a feature flag?
128+
- [ ] New `clone()` / `to_string()` / `collect()` on hot paths? Ask for justification or refactor.
129+
- [ ] Async boundaries added? Avoid splitting critical sections if not needed.
130+
- [ ] Allocations: prefer stack/arena/borrowed views; avoid intermediate `String`/`Vec` unless necessary.
131+
- [ ] Config toggles: if a costly path is optional, require a **feature flag** or runtime knob.
132+
133+
---
134+
135+
## Comment Templates (Gemini → PR)
136+
137+
**1) Instrumentation level & missing `skip_all`**
138+
139+
> Performance check: `#[instrument]` here uses the default level (`info`) and/or doesn't specify `skip_all`.
140+
> This creates spans at common prod filters and may evaluate fields eagerly.
141+
> Please switch to:
142+
> ```rust
143+
> #[instrument(skip_all, level = "trace", fields(... only cheap fields ...))]
144+
> ```
145+
> If you need richer context, record it after an `enabled!` guard.
146+
147+
**2) Expensive field computation**
148+
149+
> The `fields(...)` contains an eager computation (`pretty_print`/`format!`/`to_string`).
150+
> These are evaluated at function entry even if the span isn't ultimately recorded.
151+
> Please either:
152+
> - Gate with `tracing::enabled!(Level::DEBUG)` + `Span::current().record(...)`, or
153+
> - Use a lazy wrapper and still gate if expensive (seePatterns for Safe, Lazy Recording”).
154+
155+
**3) Hotpath logging**
156+
157+
> New logs on a hot path detected. Can we gate by level/feature or move to a colder edge?
158+
> Aim for zero cost when disabled.
159+
160+
---
161+
162+
## QuickFix Snippets Gemini Can Suggest
163+
164+
- Replace:
165+
```rust
166+
#[instrument]
167+
```
168+
With:
169+
```rust
170+
#[instrument(skip_all, level = "trace")]
171+
```
172+
173+
- Replace (eager fields):
174+
```rust
175+
#[instrument(skip_all, level="debug", fields(details = %self.pretty_print()))]
176+
```
177+
With (gated record):
178+
```rust
179+
#[instrument(skip_all, level="trace", fields(details = tracing::field::Empty))]
180+
fn f(&self) {
181+
if tracing::enabled!(tracing::Level::DEBUG) {
182+
let d = self.pretty_print();
183+
tracing::Span::current().record("details", &tracing::field::display(d));
184+
}
185+
}
186+
```
187+
188+
---
189+
190+
## When It's Okay to Use `info`
191+
192+
- Truly **cold** admin/maintenance paths (schema reload, health checks, CLI tools) **may** use `level = "info"` with **cheap** fields only.
193+
- Still **avoid** expensive field computation in attributes; prefer gated record.
194+
195+
---
196+
197+
## Rationale (why we're strict)
198+
199+
- `#[instrument]` **creates spans** and **evaluates its `fields(...)`** at function entry. With the default `info` level, this happens under typical production filters; with `debug`, the field expressions are still evaluated even if the span is dropped later.
200+
- Stringification/pretty printing can dominate latency on tight loops; a handful of these across hot paths quickly becomes a measurable tax.
201+
- The safe baseline (`skip_all`, `level="trace"`, gated recording) keeps the lights on without sacrificing debuggability when you need it.
202+
203+
---
204+
205+
## PR Author Checklist (pre‑submit)
206+
207+
- [ ] No default/`info`/`debug` `#[instrument]` on hot paths unless justified and still `skip_all` + cheap fields only.
208+
- [ ] Every `#[instrument]` has `skip_all`.
209+
- [ ] No function calls/allocs in `fields(...)`.
210+
- [ ] Hot‑path logs/metrics are gated or relocated.
211+
212+
---

bin/router/src/logger/mod.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ pub fn configure_logging(config: &LoggingConfig) {
2727
LogFormat::Json => fmt::Layer::<Registry>::default()
2828
.json()
2929
.with_timer(timer)
30-
.with_span_events(FmtSpan::CLOSE)
3130
.boxed(),
3231
LogFormat::PrettyCompact => fmt::Layer::<Registry>::default()
3332
.compact()

lib/executor/src/executors/common.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ pub trait SubgraphExecutor {
1111
&self,
1212
execution_request: HttpExecutionRequest<'a>,
1313
) -> HttpExecutionResponse;
14+
1415
fn to_boxed_arc<'a>(self) -> Arc<Box<dyn SubgraphExecutor + Send + Sync + 'a>>
1516
where
1617
Self: Sized + Send + Sync + 'a,

lib/executor/src/executors/http.rs

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -190,21 +190,30 @@ impl HTTPSubgraphExecutor {
190190
buffer.put_slice(b"}");
191191
buffer.freeze()
192192
}
193+
194+
fn log_error(&self, error: &SubgraphExecutorError) {
195+
tracing::error!(
196+
error = error as &dyn std::error::Error,
197+
"Subgraph executor error"
198+
);
199+
}
193200
}
194201

195202
#[async_trait]
196203
impl SubgraphExecutor for HTTPSubgraphExecutor {
204+
#[tracing::instrument(skip_all, fields(subgraph_name = self.subgraph_name))]
197205
async fn execute<'a>(
198206
&self,
199207
execution_request: HttpExecutionRequest<'a>,
200208
) -> HttpExecutionResponse {
201209
let body = match self.build_request_body(&execution_request) {
202210
Ok(body) => body,
203211
Err(e) => {
212+
self.log_error(&e);
204213
return HttpExecutionResponse {
205214
body: self.error_to_graphql_bytes(e),
206215
headers: Default::default(),
207-
}
216+
};
208217
}
209218
};
210219

@@ -222,10 +231,13 @@ impl SubgraphExecutor for HTTPSubgraphExecutor {
222231
body: shared_response.body,
223232
headers: shared_response.headers,
224233
},
225-
Err(e) => HttpExecutionResponse {
226-
body: self.error_to_graphql_bytes(e),
227-
headers: Default::default(),
228-
},
234+
Err(e) => {
235+
self.log_error(&e);
236+
HttpExecutionResponse {
237+
body: self.error_to_graphql_bytes(e),
238+
headers: Default::default(),
239+
}
240+
}
229241
};
230242
}
231243

@@ -261,10 +273,13 @@ impl SubgraphExecutor for HTTPSubgraphExecutor {
261273
body: shared_response.body.clone(),
262274
headers: shared_response.headers.clone(),
263275
},
264-
Err(e) => HttpExecutionResponse {
265-
body: self.error_to_graphql_bytes(e.clone()),
266-
headers: Default::default(),
267-
},
276+
Err(e) => {
277+
self.log_error(&e);
278+
HttpExecutionResponse {
279+
body: self.error_to_graphql_bytes(e.clone()),
280+
headers: Default::default(),
281+
}
282+
}
268283
}
269284
}
270285
}

0 commit comments

Comments
 (0)