Skip to content

Commit 03ee9d1

Browse files
Use structured logging + add traces & metrics
Changes all of `oxen` to use structured logging, allowing for all `log::` statements to be asynchronously written to a rotating log file as a sequence of newline delimited JSON objects. Configuration is via the `OXEN_LOG_FILE` env var: setting it to `1` or `true` has it write logs in `~/.oxen/logs/`. Setting it to a directory has it write to log files there. Leaving it unset means all logging falls back to writing to STDERR. All `log` statements are now translated to `tracing` Events via the new `tracing-log` dependency. This means that if they're active in a trace, they will be associated with it. `oxen-server` now collects and exposes a Prometheus-compatible metrics endpoint as `/metrics`. This runs on a _different_ port than the main server process. It defaults to `9090` and can be changed with `OXEN_METRICS_PORT`. New trace spans have been added to every HTTP endpoint call in `oxen-rust` and to workspace upload functions.
1 parent 0691949 commit 03ee9d1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+1951
-70
lines changed

config_for_tracing_metrics.md

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
# Tracing & Metrics Configuration Guide
2+
3+
Oxen ships with structured logging (via the Rust `tracing` crate) and Prometheus-compatible
4+
metrics (via the `metrics` crate). Both are wired into the CLI (`oxen`) and server
5+
(`oxen-server`) binaries out of the box.
6+
7+
---
8+
9+
## Overview
10+
11+
| Concern | Crate | How it works |
12+
|---------|-------|-------------|
13+
| **Structured logging** | `tracing` + `tracing-subscriber` + `tracing-log` | A `tracing-log` bridge captures all legacy `log::` calls and routes them through `tracing`. Two subscriber layers are always registered: a human-readable **stderr** layer, and an optional **JSON file** layer. |
14+
| **Metrics** | `metrics` + `metrics-exporter-prometheus` | All counters and histograms use the `metrics::counter!` / `metrics::histogram!` macros. In the server a Prometheus HTTP exporter is installed automatically. In the CLI the default no-op recorder silently discards all metric calls (zero overhead). |
15+
16+
There are ~580 metric instrumentation points across ~120 source files covering the CLI,
17+
lib (repositories, API client, merkle tree), and server controllers.
18+
19+
---
20+
21+
## Environment Variables Reference
22+
23+
| Variable | Default | Applies to | Description |
24+
|----------|---------|-----------|-------------|
25+
| `RUST_LOG` | `info` | CLI & Server | Filter directive for `tracing` (standard `EnvFilter` syntax). Controls what appears on stderr **and** in log files. |
26+
| `OXEN_LOG_FILE` | *(unset)* | CLI & Server | Opt-in to JSON file logging. Set to `1` / `true` for the default directory (`~/.oxen/logs/`), or set to a directory path. |
27+
| `OXEN_METRICS_PORT` | `9090` | Server only | TCP port for the Prometheus metrics HTTP endpoint (`/metrics`). |
28+
29+
---
30+
31+
## Use Case 1: Stderr Only (Default)
32+
33+
This is the zero-configuration default. All log output goes to stderr in a
34+
human-readable format.
35+
36+
### CLI
37+
38+
```bash
39+
# Run any oxen command — logs appear on stderr at `info` level
40+
oxen status
41+
42+
# Increase verbosity to see debug-level output from liboxen
43+
RUST_LOG=debug oxen add .
44+
45+
# Show only warnings and errors
46+
RUST_LOG=warn oxen push origin main
47+
48+
# Target a specific module
49+
RUST_LOG="warn,liboxen::repositories::add=debug" oxen add .
50+
```
51+
52+
### Server
53+
54+
```bash
55+
# Start the server — info-level logs on stderr by default
56+
oxen-server start
57+
58+
# Debug-level logging for everything
59+
RUST_LOG=debug oxen-server start
60+
61+
# Debug only for server controllers, warn for everything else
62+
RUST_LOG="warn,oxen_server::controllers=debug" oxen-server start
63+
```
64+
65+
### Sample stderr output
66+
67+
```
68+
2026-03-09T12:00:01.234Z INFO liboxen::repositories::add: staging files path="images/"
69+
2026-03-09T12:00:01.567Z DEBUG liboxen::core::v_latest::index::commit_merkle_tree: loading merkle tree commit_id="abc123"
70+
2026-03-09T12:00:02.890Z INFO liboxen::repositories::commits: committed message="add training data"
71+
```
72+
73+
### Filter syntax
74+
75+
`RUST_LOG` uses the standard `EnvFilter` syntax:
76+
77+
| Example | Meaning |
78+
|---------|---------|
79+
| `debug` | Everything at debug and above |
80+
| `warn,liboxen=debug` | Default warn, but debug for `liboxen` |
81+
| `info,liboxen::repositories=trace` | Trace-level for repo ops, info elsewhere |
82+
| `error` | Only errors |
83+
84+
---
85+
86+
## Use Case 2: JSON File Logging
87+
88+
Opt in by setting `OXEN_LOG_FILE`. A rolling daily log file is written in
89+
JSON-per-line format using `tracing-appender`. Stderr output continues as normal.
90+
91+
### Enable with default directory
92+
93+
```bash
94+
# Logs written to ~/.oxen/logs/oxen.YYYY-MM-DD
95+
OXEN_LOG_FILE=1 oxen push origin main
96+
97+
# Same for the server — files named oxen-server.YYYY-MM-DD
98+
OXEN_LOG_FILE=true oxen-server start
99+
```
100+
101+
### Enable with custom directory
102+
103+
```bash
104+
OXEN_LOG_FILE=/var/log/oxen oxen-server start
105+
# Logs written to /var/log/oxen/oxen-server.YYYY-MM-DD
106+
```
107+
108+
The directory is created automatically if it does not exist.
109+
110+
### Sample JSON output
111+
112+
Each line is a self-contained JSON object:
113+
114+
```json
115+
{
116+
"timestamp": "2026-03-09T12:00:01.234567Z",
117+
"level": "INFO",
118+
"target": "liboxen::repositories::add",
119+
"filename": "src/lib/src/repositories/add.rs",
120+
"line_number": 38,
121+
"threadId": 1,
122+
"fields": {
123+
"message": "staging files",
124+
"path": "images/"
125+
},
126+
"span": {
127+
"name": "add"
128+
}
129+
}
130+
```
131+
132+
### Inspecting logs with jq
133+
134+
```bash
135+
# Stream all errors from today's log
136+
cat ~/.oxen/logs/oxen-server.2026-03-09 | jq 'select(.level == "ERROR")'
137+
138+
# Count log entries per level
139+
cat ~/.oxen/logs/oxen-server.2026-03-09 | jq -r '.level' | sort | uniq -c
140+
141+
# Show all entries from a specific module
142+
cat ~/.oxen/logs/oxen-server.2026-03-09 | jq 'select(.target | startswith("liboxen::repositories::push"))'
143+
144+
# Extract timing histograms recorded in spans
145+
cat ~/.oxen/logs/oxen-server.2026-03-09 | jq 'select(.fields.message | contains("duration"))'
146+
```
147+
148+
### Log rotation
149+
150+
Files roll daily by filename suffix (e.g. `oxen-server.2026-03-09`,
151+
`oxen-server.2026-03-10`). Old files are **not** automatically deleted — set up
152+
your own rotation policy (e.g. `logrotate`, a cron job, or manual cleanup).
153+
154+
---
155+
156+
## Use Case 3: Prometheus Metrics (Server)
157+
158+
The server automatically starts a Prometheus HTTP exporter on startup. No
159+
additional configuration is needed for the default port.
160+
161+
### Default (port 9090)
162+
163+
```bash
164+
oxen-server start
165+
# Metrics available at http://localhost:9090/metrics
166+
```
167+
168+
### Custom port
169+
170+
```bash
171+
OXEN_METRICS_PORT=9191 oxen-server start
172+
# Metrics available at http://localhost:9191/metrics
173+
```
174+
175+
### Verify with curl
176+
177+
```bash
178+
curl -s http://localhost:9090/metrics | head -20
179+
```
180+
181+
### Sample metric output
182+
183+
```
184+
# HELP oxen_server_commits_upload_total oxen_server_commits_upload_total
185+
# TYPE oxen_server_commits_upload_total counter
186+
oxen_server_commits_upload_total 42
187+
188+
# HELP oxen_server_commits_upload_duration_seconds oxen_server_commits_upload_duration_seconds
189+
# TYPE oxen_server_commits_upload_duration_seconds summary
190+
oxen_server_commits_upload_duration_seconds{quantile="0.5"} 1.234
191+
oxen_server_commits_upload_duration_seconds{quantile="0.9"} 3.456
192+
oxen_server_commits_upload_duration_seconds{quantile="0.99"} 7.890
193+
oxen_server_commits_upload_duration_seconds_sum 123.456
194+
oxen_server_commits_upload_duration_seconds_count 42
195+
196+
# HELP oxen_errors_total oxen_errors_total
197+
# TYPE oxen_errors_total counter
198+
oxen_errors_total{module="repositories::push",error="RemoteBranchNotFound"} 3
199+
```
200+
201+
### Prometheus scrape config
202+
203+
Add a scrape target to your `prometheus.yml`:
204+
205+
```yaml
206+
scrape_configs:
207+
- job_name: "oxen-server"
208+
scrape_interval: 15s
209+
static_configs:
210+
- targets: ["localhost:9090"]
211+
```
212+
213+
---
214+
215+
## Metrics Without a Collection Endpoint (CLI)
216+
217+
The CLI calls `init_metrics_noop()` at startup. The Rust `metrics` crate's
218+
default behavior when no recorder is installed is to silently discard all
219+
counter/histogram calls. This means:
220+
221+
- **Zero runtime overhead** — macro calls compile to no-ops when no recorder exists.
222+
- **No port is opened** — the CLI never listens on a network socket for metrics.
223+
- All the same `metrics::counter!` / `metrics::histogram!` instrumentation in
224+
the shared `liboxen` library simply does nothing when invoked from the CLI.
225+
226+
There is nothing to configure for this case.
227+
228+
---
229+
230+
## Metric Naming Reference
231+
232+
All metrics follow the pattern `oxen_{component}_{operation}_total` (counters)
233+
or `oxen_{component}_{operation}_duration_seconds` (histograms).
234+
235+
### Counters (`_total`)
236+
237+
| Prefix | Component | Examples |
238+
|--------|-----------|---------|
239+
| `oxen_cli_*` | CLI commands | `oxen_cli_branch_total`, `oxen_cli_delete_remote_total` |
240+
| `oxen_repo_init_*` | Repository init | `oxen_repo_init_init_total`, `oxen_repo_init_init_with_version_total` |
241+
| `oxen_repo_commit_*` | Commits | `oxen_repo_commit_commit_total`, `oxen_repo_commit_head_commit_total`, `oxen_repo_commit_list_total` |
242+
| `oxen_repo_clone_*` | Clone | `oxen_repo_clone_clone_total`, `oxen_repo_clone_deep_clone_url_total` |
243+
| `oxen_repo_checkout_*` | Checkout | `oxen_repo_checkout_checkout_total`, `oxen_repo_checkout_checkout_theirs_total` |
244+
| `oxen_repo_status_*` | Status | `oxen_repo_status_status_total`, `oxen_repo_status_status_from_opts_total` |
245+
| `oxen_repo_diff_*` | Diffs | `oxen_repo_diff_diff_total`, `oxen_repo_diff_diff_commits_total`, `oxen_repo_diff_tabular_total` |
246+
| `oxen_repo_add_*` | Staging | `oxen_repo_add_add_total` |
247+
| `oxen_repo_push_*` | Push | `oxen_repo_push_push_total` |
248+
| `oxen_repo_pull_*` | Pull | `oxen_repo_pull_pull_total` |
249+
| `oxen_repo_merge_*` | Merge | `oxen_repo_merge_merge_total` |
250+
| `oxen_repo_entries_*` | Entry listing | `oxen_repo_entries_get_directory_total`, `oxen_repo_entries_list_directory_total` |
251+
| `oxen_repo_tree_*` | Tree operations | `oxen_repo_tree_get_root_total` |
252+
| `oxen_merkle_*` | Merkle tree internals | `oxen_merkle_from_commit_total`, `oxen_merkle_load_children_total` |
253+
| `oxen_client_*` | API client (HTTP calls) | `oxen_client_commits_post_push_complete_total`, `oxen_client_entries_download_entry_total` |
254+
| `oxen_server_commits_*` | Server commit endpoints | `oxen_server_commits_upload_total`, `oxen_server_commits_create_total` |
255+
| `oxen_server_repositories_*` | Server repo endpoints | `oxen_server_repositories_create_total`, `oxen_server_repositories_delete_total` |
256+
| `oxen_server_branches_*` | Server branch endpoints | `oxen_server_branches_index_total`, `oxen_server_branches_create_total` |
257+
| `oxen_server_workspaces_*` | Server workspace endpoints | `oxen_server_workspaces_create_total`, `oxen_server_workspaces_commit_total` |
258+
| `oxen_server_tree_*` | Server tree endpoints | `oxen_server_tree_create_nodes_total`, `oxen_server_tree_download_tree_total` |
259+
| `oxen_server_diff_*` | Server diff endpoints | `oxen_server_diff_commits_total`, `oxen_server_diff_file_total` |
260+
| `oxen_server_file_*` | Server file endpoints | `oxen_server_file_get_total`, `oxen_server_file_put_total` |
261+
| `oxen_server_merger_*` | Server merge endpoint | `oxen_server_merger_merge_total` |
262+
| `oxen_errors_total` | Error tracking | Labels: `module`, `error` — e.g. `oxen_errors_total{module="repositories::push",error="RemoteBranchNotFound"}` |
263+
264+
### Histograms (`_duration_seconds`)
265+
266+
| Metric | What it measures |
267+
|--------|-----------------|
268+
| `oxen_repo_add_add_duration_seconds` | Time to stage files |
269+
| `oxen_repo_commit_commit_duration_seconds` | Time to create a commit |
270+
| `oxen_repo_clone_clone_duration_seconds` | Time to clone a repository |
271+
| `oxen_repo_checkout_checkout_duration_seconds` | Time to checkout a branch/commit |
272+
| `oxen_repo_push_push_duration_seconds` | Time to push to remote |
273+
| `oxen_repo_pull_pull_duration_seconds` | Time to pull from remote |
274+
| `oxen_repo_merge_merge_duration_seconds` | Time to merge branches |
275+
| `oxen_repo_tree_*_duration_seconds` | Various tree operations (get root, compress, unpack, write) |
276+
| `oxen_merkle_from_commit_duration_seconds` | Time to build merkle tree from commit |
277+
| `oxen_client_import_upload_zip_duration_seconds` | Time to upload a zip import |
278+
| `oxen_client_versions_*_duration_seconds` | Upload/download version file durations |
279+
| `oxen_client_workspaces_files_*_duration_seconds` | Workspace file staging/download durations |
280+
| `oxen_server_commits_upload_duration_seconds` | Time to process a commit upload |
281+
| `oxen_server_commits_upload_tree_duration_seconds` | Time to upload tree data |
282+
| `oxen_server_file_put_duration_seconds` | Time to write a file via API |
283+
| `oxen_server_merger_merge_duration_seconds` | Time to process a server-side merge |
284+
| `oxen_server_workspaces_commit_duration_seconds` | Time to commit a workspace |
285+
| `oxen_server_prune_prune_duration_seconds` | Time to prune a repository |
286+
| `oxen_server_versions_*_duration_seconds` | Version upload/chunk durations |
287+
| `oxen_server_export_download_zip_duration_seconds` | Time to export a zip download |
288+
| `oxen_server_branches_maybe_create_merge_duration_seconds` | Time to create a merge branch |
289+
| `oxen_server_import_*_duration_seconds` | Import processing durations |
290+
291+
---
292+
293+
## Troubleshooting
294+
295+
### No log output appears
296+
297+
- Ensure `RUST_LOG` is set to a level that includes the messages you expect
298+
(e.g. `debug` or `info`). The default is `info`.
299+
- Logs go to **stderr**, not stdout. If piping output, make sure you're
300+
capturing stderr: `oxen status 2>log.txt`
301+
302+
### JSON log files are empty
303+
304+
- Verify `OXEN_LOG_FILE` is set **before** the process starts (not after).
305+
- Check the directory exists and is writable. The directory is auto-created, but
306+
parent directories must already exist.
307+
- The non-blocking writer flushes on process exit. If the process is killed with
308+
`SIGKILL`, the last few log lines may be lost. Use `SIGTERM` for graceful shutdown.
309+
310+
### Metrics endpoint returns nothing / connection refused
311+
312+
- The Prometheus exporter only runs in `oxen-server`, not the CLI.
313+
- Verify the port: `curl -s http://localhost:9090/metrics`. If you changed the
314+
port via `OXEN_METRICS_PORT`, use that port instead.
315+
- Check for port conflicts — another process may already be listening on that port.
316+
Use `lsof -i :9090` to check.
317+
318+
### Metrics show all zeros
319+
320+
- Counters start at zero and only increment when the corresponding code path
321+
executes. If no requests have been made, counters will be zero.
322+
- Histograms only appear after the first observation is recorded.
323+
324+
### Too much log output / performance concerns
325+
326+
- Use a targeted filter: `RUST_LOG="warn,liboxen::repositories=info"` to limit
327+
verbosity to specific modules.
328+
- File logging uses a non-blocking writer with an in-memory buffer, so it has
329+
minimal impact on request latency.
330+
- Metrics collection via the `metrics` crate uses atomic operations and has
331+
negligible overhead.

0 commit comments

Comments
 (0)