harness: align graph runs with structured metrics artifacts by jioffe502 · Pull Request #1803 · NVIDIA/NeMo-Retriever

jioffe502 · 2026-04-07T13:12:40Z

TL;DR

Aligns the retriever harness with the graph pipeline and makes run artifacts easier to trust: metrics come from structured *.runtime.summary.json, not stdout scraping. The harness invokes graph_pipeline with the same controls as before for run mode and CLI wiring.

Description

Build graph_pipeline subprocess commands from harness config (including run_mode and resource flags as determined by preset, YAML, env, and use_heuristics).
Replace stdout metric parsing with runtime-summary-driven metric extraction; remove the obsolete parser module and its tests; keep results.json / session layout compatible with existing consumers.
Improve artifact provenance by hardening latest_commit resolution (gitdir / packed-refs–friendly) in harness artifacts helpers.
test_config.tuning continues to record the merged harness configuration used for the run. metrics and summary_metrics contain only performance and evaluation fields from the runtime summary.

Replace stdout metric parsing in harness run execution with structured metrics derived from graph pipeline runtime summary artifacts so results are more stable and reviewer-friendly. Preserve existing artifact shape while adding run-mode/auto-tuning config coverage and updating harness tests for batch-first structured sourcing. Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>

Delete the legacy harness stream-metric parser and its tests now that harness metrics are sourced from runtime summary artifacts. Keep PTY child-process output streaming for user job-progress visibility while slimming dead parsing code. Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>

- harden latest_commit fallback for gitdir and packed-refs environments - expose resolved auto-heuristic graph flags in runtime and harness artifacts - trim effective_tuning payload while preserving key runtime context

greptile-apps · 2026-04-07T13:21:28Z

Greptile Summary

This PR replaces fragile stdout-scraping metric collection with structured *.runtime.summary.json-driven extraction and switches the subprocess target from batch_pipeline to graph_pipeline (with --run-mode and auto_tuning wiring). Supporting changes harden last_commit() for worktree environments and fix Ray fractional resource coercion. All remaining findings are P2 style issues that do not block merge.

Confidence Score: 5/5

Safe to merge; all findings are P2 style issues that do not affect runtime correctness.

Core behavioral changes — structured metrics from *.runtime.summary.json, graph_pipeline subprocess invocation, worktree-safe last_commit(), and fractional Ray resource coercion fix — are all sound and well-covered by the updated test suite. The two P2 findings (silent exception swallowing without logging and undocumented pdf_split_batch_size asymmetry) are style/documentation concerns addressable in a follow-up.

artifacts.py (5 bare except blocks without logging) and run.py (_resolve_effective_tuning pdf_split_batch_size asymmetry) warrant a second look but neither blocks merge.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/harness/artifacts.py	Adds worktree-friendly git fallback (_resolve_git_dir, _read_packed_ref, _read_head_commit); five bare except Exception blocks silently discard errors without logging.
nemo_retriever/src/nemo_retriever/harness/run.py	Core rework: graph_pipeline subprocess, structured *.runtime.summary.json metrics, _resolve_effective_tuning with undocumented pdf_split_batch_size asymmetry when auto_tuning=True.
nemo_retriever/src/nemo_retriever/harness/config.py	Adds run_mode and auto_tuning fields to HarnessConfig with validation against VALID_RUN_MODES and env-var overrides; clean and correct.
nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py	Adds _coerce_count helper using float() before int() to handle fractional Ray resource values; minimal, targeted fix.
nemo_retriever/harness/test_configs.yaml	Adds run_mode: batch and auto_tuning: true as active defaults; maps directly to new HarnessConfig fields.
nemo_retriever/tests/test_harness_config.py	New tests for run_mode env-var override, invalid-mode rejection, and auto_tuning CLI override; adequate coverage of new config surface.
nemo_retriever/tests/test_harness_run.py	Updated to cover graph_pipeline command construction, structured metrics, git-metadata fallback, and auto_tuning zeroing behavior; reflects new behavior correctly.
nemo_retriever/tests/test_resource_heuristics.py	New test for fractional Ray resource coercion (CPU: 32.0, GPU: 1.0); directly exercises the _coerce_count fix.
nemo_retriever/harness/HANDOFF.md	Trimmed to runbook-style ops notes with a key-files table; no code impact.

Sequence Diagram

sequenceDiagram
    participant Op as Operator
    participant H as Harness
    participant GP as graph_pipeline subprocess
    participant SJ as run_id.runtime.summary.json
    participant RJ as results.json

    Op->>H: harness run --dataset jp20
    H->>H: load_harness_config()
    Note over H: YAML + env overrides + CLI → HarnessConfig
    Note over H: run_mode=batch, auto_tuning=true
    H->>H: _build_command() calls _resolve_effective_tuning()
    Note over H: auto_tuning=True → zeros most tuning params
    H->>GP: python -m nemo_retriever.examples.graph_pipeline
    Note over GP: --run-mode batch
    Note over GP: --pdf-extract-tasks 0 (auto)
    Note over GP: --runtime-metrics-dir runtime_metrics/
    GP->>SJ: write run_id.runtime.summary.json
    Note over SJ: {processed_pages, ingestion_only_secs,
    Note over SJ:  evaluation_metrics: {recall@5: 0.91}}
    GP-->>H: exit code
    H->>SJ: _read_json_if_exists(runtime_summary_path)
    SJ-->>H: runtime_summary dict (or None)
    H->>H: _build_structured_metrics_payload(runtime_summary)
    H->>H: _evaluate_run_outcome(recall_metrics, evaluation_metrics)
    H->>RJ: write_json(results.json)
    Note over RJ: {success, metrics, summary_metrics,
    Note over RJ:  runtime_summary, test_config.tuning}
    RJ-->>Op: artifact

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/harness/artifacts.py
Line: 36-39

Comment:
**Silent exception swallowing without logging**

Five bare `except Exception:` blocks in the new git-metadata fallback path (lines 38, 63, 78, 89–90, 106) all return `None`/`pass` without logging any context. When `last_commit()` silently falls back to `"unknown"`, a misconfigured worktree path or permission error becomes completely invisible. The `no-bare-except` rule requires that non-boundary catch sites at minimum log with context before swallowing.

```suggestion
    except Exception as exc:  # noqa: BLE001
        logger.debug("_resolve_git_dir: failed to read %s: %s", dot_git, exc)
        return None
```
Apply the same `logger.debug(...)` pattern at the other four catch sites (`_read_packed_ref` line 63, `_read_head_commit` lines 78 and 89–90, and `last_commit` line 106).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/harness/run.py
Line: 275

Comment:
**`pdf_split_batch_size` not zeroed when `auto_tuning=True`**

Every tuning field in `_resolve_effective_tuning` is zeroed when `use_auto=True` to signal `graph_pipeline` to auto-tune — except `pdf_split_batch_size`, which always forwards the configured value. The asymmetry is undocumented, making it ambiguous whether this is intentional (the pipeline ignores `0` for this param) or an accidental omission. Add a comment if intentional:

```suggestion
        # pdf_split_batch_size is always forwarded explicitly; graph_pipeline does not
        # implement auto-tuning for this parameter and the configured default is safe.
        "pdf_split_batch_size": cfg.pdf_split_batch_size,
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (5): Last reviewed commit: "docs(harness): slim HANDOFF for ops; res..." | Re-trigger Greptile}

greptile-apps · 2026-04-07T13:21:32Z

nemo_retriever/src/nemo_retriever/harness/run.py

+        resolved_graph_flags = runtime_resolved_tuning.get("resolved_graph_flags")
+        if isinstance(resolved_graph_flags, dict):
+            final_graph_flags = dict(resolved_graph_flags)
+            effective_tuning["graph_pipeline_flags"] = final_graph_flags


*_cpus_per_actor keys silently dropped from graph_pipeline_flags on runtime override

When runtime_resolved_tuning provides a resolved_graph_flags dict, final_graph_flags is completely replaced. resolved_graph_flags (built in _resolve_runtime_tuning_summary) includes nemotron_parse_* keys but does not include page_elements_cpus_per_actor, ocr_cpus_per_actor, or embed_cpus_per_actor. Those three fields are present in requested_graph_flags from _resolve_effective_tuning and are actually passed to the subprocess, but disappear from the effective_tuning.graph_pipeline_flags artifact after the replacement. Merging instead of replacing preserves them:

Suggested change

resolved_graph_flags = runtime_resolved_tuning.get("resolved_graph_flags")

if isinstance(resolved_graph_flags, dict):

final_graph_flags = dict(resolved_graph_flags)

effective_tuning["graph_pipeline_flags"] = final_graph_flags

if isinstance(resolved_graph_flags, dict):

final_graph_flags = {**final_graph_flags, **dict(resolved_graph_flags)}

effective_tuning["graph_pipeline_flags"] = final_graph_flags

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/harness/run.py Line: 522-525 Comment: **`*_cpus_per_actor` keys silently dropped from `graph_pipeline_flags` on runtime override** When `runtime_resolved_tuning` provides a `resolved_graph_flags` dict, `final_graph_flags` is completely replaced. `resolved_graph_flags` (built in `_resolve_runtime_tuning_summary`) includes `nemotron_parse_*` keys but does not include `page_elements_cpus_per_actor`, `ocr_cpus_per_actor`, or `embed_cpus_per_actor`. Those three fields are present in `requested_graph_flags` from `_resolve_effective_tuning` and are actually passed to the subprocess, but disappear from the `effective_tuning.graph_pipeline_flags` artifact after the replacement. Merging instead of replacing preserves them: ```suggestion if isinstance(resolved_graph_flags, dict): final_graph_flags = {**final_graph_flags, **dict(resolved_graph_flags)} effective_tuning["graph_pipeline_flags"] = final_graph_flags ``` How can I resolve this? If you propose a fix, please make it concise.

- add missing trailing newline to CONTRIBUTING.md - add missing trailing newline to release notes markdown

Resolve conflicts: - run.py: omit tuning CLI flags when use_heuristics; otherwise pass effective_tuning (auto_tuning zeros vs configured values). - graph_pipeline.py: keep runtime tuning summary helpers and _count_input_units from main.

Runtime summaries may only return a subset of tuning keys; preserve requested effective values (e.g. auto-tuning CPU sentinels) when overlaying resolved_graph_flags.

…_tuning Remove harness results.json effective_tuning and runtime summary resolved_tuning so this PR stays focused on runtime-summary metrics (vs stdout). Tuning CLI construction unchanged; follow-up can reintroduce tuning capture with a single clear model.

…list Trim graph-refactor migration, git workflow, and obsolete batch/parser notes. Keep operator map, CLI, artifacts, and a short results.json contract. Add maintainer backlog and harness change checklist for agents.

jioffe502 added 3 commits April 6, 2026 18:51

harness: clarify effective tuning and commit stamping

3bd23d3

- harden latest_commit fallback for gitdir and packed-refs environments - expose resolved auto-heuristic graph flags in runtime and harness artifacts - trim effective_tuning payload while preserving key runtime context

jioffe502 requested review from a team as code owners April 7, 2026 13:12

jioffe502 requested a review from nkmcalli April 7, 2026 13:12

jioffe502 marked this pull request as draft April 7, 2026 13:14

jioffe502 changed the title ~~harness: simplify structured metrics and tuned flag reporting~~ harness: align graph runs with structured metrics artifacts Apr 7, 2026

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

docs: fix eof newline formatting for pre-commit

ff5a177

- add missing trailing newline to CONTRIBUTING.md - add missing trailing newline to release notes markdown

jioffe502 marked this pull request as ready for review April 7, 2026 14:53

jioffe502 requested review from jperez999 and removed request for nkmcalli April 7, 2026 14:53

jioffe502 added 3 commits April 7, 2026 16:29

fix(harness): merge resolved_graph_flags into requested for artifacts

abba684

Runtime summaries may only return a subset of tuning keys; preserve requested effective values (e.g. auto-tuning CPU sentinels) when overlaying resolved_graph_flags.

jioffe502 mentioned this pull request Apr 7, 2026

nemo_retriever: Add structured harness metrics reports #1699

Closed

jioffe502 and others added 2 commits April 7, 2026 15:23

Merge branch 'main' into jioffe/graph-refactor-integration

0580585

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness: align graph runs with structured metrics artifacts#1803

harness: align graph runs with structured metrics artifacts#1803
jioffe502 wants to merge 9 commits intoNVIDIA:mainfrom
jioffe502:jioffe/graph-refactor-integration

jioffe502 commented Apr 7, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 7, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jioffe502 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Description

Uh oh!

greptile-apps bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jioffe502 commented Apr 7, 2026 •

edited

Loading

greptile-apps bot commented Apr 7, 2026 •

edited

Loading