Skip to content

refactor(display): consume structured rewrite columns in native shape#144

Open
memadi-nv wants to merge 4 commits into
mainfrom
memadi/feature/maintain-native-shape-for-structured-rewrite-columns
Open

refactor(display): consume structured rewrite columns in native shape#144
memadi-nv wants to merge 4 commits into
mainfrom
memadi/feature/maintain-native-shape-for-structured-rewrite-columns

Conversation

@memadi-nv
Copy link
Copy Markdown
Contributor

@memadi-nv memadi-nv commented May 1, 2026

Summary

Maintain native shape for structured rewrite columns

Motivation

The rewrite pipeline (LLMJudgeColumnConfig, LLMStructuredColumnConfig) already writes plain Python dicts into the dataframe via pydantic model_dump(). The display layer was defensively re-normalizing these columns from three possible shapes — None, pydantic model, or JSON string — which masked upstream bugs and duplicated work.

This PR removes the defensive coercion and trusts the native dict shape throughout the display path.

Changes

src/anonymizer/interface/display.py

  • Removed speculative type-coercion from _extract_judge_scores: no more None short-circuit, model_dump() fallback, or json.loads attempt. The function now expects a plain dict and returns [] otherwise.
  • Removed the same coercion ladder from _normalize_disposition; it now expects a plain dict with "sensitivity_disposition" as a list, filtering entries via isinstance(e, dict) instead of calling getattr(e, "model_dump", dict)().
  • Dropped redundant float(...) wrappers around utility, leakage, and weighted_leakage_rate in _render_scores_section — values are already numeric at this stage.
  • Added a module-level logger and a warning when COL_JUDGE_EVALUATION is present but yields no scores, so unexpected shapes surface instead of silently rendering nothing.
  • Updated docstrings on _extract_judge_scores and _normalize_disposition to document the expected native dict shape.

tests/interface/test_display.py

  • Judge-scores test now derives rubric keys from the real PRIVACY_RUBRIC, QUALITY_RUBRIC, and NATURALNESS_RUBRIC configs (imported from final_judge) instead of hardcoded strings, preventing test-vs-runtime drift if a rubric is renamed.
  • Disposition test fixture is now built via SensitivityDispositionSchema(...).model_dump() with a full EntityDispositionSchema, matching the exact dict that LLMStructuredColumnConfig writes at runtime. Replaces the hand-rolled partial dict.
  • Disposition test uses the COL_SENSITIVITY_DISPOSITION constant instead of a hardcoded "_sensitivity_disposition" key.
  • Added imports for COL_SENSITIVITY_DISPOSITION, the three rubric configs, and the two rewrite schemas.

Net effect

  • display.py: −24 / +18 lines (simpler, one observability signal added).
  • test_display.py: +35 / −21 lines (fixtures pinned to real schemas/constants).

Test plan

  • pytest tests/interface/test_display.py passes locally.
  • Existing e2e rewrite flow renders judge scores and disposition table correctly.
  • Warning log fires when an unexpected judge-evaluation shape reaches the renderer.

Testing

  • make test passes locally
  • make check passes locally (format + lint + typecheck + lock-check)
  • Added/updated tests for changes

Documentation

  • If docs changed: make docs-build passes locally

Related Issues

Closes #70

Signed-off-by: memadi <memadi@nvidia.com>
@memadi-nv memadi-nv requested a review from a team as a code owner May 1, 2026 20:33
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

Greptile Summary

This PR removes the defensive type-coercion ladder (None-check → model_dump() fallback → json.loads attempt) from _extract_judge_scores and _normalize_disposition, replacing it with a single isinstance(raw, dict) guard that trusts the plain-dict shape the upstream pipeline already writes. A targeted logger.warning fires when a dict is present but yields no scores, surfacing upstream regressions rather than silently rendering nothing.

  • display.py: Coercion removed from both extraction helpers; warning guard uses isinstance(judge_raw, dict) (correctly handling np.nan rows without false positives); redundant float() wrappers dropped.
  • test_display.py: Fixtures rebuilt from real schemas (SensitivityDispositionSchema.model_dump()) and real constants (COL_JUDGE_EVALUATION, rubric configs); two new tests verify the NaN-no-warn and malformed-dict-warns contract.

Confidence Score: 5/5

Safe to merge — the display layer now faithfully reflects the dict shape the pipeline already writes, and the warning guard is correctly scoped with isinstance.

The simplification is well-targeted: removing coercion paths that were masking upstream bugs, not removing legitimate defensive guards. The isinstance-based warning correctly distinguishes NaN rows (no warning) from malformed dicts (warning), and both branches are directly covered by new tests. No data loss, no silent breakage for the happy path.

No files require special attention; both changed files are straightforward and covered by the updated tests.

Important Files Changed

Filename Overview
src/anonymizer/interface/display.py Removes defensive type-coercion ladders from _extract_judge_scores and _normalize_disposition, trusting the upstream dict shape. Adds a well-scoped warning (using isinstance rather than pd.isna) for malformed judge dicts.
tests/interface/test_display.py Tests updated to use real constants and real schemas; adds two new focused tests for NaN (no-warn) and malformed-dict (warn) paths that correctly exercise the new isinstance-based warning guard.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[row.get COL_JUDGE_EVALUATION] --> B{isinstance dict?}
    B -- No NaN / None / other --> C[_extract_judge_scores returns empty]
    B -- Yes --> D[_extract_judge_scores iterates keys]
    D --> E{Each value: dict with score?}
    E -- No --> F[skip entry]
    E -- Yes --> G[append name, int score]
    G --> H{judge_scores non-empty?}
    F --> H
    H -- Yes --> I[Render Judge: scores in HTML]
    H -- No --> J[warning: unexpected keys logged]
    J --> K[No Judge section rendered]
    C --> K
Loading

Reviews (4): Last reviewed commit: "address feedback" | Re-trigger Greptile

Comment thread src/anonymizer/interface/display.py Outdated
Comment thread src/anonymizer/interface/display.py Outdated
Comment thread src/anonymizer/interface/display.py
Signed-off-by: memadi <memadi@nvidia.com>
Comment thread src/anonymizer/interface/display.py Outdated
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Comment thread tests/interface/test_display.py
Copy link
Copy Markdown
Collaborator

@lipikaramaswamy lipikaramaswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from an engineering perspective. I reviewed the display refactor and the updated tests; the change is scoped and aligns the renderer with the native dict shape produced by the rewrite pipeline. One small nit, recommend updating to future proof.

I did not run a live model-backed rewrite flow, so approving assuming you did :)

Signed-off-by: memadi <memadi@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(display): consume structured rewrite columns in native shape

2 participants