Skip to content

(retriever) fix remote ocr and pe logic to match local behavior#1810

Merged
edknv merged 1 commit intoNVIDIA:mainfrom
edknv:edwardk/retriever-nim-merge-level
Apr 7, 2026
Merged

(retriever) fix remote ocr and pe logic to match local behavior#1810
edknv merged 1 commit intoNVIDIA:mainfrom
edknv:edwardk/retriever-nim-merge-level

Conversation

@edknv
Copy link
Copy Markdown
Collaborator

@edknv edknv commented Apr 7, 2026

Description

Fixes 2 bugs in the remote (NIM/build endpoint) code paths that caused the results to diverge from the local model paths.

  • OCR mege levels: The OCR NIM endpoint defaults to paragraph-level text merging when merge_levels is not specified in the request. The local OCR path correctly uses word-level merging for tables (producing proper pseudo-markdown with individual cells) and paragraph-level for everything else. The remote path was not passing merge_levels at all.
  • Page-elements score filtering: The local inference path applies a per-class final score filter to remove low-confidence detections. The remote path was missing this step.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@edknv edknv requested review from a team as code owners April 7, 2026 16:54
@edknv edknv requested a review from ChrisJar April 7, 2026 16:54
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR fixes two behavioral divergences between the remote (NIM) and local inference code paths. First, it wires merge_levels through invoke_image_inference_batches and populates it per-crop in the OCR path ("word" for tables, "paragraph" for all other elements), so the NIM endpoint no longer defaults to paragraph-level merging for table crops. Second, it applies _apply_final_score_filter in all three remote response format branches of _remote_response_to_detections, bringing remote page-element scoring in line with the local pipeline which already applied this filter after WBF post-processing. Both fixes are minimal, correctly targeted, and handle the graceful-degradation cases (empty YOLOX_PAGE_V3_FINAL_SCORE, absent merge_levels) safely.

Confidence Score: 5/5

Safe to merge; changes are narrowly scoped bug fixes that align remote paths with already-validated local behavior.

No P0 or P1 issues found. The merge_levels length validation is correct and defensive, the per-batch slicing is correct for Sequence types, and _apply_final_score_filter short-circuits safely when YOLOX_PAGE_V3_FINAL_SCORE is empty. All three remote format branches are now consistently patched.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/nim/nim.py Adds optional merge_levels param to invoke_image_inference_batches with length validation and correct per-batch slicing into the JSON payload
nemo_retriever/src/nemo_retriever/ocr/ocr.py Computes per-crop merge_levels list (word for tables, paragraph otherwise) and passes it to the remote OCR call, matching local model behavior
nemo_retriever/src/nemo_retriever/page_elements/page_elements.py Applies _apply_final_score_filter in all three remote response format branches, closing the gap with the local pipeline's post-WBF score filtering

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ocr_page_elements
    participant invoke_image_inference_batches
    participant NIM_OCR_Endpoint
    participant _remote_response_to_detections
    participant _apply_final_score_filter

    Caller->>ocr_page_elements: pages_df, invoke_url
    ocr_page_elements->>ocr_page_elements: build crop_b64s + crop_merge_levels
    note over ocr_page_elements: word for tables, paragraph for others
    ocr_page_elements->>invoke_image_inference_batches: image_b64_list, merge_levels
    invoke_image_inference_batches->>invoke_image_inference_batches: validate len(merge_levels)==n
    loop per batch
        invoke_image_inference_batches->>NIM_OCR_Endpoint: POST {input, merge_levels[start:end]}
        NIM_OCR_Endpoint-->>invoke_image_inference_batches: OCR response
    end
    invoke_image_inference_batches-->>ocr_page_elements: response_items

    Caller->>_remote_response_to_detections: response_json
    _remote_response_to_detections->>_remote_response_to_detections: parse response format
    _remote_response_to_detections->>_apply_final_score_filter: dets (post-WBF)
    note over _apply_final_score_filter: per-class YOLOX_PAGE_V3_FINAL_SCORE filter
    _apply_final_score_filter-->>_remote_response_to_detections: filtered dets
    _remote_response_to_detections-->>Caller: final detections
Loading

Reviews (1): Last reviewed commit: "(retriever) fix remote ocr and pe logic ..." | Re-trigger Greptile

@edknv edknv merged commit a7cd139 into NVIDIA:main Apr 7, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants