Skip to content

Cherry-pick #4168 to main: Fix hello-numpy-cross-val example#4194

Open
nvkevlu wants to merge 3 commits intoNVIDIA:mainfrom
nvkevlu:cherry_pick_4011_4168_4173_to_main
Open

Cherry-pick #4168 to main: Fix hello-numpy-cross-val example#4194
nvkevlu wants to merge 3 commits intoNVIDIA:mainfrom
nvkevlu:cherry_pick_4011_4168_4173_to_main

Conversation

@nvkevlu
Copy link
Collaborator

@nvkevlu nvkevlu commented Feb 13, 2026

Cherry-pick #4168 to main: Fix hello-numpy-cross-val example.

Description

Cherry-pick #4168 to main: Fix hello-numpy-cross-val example.
(#4011 and #4173 already on main; only 4168 changes applied.)

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings February 13, 2026 18:34
@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Feb 13, 2026

/build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Cherry-picked PR #4168 to fix the hello-numpy-cross-val example by refactoring the client to properly support cross-site evaluation (CSE) workflow with distinct task handling.

Key Changes:

  • Refactored client.py to handle three task types separately: train, evaluate, and submit_model with explicit branching
  • Added fail-fast error handling for train tasks with missing params to prevent empty responses that would break server aggregation (aligns with custom rule 783565ac-d530-4d49-a8bc-55877cb0a0cd)
  • Introduced last_params tracking to store trained weights for submission during CSE submit_model phase
  • Fixed integration test config key from class_path to path for dict-based model configuration
  • Added comprehensive unit tests verifying helper functions and fail-fast behavior for invalid scenarios

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Cherry-picked fix with clear improvements to error handling, comprehensive test coverage, and proper task separation. Changes follow fail-fast principles per custom instructions, prevent silent failures that could break aggregation, and include both unit and integration tests
  • No files require special attention

Important Files Changed

Filename Overview
examples/hello-world/hello-numpy-cross-val/client.py Refactored client to handle multiple task types (train, evaluate, submit_model) with explicit fail-fast error handling for missing parameters, tracking last_params for CSE workflow
tests/integration_test/recipe_system_test.py Fixed model config key from class_path to path for consistency with dict-based config format
tests/unit_test/recipe/hello_numpy_cross_val_client_test.py New comprehensive unit tests for hello-numpy-cross-val client covering helper functions and fail-fast behavior for invalid train/submit_model scenarios

Flowchart

flowchart TD
    Start([Client receives task]) --> CheckType{Task Type?}
    
    CheckType -->|is_train| ValidateParams{Params valid?}
    ValidateParams -->|No params or missing NUMPY_KEY| RaiseError1[Raise RuntimeError: Train task needs params]
    ValidateParams -->|Valid| Train[Train model: params + 1]
    Train --> SaveParams[Store in last_params]
    SaveParams --> Evaluate1[Evaluate model]
    Evaluate1 --> CheckUpdateType{update_type?}
    CheckUpdateType -->|diff| SendDiff[Send params diff]
    CheckUpdateType -->|full| SendFull[Send full params]
    SendDiff --> End([Continue loop])
    SendFull --> End
    
    CheckType -->|is_evaluate| ValidateEvalParams{Params valid?}
    ValidateEvalParams -->|No| SendEmpty1[Send empty metrics]
    ValidateEvalParams -->|Yes| Evaluate2[Evaluate model]
    Evaluate2 --> SendMetrics[Send metrics only]
    SendMetrics --> End
    SendEmpty1 --> End
    
    CheckType -->|is_submit_model| CheckLastParams{last_params set?}
    CheckLastParams -->|No| RaiseError2[Raise RuntimeError: No local model]
    CheckLastParams -->|Yes| SubmitModel[Submit last_params as FULL]
    SubmitModel --> End
    
    CheckType -->|Other| SendEmpty2[Send empty metrics]
    SendEmpty2 --> End
    
    RaiseError1 --> Stop([Fail fast])
    RaiseError2 --> Stop
Loading

Last reviewed commit: 72120c5

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks PR #4168 to the main branch, fixing the hello-numpy-cross-val example to properly support cross-site evaluation (CSE) by implementing branching logic for different task types.

Changes:

  • Refactored hello-numpy-cross-val client to use branching pattern with is_train(), is_evaluate(), and is_submit_model() instead of assuming all tasks are training tasks
  • Added fail-fast validation that raises RuntimeError when train tasks receive no model params or when submit_model is called without trained params
  • Added comprehensive unit tests for the client's helper functions and fail-fast behavior
  • Fixed integration test that incorrectly used "class_path" instead of "path" for model configuration dict

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/unit_test/recipe/hello_numpy_cross_val_client_test.py New unit test file that validates client helper functions (train/evaluate) and tests fail-fast behavior when params are missing
tests/integration_test/recipe_system_test.py Fixed model config dict to use "path" instead of "class_path" (correct for job/config layer)
examples/hello-world/hello-numpy-cross-val/client.py Refactored to use branching pattern for train/evaluate/submit_model tasks with proper error handling and last_params tracking for CSE

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Feb 13, 2026

/build

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Feb 13, 2026

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant