Skip to content

Conversation

@ZiyueXu77
Copy link
Collaborator

Fixes # .

Description

Convert KM example from JobAPI to Recipe, also add production instructions with provisioned HE context

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings December 15, 2025 18:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR converts the Kaplan-Meier homomorphic encryption example from the JobAPI to the Recipe API, enabling both simulation and production deployment modes. The key changes include:

  • Replaced km_job.py with a new Recipe-based job.py that supports both simulation and production environments
  • Added production deployment infrastructure including provisioning configuration (project.yml) and convenience scripts (start_all.sh)
  • Updated controller initialization to explicitly pass empty persistor_id parameter
  • Enhanced documentation with comprehensive instructions for both deployment modes

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
examples/advanced/kaplan-meier-he/job.py New Recipe-based job configuration replacing JobAPI implementation
examples/advanced/kaplan-meier-he/km_job.py Removed old JobAPI implementation
examples/advanced/kaplan-meier-he/server.py Updated controller initialization with explicit persistor_id
examples/advanced/kaplan-meier-he/server_he.py Updated controller initialization with explicit persistor_id
examples/advanced/kaplan-meier-he/project.yml New provisioning configuration for production deployment with CKKS HE scheme
examples/advanced/kaplan-meier-he/start_all.sh Convenience script for starting all production components locally
examples/advanced/kaplan-meier-he/README.md Comprehensive documentation update covering both simulation and production modes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 15, 2025

Greptile Summary

This PR successfully migrates the Kaplan-Meier homomorphic encryption example from the deprecated JobAPI pattern to the modern Recipe API. The changes enable both simulation and production deployment modes.

Key Changes:

  • Replaced km_job.py with job.py that implements KMRecipe class extending the Recipe API
  • Moved client/server scripts from src/ directory to root level (client.py, client_he.py, server.py, server_he.py)
  • Added project.yml with HEBuilder configuration for automated HE context provisioning in production mode
  • Added start_all.sh convenience script for local production testing
  • Enhanced HE context handling to support both base64-encoded files (simulation) and raw binary .tenseal files (production)
  • Updated file path resolution in client scripts to save outputs to correct job directories
  • Extensively updated README with comprehensive production deployment instructions

Technical Improvements:

  • Production mode now uses NVFlare's SecurityContentService for automatic HE context resolution from startup kits
  • Simulation mode continues to use manually prepared HE context files via prepare_he_context.py
  • Both modes now consistently use CKKS encryption scheme (previously documentation mentioned BFV)
  • Fixed histogram initialization logic in client.py to properly calculate max_hist_idx
  • Added better logging to distinguish cleartext vs ciphertext operations

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it's a well-structured refactoring that maintains backward compatibility while adding production capabilities
  • The changes are systematic and well-documented. The code properly handles both simulation and production modes with appropriate path resolution and HE context format detection. All logic changes are improvements (histogram initialization fix, better logging) with no breaking changes to the core algorithm.
  • No files require special attention - all changes are clean refactoring and enhancements

Important Files Changed

Filename Overview
examples/advanced/kaplan-meier-he/job.py Converted from JobAPI to Recipe pattern with support for both simulation and production modes via KMRecipe class
examples/advanced/kaplan-meier-he/client_he.py Renamed from src/kaplan_meier_train_he.py, added support for both base64 and raw binary HE context formats, improved logging
examples/advanced/kaplan-meier-he/server_he.py Renamed from src/kaplan_meier_wf_he.py, added support for both base64 and raw binary HE context formats for production mode
examples/advanced/kaplan-meier-he/project.yml New provisioning configuration with HEBuilder using CKKS scheme for production deployment
examples/advanced/kaplan-meier-he/README.md Extensively updated with production mode instructions, HE context management details, and comprehensive deployment guide

Sequence Diagram

sequenceDiagram
    participant Admin as Admin Console
    participant Server as FL Server<br/>(KM_HE Controller)
    participant C1 as Client 1
    participant C2 as Client 2
    participant CN as Client N

    Note over Admin,CN: Production Mode: Provisioned HE context via startup kits
    Note over Admin,CN: Simulation Mode: Manual HE context preparation

    Admin->>Server: Submit KM_HE Job
    
    Note over Server,CN: Round 1: Collect Maximum Histogram Index
    Server->>C1: Empty start message
    Server->>C2: Empty start message
    Server->>CN: Empty start message
    
    C1->>C1: Generate local histogram<br/>from event data
    C2->>C2: Generate local histogram<br/>from event data
    CN->>CN: Generate local histogram<br/>from event data
    
    C1->>Server: max_idx (cleartext)
    C2->>Server: max_idx (cleartext)
    CN->>Server: max_idx (cleartext)
    
    Server->>Server: Aggregate: max_idx_global = max(all indices) + 1
    
    Note over Server,CN: Round 2: Collect Encrypted Histograms
    Server->>C1: max_idx_global (cleartext)
    Server->>C2: max_idx_global (cleartext)
    Server->>CN: max_idx_global (cleartext)
    
    C1->>C1: Normalize histogram to<br/>global length
    C1->>C1: Encrypt with CKKS:<br/>ts.ckks_vector(context, hist)
    C2->>C2: Normalize histogram to<br/>global length
    C2->>C2: Encrypt with CKKS:<br/>ts.ckks_vector(context, hist)
    CN->>CN: Normalize histogram to<br/>global length
    CN->>CN: Encrypt with CKKS:<br/>ts.ckks_vector(context, hist)
    
    C1->>Server: hist_obs, hist_cen (ciphertext)
    C2->>Server: hist_obs, hist_cen (ciphertext)
    CN->>Server: hist_obs, hist_cen (ciphertext)
    
    Server->>Server: Aggregate encrypted vectors:<br/>global = c1 + c2 + ... + cn
    
    Note over Server,CN: Round 3: Distribute Global Encrypted Histograms
    Server->>C1: hist_obs_global, hist_cen_global (ciphertext)
    Server->>C2: hist_obs_global, hist_cen_global (ciphertext)
    Server->>CN: hist_obs_global, hist_cen_global (ciphertext)
    
    C1->>C1: Decrypt: ts.ckks_vector_from(context, data).decrypt()
    C1->>C1: Round floats to integers
    C1->>C1: Unfold to event list
    C1->>C1: Perform KM analysis
    C1->>C1: Save km_curve_fl_he.png & km_global.json
    
    C2->>C2: Decrypt & perform KM analysis
    CN->>CN: Decrypt & perform KM analysis
    
    C1->>Server: Empty response
    C2->>Server: Empty response
    CN->>Server: Empty response
    
    Server->>Admin: Job Complete
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/advanced/kaplan-meier-he/start_all.sh, line 71 (link)

    style: stop_all.sh script doesn't exist

    Update to mention using the admin console instead:

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. examples/advanced/kaplan-meier-he/client_he.py, line 140-141 (link)

    logic: HE scheme mismatch - code uses BFV (ts.bfv_vector) but project.yml provisions CKKS scheme for production. Production mode will fail when it tries to use BFV functions with CKKS context. Either change project.yml scheme to BFV or update client/server code to use scheme-agnostic operations.

  2. examples/advanced/kaplan-meier-he/server_he.py, line 95-97 (link)

    logic: HE scheme mismatch - code uses BFV (ts.bfv_vector_from) but project.yml provisions CKKS scheme for production. Production mode will fail at deserialization. Either change project.yml scheme to BFV or update code to be scheme-agnostic.

  3. examples/advanced/kaplan-meier-he/project.yml, line 50 (link)

    logic: CKKS scheme incompatible with code - client_he.py and server_he.py use BFV-specific functions (ts.bfv_vector, ts.bfv_vector_from) for integer histogram operations. Change to BFV scheme or update code.

  4. examples/advanced/kaplan-meier-he/job.py, line 143 (link)

    logic: string replacement will fail if --workspace_dir doesn't contain "/km/". use pathlib or explicit path construction instead.

9 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

greptile-apps[bot]

This comment was marked as resolved.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@chesterxgchen
Copy link
Collaborator

/build

greptile-apps[bot]

This comment was marked as resolved.

@ZiyueXu77
Copy link
Collaborator Author

/build

Copy link
Collaborator

@chesterxgchen chesterxgchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we need to discuss how to avoid to use provision (heavy handed) in order to use HE

@ZiyueXu77
Copy link
Collaborator Author

/build

@ZiyueXu77 ZiyueXu77 enabled auto-merge (squash) January 7, 2026 14:41
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR successfully converts the Kaplan-Meier HE example from JobAPI to the Recipe pattern, enabling both simulation and production deployment modes with homomorphic encryption support.

Major Changes:

  • Converted km_job.py to new job.py with Recipe pattern (KMRecipe class)
  • Renamed client/server files (removed src/ directory structure)
  • Added project.yml for production provisioning with HEBuilder (CKKS scheme)
  • Added start_all.sh convenience script for local production testing
  • Enhanced HE context handling to support both simulation (base64-encoded .txt) and production (raw binary .tenseal) formats
  • Updated prepare_he_context.py to generate relinearization keys for CKKS compatibility
  • Comprehensive README update with detailed instructions for both modes

Issues Found:

  • job.py has type hint mismatch: parameters accept None but are typed as str
  • Fragile path manipulation using string replacement instead of os.path
  • Missing os import for proposed path fix

Confidence Score: 3/5

  • This PR is mostly safe to merge but requires fixing critical type hint and path handling issues
  • Score reflects solid architecture changes and comprehensive documentation, but marked down due to: (1) type hint mismatch that could cause runtime issues when None is passed for HE context paths without encryption, (2) fragile string-based path manipulation that will fail with custom paths, (3) missing import. These are fixable issues that don't break core functionality but should be addressed before merge.
  • Primary attention needed for job.py - fix type hints, path handling, and add missing import. All other files are well-implemented.

Important Files Changed

File Analysis

Filename Score Overview
examples/advanced/kaplan-meier-he/job.py 3/5 Converted JobAPI to Recipe pattern, added production support. Issues: missing import, type hint mismatch, fragile path manipulation
examples/advanced/kaplan-meier-he/client_he.py 5/5 Renamed from kaplan_meier_train_he.py, added support for both simulation and production HE context formats
examples/advanced/kaplan-meier-he/server_he.py 5/5 Renamed from kaplan_meier_wf_he.py, added support for both simulation and production HE context formats
examples/advanced/kaplan-meier-he/project.yml 5/5 New project configuration for provisioning with HEBuilder using CKKS scheme for production deployment
examples/advanced/kaplan-meier-he/README.md 5/5 Comprehensive documentation update covering Recipe API, simulation/production modes, and HE context provisioning

Sequence Diagram

sequenceDiagram
    participant User
    participant job.py
    participant Recipe
    participant FedJob
    participant SimEnv/ProdEnv
    participant Server
    participant Client

    User->>job.py: Run with --encryption flag
    job.py->>job.py: Parse args & determine mode
    
    alt Simulation Mode
        job.py->>job.py: Load HE context from /tmp/nvflare/he_context/*.txt
    else Production Mode
        job.py->>job.py: Use provisioned context filenames
    end
    
    job.py->>Recipe: Create KMRecipe(encryption, num_clients, paths)
    Recipe->>FedJob: Create FedJob(KM or KM_HE)
    
    alt With HE
        Recipe->>Server: Add KM_HE controller
        Recipe->>Client: Add ScriptRunner(client_he.py)
    else Without HE
        Recipe->>Server: Add KM controller
        Recipe->>Client: Add ScriptRunner(client.py)
    end
    
    job.py->>FedJob: Export job
    
    alt Simulation Mode
        job.py->>SimEnv: Create SimEnv(num_clients, workspace)
        job.py->>Recipe: Execute with SimEnv
    else Production Mode
        job.py->>ProdEnv: Create ProdEnv(startup_kit)
        job.py->>Recipe: Execute with ProdEnv
    end
    
    Recipe->>Server: Start workflow
    
    alt With HE (3 rounds)
        Server->>Client: Round 1: Request max index
        Client->>Server: Send max_idx (plaintext)
        Server->>Client: Round 2: Distribute global max_idx
        Client->>Server: Send encrypted histograms
        Server->>Client: Round 3: Send encrypted global histograms
        Client->>Client: Decrypt & perform KM analysis
    else Without HE (2 rounds)
        Server->>Client: Round 1: Request histograms
        Client->>Server: Send local histograms (plaintext)
        Server->>Client: Round 2: Send aggregated histograms
        Client->>Client: Perform KM analysis
    end
    
    Client->>Client: Save km_curve and km_global.json
    Recipe-->>User: Return job status
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Additional Comments (3)

examples/advanced/kaplan-meier-he/job.py
Type hints should allow None since line 157 passes None when encryption is disabled

        he_context_path_client: str | None = "/tmp/nvflare/he_context/he_context_client.txt",
        he_context_path_server: str | None = "/tmp/nvflare/he_context/he_context_server.txt",

examples/advanced/kaplan-meier-he/job.py
This string replacement will fail if user provides custom --he_context_path not containing "he_context_client.txt". Use os.path for safer path manipulation

        he_context_path_server = os.path.join(os.path.dirname(he_context_path_client), "he_context_server.txt")

examples/advanced/kaplan-meier-he/job.py
Missing os import needed for line 154 path manipulation fix

@ZiyueXu77 ZiyueXu77 merged commit 224d658 into NVIDIA:main Jan 7, 2026
19 of 20 checks passed
@ZiyueXu77 ZiyueXu77 deleted the km_rcp branch January 7, 2026 22:06
@ZiyueXu77 ZiyueXu77 restored the km_rcp branch January 12, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants