Skip to content

[Research] Add fsi-fraud-detection code#4395

Open
holgerroth wants to merge 13 commits intoNVIDIA:mainfrom
holgerroth:fsi_update
Open

[Research] Add fsi-fraud-detection code#4395
holgerroth wants to merge 13 commits intoNVIDIA:mainfrom
holgerroth:fsi_update

Conversation

@holgerroth
Copy link
Copy Markdown
Collaborator

@holgerroth holgerroth commented Apr 2, 2026

Fixes # .

Description

Add the research/fsi-fraud-detection implementation code to go with the paper Privacy-Preserving Federated Fraud Detection in Payment Transactions with NVIDIA FLARE.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR adds the complete research/fsi-fraud-detection implementation — synthetic data generation, federated and central training clients, federated statistics, attribution analysis, tests, and documentation — accompanying the paper on privacy-preserving federated fraud detection with NVFlare. Successive review rounds resolved the prior P0/P1 issues (scaler fitting on wrong feature set, broken from utils import path, hardcoded file-count assertions, commented-out os.system installs, misleading SHAP status message, orphaned Matplotlib figure, and file-truncation under concurrent writes). Three minor P2 items remain: a max(n_batches, 1) guard in train/client.py that violates the project's explicit-error rule for empty DataLoaders, a break in stats/client.py that silently discards all but the first test dataset despite preparing and validating all of them, and a bare assert n_classes == 2 in train/utils.py that would surface as a cryptic AssertionError if a low-fraud-rate test set happened to contain only one class.

Confidence Score: 5/5

Safe to merge; all prior P0/P1 issues have been resolved and only minor P2 quality suggestions remain

All remaining findings are P2: a residual silent-division guard that violates a style rule, a break that intentionally returns only one test dataset, and a bare assert that could produce a cryptic error on an unusual but unlikely data condition. None block correctness of the federated training loop or data pipeline.

research/fsi-fraud-detection/train/client.py (max(n_batches,1) guard), research/fsi-fraud-detection/stats/client.py (break discards test datasets), research/fsi-fraud-detection/train/utils.py (assert n_classes==2)

Important Files Changed

Filename Overview
research/fsi-fraud-detection/train/client.py Federated-learning training client; prior critical bugs addressed; one residual max(n_batches, 1) guard remains that silently masks an empty-DataLoader case per the custom rule
research/fsi-fraud-detection/train/central_train.py Central training script; hardcoded-assertion and silent-empty-loader fixes applied; prepare_dataset(df_train) is called twice which is redundant but not incorrect
research/fsi-fraud-detection/stats/client.py NVFlare statistics client; prior scaler/import bugs fixed; loads and validates all test datasets but silently discards every test dataset after the first via a break
research/fsi-fraud-detection/train/utils.py Utilities for loss, attribution, MLflow logging, and metrics persistence; assert n_classes == 2 is fragile and would crash with a cryptic AssertionError on single-class test sets
research/fsi-fraud-detection/misc/data.py Data preparation pipeline; clean, well-structured, no new issues found
research/fsi-fraud-detection/main.py CLI entry-point for bulk synthetic data generation; deterministic seeding, checksums, and universal scaling dataset generation look correct

Sequence Diagram

sequenceDiagram
    participant S as NVFlare Server
    participant C as train/client.py
    participant D as misc/data.py
    participant M as train/model.py
    participant U as train/utils.py

    Note over C: Startup: load & prepare data
    C->>D: load_csv_data_from_path(train, test, scaling)
    D-->>C: raw DataFrames
    C->>D: prepare_dataset(df_scaling) to fit StandardScaler
    C->>D: prepare_dataset(df_train, scaler)
    C->>D: prepare_dataset(df_test, scaler) for each test file

    loop Federated Rounds
        S->>C: FLModel (global weights)
        C->>M: load_state_dict(global_params)
        Note over C,M: Local training loop
        C->>U: evaluate_on_test_datasets(model, test_tensors)
        U-->>C: accuracy / precision / recall / F1
        C->>U: compute_shapley_values (last round or every N)
        U-->>C: attribution_metrics
        C-->>S: FLModel(DIFF params + metrics)
    end
Loading

Reviews (10): Last reviewed commit: "Fix fraud detection metrics save race" | Re-trigger Greptile

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth requested a review from pcnudde April 2, 2026 21:07
chesterxgchen
chesterxgchen previously approved these changes Apr 2, 2026
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth requested a review from chesterxgchen April 3, 2026 00:13
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

chesterxgchen
chesterxgchen previously approved these changes Apr 3, 2026
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth requested a review from chesterxgchen April 3, 2026 21:06
Comment on lines +519 to +528
os.makedirs(os.path.dirname(file_path), exist_ok=True)

# Open file for writing with exclusive lock
with open(file_path, "wb") as f:
# Try to acquire exclusive lock (non-blocking)
fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)

# Save the data
np.save(f, data)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 File truncated before lock is acquired — data loss under concurrent writes

open(file_path, "wb") truncates the file immediately on entry, before the LOCK_NB lock is checked. If a second caller reaches this method while the first holds the lock, it truncates the file (clearing the first writer's data in progress), then fails the flock call and retries — truncating the file again on each retry. The entire purpose of the retry loop is therefore defeated: every failed lock attempt corrupts whatever the successful writer put there.

The standard fix is to write to a temporary file in the same directory, then atomically rename it into place once the write succeeds, which makes the lock unnecessary entirely:

import tempfile

def _safe_save_with_lock(self, data, file_path, **_):
    out_dir = os.path.dirname(file_path)
    if out_dir:
        os.makedirs(out_dir, exist_ok=True)
    try:
        fd, tmp_path = tempfile.mkstemp(dir=out_dir or ".")
        with os.fdopen(fd, "wb") as f:
            np.save(f, data)
        os.replace(tmp_path, file_path)   # atomic on POSIX
        return True
    except Exception as e:
        self.log_error(None, f"Failed to save {file_path}: {e}")
        return False

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ff567d5d1. _safe_save_with_lock() no longer opens metrics.npy in "wb" before coordination; it now writes to a temp file in the same directory and atomically swaps it into place with os.replace(...), so a competing writer cannot truncate the live file during retries. I also added a local mutex around the self.all_metrics update plus save so the shared in-memory snapshot is serialized consistently before each write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants