[Research] Add fsi-fraud-detection code#4395
[Research] Add fsi-fraud-detection code#4395holgerroth wants to merge 13 commits intoNVIDIA:mainfrom
Conversation
Greptile SummaryThis PR adds the complete Confidence Score: 5/5Safe to merge; all prior P0/P1 issues have been resolved and only minor P2 quality suggestions remain All remaining findings are P2: a residual silent-division guard that violates a style rule, a break that intentionally returns only one test dataset, and a bare assert that could produce a cryptic error on an unusual but unlikely data condition. None block correctness of the federated training loop or data pipeline. research/fsi-fraud-detection/train/client.py (max(n_batches,1) guard), research/fsi-fraud-detection/stats/client.py (break discards test datasets), research/fsi-fraud-detection/train/utils.py (assert n_classes==2) Important Files Changed
Sequence DiagramsequenceDiagram
participant S as NVFlare Server
participant C as train/client.py
participant D as misc/data.py
participant M as train/model.py
participant U as train/utils.py
Note over C: Startup: load & prepare data
C->>D: load_csv_data_from_path(train, test, scaling)
D-->>C: raw DataFrames
C->>D: prepare_dataset(df_scaling) to fit StandardScaler
C->>D: prepare_dataset(df_train, scaler)
C->>D: prepare_dataset(df_test, scaler) for each test file
loop Federated Rounds
S->>C: FLModel (global weights)
C->>M: load_state_dict(global_params)
Note over C,M: Local training loop
C->>U: evaluate_on_test_datasets(model, test_tensors)
U-->>C: accuracy / precision / recall / F1
C->>U: compute_shapley_values (last round or every N)
U-->>C: attribution_metrics
C-->>S: FLModel(DIFF params + metrics)
end
Reviews (10): Last reviewed commit: "Fix fraud detection metrics save race" | Re-trigger Greptile |
|
/build |
|
/build |
|
/build |
|
/build |
|
/build |
| os.makedirs(os.path.dirname(file_path), exist_ok=True) | ||
|
|
||
| # Open file for writing with exclusive lock | ||
| with open(file_path, "wb") as f: | ||
| # Try to acquire exclusive lock (non-blocking) | ||
| fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB) | ||
|
|
||
| # Save the data | ||
| np.save(f, data) | ||
|
|
There was a problem hiding this comment.
File truncated before lock is acquired — data loss under concurrent writes
open(file_path, "wb") truncates the file immediately on entry, before the LOCK_NB lock is checked. If a second caller reaches this method while the first holds the lock, it truncates the file (clearing the first writer's data in progress), then fails the flock call and retries — truncating the file again on each retry. The entire purpose of the retry loop is therefore defeated: every failed lock attempt corrupts whatever the successful writer put there.
The standard fix is to write to a temporary file in the same directory, then atomically rename it into place once the write succeeds, which makes the lock unnecessary entirely:
import tempfile
def _safe_save_with_lock(self, data, file_path, **_):
out_dir = os.path.dirname(file_path)
if out_dir:
os.makedirs(out_dir, exist_ok=True)
try:
fd, tmp_path = tempfile.mkstemp(dir=out_dir or ".")
with os.fdopen(fd, "wb") as f:
np.save(f, data)
os.replace(tmp_path, file_path) # atomic on POSIX
return True
except Exception as e:
self.log_error(None, f"Failed to save {file_path}: {e}")
return FalseThere was a problem hiding this comment.
Fixed in ff567d5d1. _safe_save_with_lock() no longer opens metrics.npy in "wb" before coordination; it now writes to a temp file in the same directory and atomically swaps it into place with os.replace(...), so a competing writer cannot truncate the live file during retries. I also added a local mutex around the self.all_metrics update plus save so the shared in-memory snapshot is serialized consistently before each write.
Fixes # .
Description
Add the research/fsi-fraud-detection implementation code to go with the paper Privacy-Preserving Federated Fraud Detection in Payment Transactions with NVIDIA FLARE.
Types of changes
./runtest.sh.