Ft/ensemble experiments test script #96

fatemetkl · 2025-11-18T17:18:21Z

PR Type

[Feature | Fix ]

Short Description

Clickup Ticket(s): Link

Added a script in the Ensemble attack example to facilitate testing on target models in the experiment setup. This PR also removes the dependency of the meta classifier pipeline on the whole target's TrainingResult object, as only its synthetic_data is actually required.

Tests Added

Changes are made to the existing tests.

… uses target synthetic data.

…rent experimental setups

fatemetkl · 2025-11-18T17:20:20Z

examples/ensemble_attack/configs/experiment_config.yaml

+  run_shadow_model_training: true # Set this to false if shadow models are already trained and saved
+  run_metaclassifier_training: true
+
+target_model: # This is only used for testing the attack on a real target model.


We are attacking the target model tabddpm_21 with the trained metaclassifier.

fatemetkl · 2025-11-18T17:20:55Z

examples/ensemble_attack/configs/experiment_config.yaml

+  # The column name in the data to be used for stratified splitting.
+  column_to_stratify: "trans_type"  # Attention: This value is not documented in the original codebase.
+  folder_ranges: #Specify folder ranges for any of the mentioned splits.
+    train: [[1, 20]] # Folders to be used for train data collection in the experiments


Model IDs used for training the metaclassifier (attack model).

coderabbitai · 2025-11-18T17:26:28Z

📝 Walkthrough

Walkthrough

This pull request refactors the ensemble attack module's data flow and configuration architecture. The changes migrate from pickle-based persistence and dictionary-structured data to CSV/DataFrame-based persistence with explicit configuration paths. Key modifications include: (1) introducing a centralized experiment_config.yaml replacing scattered configuration logic; (2) updating the BlendingPlusPlus class to accept a data types file path instead of target data directly; (3) refactoring RMIA signal calculation to operate on lists of DataFrames rather than nested dictionaries; (4) expanding training hyperparameters significantly (e.g., diffusion iterations: 3→200000, batch sizes: 1→4096); (5) adding new scripts for testing and SLURM job submission; and (6) simplifying shadow model training to focus on synthetic data output. The .gitignore is broadened to cover new file patterns. Parameter names are harmonized across modules (e.g., target_data_path→target_model_synthetic_path).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–75 minutes

Areas requiring extra attention:

BlendingPlusPlus API refactoring (src/midst_toolkit/attacks/ensemble/blending.py and all call sites): Verify that the data types file path loading and DataFrame handling are correctly wired through the attack pipeline; validate that all callers in run_metaclassifier_training.py and test_attack_model.py pass the correct path type.
RMIA signal calculation refactoring (src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py): Confirm that the transition from dict[str, list[Any]] to list[pd.DataFrame] preserves the correct synthetic data extraction and aggregation, especially in shadow vs. target model contexts.
Configuration consistency (examples/ensemble_attack/configs/experiment_config.yaml and original_attack_config.yaml): Ensure all new configuration keys (e.g., population_splits, challenge_splits, meta_classifier_model_name, target_synthetic_data_path) are used consistently across run_attack.py, run_metaclassifier_training.py, and test_attack_model.py.
Hyperparameter expansion (examples/ensemble_attack/data_configs/trans.json): Review the substantial increases in model capacity (layer expansions, batch sizes, iteration counts) and verify they do not introduce memory or convergence issues in the training pipeline.
Test fixture updates (tests/unit/attacks/ensemble/test_meta_classifier.py and test_rmia.py): Validate that mock configurations and test data structures align with the new DataFrame-based API and that synthetic data extraction logic matches production code.
Data persistence changes: Trace the flow from pickle-based storage (removed) to CSV-based storage (new) to ensure no data loss or format inconsistencies, particularly in run_metaclassifier_training.py and run_shadow_model_training.py.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description is mostly complete with clear objectives and scope, but the PR Type field is unspecified/empty.	Specify the PR Type by selecting either [Feature] or [Fix]. Based on the changes, [Feature] appears most appropriate given the new test script added.
Title check	❓ Inconclusive	The title is vague and uses a non-descriptive abbreviation 'Ft' that doesn't clearly convey the main change about adding a test script or refactoring the meta classifier pipeline.	Revise the title to be more specific, such as 'Add test script for ensemble attack meta-classifier evaluation' or 'Refactor ensemble attack to use target synthetic data only'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ft/ensemble_experiments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (19)

examples/ensemble_attack/data_configs/trans.json (1)

23-23: Review the regularization impact of setting dropout to 0.0.

Dropout was reduced from 0.1 to 0.0, completely disabling this regularization mechanism. Combined with the massively expanded network architecture (line 15-21), this could lead to overfitting on small datasets or reduced generalization.

Verify whether:

This change was intentional as part of the experimental setup

The dataset size or early stopping strategies compensate for the lack of regularization
.gitignore (1)
56-58: Consider scoping the training log patterns more narrowly.

The wildcard patterns *.err and *.out will ignore all files with these extensions across the entire repository. If these are SLURM job logs specific to the ensemble attack examples, consider scoping them more narrowly.

Apply this diff to scope the patterns to the examples directory:
 # Training Logs
-*.err
-*.out
+examples/**/*.err
+examples/**/*.out
examples/ensemble_attack/configs/original_attack_config.yaml (1)

76-80: Document the dramatic increase in computational requirements.

The hyperparameters have been increased by orders of magnitude (e.g., diffusion iterations: 2→200000, synthetic samples: 200→20000, Optuna trials: 10→100). While these production values likely improve attack quality, they will dramatically increase runtime and resource consumption.

Consider:

Adding comments in the config file indicating estimated runtime/resource requirements

Documenting these changes in the PR description or a README

Providing a separate "quick test" configuration with smaller values for development/testing
examples/ensemble_attack/run_test.sh (1)
17-17: Add error handling for virtual environment activation.

The script assumes .venv exists at the repository root without checking. If the virtual environment doesn't exist or activation fails, the script will continue and likely fail with unclear errors.

Apply this diff to add error handling:
 # This script sets up the environment and runs the ensemble attack example.
-source .venv/bin/activate
+if [ ! -f .venv/bin/activate ]; then
+    echo "Error: Virtual environment not found at .venv/"
+    echo "Please create it with: python -m venv .venv"
+    exit 1
+fi
+
+source .venv/bin/activate || { echo "Failed to activate virtual environment"; exit 1; }
examples/ensemble_attack/run_train.sh (2)
17-17: Add error handling for virtual environment activation.

The script assumes .venv exists at the repository root without checking. Consider adding the same error handling recommended for run_test.sh.

Apply this diff:
 # This script sets up the environment and runs the ensemble attack example.
-source .venv/bin/activate
+if [ ! -f .venv/bin/activate ]; then
+    echo "Error: Virtual environment not found at .venv/"
+    echo "Please create it with: python -m venv .venv"
+    exit 1
+fi
+
+source .venv/bin/activate || { echo "Failed to activate virtual environment"; exit 1; }
1-26: Consider consolidating the two SLURM scripts.

run_train.sh and run_test.sh are nearly identical, differing only in job name, time limit, and the Python module executed. This duplication could be reduced by using a parameterized script or a wrapper.

Example approach:
#!/bin/bash
# run_ensemble.sh <train|test>
MODE=${1:-train}
if [ "$MODE" = "train" ]; then
    TIME=12:00:00
    MODULE=examples.ensemble_attack.run_attack
elif [ "$MODE" = "test" ]; then
    TIME=5:00:00
    MODULE=examples.ensemble_attack.test_attack_model
else
    echo "Usage: $0 <train|test>"
    exit 1
fi

#SBATCH --time=$TIME
#SBATCH --job-name=ensemble_attack_$MODE
# ... rest of config
python -m $MODULE
examples/ensemble_attack/run_shadow_model_training.py (1)
26-27: Return a Path object instead of a raw config string for consistency

The new flow of saving only train_result.synthetic_data and returning its path looks good and matches the metaclassifier’s needs. However, config.shadow_training.target_synthetic_data_path is almost certainly a string from Hydra, while the function is annotated as returning Path and other call sites treat it as such.

To keep types consistent across the pipeline (including run_attack.main and run_metaclassifier_training), consider converting to Path here:
-    # Save the target model's synthetic data
-    target_model_synthetic_path = config.shadow_training.target_synthetic_data_path
-    target_synthetic_data.to_csv(target_model_synthetic_path, index=False)
-
-    return target_model_synthetic_path
+    # Save the target model's synthetic data
+    target_model_synthetic_path = Path(config.shadow_training.target_synthetic_data_path)
+    target_synthetic_data.to_csv(target_model_synthetic_path, index=False)
+
+    return target_model_synthetic_path
Also applies to: 70-79
examples/ensemble_attack/run_attack.py (1)
69-80: Refresh assertion messages to reflect new names and ensure path type consistency

The control flow around target_model_synthetic_path is correct, but there are two small clarity issues:

The assertion message for the shadows list still references the old name:

Line 77: "The attack_data_paths list must contain exactly three elements."
but the variable is shadow_data_paths.

The assertion error message for the target path still references target_data_path rather than the new synthetic path naming.

Consider updating the messages:
-        assert len(shadow_data_paths) == 3, "The attack_data_paths list must contain exactly three elements."
-        assert target_model_synthetic_path is not None, (
-            "The target_data_path must be provided for metaclassifier training."
-        )
+        assert len(shadow_data_paths) == 3, "The shadow_data_paths list must contain exactly three elements."
+        assert target_model_synthetic_path is not None, (
+            "The target_model_synthetic_path must be provided for metaclassifier training."
+        )
Once run_target_model_training is updated to return a Path (as suggested in its file), target_model_synthetic_path will be a Path in both branches, which keeps the type consistent going into run_metaclassifier_training.

Also applies to: 83-84
tests/unit/attacks/ensemble/test_rmia.py (1)
96-107: Fixture update correctly matches new calculate_rmia_signals API

Including target_synthetic_data directly in rmia_signal_data keeps the fixture aligned with the refactored calculate_rmia_signals(**rmia_signal_data) signature and isolates the tests from any previous TrainingResult dict structure.

If you want to simplify slightly, this line:
target_synthetic_data = MockTrainingResult(syn_data_5.copy()).synthetic_data
could just be:
target_synthetic_data = syn_data_5.copy()
since the namedtuple adds no extra behavior here.
examples/ensemble_attack/configs/experiment_config.yaml (1)

1-107: Config structure matches the example pipeline; clarify a couple of comments and environment assumptions

This new experiment_config.yaml lines up well with the example code:

pipeline.* flags drive which stages run in run_attack.main.

data_paths.* and data_processing_config.* cover everything run_data_processing and process_split_data need, including the new population_splits / challenge_splits.

shadow_training.* defines all paths and knobs used by run_shadow_model_training and run_target_model_training, including target_synthetic_data_path.

metaclassifier.* (especially data_types_file_path, model_type, and meta_classifier_model_name) matches how run_metaclassifier_training and the BlendingPlusPlus tests consume config.

Two small clarity improvements you might consider:

final_shadow_models_path comment (Lines 74-81)
The entries already interpolate ${shadow_training.shadow_models_output_path}, so they are effectively full paths, not paths “relative to shadow_models_output_path” as the comment suggests. Rephrasing the comment (or removing “relative”) would avoid confusion.

Cluster‑specific absolute paths (Lines 13, 27)
The /projects/midst-experiments/... paths are clearly tailored to your cluster. It may help future users to add a brief note in the header indicating that these should be customized to their environment before running the example.
examples/ensemble_attack/run_metaclassifier_training.py (4)
67-76: Tighten validation of loaded target synthetic data

The existence check on target_model_synthetic_path is good, but:

pd.read_csv will never return None, so the assert target_synthetic is not None is effectively redundant.

If you want a stronger guard, consider validating that required columns are present and/or the DataFrame is non-empty instead.

For example:
-    target_synthetic = pd.read_csv(target_model_synthetic_path)
-
-    assert target_synthetic is not None, "Target model's synthetic data is missing."
+    target_synthetic = pd.read_csv(target_model_synthetic_path)
+    required_cols = df_meta_train.drop(columns=["trans_id", "account_id"]).columns
+    missing = set(required_cols) - set(target_synthetic.columns)
+    assert not missing, f"Target synthetic data missing expected columns: {missing}"
90-91: Be explicit about account_id expectation or drop it defensively

You assert the presence of "trans_id" but not "account_id", yet you unconditionally drop both:
df_meta_train = df_meta_train.drop(columns=["trans_id", "account_id"])
df_meta_test = df_meta_test.drop(columns=["trans_id", "account_id"])
If some datasets omit account_id, this will raise a KeyError. Either assert its presence as well, or drop it with errors="ignore".
-    df_meta_train = df_meta_train.drop(columns=["trans_id", "account_id"])
-    df_meta_test = df_meta_test.drop(columns=["trans_id", "account_id"])
+    df_meta_train = df_meta_train.drop(columns=["trans_id", "account_id"], errors="ignore")
+    df_meta_test = df_meta_test.drop(columns=["trans_id", "account_id"], errors="ignore")
118-122: Add a sanity check before pickling the trained meta-classifier

It’s possible (e.g., if tuning fails) that blending_attacker.trained_model ends up None, in which case you’d silently pickle None and only fail much later in testing. A small guard here would fail fast:
-    with open(model_path, "wb") as f:
-        pickle.dump(blending_attacker.trained_model, f)
+    assert blending_attacker.trained_model is not None, "Meta-classifier training did not produce a model."
+    with open(model_path, "wb") as f:
+        pickle.dump(blending_attacker.trained_model, f)
126-131: Clarify evaluation comment to reflect new dedicated testing script

The inline comment:
df_original_synthetic=target_synthetic,  # For evaluation only, replace with actual target model during testing.
is now slightly misleading, since the actual target-model testing is handled by examples/ensemble_attack/test_attack_model.py. Consider rephrasing to make it clear this call is only for training-time evaluation and that the separate testing script handles real targets.
examples/ensemble_attack/test_attack_model.py (3)
54-68: Harden challenge/synthetic data loading and column handling

A few small robustness points here:

As with the training script, it can be helpful to assert that the CSV files exist and have expected columns before reading/using them, to fail fast with clearer messages.

You assert "trans_id" but not "account_id", yet you drop both. If some challenge datasets omit account_id, drop will raise a KeyError.

Suggested tweak:
-    df_test = pd.read_csv(challenge_data_path)
-    y_test = pd.read_csv(challenge_label_path).to_numpy().squeeze()
+    assert challenge_data_path.exists(), f"Challenge data not found at {challenge_data_path}"
+    assert challenge_label_path.exists(), f"Challenge labels not found at {challenge_label_path}"
+    df_test = pd.read_csv(challenge_data_path)
+    y_test = pd.read_csv(challenge_label_path).to_numpy().squeeze()

@@
-    df_test = df_test.drop(columns=["trans_id", "account_id"])
+    df_test = df_test.drop(columns=["trans_id", "account_id"], errors="ignore")
You might also want to assert that y_test.ndim == 1 after squeeze() if downstream code assumes a 1D array.

61-63: Mirror existence checks for target synthetic data

Unlike in run_metaclassifier_training, there’s no existence check for target_synthetic_path here. For symmetry and clearer errors:
-    target_synthetic_path = Path(config.target_model.target_synthetic_data_path)
-    target_synthetic = pd.read_csv(target_synthetic_path)
+    target_synthetic_path = Path(config.target_model.target_synthetic_data_path)
+    assert target_synthetic_path.exists(), f"Target synthetic data not found at {target_synthetic_path}"
+    target_synthetic = pd.read_csv(target_synthetic_path)
120-127: Output path and naming are fine; consider differentiating test vs val

Saving to config.target_model.attack_probabilities_result_path with the filename pattern *_val_pred_proba.npy works, but the _val_ in the name may be slightly confusing for test-time attack probabilities. If you expect both validation and test artifacts to coexist often, consider renaming this to _test_pred_proba.npy to prevent ambiguity.
src/midst_toolkit/attacks/ensemble/blending.py (2)
31-60: Config-driven data_types_file_path is a good decoupling; consider better error messaging

Switching the constructor to take data_types_file_path: Path and loading self.column_types from JSON cleanly separates schema information from runtime data, which fits the rest of the refactor.

You may want to add a small guard to provide clearer diagnostics if the file is missing or malformed, e.g.:
-        with open(data_types_file_path, "r") as f:
-            self.column_types = json.load(f)
+        try:
+            with open(data_types_file_path, "r") as f:
+                self.column_types = json.load(f)
+        except FileNotFoundError as e:
+            raise FileNotFoundError(f"Data types file not found at {data_types_file_path}") from e
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON in data types file at {data_types_file_path}") from e
202-209: Prediction precondition assertion is useful; minor type/typo nits

The new assertion in predict:
assert self.trained_model is not None, (
    "You must call .fit() before .predict() or provide a trained_model, "
    "or assign the trained model to the BlengingPlusPlus object."
)
nicely documents how the class can be used both with internal training and with an externally loaded model (as in the new testing script).

Two small polish points, if you touch this area again:

There’s a typo in the message (BlengingPlusPlus → BlendingPlusPlus).

The signature types y_test: np.ndarray, but the docstring calls it optional and the code checks if y_test is not None:. For full consistency, you could update the type hint to np.ndarray | None.

Also applies to: 234-237

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45f4689 and 81c0bb3.

📒 Files selected for processing (20)

.gitignore (2 hunks)
examples/ensemble_attack/configs/experiment_config.yaml (1 hunks)
examples/ensemble_attack/configs/original_attack_config.yaml (5 hunks)
examples/ensemble_attack/data_configs/trans.json (1 hunks)
examples/ensemble_attack/real_data_collection.py (4 hunks)
examples/ensemble_attack/run.sh (0 hunks)
examples/ensemble_attack/run_attack.py (5 hunks)
examples/ensemble_attack/run_metaclassifier_training.py (5 hunks)
examples/ensemble_attack/run_shadow_model_training.py (2 hunks)
examples/ensemble_attack/run_test.sh (1 hunks)
examples/ensemble_attack/run_train.sh (1 hunks)
examples/ensemble_attack/test_attack_model.py (1 hunks)
src/midst_toolkit/attacks/ensemble/blending.py (7 hunks)
src/midst_toolkit/attacks/ensemble/metric_utils.py (1 hunks)
src/midst_toolkit/attacks/ensemble/process_split_data.py (1 hunks)
src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py (7 hunks)
src/midst_toolkit/attacks/ensemble/rmia/shadow_model_training.py (1 hunks)
src/midst_toolkit/attacks/ensemble/xgboost_tuner.py (1 hunks)
tests/unit/attacks/ensemble/test_meta_classifier.py (10 hunks)
tests/unit/attacks/ensemble/test_rmia.py (4 hunks)

💤 Files with no reviewable changes (1)

examples/ensemble_attack/run.sh

🧰 Additional context used

🧬 Code graph analysis (11)

src/midst_toolkit/attacks/ensemble/xgboost_tuner.py (1)

src/midst_toolkit/attacks/ensemble/metric_utils.py (1)

get_tpr_at_fpr (7-28)

examples/ensemble_attack/run_metaclassifier_training.py (1)

src/midst_toolkit/attacks/ensemble/blending.py (1)

predict (202-256)

src/midst_toolkit/attacks/ensemble/metric_utils.py (2)

src/midst_toolkit/evaluation/privacy/mia_scoring.py (2)

TprAtFpr (216-267)

TprFpr (159-213)

tests/unit/evaluation/privacy/test_mia_metrics.py (1)

test_tpr_at_fpr_function_bad_ranges (24-37)

examples/ensemble_attack/test_attack_model.py (3)

examples/ensemble_attack/run_shadow_model_training.py (1)

run_shadow_model_training (82-136)

src/midst_toolkit/attacks/ensemble/blending.py (3)

BlendingPlusPlus (26-256)

MetaClassifierType (21-23)

predict (202-256)

src/midst_toolkit/attacks/ensemble/data_utils.py (1)

load_dataframe (31-52)

examples/ensemble_attack/configs/original_attack_config.yaml (2)

tests/integration/attacks/ensemble/test_shadow_model_training.py (2)

test_train_and_fine_tune_tabddpm (135-187)

test_train_shadow_on_half_challenge_data (89-131)

src/midst_toolkit/attacks/ensemble/shadow_model_utils.py (2)

fine_tune_tabddpm_and_synthesize (158-248)

save_additional_tabddpm_config (36-76)

examples/ensemble_attack/data_configs/trans.json (2)

src/midst_toolkit/attacks/ensemble/clavaddpm_fine_tuning.py (2)

fine_tune_model (47-136)

child_fine_tuning (243-339)

tests/integration/attacks/ensemble/test_shadow_model_training.py (1)

test_train_and_fine_tune_tabddpm (135-187)

src/midst_toolkit/attacks/ensemble/process_split_data.py (1)

tests/unit/attacks/ensemble/test_process_data_split.py (1)

test_process_split_data (17-67)

examples/ensemble_attack/run_attack.py (4)

src/midst_toolkit/common/random.py (1)

set_all_random_seeds (11-55)

examples/ensemble_attack/run_shadow_model_training.py (2)

run_target_model_training (18-79)

run_shadow_model_training (82-136)

examples/ensemble_attack/run_metaclassifier_training.py (1)

run_metaclassifier_training (14-146)

tests/integration/attacks/ensemble/test_shadow_model_training.py (1)

cfg (31-33)

tests/unit/attacks/ensemble/test_rmia.py (1)

src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py (2)

Key (18-20)

get_rmia_gower (23-86)

examples/ensemble_attack/configs/experiment_config.yaml (3)

tests/unit/attacks/ensemble/test_shadow_model_utils.py (1)

test_save_additional_tabddpm_config (19-54)

tests/integration/attacks/ensemble/test_shadow_model_training.py (2)

cfg (31-33)

test_train_and_fine_tune_tabddpm (135-187)

src/midst_toolkit/attacks/ensemble/shadow_model_utils.py (1)

save_additional_tabddpm_config (36-76)

src/midst_toolkit/attacks/ensemble/blending.py (1)

src/midst_toolkit/attacks/ensemble/metric_utils.py (1)

get_tpr_at_fpr (7-28)

🪛 Ruff (0.14.5)

examples/ensemble_attack/test_attack_model.py

48-48: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

89-89: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build
GitHub Check: run-code-check
GitHub Check: integration-tests
GitHub Check: unit-tests

🔇 Additional comments (27)

examples/ensemble_attack/data_configs/trans.json (2)

9-12: ****

The concern about data_split_ratios removal is unfounded. The config classes define data_split_ratios with default values [0.7, 0.2, 0.1], and Pydantic will automatically use these defaults if the field is omitted from the JSON configuration. The code will not raise AttributeError or KeyError at runtime—it will simply use the defaults. All usages in train.py and clavaddpm_fine_tuning.py safely access the field through config objects that have this field properly defined with fallback defaults.

Likely an incorrect or invalid review comment.

31-31: Verification confirms scheduler implementation is correct.

The "cosine" scheduler is fully supported:

SchedulerType.COSINE = "cosine" is defined in the enum

Pydantic's BaseModel automatically converts the JSON string "cosine" to the SchedulerType.COSINE enum value

The scheduler parameter is properly propagated from DiffusionConfig.scheduler to GaussianMultinomialDiffusion.__init__(scheduler_type: SchedulerType)

The get_named_beta_schedule() function correctly handles SchedulerType.COSINE

No changes required.

examples/ensemble_attack/real_data_collection.py (2)

24-29: Good addition of experiment-specific attack types.

The new enum entries support experimentation with different training data sizes and follow the existing naming conventions consistently.

141-176: Well-designed parameterization of data splits.

The addition of optional population_splits and challenge_splits parameters with sensible defaults improves flexibility while maintaining backward compatibility. The default handling logic is clean and the automatic directory creation is a helpful addition.

src/midst_toolkit/attacks/ensemble/process_split_data.py (1)

170-171: LGTM!

Adding directory creation with parents=True and exist_ok=True is a defensive best practice that ensures the output path exists before writing data files.

src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py (1)

23-86: LGTM! Cleaner API with DataFrame lists.

The refactoring from dict-based to list-based model data simplifies the API and improves clarity. Direct iteration over DataFrames is more intuitive than key-based access.

src/midst_toolkit/attacks/ensemble/metric_utils.py (1)

17-20: LGTM! Clearer documentation.

The updated docstring better explains that predictions are confidence values in [0,1] representing membership probability, improving clarity for users of this function.

src/midst_toolkit/attacks/ensemble/xgboost_tuner.py (1)

12-12: LGTM! Import path refactored to metric_utils.

Moving get_tpr_at_fpr to metric_utils is a more appropriate module organization for metric-related utilities, and aligns with similar changes across the ensemble attack module.

examples/ensemble_attack/run_attack.py (2)

16-16: Deterministic seeding hook looks good

Using set_all_random_seeds(seed=config.random_seed) at the start of main is a solid addition and aligns with the new random_seed field in experiment_config.yaml. This will help keep runs reproducible across all three pipeline stages.

Also applies to: 58-58

28-37: Passing population_splits / challenge_splits through is aligned with the new config

Wiring population_splits and challenge_splits from config.data_processing_config into collect_population_data_ensemble matches the new experiment_config.yaml structure and keeps the example flexible for different split configurations.

tests/unit/attacks/ensemble/test_rmia.py (1)

155-161: Tests now correctly exercise get_rmia_gower with a list of synthetic DataFrames

The updated tests that build:

shadow_synthetic_list from base_data["model_data"][Key.TRAINED_RESULTS.value], and

synthetic_data_list from the same source (and from Key.FINE_TUNED_RESULTS.value for the missing categorical case),

and then pass these lists to get_rmia_gower(model_data=...) are well aligned with the new API (model_data: list[pd.DataFrame]).

The assertions still verify:

correct call counts,

that the ID column is dropped before distance computation, and

that sampling uses random_state=base_data["random_seed"].

This gives good coverage of the refactored implementation.

Also applies to: 198-206, 226-232

tests/unit/attacks/ensemble/test_meta_classifier.py (6)

31-42: Mock config now matches new metaclassifier settings

Adding data_types_file_path and meta_classifier_model_name under metaclassifier in mock_config_with_json_path keeps the test config in sync with the real experiment_config.yaml shape and ensures BlendingPlusPlus can be initialized without missing fields.

47-86: Including the ID column in sample_dataframes is consistent with data_types.json expectations

Extending sample_dataframes so that all frames include id_col (matching MOCK_COLUMN_TYPES_CONTENT["id_column_name"]) is the right move. It mirrors how the real pipeline uses an explicit ID column for RMIA/DOMIAS/Gower feature calculation and for metaclassifier training/evaluation.

89-122: BlendingPlusPlus initialization tests correctly exercise data_types_file_path handling

The updated test_init_success and test_init_invalid_type_raises_error:

Pass data_types_file_path=mock_config_with_json_path.metaclassifier.data_types_file_path,

Verify that open is called once with that path in read mode, and

Confirm that the loaded column_types and meta_classifier_type are as expected.

This matches the new constructor contract for BlendingPlusPlus and provides good regression coverage around the JSON schema.

141-231: Meta‑feature preparation tests are aligned with the new column‑types and RMIA wiring

In both _prepare_meta_features tests, passing data_types_file_path into BlendingPlusPlus and then:

using MOCK_COLUMN_TYPES_CONTENT["categorical"], ["numerical"], and "id_column_name", and

asserting that calculate_rmia_signals receives df_input, shadow_data_collection, categorical_column_names, id_column_name, and id_column_data

nicely validate that the new column‑types–driven code path is exercised correctly.

235-302: Fit‑path tests for LR/XGB correctly incorporate data_types_file_path

Both test_fit_logistic_regression and test_fit_xgboost now initialize BlendingPlusPlus with data_types_file_path and confirm:

_prepare_meta_features is invoked once, and

the right model type is constructed and trained, with hyperparameters pulled from mock_config_with_json_path.

This keeps the tests in sync with the updated API without changing their behavioral intent.

303-364: Predict‑path tests correctly use the new initialization signature

The predict‑related tests now pass data_types_file_path into BlendingPlusPlus in both the “not fit yet” and full predict flow cases, while still asserting:

an AssertionError when predict is called before fit, and

correct probability extraction and TPR@FPR computation in the happy path.

These updates maintain strong coverage of the predict API after the constructor change.

examples/ensemble_attack/run_metaclassifier_training.py (4)

6-6: Pandas import aligns with new CSV-based synthetic loading

The added pandas import is appropriate for reading the target model’s synthetic data from CSV; no issues here.

14-28: Decoupling from TrainingResult via target_model_synthetic_path looks good

The new target_model_synthetic_path: Path argument and updated docstring correctly narrow the training dependency to just the target’s synthetic data, which matches the PR objective of removing the need for the full TrainingResult.

96-103: BlendingPlusPlus initialization with data_types_file_path is consistent

Passing data_types_file_path=Path(config.metaclassifier.data_types_file_path) matches the updated BlendingPlusPlus constructor and centralizes column-type config in the JSON file; this wiring looks correct.

136-143: Evaluation output path and naming are coherent

Creating attack_evaluation_result_path and saving to a deterministic filename based on model_type (*_val_pred_proba.npy) is a nice improvement over timestamped paths and should simplify downstream consumption.

examples/ensemble_attack/test_attack_model.py (4)

18-35: Hydra entrypoint and logging of target model ID look good

Using @hydra.main with experiment_config and logging config.target_model.target_model_id on start provides a clear entrypoint and traceability for which target model is under attack.

95-118: BlendingPlusPlus construction and predict usage are consistent with the API

Passing data_types_file_path=Path(config.metaclassifier.data_types_file_path) matches the updated constructor.

Assigning blending_attacker.trained_model = trained_mataclassifier_model aligns with the new assertion in predict() (“provide a trained_model, or assign the trained model to the BlengingPlusPlus object”).

The predict call wires df_test, df_original_synthetic=target_synthetic, df_reference, and id_column_data=test_trans_ids coherently with the training-time usage.

This section looks correct.

133-134: __main__ guard correctly delegates to Hydra-decorated entrypoint

Calling run_metaclassifier_testing() under the if __name__ == "__main__": guard is the standard Hydra pattern and should work as intended.

77-90: Pickle concern does not apply to this closed research pipeline

The review correctly identifies that this code is "fine in a closed research pipeline," and that is precisely the context here. The pickle files are generated internally by run_shadow_model_training(config) during execution—not loaded from untrusted external sources. Config paths derive from developer-controlled YAML files in examples/ensemble_attack/configs/. While Hydra allows CLI overrides, this is standard for research code where the researcher controls the command line and config values. The assumption that shadow model files are trusted is valid in this context.

Likely an incorrect or invalid review comment.

src/midst_toolkit/attacks/ensemble/blending.py (2)

6-7: Importing Path matches new data_types_file_path usage

Adding from pathlib import Path is consistent with the constructor’s new data_types_file_path: Path parameter.

82-86: RMIA integration uses target synthetic data consistently

The _prepare_meta_features docstring and the calculate_rmia_signals call now both refer to synthetic data as df_synthetic / target_synthetic_data, which matches how the callers (fit and predict) pass in the diffusion model’s synthetic data. This should keep RMIA features aligned with the rest of the meta-features.

Also applies to: 100-106

examples/ensemble_attack/data_configs/trans.json

examples/ensemble_attack/real_data_collection.py

examples/ensemble_attack/test_attack_model.py

src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py

src/midst_toolkit/attacks/ensemble/rmia/shadow_model_training.py

examples/ensemble_attack/configs/experiment_config.yaml

examples/ensemble_attack/real_data_collection.py

sarakodeiri

Overall looks great to me. Added very minor comments / questions.

examples/ensemble_attack/run_attack.py

examples/ensemble_attack/run_metaclassifier_training.py

examples/ensemble_attack/run_shadow_model_training.py

examples/ensemble_attack/run_train.sh

src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py

examples/ensemble_attack/configs/experiment_config.yaml

examples/ensemble_attack/configs/original_attack_config.yaml

examples/ensemble_attack/test_attack_model.py

src/midst_toolkit/attacks/ensemble/rmia/rmia_calculation.py

src/midst_toolkit/attacks/ensemble/blending.py

* Added testing several targets on multiple gpus * Added a comment

lotif

LGTM apart from one small thing. Thanks for addressing the comments!

examples/ensemble_attack/test_attack_model.py

lotif · 2025-11-21T20:13:05Z

examples/ensemble_attack/test_attack_model.py

-    df_test = df_test.drop(columns=["trans_id", "account_id"])
+    with open(Path(config.metaclassifier.data_types_file_path), "r") as f:
+        column_types = json.load(f)
+    id_column_name = column_types["id_column_name"]


Nice! Thanks for finding this!

fatemetkl added 6 commits November 12, 2025 14:57

Fixed 2 bugs: shadow synth data size, and var name

27fa9c8

Remove dependency on target’s training result object; attack now only…

f64b650

… uses target synthetic data.

Added test script that works with the trained tabddpm models in diffe…

0d4a3b6

…rent experimental setups

Updated test

9c1217d

pre-commit checks

0a12084

Merged main into this branch, addressed conflicts

81c0bb3

fatemetkl changed the title ~~Ft/ensemble experiments~~ Ft/ensemble experiments test script Nov 18, 2025

fatemetkl commented Nov 18, 2025

View reviewed changes

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

fatemetkl added 2 commits November 18, 2025 13:17

Minor fixes

a19a595

Minor fixes

e93b3c1

fatemetkl requested review from ElahehBassak, bzamanlooy, emersodb, lotif, masi-sh and sarakodeiri November 18, 2025 18:28

Removed extra line

70cb1af

sarakodeiri reviewed Nov 18, 2025

View reviewed changes

examples/ensemble_attack/configs/experiment_config.yaml Show resolved Hide resolved

sarakodeiri reviewed Nov 18, 2025

View reviewed changes

examples/ensemble_attack/real_data_collection.py Outdated Show resolved Hide resolved

sarakodeiri reviewed Nov 18, 2025

View reviewed changes

examples/ensemble_attack/configs/experiment_config.yaml Show resolved Hide resolved

Sara's comments

942475d

lotif requested changes Nov 19, 2025

View reviewed changes

fatemetkl added 4 commits November 20, 2025 10:50

Addressed Marcelo's comments

de1cb95

Merged remote

819826c

Ensemble experiments: SLURM script (#97)

59e52d4

* Added testing several targets on multiple gpus * Added a comment

Small fix

cb7e65c

lotif approved these changes Nov 21, 2025

View reviewed changes

fatemetkl added 2 commits November 25, 2025 10:22

Marcelo's comment

c58db99

Merge branch 'main' into ft/ensemble_experiments

1df7528

fatemetkl merged commit 1cf674b into main Nov 25, 2025
6 of 7 checks passed

fatemetkl deleted the ft/ensemble_experiments branch November 25, 2025 19:46

Ft/ensemble experiments test script #96

Ft/ensemble experiments test script #96

Uh oh!

Conversation

fatemetkl commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

fatemetkl Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

fatemetkl Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarakodeiri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lotif left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lotif Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fatemetkl commented Nov 18, 2025 •

edited

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading