-
Notifications
You must be signed in to change notification settings - Fork 1
Ft/ensemble changes #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Ft/ensemble changes #115
Conversation
… uses target synthetic data.
…rent experimental setups
* Added testing several targets on multiple gpus * Added a comment
| challenge_data_path: ${target_model.target_model_directory}/${target_model.target_model_name}/challenge_with_id.csv | ||
| challenge_label_path: ${target_model.target_model_directory}/${target_model.target_model_name}/challenge_label.csv | ||
|
|
||
| target_attack_artifact_dir: ${base_experiment_dir}/target_${target_model.target_model_id}_attack_artifacts/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory was extra and can be removed.
| @@ -1,34 +1,36 @@ | |||
| # Ensemble experiment configuration | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current configuration contains a large number of variables and data paths. While all of them are necessary for the experiment, even detailed comments can’t fully clarify the purpose of each element.
As part of a future refactor, I suggest splitting the configuration into multiple smaller configs, each dedicated to a specific part of the attack (e.g., data‑collection pipeline, training pipeline, testing pipeline). This will introduce some overhead, since users will need to manage several config files and ensure they stay aligned, but the gain in clarity and maintainability is worth it.
I did appreciate the convenience of controlling everything from a single config, as it made running many experiments and adjusting parameters very fast. However, this convenience comes at the cost of readability, which becomes a real issue as other users begin to use it.
Maybe it's worth taking a shot at just using better variable names in one config. So far, we've tried to align the names with the original attack code, which also uses very vague and non-self-explanatory variable names. But they should change throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR:
The current single config is hard to understand because it mixes many variables and data paths with unclear names inherited from the original attack code. Splitting it into multiple pipeline‑specific configs would improve clarity and maintainability, even if it adds some overhead. Alternatively, improving variable naming within one config could be helpful.
| ) | ||
|
|
||
| population.append(df_real) | ||
| population.append(df_real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug! Thank you for catching this, Sara!
| # Load the required dataframes for shadow model training. | ||
| # For shadow model training we need master_challenge_train and population data. | ||
| # Master challenge is the main training (or fine-tuning) data for the shadow models. | ||
| df_master_challenge_train = load_dataframe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of loading the data here, it is passed to the function.
| f"Fine-tuned model {model_id} generated {len(train_result.synthetic_data)} synthetic samples.", | ||
| ) | ||
| attack_data["fine_tuned_results"].append(train_result) | ||
| attack_data["fine_tuned_results"].append(train_result.synthetic_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to save the synthetic data.
PR Type
[Feature | Fix | Documentation | Other ]
Short Description
Clickup Ticket(s): Link(s) if applicable.
Add a short description of what is in this PR.
Tests Added
Describe the tests that have been added to ensure the codes correctness, if applicable.