Skip to content

Conversation

@fatemetkl
Copy link
Collaborator

PR Type

[Feature | Fix | Documentation | Other ]

Short Description

Clickup Ticket(s): Link(s) if applicable.

Add a short description of what is in this PR.

Tests Added

Describe the tests that have been added to ensure the codes correctness, if applicable.

challenge_data_path: ${target_model.target_model_directory}/${target_model.target_model_name}/challenge_with_id.csv
challenge_label_path: ${target_model.target_model_directory}/${target_model.target_model_name}/challenge_label.csv

target_attack_artifact_dir: ${base_experiment_dir}/target_${target_model.target_model_id}_attack_artifacts/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory was extra and can be removed.

@@ -1,34 +1,36 @@
# Ensemble experiment configuration
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current configuration contains a large number of variables and data paths. While all of them are necessary for the experiment, even detailed comments can’t fully clarify the purpose of each element.

As part of a future refactor, I suggest splitting the configuration into multiple smaller configs, each dedicated to a specific part of the attack (e.g., data‑collection pipeline, training pipeline, testing pipeline). This will introduce some overhead, since users will need to manage several config files and ensure they stay aligned, but the gain in clarity and maintainability is worth it.

I did appreciate the convenience of controlling everything from a single config, as it made running many experiments and adjusting parameters very fast. However, this convenience comes at the cost of readability, which becomes a real issue as other users begin to use it.

Maybe it's worth taking a shot at just using better variable names in one config. So far, we've tried to align the names with the original attack code, which also uses very vague and non-self-explanatory variable names. But they should change throughout the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR:
The current single config is hard to understand because it mixes many variables and data paths with unclear names inherited from the original attack code. Splitting it into multiple pipeline‑specific configs would improve clarity and maintainability, even if it adds some overhead. Alternatively, improving variable naming within one config could be helpful.

)

population.append(df_real)
population.append(df_real)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug! Thank you for catching this, Sara!

# Load the required dataframes for shadow model training.
# For shadow model training we need master_challenge_train and population data.
# Master challenge is the main training (or fine-tuning) data for the shadow models.
df_master_challenge_train = load_dataframe(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of loading the data here, it is passed to the function.

f"Fine-tuned model {model_id} generated {len(train_result.synthetic_data)} synthetic samples.",
)
attack_data["fine_tuned_results"].append(train_result)
attack_data["fine_tuned_results"].append(train_result.synthetic_data)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to save the synthetic data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants