-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Problem Description
The SDGym monthly benchmark execution is being migrated from AWS to Google Cloud Platform (GCP) compute instances. To support this, a new internal helper method, _benchmark_multi_table_compute_gcp, should be introduced to run the multi-table SDGym benchmark on GCP.
Starting with this implementation, the GCP compute instances should also install sdv_enterprise, enabling the inclusion of enterprise synthesizers in benchmark runs.
GCP should be used only for computation. All benchmark artifacts and results must continue to be stored in AWS (S3), consistent with the current benchmark workflow.
Expected behavior
- To avoid conflicts with the existing AWS benchmark logic, create a new internal _benchmark folder.
- Define _benchmark_multi_table_compute_gcp inside this folder.
- The method should used similar logic to what is done in
benchmark_multi_table_aws
def _benchmark_multi_table_compute_gcp(
output_destination,
credential_filepath,
compute_config=None,
synthesizers=DEFAULT_MULTI_TABLE_SYNTHESIZERS,
sdv_datasets=DEFAULT_MULTI_TABLE_DATASETS,
additional_datasets_folder=None,
limit_dataset_size=False,
compute_quality_score=True,
compute_diagnostic_score=True,
sdmetrics=None,
timeout=None,
):Parameters
output_destination: An S3 bucket destination, matching the behavior of benchmark_multi_table_aws.credential_filepath: Path to a JSON file containing all required credentials. Expected structure:
{
"aws": {
"aws_access_key_id": "<key_id>",
"aws_secret_access_key": "<access_key_id>"
},
"gcp": {
"type": "value",
"project_id": "value",
"private_key_id": "value",
"private_key": "value",
"client_email": "value",
"client_id": "value",
"auth_uri": "value",
"token_uri": "value",
"auth_provider_x509_cert_url": "value",
"client_x509_cert_url": "value",
"universe_domain": "value",
"gcp_project": "value",
"gcp_zone": "value"
},
"sdv": {
"username": "value",
"license_key": "value"
}
}
compute_config
A dictionary defining all parameters required to launch and manage compute instances.- For now, this may default to None, in which case an internal default configuration is used.
- In the future, this may be exposed to users in a manner similar to AWS.
- Default configuration example:
{
"gcp": {
"name_prefix": "sdgym-run",
"machine_type": "n1-standard-8",
"source_image": (
"projects/deeplearning-platform-release/global/images/family/"
"common-cu128-ubuntu-2204-nvidia-570"
),
"gpu_type": "nvidia-tesla-t4",
"gpu_count": 1,
"install_nvidia_driver": False,
"delete_on_success": True,
"delete_on_error": True,
"stop_fallback": True,
},
"aws": {
"name_prefix": "sdgym-run",
"ami": "ami-080e1f13689e07408",
"instance_type": "g4dn.4xlarge",
"volume_size_gb": 100,
},
}- Other parameters
- All remaining parameters should mirror the behavior of
benchmark_multi_table_aws.
- All remaining parameters should mirror the behavior of
Additional context
This issue should define the full lifecycle required to launch, run, and terminate a GCP compute instance.
To remain consistent with the existing AWS benchmark workflow, the following requirements apply:
- Use a GPU-enabled GCP instance
- Recommended GPU types:
- nvidia-tesla-t4
- nvidia-l4 (widely available and cost-effective)
- Ensure robust cleanup logic
- Use a trap or equivalent mechanism so that the instance is stopped or deleted if:
- the benchmark completes successfully, or
- the benchmark fails due to an error
- This prevents accidental billing from orphaned instances.
- Use a trap or equivalent mechanism so that the instance is stopped or deleted if:
- At the end of this issue, a multi-table benchmark should be run on gcp with sdv_enterprise synthesizers and the results uploaded to aws
- Define the equivalent
single-tablemethod