Skip to content

Commit fcf701e

Browse files
authored
Infra improvements (#66)
* set docker context to root of autogluon-bench project to prepare for copying package setup files to docker * install agbench according to local agbench version * use static base dir in docker to increase caching * Use /home as base dir for dependencies * require IMDSV2 in instances * use AWS array job to avoid throttle * raise lambda error * use custom metrics with standard metrics * use custom_configs/ for mounting * handle empty params and default eval_metric for init * add metrics support * update tests * lint * update README
1 parent 5bcd5bc commit fcf701e

File tree

26 files changed

+303
-413
lines changed

26 files changed

+303
-413
lines changed

.dockerignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*
2+
!.git/
3+
!src/
4+
!pyproject.toml

README.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,6 @@ cd autogluon-bench
3333
pip install -e ".[tests]"
3434
```
3535

36-
For development, please be aware that `autogluon.bench` is installed as a dependency in certain places, such as the [Dockerfile](https://github.com/autogluon/autogluon-bench/blob/master/src/autogluon/bench/Dockerfile) and [Multimodal Setup](https://github.com/autogluon/autogluon-bench/blob/master/src/autogluon/bench/frameworks/multimodal/setup.sh). We made it possible to reflect the development changes by pushing the changes to a remote GitHub branch, and providing the URI when testing on benchmark runs:
37-
38-
```
39-
agbench run sample_configs/multimodal_cloud_configs.yaml --dev-branch https://github.com/<username>/autogluon-bench.git#<dev_branch>
40-
```
41-
4236

4337
## Run benchmarks locally
4438

@@ -144,11 +138,11 @@ After having the configuration file ready, use the command below to initiate ben
144138
agbench run /path/to/cloud_config_file
145139
```
146140

147-
This command automatically sets up an AWS Batch environment using instance specifications defined in the [cloud config files](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs). It also creates a lambda function named with your chosen `LAMBDA_FUNCTION_NAME`. This lambda function is automatically invoked with the cloud config file you provided, submitting multiple AWS Batch jobs to the job queue (named with the `PREFIX` you provided).
141+
This command automatically sets up an AWS Batch environment using instance specifications defined in the [cloud config files](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs). It also creates a lambda function named with your chosen `LAMBDA_FUNCTION_NAME`. This lambda function is automatically invoked with the cloud config file you provided, submitting a single AWS Batch job or a parent job for [Array jobs](https://docs.aws.amazon.com/batch/latest/userguide/array_jobs.html) to the job queue (named with the `PREFIX` you provided).
148142

149-
In order for the Lambda function to submit multiple jobs simultaneously, you need to specify a list of values for each module-specific key. Each combination of configurations is saved and uploaded to your specified `METRICS_BUCKET` in S3, stored under `S3://{METRICS_BUCKET}/configs/{BENCHMARK_NAME}_{timestamp}/{BENCHMARK_NAME}_split_{UID}.yaml`. Here, `UID` is a unique ID assigned to the split.
143+
In order for the Lambda function to submit multiple Array child jobs simultaneously, you need to specify a list of values for each module-specific key. Each combination of configurations is saved and uploaded to your specified `METRICS_BUCKET` in S3, stored under `S3://{METRICS_BUCKET}/configs/{module}/{BENCHMARK_NAME}_{timestamp}/{BENCHMARK_NAME}_split_{UID}.yaml`. Here, `UID` is a unique ID assigned to the split.
150144

151-
The AWS infrastructure configurations and submitted job IDs are saved locally at `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/aws_configs.yaml`. You can use this file to check the job status at any time:
145+
The AWS infrastructure configurations and submitted job ID is saved locally at `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/aws_configs.yaml`. You can use this file to check the job status at any time:
152146

153147
```bash
154148
agbench get-job-status --config-file /path/to/aws_configs.yaml
@@ -272,5 +266,5 @@ agbench clean-amlb-results --help
272266
Step 3: Run evaluation on multiple cleaned files from `Step 2`
273267

274268
```
275-
agbench evaluate-amlb-results --frameworks-run framework_1 --frameworks-run framework_2 --results-dir-input data/results/input/prepared/openml/ --paths file_name_1.csv --paths file_name_2.csv --no-clean-data
269+
agbench evaluate-amlb-results --frameworks-run framework_1 --frameworks-run framework_2 --results-dir-input data/results/input/prepared/openml/ --paths file_name_1.csv --paths file_name_2.csv --output-suffix benchmark_name --no-clean-data
276270
```

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,3 +108,4 @@ xfail_strict = true
108108

109109
[tool.setuptools_scm]
110110
write_to = "src/autogluon/bench/version.py"
111+
fallback_version = "0.0.1.dev0"

src/autogluon/bench/Dockerfile

Lines changed: 16 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ ARG AG_BENCH_BASE_IMAGE
22
FROM $AG_BENCH_BASE_IMAGE
33

44
ENV DEBIAN_FRONTEND=noninteractive
5+
ENV RUNNING_IN_DOCKER=true
6+
ENV AGBENCH_BASE=src/autogluon/bench/
57

68
# Install essential packages and Python 3.9
79
RUN apt-get update && \
@@ -22,48 +24,38 @@ RUN apt-get install -y python3-pip unzip curl git pciutils && \
2224
rm -rf /var/lib/apt/lists/* /usr/local/aws
2325

2426
# Application-specific steps
25-
ARG AG_BENCH_DEV_URL
2627
ARG AG_BENCH_VERSION
2728
ARG CDK_DEPLOY_REGION
2829
ARG FRAMEWORK_PATH
2930
ARG GIT_URI
3031
ARG GIT_BRANCH
31-
ARG BENCHMARK_DIR
3232
ARG AMLB_FRAMEWORK
3333
ARG AMLB_USER_DIR
3434

3535
WORKDIR /app/
3636

37-
RUN if [ -n "$AG_BENCH_DEV_URL" ]; then \
38-
echo "Cloning: $AG_BENCH_DEV_URL" \
39-
&& AG_BENCH_DEV_REPO=$(echo "$AG_BENCH_DEV_URL" | cut -d "#" -f 1) \
40-
&& AG_BENCH_DEV_BRANCH=$(echo "$AG_BENCH_DEV_URL" | cut -d "#" -f 2) \
41-
&& git clone --branch "$AG_BENCH_DEV_BRANCH" --single-branch "$AG_BENCH_DEV_REPO" /app/autogluon-bench \
42-
&& python3 -m pip install -e /app/autogluon-bench; \
37+
# Copying necessary files for autogluon-bench package
38+
COPY . /app/
39+
COPY ${AGBENCH_BASE}entrypoint.sh /app/
40+
COPY ${AGBENCH_BASE}custom_configs /app/custom_configs/
41+
42+
# check if autogluon.bench version contains "dev" tag
43+
RUN if echo "$AG_BENCH_VERSION" | grep -q "dev"; then \
44+
# install from local source
45+
pip install /app/; \
4346
else \
44-
output=$(pip install autogluon.bench==$AG_BENCH_VERSION 2>&1) || true; \
45-
if echo $output | grep -q "No matching distribution"; then \
46-
echo -e "ERROR: No matching distribution found for autogluon.bench==$AG_BENCH_VERSION\n\
47-
To resolve the issue, try 'agbench run <config_file> --dev-branch <autogluon_bench_fork_uri>#<git_branch>'"; \
48-
exit 1; \
49-
fi; \
47+
pip install autogluon.bench==$AG_BENCH_VERSION; \
5048
fi
5149

52-
COPY entrypoint.sh utils/hardware_utilization.sh $FRAMEWORK_PATH/setup.sh custom_configs/ /app/
53-
54-
RUN chmod +x setup.sh entrypoint.sh hardware_utilization.sh \
50+
RUN chmod +x entrypoint.sh \
5551
&& if echo "$FRAMEWORK_PATH" | grep -q -E "tabular|timeseries"; then \
5652
if [ -n "$AMLB_USER_DIR" ]; then \
57-
bash setup.sh $GIT_URI $GIT_BRANCH $BENCHMARK_DIR $AMLB_FRAMEWORK $AMLB_USER_DIR; \
53+
bash ${AGBENCH_BASE}${FRAMEWORK_PATH}setup.sh $GIT_URI $GIT_BRANCH "/home" $AMLB_FRAMEWORK $AMLB_USER_DIR; \
5854
else \
59-
bash setup.sh $GIT_URI $GIT_BRANCH $BENCHMARK_DIR $AMLB_FRAMEWORK; \
55+
bash ${AGBENCH_BASE}${FRAMEWORK_PATH}setup.sh $GIT_URI $GIT_BRANCH "/home" $AMLB_FRAMEWORK; \
6056
fi; \
6157
elif echo "$FRAMEWORK_PATH" | grep -q "multimodal"; then \
62-
if [ -n "$AG_BENCH_DEV_URL" ]; then \
63-
bash setup.sh $GIT_URI $GIT_BRANCH $BENCHMARK_DIR --AGBENCH_DEV_URL=$AG_BENCH_DEV_URL; \
64-
else \
65-
bash setup.sh $GIT_URI $GIT_BRANCH $BENCHMARK_DIR --AG_BENCH_VER=$AG_BENCH_VERSION; \
66-
fi; \
58+
bash ${AGBENCH_BASE}${FRAMEWORK_PATH}setup.sh $GIT_URI $GIT_BRANCH "/home" $AG_BENCH_VERSION; \
6759
fi \
6860
&& echo "CDK_DEPLOY_REGION=$CDK_DEPLOY_REGION" >> /etc/environment
6961

src/autogluon/bench/cloud/aws/batch_stack/lambdas/amlb_configs/__init__.py

Lines changed: 0 additions & 1 deletion
This file was deleted.

src/autogluon/bench/cloud/aws/batch_stack/lambdas/custom_configs/amlb_configs/__init__.py

Whitespace-only changes.

src/autogluon/bench/cloud/aws/batch_stack/lambdas/lambda_function.py

Lines changed: 46 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import itertools
33
import logging
44
import os
5-
import uuid
65
import zipfile
76

87
import requests
@@ -18,7 +17,7 @@
1817
AMLB_DEPENDENT_MODULES = ["tabular", "timeseries"]
1918

2019

21-
def submit_batch_job(env: list, job_name: str, job_queue: str, job_definition: str):
20+
def submit_batch_job(env: list, job_name: str, job_queue: str, job_definition: str, array_size: int):
2221
"""
2322
Submits a Batch job with the given environment variables, job name, job queue and job definition.
2423
@@ -27,17 +26,23 @@ def submit_batch_job(env: list, job_name: str, job_queue: str, job_definition: s
2726
job_name (str): Name of the job.
2827
job_queue (str): Name of the job queue.
2928
job_definition (str): Name of the job definition.
29+
array_size (int): Number of jobs to submit.
3030
3131
Returns:
3232
str: Job ID.
3333
"""
3434
container_overrides = {"environment": env}
35-
response = aws_batch.submit_job(
36-
jobName=job_name,
37-
jobQueue=job_queue,
38-
jobDefinition=job_definition,
39-
containerOverrides=container_overrides,
40-
)
35+
job_params = {
36+
"jobName": job_name,
37+
"jobQueue": job_queue,
38+
"jobDefinition": job_definition,
39+
"containerOverrides": container_overrides,
40+
}
41+
if array_size > 1:
42+
job_params["arrayProperties"] = {"size": array_size}
43+
44+
response = aws_batch.submit_job(**job_params)
45+
4146
logger.info("Job %s submitted to AWS Batch queue %s.", job_name, job_queue)
4247
logger.info(response)
4348
return response["jobId"]
@@ -88,7 +93,7 @@ def download_dir_from_s3(s3_path: str, local_path: str) -> str:
8893
return local_path
8994

9095

91-
def upload_config(bucket: str, benchmark_name: str, file: str):
96+
def upload_config(config_list: list, bucket: str, benchmark_name: str):
9297
"""
9398
Uploads a file to the given S3 bucket.
9499
@@ -99,28 +104,9 @@ def upload_config(bucket: str, benchmark_name: str, file: str):
99104
Returns:
100105
str: S3 path of the uploaded file.
101106
"""
102-
file_name = f'{file.split("/")[-1].split(".")[0]}.yaml'
103-
s3_path = f"configs/{benchmark_name}/{file_name}"
104-
s3.upload_file(file, bucket, s3_path)
105-
return f"s3://{bucket}/{s3_path}"
106-
107-
108-
def save_configs(configs: dict, uid: str):
109-
"""
110-
Saves the given dictionary of configs to a YAML file with the given UID as a part of the filename.
111-
112-
Args:
113-
configs (Dict[str, Any]): Dictionary of configurations to be saved.
114-
uid (str): UID to be added to the filename of the saved file.
115-
116-
Returns:
117-
str: Local path of the saved file.
118-
"""
119-
benchmark_name = configs["benchmark_name"]
120-
config_file_path = os.path.join("/tmp", f"{benchmark_name}_split_{uid}.yaml")
121-
with open(config_file_path, "w+") as f:
122-
yaml.dump(configs, f, default_flow_style=False)
123-
return config_file_path
107+
s3_key = f"configs/{benchmark_name}/{benchmark_name}_job_configs.yaml"
108+
s3.put_object(Body=yaml.dump(config_list), Bucket=bucket, Key=s3_key)
109+
return f"s3://{bucket}/{s3_key}"
124110

125111

126112
def download_automlbenchmark_resources():
@@ -217,59 +203,37 @@ def process_benchmark_runs(module_configs: dict, amlb_benchmark_search_dirs: lis
217203
module_configs["fold_to_run"][benchmark][task] = amlb_task_folds[benchmark][task]
218204

219205

220-
def process_combination(configs, metrics_bucket, batch_job_queue, batch_job_definition):
221-
"""
222-
Processes a combination of configurations by generating and submitting Batch jobs.
223-
224-
Args:
225-
combination (Tuple): tuple of configurations to process.
226-
keys (List[str]): list of keys of the configurations.
227-
metrics_bucket (str): name of the bucket to upload metrics to.
228-
batch_job_queue (str): name of the Batch job queue to submit jobs to.
229-
batch_job_definition (str): name of the Batch job definition to use for submitting jobs.
230-
231-
Returns:
232-
str: job id of the submitted batch job.
233-
"""
234-
logger.info(f"Generating config with: {configs}")
235-
config_uid = uuid.uuid1().hex
236-
config_local_path = save_configs(configs=configs, uid=config_uid)
237-
config_s3_path = upload_config(
238-
bucket=metrics_bucket, benchmark_name=configs["benchmark_name"], file=config_local_path
239-
)
240-
job_name = f"{configs['benchmark_name']}-{configs['module']}-{config_uid}"
241-
env = [{"name": "config_file", "value": config_s3_path}]
242-
243-
job_id = submit_batch_job(
244-
env=env,
245-
job_name=job_name,
246-
job_queue=batch_job_queue,
247-
job_definition=batch_job_definition,
248-
)
249-
return job_id, config_s3_path
206+
def get_cloudwatch_logs_url(region: str, job_id: str, log_group_name: str = "aws/batch/job"):
207+
base_url = f"https://console.aws.amazon.com/cloudwatch/home?region={region}"
208+
job_response = aws_batch.describe_job(jobs=[job_id])
209+
log_stream_name = job_response["jobs"][0]["attempts"][0]["container"]["logStreamName"]
210+
return f"{base_url}#logsV2:log-groups/log-group/{log_group_name.replace('/', '%2F')}/log-events/{log_stream_name.replace('/', '%2F')}"
250211

251212

252213
def generate_config_combinations(config, metrics_bucket, batch_job_queue, batch_job_definition):
253-
job_configs = {}
254-
config.pop("cdk_context")
214+
job_configs = []
255215
if config["module"] in AMLB_DEPENDENT_MODULES:
256-
job_configs = generate_amlb_module_config_combinations(
257-
config, metrics_bucket, batch_job_queue, batch_job_definition
258-
)
216+
job_configs = generate_amlb_module_config_combinations(config)
259217
elif config["module"] == "multimodal":
260-
job_configs = generate_multimodal_config_combinations(
261-
config, metrics_bucket, batch_job_queue, batch_job_definition
262-
)
218+
job_configs = generate_multimodal_config_combinations(config)
263219
else:
264220
raise ValueError("Invalid module. Choose either 'tabular', 'timeseries', or 'multimodal'.")
265221

266-
response = {
267-
"job_configs": job_configs,
268-
}
269-
return response
222+
benchmark_name = config["benchmark_name"]
223+
config_s3_path = upload_config(config_list=job_configs, bucket=metrics_bucket, benchmark_name=benchmark_name)
224+
env = [{"name": "config_file", "value": config_s3_path}]
225+
job_name = f"{benchmark_name}-array-job"
226+
parent_job_id = submit_batch_job(
227+
env=env,
228+
job_name=job_name,
229+
job_queue=batch_job_queue,
230+
job_definition=batch_job_definition,
231+
array_size=len(job_configs),
232+
)
233+
return {parent_job_id: config_s3_path}
270234

271235

272-
def generate_multimodal_config_combinations(config, metrics_bucket, batch_job_queue, batch_job_definition):
236+
def generate_multimodal_config_combinations(config):
273237
common_keys = []
274238
specific_keys = []
275239
for key in config.keys():
@@ -278,23 +242,21 @@ def generate_multimodal_config_combinations(config, metrics_bucket, batch_job_qu
278242
else:
279243
common_keys.append(key)
280244

281-
job_configs = {}
282245
specific_value_combinations = list(
283246
itertools.product(*(config[key] for key in specific_keys if key in config.keys()))
284247
) or [None]
285248

249+
all_configs = []
286250
for combo in specific_value_combinations:
287251
new_config = {key: config[key] for key in common_keys}
288252
if combo is not None:
289253
new_config.update(dict(zip(specific_keys, combo)))
254+
all_configs.append(new_config)
290255

291-
job_id, config_s3_path = process_combination(new_config, metrics_bucket, batch_job_queue, batch_job_definition)
292-
job_configs[job_id] = config_s3_path
293-
294-
return job_configs
256+
return all_configs
295257

296258

297-
def generate_amlb_module_config_combinations(config, metrics_bucket, batch_job_queue, batch_job_definition):
259+
def generate_amlb_module_config_combinations(config):
298260
specific_keys = ["git_uri#branch", "framework", "amlb_constraint", "amlb_user_dir"]
299261
exclude_keys = ["amlb_benchmark", "amlb_task", "fold_to_run"]
300262
common_keys = []
@@ -308,13 +270,13 @@ def generate_amlb_module_config_combinations(config, metrics_bucket, batch_job_q
308270
else:
309271
common_keys.append(key)
310272

311-
job_configs = {}
312273
specific_value_combinations = list(
313274
itertools.product(*(config[key] for key in specific_keys if key in config.keys()))
314275
) or [None]
315276

316277
# Iterate through the combinations and the amlb benchmark task keys
317278
# Generates a config for each combination of specific key and keys in `fold_to_run`
279+
all_configs = []
318280
for combo in specific_value_combinations:
319281
for benchmark, tasks in config["fold_to_run"].items():
320282
for task, fold_numbers in tasks.items():
@@ -325,62 +287,16 @@ def generate_amlb_module_config_combinations(config, metrics_bucket, batch_job_q
325287
new_config["amlb_benchmark"] = benchmark
326288
new_config["amlb_task"] = task
327289
new_config["fold_to_run"] = fold_num
328-
job_id, config_s3_path = process_combination(
329-
new_config, metrics_bucket, batch_job_queue, batch_job_definition
330-
)
331-
job_configs[job_id] = config_s3_path
332-
return job_configs
290+
all_configs.append(new_config)
291+
292+
return all_configs
333293

334294

335295
def handler(event, context):
336296
"""
337297
Execution entrypoint for AWS Lambda.
338298
Triggers batch jobs with hyperparameter combinations.
339299
ENV variables are set by the AWS CDK infra code.
340-
341-
Sample of cloud_configs.yaml to be supplied by user
342-
343-
# Infra configurations
344-
cdk_context:
345-
CDK_DEPLOY_ACCOUNT: dummy
346-
CDK_DEPLOY_REGION: dummy
347-
348-
# Benchmark configurations
349-
module: multimodal
350-
mode: aws
351-
benchmark_name: test_yaml
352-
metrics_bucket: autogluon-benchmark-metrics
353-
354-
# Module specific configurations
355-
module_configs:
356-
# Multimodal specific
357-
multimodal:
358-
git_uri#branch: https://github.com/autogluon/autogluon#master
359-
dataset_name: melbourne_airbnb
360-
presets: medium_quality
361-
hyperparameters:
362-
optimization.learning_rate: 0.0005
363-
optimization.max_epochs: 5
364-
time_limit: 10
365-
366-
367-
# Tabular specific
368-
# You can refer to AMLB (https://github.com/openml/automlbenchmark#quickstart) for more details
369-
tabular:
370-
framework:
371-
- AutoGluon
372-
label:
373-
- stable
374-
amlb_benchmark:
375-
- test
376-
- small
377-
amlb_task:
378-
test: null
379-
small:
380-
- credit-g
381-
- vehicle
382-
amlb_constraint:
383-
- test
384300
"""
385301
if "config_file" not in event or not event["config_file"].startswith("s3"):
386302
raise KeyError("S3 path of config file is required.")

0 commit comments

Comments
 (0)