Skip to content

Commit eee9fe0

Browse files
authored
Make data.download not use Docker with DirectRunner and apache_beam>=2.68.0 (#182)
* rm environment_type DOCKER to make DirectRunner not use docker with beam>2.67 * fix incorrct dont_sonify input * Remove unused constant * adapt readme
1 parent 1f95833 commit eee9fe0

File tree

7 files changed

+2
-10
lines changed

7 files changed

+2
-10
lines changed

basic_pitch/data/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ The code and scripts in this section deal with training basic pitch on your own.
55
* **--runner**: The method used to run the Beam Pipeline for processing the dataset. Options include `DirectRunner`, running directly in the code process running the pipeline, `PortableRunner`, which can be used to run the pipeline in a docker container locally, and `DataflowRunner`, which can be used to run the pipeline in a docker container on Dataflow.
66
* **--timestamped**: If passed, the dataset will be put into a timestamp directory instead of 'splits'.
77
* **--batch-size**: Number of examples per tfrecord when partitioning the dataset.
8-
* **--sdk_container_image**: The Docker container image used to process the data if using `PortableRunner` or `DirectRunner`.
8+
* **--sdk_container_image**: The Docker container image used to process the data if using `PortableRunner`.
99
* **--job_endpoint**: the endpoint where the job is running. It defaults to `embed` which works for `PortableRunner`.
1010

1111
Additional arguments that work with Beam in general can be used as well, and will be passed along and used by the pipeline. If using `DataflowRunner`, you will be required to pass `--temp_location={Path to GCS Bucket}`, `--staging_location={Path to GCS Bucket}`, `--project={Name of GCS Project}` and `--region={GCS region}`.

basic_pitch/data/datasets/guitarset.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,6 @@ def main(known_args: argparse.Namespace, pipeline_args: List[str]) -> None:
165165
"save_main_session": True,
166166
"sdk_container_image": known_args.sdk_container_image,
167167
"job_endpoint": known_args.job_endpoint,
168-
"environment_type": "DOCKER",
169168
"environment_config": known_args.sdk_container_image,
170169
}
171170
pipeline.run(

basic_pitch/data/datasets/ikala.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,6 @@ def main(known_args: argparse.Namespace, pipeline_args: List[str]) -> None:
165165
"save_main_session": True,
166166
"sdk_container_image": known_args.sdk_container_image,
167167
"job_endpoint": known_args.job_endpoint,
168-
"environment_type": "DOCKER",
169168
"environment_config": known_args.sdk_container_image,
170169
}
171170
input_data = create_input_data(known_args.train_percent, known_args.split_seed)

basic_pitch/data/datasets/maestro.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,6 @@ def __init__(self, source: str) -> None:
4646
self.source = source
4747

4848
def setup(self) -> None:
49-
# Oddly enough we dont want to include the gcs bucket uri.
50-
# Just the path within the bucket
5149
self.maestro_remote = mirdata.initialize("maestro", data_home=self.source)
5250
self.filesystem = beam.io.filesystems.FileSystems()
5351

@@ -89,8 +87,6 @@ def setup(self) -> None:
8987
import apache_beam as beam
9088
import mirdata
9189

92-
# Oddly enough we dont want to include the gcs bucket uri.
93-
# Just the path within the bucket
9490
self.maestro_remote = mirdata.initialize("maestro", data_home=self.source)
9591
self.filesystem = beam.io.filesystems.FileSystems()
9692
if self.download:

basic_pitch/data/datasets/medleydb_pitch.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,6 @@ def main(known_args: argparse.Namespace, pipeline_args: List[str]) -> None:
164164
"save_main_session": True,
165165
"sdk_container_image": known_args.sdk_container_image,
166166
"job_endpoint": known_args.job_endpoint,
167-
"environment_type": "DOCKER",
168167
"environment_config": known_args.sdk_container_image,
169168
}
170169
pipeline.run(

basic_pitch/data/datasets/slakh.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,6 @@ def main(known_args: argparse.Namespace, pipeline_args: List[str]) -> None:
200200
"save_main_session": True,
201201
"sdk_container_image": known_args.sdk_container_image,
202202
"job_endpoint": known_args.job_endpoint,
203-
"environment_type": "DOCKER",
204203
"environment_config": known_args.sdk_container_image,
205204
}
206205
pipeline.run(

basic_pitch/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ def console_entry_point() -> None:
277277
args.size_evaluation_callback_datasets,
278278
datasets_to_use,
279279
dataset_sampling_frequency,
280-
args.dont_sonify,
280+
args.no_sonify,
281281
args.no_contours,
282282
args.weighted_onset_loss,
283283
args.positive_onset_weight,

0 commit comments

Comments
 (0)