Skip to content

Commit 254f3b0

Browse files
authored
Merge pull request #281 from DataBiosphere/dev
PR for 0.4.10 release
2 parents 8e97456 + 287f648 commit 254f3b0

21 files changed

+227
-78
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,8 @@ If your script has dependent files, you can make them available to your script
376376
by:
377377

378378
* Building a private Docker image with the dependent files and publishing the
379-
image to a public site, or privately to Google Container Registry
379+
image to a public site, or privately to Google Container Registry or
380+
Artifact Registry
380381
* Uploading the files to Google Cloud Storage
381382

382383
To upload the files to Google Cloud Storage, you can use the
@@ -465,8 +466,9 @@ local directory in a similar fashion to support your local development.
465466

466467
##### Mounting a Google Cloud Storage bucket
467468

468-
To have the `google-v2` or `google-cls-v2` provider mount a Cloud Storage bucket using
469-
Cloud Storage FUSE, use the `--mount` command line flag:
469+
To have the `google-v2` or `google-cls-v2` provider mount a Cloud Storage bucket
470+
using [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse),
471+
use the `--mount` command line flag:
470472

471473
--mount RESOURCES=gs://mybucket
472474

docs/code.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,10 @@ sites such as [Docker Hub](https://hub.docker.com/). Images can be pulled
111111
from Docker Hub or any container registry:
112112

113113
```
114-
--image debian:jessie # pull image implicitly from Docker hub.
115-
--image gcr.io/PROJECT/IMAGE # pull from GCR registry.
116-
--image quay.io/quay/ubuntu # pull from Quay.io.
114+
--image debian:jessie # pull image implicitly from Docker hub
115+
--image gcr.io/PROJECT/IMAGE # pull from Google Container Registry
116+
--image us-central1.pkg.dev/PROJECT/REPO/IMAGE # pull from Artifact Registry
117+
--image quay.io/quay/ubuntu # pull from Quay.io
117118
```
118119

119120
When you have more than a single custom script to run or you have dependent
@@ -123,8 +124,9 @@ store it in a container registry.
123124

124125
A quick way to start using custom Docker images is to use Google Container
125126
Builder which will build an image remotely and store it in the [Google Container
126-
Registry](https://cloud.google.com/container-registry/docs/). Alternatively you
127-
can build a Docker image locally and push it to a registry. See the
127+
Registry](https://cloud.google.com/container-registry/docs)
128+
or [Artifact Registry](https://cloud.google.com/artifact-registry/docs).
129+
Alternatively you can build a Docker image locally and push it to a registry. See the
128130
[FastQC example](../examples/fastqc) for a demonstration of both strategies.
129131

130132
For information on building Docker images, see the Docker documentation:

docs/compute_resources.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,8 @@ A Compute Engine VM by default has both a public (external) IP address and a
8282
private (internal) IP address. For batch processing, it is often the case that
8383
no public IP address is necessary. If your job only accesses Google services,
8484
such as Cloud Storage (inputs, outputs, and logging) and Google Container
85-
Registry (your Docker image), then you can run your `dsub` job on VMs without a
86-
public IP address.
85+
Registry or Artifact Registry (your Docker image), then you can run your `dsub`
86+
job on VMs without a public IP address.
8787

8888
For more information on Compute Engine IP addresses, see:
8989

@@ -132,7 +132,9 @@ was assigned.**
132132
The default `--image` used for `dsub` tasks is `ubuntu:14.04` which is pulled
133133
from Dockerhub. For VMs that do not have a public IP address, set the `--image`
134134
flag to a Docker image hosted by
135-
[Google Container Registry](https://cloud.google.com/container-registry/docs).
135+
[Google Container Registry](https://cloud.google.com/container-registry/docs) or
136+
[Artifact Registry](https://cloud.google.com/artifact-registry/docs).
137+
136138
Google provides a set of
137139
[Managed Base Images](https://cloud.google.com/container-registry/docs/managed-base-images)
138140
in Container Registry that can be used as simple replacements for your tasks.

docs/input_output.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -256,15 +256,28 @@ the name of the input parameter must comply with the
256256

257257
## Requester Pays
258258

259-
To access a Google Cloud Storage
260-
[Requester Pays bucket](https://cloud.google.com/storage/docs/requester-pays),
261-
you will need to specify a billing project. To do so, use the `dsub`
262-
command-line option `--user-project`:
259+
Unless specifically enabled, a Google Cloud Storage bucket is "owner pays"
260+
for all requests. This includes
261+
[network charges](https://cloud.google.com/vpc/network-pricing) for egress
262+
(data downloads or copies to a different cloud region), as well as
263+
[retrieval charges](https://cloud.google.com/storage/pricing#retrieval-pricing)
264+
on files in "cold" storage classes, such as Nearline, Coldline, and Archive.
265+
266+
When [Requester Pays](https://cloud.google.com/storage/docs/requester-pays)
267+
is enabled on a bucket, the requester must specify a Cloud project to which
268+
charges can be billed. Use the `dsub` command-line option `--user-project`:
263269

264270
```
265271
--user-project my-cloud-project
266272
```
267273

274+
The user project specified will be passed for all GCS interactions, including:
275+
276+
- Logging
277+
- Localization (inputs)
278+
- Delocalization (outputs)
279+
- Mount (gcs fuse)
280+
268281
## Unsupported path formats:
269282

270283
* GCS recursive wildcards (**) are not supported

dsub/_dsub_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@
2626
0.1.3.dev0 -> 0.1.3 -> 0.1.4.dev0 -> ...
2727
"""
2828

29-
DSUB_VERSION = '0.4.9'
29+
DSUB_VERSION = '0.4.10'

dsub/commands/dsub.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -180,20 +180,26 @@ def get_credentials(args):
180180

181181

182182
def _check_private_address(args):
183-
"""If --use-private-address is enabled, ensure the Docker path is for GCR."""
183+
"""If --use-private-address is enabled, Docker path must be for GCR or AR."""
184184
if args.use_private_address:
185185
image = args.image or DEFAULT_IMAGE
186186
split = image.split('/', 1)
187-
if len(split) == 1 or not split[0].endswith('gcr.io'):
187+
if len(split) == 1 or not (
188+
split[0].endswith('gcr.io') or split[0].endswith('pkg.dev')
189+
):
188190
raise ValueError(
189-
'--use-private-address must specify a --image with a gcr.io host')
191+
'--use-private-address must specify a --image with a gcr.io or'
192+
' pkg.dev host'
193+
)
190194

191195

192196
def _check_nvidia_driver_version(args):
193197
"""If --nvidia-driver-version is set, warn that it is ignored."""
194198
if args.nvidia_driver_version:
195-
print('***WARNING: The --nvidia-driver-version flag is deprecated and will '
196-
'be ignored.')
199+
print(
200+
'***WARNING: The --nvidia-driver-version flag is deprecated and will '
201+
'be ignored.'
202+
)
197203

198204

199205
def _google_cls_v2_parse_arguments(args):
@@ -360,8 +366,10 @@ def _parse_arguments(prog, argv):
360366
parser.add_argument(
361367
'--user-project',
362368
help="""Specify a user project to be billed for all requests to Google
363-
Cloud Storage (logging, localization, delocalization). This flag exists
364-
to support accessing Requester Pays buckets (default: None)""")
369+
Cloud Storage (logging, localization, delocalization, mounting).
370+
This flag exists to support accessing Requester Pays buckets
371+
(default: None)""",
372+
)
365373
parser.add_argument(
366374
'--mount',
367375
nargs='*',

dsub/providers/google_v2_base.py

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -296,30 +296,43 @@ def _get_logging_env(self, logging_uri, user_project):
296296
'USER_PROJECT': user_project,
297297
}
298298

299-
def _get_mount_actions(self, mounts, mnt_datadisk):
299+
def _get_mount_actions(self, mounts, mnt_datadisk, user_project):
300300
"""Returns a list of two actions per gcs bucket to mount."""
301301
actions_to_add = []
302302
for mount in mounts:
303303
bucket = mount.value[len('gs://'):]
304304
mount_path = mount.docker_path
305+
306+
mount_command = (
307+
['--billing-project', user_project] if user_project else []
308+
)
309+
mount_command.extend([
310+
'--implicit-dirs',
311+
'--foreground',
312+
'-o ro',
313+
bucket,
314+
os.path.join(_DATA_MOUNT_POINT, mount_path),
315+
])
316+
305317
actions_to_add.extend([
306318
google_v2_pipelines.build_action(
307319
name='mount-{}'.format(bucket),
308320
enable_fuse=True,
309321
run_in_background=True,
310322
image_uri=_GCSFUSE_IMAGE,
311323
mounts=[mnt_datadisk],
312-
commands=[
313-
'--implicit-dirs', '--foreground', '-o ro', bucket,
314-
os.path.join(_DATA_MOUNT_POINT, mount_path)
315-
]),
324+
commands=mount_command,
325+
),
316326
google_v2_pipelines.build_action(
317327
name='mount-wait-{}'.format(bucket),
318328
enable_fuse=True,
319329
image_uri=_GCSFUSE_IMAGE,
320330
mounts=[mnt_datadisk],
321-
commands=['wait',
322-
os.path.join(_DATA_MOUNT_POINT, mount_path)])
331+
commands=[
332+
'wait',
333+
os.path.join(_DATA_MOUNT_POINT, mount_path),
334+
],
335+
),
323336
])
324337
return actions_to_add
325338

@@ -418,7 +431,9 @@ def _build_pipeline_request(self, task_view):
418431
if job_resources.ssh:
419432
optional_actions += 1
420433

421-
mount_actions = self._get_mount_actions(gcs_mounts, mnt_datadisk)
434+
mount_actions = self._get_mount_actions(
435+
gcs_mounts, mnt_datadisk, user_project
436+
)
422437
optional_actions += len(mount_actions)
423438

424439
user_action = 4 + optional_actions

dsub/providers/local/runner.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,9 @@ function configure_docker_if_necessary() {
153153

154154
# Check that the prefix is gcr.io or <location>.gcr.io
155155
if [[ "${prefix}" == "gcr.io" ]] ||
156-
[[ "${prefix}" == *.gcr.io ]]; then
156+
[[ "${prefix}" == *.gcr.io ]] ||
157+
[[ "${prefix}" == "pkg.dev" ]] ||
158+
[[ "${prefix}" == *.pkg.dev ]] ; then
157159
log_info "Ensuring docker auth is configured for ${prefix}"
158160
gcloud --quiet auth configure-docker "${prefix}"
159161
fi

setup.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,28 +14,28 @@
1414
# dependencies for dsub, ddel, dstat
1515
# Pin to known working versions to prevent episodic breakage from library
1616
# version mismatches.
17-
# This version list generated: 04/13/2023
17+
# This version list generated: 12/07/2023
1818
# direct dependencies
19-
'google-api-python-client>=2.47.0,<=2.85.0',
20-
'google-auth>=2.6.6,<=2.17.3',
21-
'google-cloud-batch==0.10.0',
19+
'google-api-python-client>=2.47.0,<=2.109.0',
20+
'google-auth>=2.6.6,<=2.25.1',
21+
'google-cloud-batch==0.17.5',
2222
'python-dateutil<=2.8.2',
2323
'pytz<=2023.3',
24-
'pyyaml<=6.0',
25-
'tenacity<=8.2.2',
24+
'pyyaml<=6.0.1',
25+
'tenacity<=8.2.3',
2626
'tabulate<=0.9.0',
2727
# downstream dependencies
2828
'funcsigs==1.0.2',
29-
'google-api-core>=2.7.3,<=2.11.0',
30-
'google-auth-httplib2<=0.1.0',
29+
'google-api-core>=2.7.3,<=2.15.0',
30+
'google-auth-httplib2<=0.1.1',
3131
'httplib2<=0.22.0',
32-
'pyasn1<=0.4.8',
33-
'pyasn1-modules<=0.2.8',
32+
'pyasn1<=0.5.1',
33+
'pyasn1-modules<=0.3.0',
3434
'rsa<=4.9',
3535
'uritemplate<=4.1.1',
3636
# dependencies for test code
37-
'parameterized<=0.8.1',
38-
'mock<=4.0.3',
37+
'parameterized<=0.9.0',
38+
'mock<=5.1.0',
3939
]
4040

4141

test/integration/e2e_dstat.sh

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ function verify_dstat_output() {
3535

3636
# Verify that that the jobs are found and are in the expected order.
3737
# dstat sort ordering is by create-time (descending), so job 0 here should be the last started.
38-
local first_job_name="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[0].job-name")"
39-
local second_job_name="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[1].job-name")"
40-
local third_job_name="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[2].job-name")"
38+
local first_job_name="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[0].job-name")"
39+
local second_job_name="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[1].job-name")"
40+
local third_job_name="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[2].job-name")"
4141

4242
if [[ "${first_job_name}" != "${RUNNING_JOB_NAME_2}" ]]; then
4343
1>&2 echo "Job ${RUNNING_JOB_NAME_2} not found in the correct location in the dstat output! "
@@ -87,8 +87,8 @@ function verify_dstat_google_provider_fields() {
8787

8888
for (( task=0; task < 3; task++ )); do
8989
# Run the provider test.
90-
local job_name="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].job-name")"
91-
local job_provider="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider")"
90+
local job_name="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].job-name")"
91+
local job_provider="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider")"
9292

9393
# Validate provider.
9494
if [[ "${job_provider}" != "${DSUB_PROVIDER}" ]]; then
@@ -99,7 +99,7 @@ function verify_dstat_google_provider_fields() {
9999

100100
# For google-cls-v2, validate that the correct "location" was used for the request.
101101
if [[ "${DSUB_PROVIDER}" == "google-cls-v2" ]]; then
102-
local op_name="$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${DSTAT_OUTPUT}" "[0].internal-id")"
102+
local op_name="$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${DSTAT_OUTPUT}" "[0].internal-id")"
103103

104104
# The operation name format is projects/<project-number>/locations/<location>/operations/<operation-id>
105105
local op_location="$(echo -n "${op_name}" | awk -F '/' '{ print $4 }')"
@@ -131,15 +131,15 @@ function verify_dstat_google_provider_fields() {
131131
util::dstat_yaml_assert_boolean_field_equal "${dstat_out}" "[${task}].provider-attributes.preemptible" "false"
132132

133133
# Check that instance name is not empty
134-
local instance_name=$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider-attributes.instance-name")
134+
local instance_name=$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider-attributes.instance-name")
135135
if [[ -z "${instance_name}" ]]; then
136136
1>&2 echo " - FAILURE: Instance ${instance_name} for job ${job_name}, task $((task+1)) is empty."
137137
1>&2 echo "${dstat_out}"
138138
exit 1
139139
fi
140140

141141
# Check zone exists and is expected format
142-
local job_zone=$(python "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider-attributes.zone")
142+
local job_zone=$(python3 "${SCRIPT_DIR}"/get_data_value.py "yaml" "${dstat_out}" "[${task}].provider-attributes.zone")
143143
if ! [[ "${job_zone}" =~ ^[a-z]{1,4}-[a-z]{2,15}[0-9]-[a-z]$ ]]; then
144144
1>&2 echo " - FAILURE: Zone ${job_zone} for job ${job_name}, task $((task+1)) not valid."
145145
1>&2 echo "${dstat_out}"

0 commit comments

Comments
 (0)