Skip to content

Commit 0cbbdb5

Browse files
authored
Merge pull request #293 from DataBiosphere/dev
PR for 0.4.13 release
2 parents 0c3e313 + 37b4e0e commit 0cbbdb5

File tree

11 files changed

+152
-29
lines changed

11 files changed

+152
-29
lines changed

CONTRIBUTING.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# How to Contribute
1+
# Contributing
22

33
We'd love to accept your patches and contributions to this project. There are
44
just a few small guidelines you need to follow.
@@ -8,16 +8,24 @@ just a few small guidelines you need to follow.
88
Contributions to this project must be accompanied by a Contributor License
99
Agreement. You (or your employer) retain the copyright to your contribution,
1010
this simply gives us permission to use and redistribute your contributions as
11-
part of the project. Head over to https://cla.developers.google.com/ to see
12-
your current agreements on file or to sign a new one.
11+
part of the project. Review and sign our
12+
[Contributor License Agreement](https://docs.google.com/document/d/1Yc-z59DQKRqiqpVgHcIHC3xhfwm3DoPmpc9eFwr_J8c/edit)
13+
and send it to our IP team at [email protected].
1314

14-
You generally only need to submit a CLA once, so if you've already submitted one
15-
(even if it was for a different project), you probably don't need to do it
16-
again.
15+
## How to Contribute
1716

18-
## Code reviews
17+
There are many ways to contribute, including:
1918

20-
All submissions, including submissions by project members, require review. We
21-
use GitHub pull requests for this purpose. Consult
22-
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
23-
information on using pull requests.
19+
* **Reporting Bugs:** If you find a bug, please open an issue describing the problem in detail. Include steps to reproduce, expected behavior, and any relevant error messages.
20+
* **Suggesting Enhancements:** Have an idea for a new feature or improvement? Open an issue to discuss it with us.
21+
* **Improving Documentation:** Help make our documentation clearer and more helpful by suggesting changes or fixing errors.
22+
* **Submitting Pull Requests (PRs):** The best way to contribute code is by submitting a pull request. Before you start working on a major change, please open an issue to discuss the proposed change first.
23+
24+
## Pull Request Guidelines
25+
26+
1. **Fork the repository:** Create your own copy of the repository.
27+
2. **Create a branch:** Make your changes on a new branch.
28+
3. **Write clear commit messages:** Explain what each commit does.
29+
4. **Follow coding style:** Match the existing code style as closely as possible.
30+
5. **Add tests:** Include tests for your changes (if applicable).
31+
6. **Open a pull request:** Submit your changes for review.

docs/retries.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ current attempt will still be treated as a preemptible attempt used.
6363
Transient failure rates should be much lower in practice than preemption rates
6464
and more complex retry logic is not clearly more desirable.
6565

66+
When using the `google-batch` provider, using the `--preemptible` flag will
67+
cause your tasks to be run on [Spot VMs](https://cloud.google.com/spot-vms).
68+
Unlike standard GCE preemptible VMs, Spot VMs do not have a 24-hour time limit.
69+
6670
## Tracking task attempts
6771

6872
When viewing tasks with `dstat --full` the attempt number will be available

dsub/_dsub_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@
2626
0.1.3.dev0 -> 0.1.3 -> 0.1.4.dev0 -> ...
2727
"""
2828

29-
DSUB_VERSION = '0.4.12'
29+
DSUB_VERSION = '0.4.13'

dsub/providers/batch_dummy.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ class AllocationPolicy(object):
4040
NetworkPolicy = None
4141
Accelerator = None
4242
LocationPolicy = None
43+
ProvisioningModel = None
4344

4445
class LogsPolicy(object):
4546
Destination = None

dsub/providers/google_batch.py

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,15 @@
2323
import textwrap
2424
from typing import Dict, List, Set
2525

26+
from ..lib import dsub_util
27+
from ..lib import job_model
28+
from ..lib import param_util
29+
from ..lib import providers_util
2630
from . import base
2731
from . import google_base
2832
from . import google_batch_operations
2933
from . import google_custom_machine
3034
from . import google_utils
31-
from ..lib import job_model
32-
from ..lib import param_util
33-
from ..lib import providers_util
34-
3535

3636
# pylint: disable=g-import-not-at-top
3737
try:
@@ -298,6 +298,7 @@ def get_field(self, field: str, default: str = None):
298298
elif field == 'provider-attributes':
299299
# TODO: This needs to return instance (VM) metadata
300300
value = {}
301+
value['preemptible'] = google_batch_operations.get_preemptible(self._op)
301302
elif field == 'events':
302303
# TODO: This needs to return a list of events
303304
value = []
@@ -394,16 +395,25 @@ class GoogleBatchJobProvider(google_utils.GoogleJobProviderBase):
394395
def __init__(
395396
self, dry_run: bool, project: str, location: str, credentials=None
396397
):
398+
storage_service = dsub_util.get_storage_service(credentials=credentials)
399+
397400
self._dry_run = dry_run
398401
self._location = location
399402
self._project = project
403+
self._storage_service = storage_service
400404

401405
def _batch_handler_def(self):
402406
return GoogleBatchBatchHandler
403407

404408
def _operations_cancel_api_def(self):
405409
return batch_v1.BatchServiceClient().delete_job
406410

411+
def _get_provisioning_model(self, task_resources):
412+
if task_resources.preemptible:
413+
return batch_v1.AllocationPolicy.ProvisioningModel.SPOT
414+
else:
415+
return batch_v1.AllocationPolicy.ProvisioningModel.STANDARD
416+
407417
def _get_batch_job_regions(self, regions, zones) -> List[str]:
408418
"""Returns the list of regions and zones to use for a Batch Job request.
409419
@@ -743,6 +753,7 @@ def _create_batch_request(
743753
accelerator_type=job_resources.accelerator_type,
744754
accelerator_count=job_resources.accelerator_count,
745755
),
756+
provisioning_model=self._get_provisioning_model(task_resources),
746757
)
747758

748759
ipt = google_batch_operations.build_instance_policy_or_template(
@@ -835,6 +846,17 @@ def submit_job(
835846
requests = []
836847

837848
for task_view in job_model.task_view_generator(job_descriptor):
849+
850+
job_params = task_view.job_params
851+
task_params = task_view.task_descriptors[0].task_params
852+
853+
outputs = job_params['outputs'] | task_params['outputs']
854+
if skip_if_output_present:
855+
# check whether the output's already there
856+
if dsub_util.outputs_are_present(outputs, self._storage_service):
857+
print('Skipping task because its outputs are present')
858+
continue
859+
838860
request = self._create_batch_request(task_view)
839861
if self._dry_run:
840862
requests.append(request)
@@ -849,6 +871,10 @@ def submit_job(
849871
# closely resembles yaml, but can't actually be serialized into yaml.
850872
# Ideally, we could serialize these request objects to yaml or json.
851873
print(requests)
874+
875+
if not requests and not launched_tasks:
876+
return {'job-id': dsub_util.NO_JOB}
877+
852878
return {
853879
'job-id': job_descriptor.job_metadata['job-id'],
854880
'user-id': job_descriptor.job_metadata.get('user-id'),

dsub/providers/google_batch_operations.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,16 @@ def get_status_events(op: batch_v1.types.Job):
9999
return op.status.status_events
100100

101101

102+
def get_preemptible(op: batch_v1.types.Job) -> bool:
103+
pm = op.allocation_policy.instances[0].policy.provisioning_model
104+
if pm == batch_v1.AllocationPolicy.ProvisioningModel.SPOT:
105+
return True
106+
elif pm == batch_v1.AllocationPolicy.ProvisioningModel.STANDARD:
107+
return False
108+
else:
109+
raise ValueError(f'Invalid provisioning_model value: {pm}')
110+
111+
102112
def build_job(
103113
task_groups: List[batch_v1.types.TaskGroup],
104114
allocation_policy: batch_v1.types.AllocationPolicy,
@@ -317,6 +327,7 @@ def build_instance_policy(
317327
disks: List[batch_v1.types.AllocationPolicy.AttachedDisk],
318328
machine_type: str,
319329
accelerators: MutableSequence[batch_v1.types.AllocationPolicy.Accelerator],
330+
provisioning_model: batch_v1.types.AllocationPolicy.ProvisioningModel,
320331
) -> batch_v1.types.AllocationPolicy.InstancePolicy:
321332
"""Build an instance policy for a Batch request.
322333
@@ -325,6 +336,7 @@ def build_instance_policy(
325336
disks (List[AttachedDisk]): Non-boot disks to be attached for each VM.
326337
machine_type (str): The Compute Engine machine type.
327338
accelerators (List): The accelerators attached to each VM instance.
339+
provisioning_model (enum): Either SPOT (preemptible) or STANDARD
328340
329341
Returns:
330342
An object representing an instance policy.
@@ -334,6 +346,7 @@ def build_instance_policy(
334346
instance_policy.disks = [disks]
335347
instance_policy.machine_type = machine_type
336348
instance_policy.accelerators = accelerators
349+
instance_policy.provisioning_model = provisioning_model
337350

338351
return instance_policy
339352

setup.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616
# version mismatches.
1717
# This version list generated: 05/03/2024
1818
# direct dependencies
19-
'google-api-python-client>=2.47.0,<=2.127.0',
19+
'google-api-python-client>=2.47.0,<=2.131.0',
2020
'google-auth>=2.6.6,<=2.29.0',
21-
'google-cloud-batch<=0.17.18',
21+
'google-cloud-batch<=0.17.20',
2222
'python-dateutil<=2.9.0',
2323
'pytz<=2024.1',
2424
'pyyaml<=6.0.1',
@@ -29,6 +29,7 @@
2929
'google-api-core>=2.7.3,<=2.19.0',
3030
'google-auth-httplib2<=0.2.0',
3131
'httplib2<=0.22.0',
32+
'protobuf>=3.19.0,<=5.26.0',
3233
'pyasn1<=0.6.0',
3334
'pyasn1-modules<=0.4.0',
3435
'rsa<=4.9',

test/integration/e2e_logging_content.sh

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ EOF
6161
)"
6262

6363
run_dsub \
64+
--unique-job-id \
6465
--command '\
6566
echo -n '"'${STDOUT_MSG%.}'"' && \
6667
1>&2 echo -n '"'${STDERR_MSG%.}'"'
@@ -73,24 +74,16 @@ echo "Checking output..."
7374
# Check the results
7475
readonly STDOUT_RESULT_EXPECTED="$(echo -n "${STDOUT_MSG%.}")"
7576

76-
# There is a bug with the Batch API where blank lines
77-
# do not get printed to log files.
78-
# Temporarily ignore blank lines for the Batch provider.
79-
diff_args=()
80-
if [[ "${DSUB_PROVIDER}" == "google-batch" ]]; then
81-
diff_args+=("--ignore-blank-lines")
82-
fi
83-
8477
readonly STDOUT_RESULT="$(gsutil cat "${STDOUT_LOG}")"
85-
if ! diff "${diff_args[@]}" <(echo "${STDOUT_RESULT_EXPECTED}") <(echo "${STDOUT_RESULT}"); then
78+
if ! diff <(echo "${STDOUT_RESULT_EXPECTED}") <(echo "${STDOUT_RESULT}"); then
8679
echo "STDOUT file does not match expected"
8780
exit 1
8881
fi
8982

9083
readonly STDERR_RESULT_EXPECTED="$(echo -n "${STDERR_MSG%.}")"
9184

9285
readonly STDERR_RESULT="$(gsutil cat "${STDERR_LOG}")"
93-
if ! diff "${diff_args[@]}" <(echo "${STDERR_RESULT_EXPECTED}") <(echo "${STDERR_RESULT}"); then
86+
if ! diff <(echo "${STDERR_RESULT_EXPECTED}") <(echo "${STDERR_RESULT}"); then
9487
echo "STDERR file does not match expected"
9588
exit 1
9689
fi

test/integration/e2e_verify_failure_log.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ source "${SCRIPT_DIR}/test_setup_e2e.sh"
2626

2727
# Run the job
2828
if JOB_ID=$(run_dsub \
29+
--unique-job-id \
2930
--image gcr.io/no.such.image \
3031
--command 'echo "Test"' \
3132
--wait); then

test/integration/unit_flags.google-batch.sh

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,67 @@ function test_zones() {
375375
}
376376
readonly -f test_zones
377377

378+
function test_preemptible_zero() {
379+
local subtest="${FUNCNAME[0]}"
380+
381+
if call_dsub \
382+
--command 'echo "${TEST_NAME}"' \
383+
--preemptible 0; then
384+
385+
# Check that the output contains expected values
386+
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
387+
if [[ "${result}" != "STANDARD" ]]; then
388+
1>&2 echo "provisioning_model was actually ${result}, expected STANDARD"
389+
exit 1
390+
fi
391+
test_passed "${subtest}"
392+
else
393+
test_failed "${subtest}"
394+
fi
395+
}
396+
readonly -f test_preemptible_zero
397+
398+
function test_preemptible_off() {
399+
local subtest="${FUNCNAME[0]}"
400+
401+
if call_dsub \
402+
--command 'echo "${TEST_NAME}"' \
403+
--regions us-central1; then
404+
405+
# Check that the output contains expected values
406+
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
407+
if [[ "${result}" != "STANDARD" ]]; then
408+
1>&2 echo "provisioning_model was actually ${result}, expected STANDARD"
409+
exit 1
410+
fi
411+
test_passed "${subtest}"
412+
else
413+
test_failed "${subtest}"
414+
fi
415+
}
416+
readonly -f test_preemptible_off
417+
418+
function test_preemptible_on() {
419+
local subtest="${FUNCNAME[0]}"
420+
421+
if call_dsub \
422+
--command 'echo "${TEST_NAME}"' \
423+
--regions us-central1 \
424+
--preemptible; then
425+
426+
# Check that the output contains expected values
427+
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
428+
if [[ "${result}" != "SPOT" ]]; then
429+
1>&2 echo "provisioning_model was actually ${result}, expected SPOT"
430+
exit 1
431+
fi
432+
test_passed "${subtest}"
433+
else
434+
test_failed "${subtest}"
435+
fi
436+
}
437+
readonly -f test_preemptible_on
438+
378439
# # Run the tests
379440
trap "exit_handler" EXIT
380441

@@ -408,3 +469,8 @@ test_neither_region_nor_zone
408469
test_region_and_zone
409470
test_regions
410471
test_zones
472+
473+
echo
474+
test_preemptible_zero
475+
test_preemptible_off
476+
test_preemptible_on

0 commit comments

Comments
 (0)