Skip to content

Commit dba9b86

Browse files
authored
Merge pull request #296 from DataBiosphere/dev
PR for 0.5.0 release
2 parents 0cbbdb5 + 88e8b72 commit dba9b86

39 files changed

+182
-479
lines changed

README.md

Lines changed: 36 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ your shell.
5252

5353
#### Install the Google Cloud SDK
5454

55-
While not used directly by `dsub` for the `google-v2` or `google-cls-v2` providers, you are likely to want to install the command line tools found in the [Google
55+
While not used directly by `dsub` for the `google-batch` or `google-cls-v2` providers, you are likely to want to install the command line tools found in the [Google
5656
Cloud SDK](https://cloud.google.com/sdk/).
5757

5858
If you will be using the `local` provider for faster job development,
@@ -156,13 +156,13 @@ You'll get quicker turnaround times and won't incur cloud charges using it.
156156

157157
### Getting started on Google Cloud
158158

159-
`dsub` supports the use of two different APIs from Google Cloud for running
160-
tasks. Google Cloud is transitioning from `Genomics v2alpha1`
161-
to [Cloud Life Sciences v2beta](https://cloud.google.com/life-sciences/docs/reference/rest).
159+
`dsub` currently supports the [Cloud Life Sciences v2beta](https://cloud.google.com/life-sciences/docs/reference/rest)
160+
API from Google Cloud and is is developing support for the [Batch](https://cloud.google.com/batch/docs/reference/rest)
161+
API from Google Cloud.
162162

163-
`dsub` supports both APIs with the (old) `google-v2` and (new) `google-cls-v2`
164-
providers respectively. `google-v2` is the current default provider. `dsub`
165-
will be transitioning to make `google-cls-v2` the default in coming releases.
163+
`dsub` supports the v2beta API with the `google-cls-v2` provider.
164+
`google-cls-v2` is the current default provider. `dsub` will be transitioning to
165+
make `google-batch` the default in coming releases.
166166

167167
The steps for getting started differ slightly as indicated in the steps below:
168168

@@ -171,13 +171,14 @@ The steps for getting started differ slightly as indicated in the steps below:
171171

172172
1. Enable the APIs:
173173

174-
- For the `v2alpha1` API (provider: `google-v2`):
174+
- For the `v2beta` API (provider: `google-cls-v2`):
175175

176-
[Enable the Genomics, Storage, and Compute APIs](https://console.cloud.google.com/flows/enableapi?apiid=genomics,storage_component,compute_component&redirect=https://console.cloud.google.com).
176+
[Enable the Cloud Life Sciences, Storage, and Compute APIs](https://console.cloud.google.com/flows/enableapi?apiid=lifesciences.googleapis.com,storage.googleapis.com,compute.googleapis.com&redirect=https://console.cloud.google.com)
177177

178-
- For the `v2beta` API (provider: `google-cls-v2`):
178+
- For the `batch` API (provider: `google-batch`):
179+
180+
[Enable the Batch, Storage, and Compute APIs](https://console.cloud.google.com/flows/enableapi?apiid=batch.googleapis.com,storage.googleapis.com,compute.googleapis.com&redirect=https://console.cloud.google.com).
179181

180-
[Enable the Cloud Life Sciences, Storage, and Compute APIs](https://console.cloud.google.com/flows/enableapi?apiid=lifesciences.googleapis.com,storage_component,compute_component&redirect=https://console.cloud.google.com)
181182

182183
1. Provide [credentials](https://developers.google.com/identity/protocols/application-default-credentials)
183184
so `dsub` can call Google APIs:
@@ -202,10 +203,10 @@ The steps for getting started differ slightly as indicated in the steps below:
202203

203204
1. Run a very simple "Hello World" `dsub` job and wait for completion.
204205

205-
- For the `v2alpha1` API (provider: `google-v2`):
206+
- For the `v2beta` API (provider: `google-cls-v2`):
206207

207208
dsub \
208-
--provider google-v2 \
209+
--provider google-cls-v2 \
209210
--project my-cloud-project \
210211
--regions us-central1 \
211212
--logging gs://my-bucket/logging/ \
@@ -216,10 +217,10 @@ The steps for getting started differ slightly as indicated in the steps below:
216217
Change `my-cloud-project` to your Google Cloud project, and `my-bucket` to
217218
the bucket you created above.
218219

219-
- For the `v2beta` API (provider: `google-cls-v2`):
220+
- For the `batch` API (provider: `google-batch`):
220221

221222
dsub \
222-
--provider google-cls-v2 \
223+
--provider google-batch \
223224
--project my-cloud-project \
224225
--regions us-central1 \
225226
--logging gs://my-bucket/logging/ \
@@ -246,14 +247,13 @@ To this end, `dsub` provides multiple "backend providers", each of which
246247
implements a consistent runtime environment. The current providers are:
247248

248249
- local
249-
- google-v2 (the default)
250-
- google-cls-v2
250+
- google-cls-v2(the default)
251251
- google-batch (*new*)
252252

253253
More details on the runtime environment implemented by the backend providers
254254
can be found in [dsub backend providers](https://github.com/DataBiosphere/dsub/blob/main/docs/providers/README.md).
255255

256-
### Differences between `google-v2`, `google-cls-v2` and `google-batch`
256+
### Differences between `google-cls-v2` and `google-batch`
257257

258258
The `google-cls-v2` provider is built on the Cloud Life Sciences `v2beta` API.
259259
This API is very similar to its predecessor, the Genomics `v2alpha1` API.
@@ -265,29 +265,15 @@ Details of Cloud Life Sciences versus Batch can be found in this
265265
[Migration Guide](https://cloud.google.com/batch/docs/migrate-to-batch-from-cloud-life-sciences).
266266

267267
`dsub` largely hides the differences between the APIs, but there are a
268-
few difference to note:
269-
270-
- `v2beta` and Cloud Batch are regional services, `v2alpha1` is a global service
271-
272-
What this means is that with `v2alpha1`, the metadata about your tasks
273-
(called "operations"), is stored in a global database, while with `v2beta` and
274-
Cloud Batch, the metadata about your tasks are stored in a regional database. If
275-
your operation/job information needs to stay in a particular region, use the
276-
`v2beta` or Batch API (the `google-cls-v2` or `google-batch` provider), and
277-
specify the `--location` where your operation/job information should be stored.
268+
few differences to note:
278269

279-
- The `--regions` and `--zones` flags can be omitted when using `google-cls-v2` and `google-batch`
270+
- `google-batch` requires jobs to run in one region
280271

281272
The `--regions` and `--zones` flags for `dsub` specify where the tasks should
282-
run. More specifically, this specifies what Compute Engine Zones to use for
283-
the VMs that run your tasks.
284-
285-
With the `google-v2` provider, there is no default region or zone, and thus
286-
one of the `--regions` or `--zones` flags is required.
287-
288-
With `google-cls-v2` and `google-batch`, the `--location` flag defaults to
289-
`us-central1`, and if the `--regions` and `--zones` flags are omitted, the
290-
`location` will be used as the default `regions` list.
273+
run. The `google-cls-v2` allows you to specify a multi-region like `US`,
274+
multiple regions, or multiple zones across regions. With the `google-batch`
275+
provider, you must specify either one region or multiple zones within a single
276+
region.
291277

292278
## `dsub` features
293279

@@ -463,15 +449,15 @@ mounting read-only:
463449
[Compute Engine Image](https://cloud.google.com/compute/docs/images) that you
464450
pre-create.
465451

466-
The `google-v2` and `google-cls-v2` providers support these methods of
452+
The `google-cls-v2` and `google-batch` provider support these methods of
467453
providing access to resource data.
468454

469455
The `local` provider supports mounting a
470456
local directory in a similar fashion to support your local development.
471457

472458
##### Mounting a Google Cloud Storage bucket
473459

474-
To have the `google-v2`, `google-cls-v2`, or `google-batch` provider mount a
460+
To have the `google-cls-v2` or `google-batch` provider mount a
475461
Cloud Storage bucket using
476462
[Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse), use the
477463
`--mount` command line flag:
@@ -488,15 +474,15 @@ before using Cloud Storage FUSE.
488474

489475
##### Mounting an existing peristent disk
490476

491-
To have the `google-v2` or `google-cls-v2` provider mount a persistent disk that
477+
To have the `google-cls-v2` or `google-batch` provider mount a persistent disk that
492478
you have pre-created and populated, use the `--mount` command line flag and the
493479
url of the source disk:
494480

495481
--mount RESOURCES="https://www.googleapis.com/compute/v1/projects/your-project/zones/your_disk_zone/disks/your-disk"
496482

497483
##### Mounting a persistent disk, created from an image
498484

499-
To have the `google-v2` or `google-cls-v2` provider mount a persistent disk created from an image,
485+
To have the `google-cls-v2` or `google-batch` provider mount a persistent disk created from an image,
500486
use the `--mount` command line flag and the url of the source image and the size
501487
(in GB) of the disk:
502488

@@ -527,7 +513,7 @@ path using the environment variable.
527513
`dsub` tasks run using the `local` provider will use the resources available on
528514
your local machine.
529515

530-
`dsub` tasks run using the `google`, `google-v2`, or `google-cls-v2` providers can take advantage
516+
`dsub` tasks run using the `google-cls-v2` or `google-batch` providers can take advantage
531517
of a wide range of CPU, RAM, disk, and hardware accelerator (eg. GPU) options.
532518

533519
See the [Compute Resources](https://github.com/DataBiosphere/dsub/blob/main/docs/compute_resources.md)
@@ -634,14 +620,14 @@ For more details, see [Checking Status and Troubleshooting Jobs](https://github.
634620

635621
The `dstat` command displays the status of jobs:
636622

637-
dstat --provider google-v2 --project my-cloud-project
623+
dstat --provider google-cls-v2 --project my-cloud-project
638624

639625
With no additional arguments, dstat will display a list of *running* jobs for
640626
the current `USER`.
641627

642628
To display the status of a specific job, use the `--jobs` flag:
643629

644-
dstat --provider google-v2 --project my-cloud-project --jobs job-id
630+
dstat --provider google-cls-v2 --project my-cloud-project --jobs job-id
645631

646632
For a batch job, the output will list all *running* tasks.
647633

@@ -673,7 +659,7 @@ By default, dstat outputs one line per task. If you're using a batch job with
673659
many tasks then you may benefit from `--summary`.
674660

675661
```
676-
$ dstat --provider google-v2 --project my-project --status '*' --summary
662+
$ dstat --provider google-cls-v2 --project my-project --status '*' --summary
677663
678664
Job Name Status Task Count
679665
------------- ------------- -------------
@@ -694,25 +680,25 @@ Use the `--users` flag to specify other users, or `'*'` for all users.
694680

695681
To delete a running job:
696682

697-
ddel --provider google-v2 --project my-cloud-project --jobs job-id
683+
ddel --provider google-cls-v2 --project my-cloud-project --jobs job-id
698684

699685
If the job is a batch job, all running tasks will be deleted.
700686

701687
To delete specific tasks:
702688

703689
ddel \
704-
--provider google-v2 \
690+
--provider google-cls-v2 \
705691
--project my-cloud-project \
706692
--jobs job-id \
707693
--tasks task-id1 task-id2
708694

709695
To delete all running jobs for the current user:
710696

711-
ddel --provider google-v2 --project my-cloud-project --jobs '*'
697+
ddel --provider google-cls-v2 --project my-cloud-project --jobs '*'
712698

713699
## Service Accounts and Scope (Google providers only)
714700

715-
When you run the `dsub` command with the `google-v2` or `google-cls-v2`
701+
When you run the `dsub` command with the `google-cls-v2` or `google-batch`
716702
provider, there are two different sets of credentials to consider:
717703

718704
- Account submitting the `pipelines.run()` request to run your command/script on a VM

docs/job_control.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,22 +61,22 @@ dsub ... --after "${JOB_A}" "${JOB_B}"
6161
Here is the output of a sample run:
6262

6363
```
64-
$ JOBID_A=$(dsub --provider google-v2 --project "${MYPROJECT}" --regions us-central1 \
64+
$ JOBID_A=$(dsub --provider google-cls-v2 --project "${MYPROJECT}" --regions us-central1 \
6565
--logging "gs://${MYBUCKET}/logging/" \
6666
--command 'echo "hello from job A"')
6767
Job: echo--<user>--180924-112256-64
6868
Launched job-id: echo--<user>--180924-112256-64
6969
To check the status, run:
70-
dstat --provider google-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112256-64' --status '*'
70+
dstat --provider google-cls-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112256-64' --status '*'
7171
To cancel the job, run:
72-
ddel --provider google-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112256-64'
72+
ddel --provider google-cls-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112256-64'
7373
7474
$ echo "${JOBID_A}"
7575
echo--<user>--180924-112256-64
7676
7777
$ JOBID_B=... (similar)
7878
79-
$ JOBID_C=$(dsub --provider google-v2 --project "${MYPROJECT}" --regions us-central1 \
79+
$ JOBID_C=$(dsub --provider google-cls-v2 --project "${MYPROJECT}" --regions us-central1 \
8080
--logging "gs://${MYBUCKET}/logging/" \
8181
--command 'echo "job C"' --after "${JOBID_A}" "${JOBID_B}")
8282
Waiting for predecessor jobs to complete...
@@ -86,9 +86,9 @@ Waiting for: echo--<user>--180924-112259-48.
8686
echo--<user>--180924-112259-48: SUCCESS
8787
Launched job-id: echo--<user>--180924-112302-87
8888
To check the status, run:
89-
dstat --provider google-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112302-87' --status '*'
89+
dstat --provider google-cls-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112302-87' --status '*'
9090
To cancel the job, run:
91-
ddel --provider google-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112302-87'
91+
ddel --provider google-cls-v2 --project ${MYPROJECT} --jobs 'echo--<user>--180924-112302-87'
9292
echo--<user>--180924-112302-87
9393
```
9494

docs/providers/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ implements a consistent runtime environment. The current supported providers
1010
are:
1111

1212
- local
13-
- google-v2 (the default)
14-
- google-cls-v2 (*new*)
13+
- google-cls-v2 (the default)
14+
- google-batch (*new*)
1515

1616
## Runtime environment
1717

@@ -194,13 +194,13 @@ During execution, `runner.sh` writes the following files to record task state:
194194
The `local` provider does not support resource-related flags such as
195195
`--min-cpu`, `--min-ram`, `--boot-disk-size`, or `--disk-size`.
196196

197-
### `google-v2` and `google-cls-v2` providers
197+
### `google-cls-v2` and `google-batch` providers
198198

199-
The `google-v2` and `google-cls-v2` providers share a significant amount of
200-
their implementation. The `google-v2` provider utilizes the Google Genomics
201-
Pipelines API `v2alpha1`
202-
while the `google-cls-v2` provider utilizes the Google Cloud Life Sciences
199+
The `google-cls-v2` and `google-batch` providers share a significant amount of
200+
their implementation. The `google-cls-v2` provider utilizes the Google Cloud Life Sciences
203201
Piplines API [v2beta](https://cloud.google.com/life-sciences/docs/apis)
202+
while the `google-batch` provider utilizes the Google Cloud
203+
[Batch API](https://cloud.google.com/batch/docs/reference/rest)
204204
to queue a request for the following sequence of events:
205205

206206
1. Create a Google Compute Engine
@@ -282,7 +282,7 @@ its status is `RUNNING`.
282282

283283
#### Logging
284284

285-
The `google-v2` provider saves 3 log files to Cloud Storage, every 5 minutes
285+
The `google-cls-v2` and `google-batch` provider saves 3 log files to Cloud Storage, every 5 minutes
286286
to the `--logging` location specified to `dsub`:
287287

288288
- `[prefix].log`: log generated by all containers running on the VM
@@ -293,7 +293,7 @@ Logging paths and the `[prefix]` are discussed further in [Logging](../logging.m
293293

294294
#### Resource requirements
295295

296-
The `google-v2` and `google-cls-v2` providers support many resource-related
296+
The `google-cls-v2` and `google-batch` providers support many resource-related
297297
flags to configure the Compute Engine VMs that tasks run on, such as
298298
`--machine-type` or `--min-cores` and `--min-ram`, as well as `--boot-disk-size`
299299
and `--disk-size`. Additional provider-specific parameters are available
@@ -311,12 +311,12 @@ large Docker images are used, as such images need to be pulled to the boot disk.
311311

312312
#### Provider specific parameters
313313

314-
The following `dsub` parameters are specific to the `google-v2` and
315-
`google-cls-v2` providers:
314+
The following `dsub` parameters are specific to the `google-cls-v2` and
315+
`google-batch` providers:
316316

317317
* [Location resources](https://cloud.google.com/about/locations)
318318

319-
- `--location` (`google-cls-v2` only):
319+
- `--location`:
320320
- Specifies the Google Cloud region to which the pipeline request will be
321321
sent and where operation metadata will be stored. The associated dsub task
322322
may be executed in another region if the `--regions` or `--zones`

0 commit comments

Comments
 (0)