Skip to content

Commit 691c84c

Browse files
committed
Co-execution client Google Cloud Platform Batch v1.
1 parent 84194b7 commit 691c84c

18 files changed

+1192
-100
lines changed

dev-requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ types-pycurl
3131
types-requests
3232
types-psutil
3333
sentry-sdk
34+
types-google-cloud-ndb
3435

3536
# For release
3637
build

docs/containers.rst

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,164 @@ GA4GH TES
132132

133133
GA4GH TES job execution with Conda dependencies for the tool and no message queue.
134134

135+
A Galaxy job configuration (job_conf.yml) for using TES with Pulsar and RabbitMQ might look like:
136+
137+
::
138+
139+
runners:
140+
local:
141+
load: galaxy.jobs.runners.local:LocalJobRunner
142+
pulsar_tes:
143+
load: galaxy.jobs.runners.pulsar:PulsarTesJobRunner
144+
# RabbitMQ URL from Galaxy server.
145+
amqp_url: <amqp_url>
146+
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
147+
#galaxy_url: <galaxy_url>
148+
149+
execution:
150+
default: pulsar_tes
151+
environments:
152+
local:
153+
runner: local
154+
local_slots: 1
155+
pulsar_tes:
156+
runner: pulsar_tes
157+
# TES URL to use.
158+
tes_url: <tes_url>
159+
pulsar_app_config:
160+
# This needs to be the RabbitMQ server, but this should be the host
161+
# and port that your TES nodes would connect to the server via.
162+
message_queue_url: <amqp_url>
163+
164+
tools:
165+
- class: local
166+
environment: local
167+
168+
For testing on a Macbook with RabbitMQ installed via homebrew and Docker Desktop available
169+
and a Funnel with default configuration server running locally, a configuration might look like:
170+
171+
::
172+
173+
runners:
174+
local:
175+
load: galaxy.jobs.runners.local:LocalJobRunner
176+
pulsar_tes:
177+
load: galaxy.jobs.runners.pulsar:PulsarTesJobRunner
178+
# RabbitMQ URL from Galaxy server.
179+
amqp_url: amqp://guest:guest@localhost:5672//
180+
# Communicate to Pulsar nodes that Galaxy should be accessed on the Docker
181+
# host - the Macbook.
182+
galaxy_url: http://host.docker.internal:8080/
183+
184+
execution:
185+
default: pulsar_tes
186+
environments:
187+
local:
188+
runner: local
189+
local_slots: 1
190+
pulsar_tes:
191+
runner: pulsar_tes
192+
# Funnel will run on 8000 by default.
193+
tes_url: http://localhost:8000
194+
pulsar_app_config:
195+
message_queue_url: amqp://guest:[email protected]:5672//
196+
197+
tools:
198+
- class: local
199+
environment: local
200+
201+
202+
Google Cloud Platform Batch
203+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
204+
205+
.. figure:: pulsar_gcp_coexecution_deployment.plantuml.svg
206+
207+
GA4GH TES job execution with a biocontainer for the tool and no message queue.
208+
209+
.. figure:: pulsar_gcp_deployment.plantuml.svg
210+
211+
GA4GH TES job execution with Conda dependencies for the tool and no message queue.
212+
213+
Pulsar job destination options to configure these scenarios:
214+
215+
.. figure:: job_destination_parameters_gcp.png
216+
217+
A Galaxy job configuration (job_conf.yml) for using GCP with Pulsar and RabbitMQ might look like:
218+
219+
::
220+
221+
runners:
222+
local:
223+
load: galaxy.jobs.runners.local:LocalJobRunner
224+
pulsar_gcp:
225+
load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
226+
# RabbitMQ URL from Galaxy server.
227+
amqp_url: <amqp_url>
228+
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
229+
#galaxy_url: <galaxy_url>
230+
231+
execution:
232+
default: pulsar_gcp
233+
environments:
234+
local:
235+
runner: local
236+
local_slots: 1
237+
pulsar_gcp:
238+
runner: pulsar_gcp
239+
# GCP Project ID to use (required)
240+
project_id: project-id-here
241+
# GCP region or zone to use (optional)
242+
#region: us-central1
243+
# Max walltime to use in seconds (defaults to 60 * 60 * 24)
244+
#walltime_limit: 216000
245+
# GCP Credentials setup.
246+
#credentials_file: ~/.config/gcloud/application_default_credentials.json
247+
pulsar_app_config:
248+
# RabbitMQ URL the execute nodes should use to connect to the AMQP server.
249+
message_queue_url: <amqp_url>
250+
251+
tools:
252+
- class: local
253+
environment: local
254+
255+
For testing these configurations - John setup a production-ish RabbitMQ server on
256+
173.255.213.165 with user `john` and password `password` that is accessible from
257+
anywhere. John also opened the router ports to expose their Macbook and set Galaxy
258+
to bind to ``0.0.0.0`` using the `bind` option in the `gunicorn` section of `galaxy.yml`.
259+
260+
The job configuration for this test setup looked something like:
261+
262+
::
263+
264+
runners:
265+
local:
266+
load: galaxy.jobs.runners.local:LocalJobRunner
267+
pulsar_gcp:
268+
load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
269+
amqp_url: "amqp://john:[email protected]/"
270+
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
271+
galaxy_url: http://71.162.7.202:8080/
272+
273+
execution:
274+
default: pulsar_gcp
275+
environments:
276+
local:
277+
runner: local
278+
local_slots: 1
279+
pulsar_gcp:
280+
runner: pulsar_gcp
281+
project_id: tonal-bloom-123435
282+
region: us-central1
283+
walltime_limit: 216000
284+
pulsar_app_config:
285+
# RabbitMQ URL the execute nodes should use to connect to the AMQP server.
286+
message_queue_url: "amqp://john:[email protected]/"
287+
288+
tools:
289+
- class: local
290+
environment: local
291+
292+
135293
AWS Batch
136294
~~~~~~~~~~
137295

docs/galaxy_conf.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,10 +90,11 @@ making use of the HTTP transport method:
9090
.. literalinclude:: files/job_conf_sample_mq_rsync.xml
9191
:language: xml
9292

93-
Targeting Apache Mesos (Prototype)
94-
``````````````````````````````````
93+
Targeting GCP Batch, Kubernetes, or TES
94+
```````````````````````````````````````
9595

96-
See `commit message <https://github.com/galaxyproject/pulsar/commit/5888810b47da5065f532534b9594704bdd241d03>`_ for initial work on this and `this post on galaxy-dev <http://dev.list.galaxyproject.org/Using-Mesos-to-Enable-distributed-computing-under-Galaxy-tp4662310p4664829.html>`_.
96+
Check out :ref:`containers` for information on using Pulsar with these
97+
container-native execution environments.
9798

9899
Generating Galaxy Metadata in Pulsar Jobs
99100
`````````````````````````````````````````

docs/gen_erd_diagrams.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import os
2+
import sys
3+
4+
import erdantic as erd
5+
6+
sys.path.insert(1, os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)))
7+
8+
from pulsar.client.container_job_config import GcpJobParams
9+
10+
DOC_SOURCE_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
11+
class_to_diagram = {
12+
GcpJobParams: "job_destination_parameters_gcp",
13+
}
14+
15+
for clazz, diagram_name in class_to_diagram.items():
16+
erd.draw(clazz, out=f"{DOC_SOURCE_DIR}/{diagram_name}.png")
25.4 KB
Loading

docs/job_managers.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ the Galaxy mailing list.
158158
More Options
159159
-------------------------------
160160

161-
Any manager can override the ``staging_directory`` used by setting this
161+
Most managers can override the ``staging_directory`` used by setting this
162162
property in its configuration section.
163163

164164
The ``min_polling_interval: 0.5`` option can be set on any manager to control

0 commit comments

Comments
 (0)