Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ types-pycurl
types-requests
types-psutil
sentry-sdk
types-google-cloud-ndb

# For release
build
Expand Down
158 changes: 158 additions & 0 deletions docs/containers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,164 @@ GA4GH TES

GA4GH TES job execution with Conda dependencies for the tool and no message queue.

A Galaxy job configuration (job_conf.yml) for using TES with Pulsar and RabbitMQ might look like:

::

runners:
local:
load: galaxy.jobs.runners.local:LocalJobRunner
pulsar_tes:
load: galaxy.jobs.runners.pulsar:PulsarTesJobRunner
# RabbitMQ URL from Galaxy server.
amqp_url: <amqp_url>
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
#galaxy_url: <galaxy_url>

execution:
default: pulsar_tes
environments:
local:
runner: local
local_slots: 1
pulsar_tes:
runner: pulsar_tes
# TES URL to use.
tes_url: <tes_url>
pulsar_app_config:
# This needs to be the RabbitMQ server, but this should be the host
# and port that your TES nodes would connect to the server via.
message_queue_url: <amqp_url>

tools:
- class: local
environment: local

For testing on a Macbook with RabbitMQ installed via homebrew and Docker Desktop available
and a Funnel with default configuration server running locally, a configuration might look like:

::

runners:
local:
load: galaxy.jobs.runners.local:LocalJobRunner
pulsar_tes:
load: galaxy.jobs.runners.pulsar:PulsarTesJobRunner
# RabbitMQ URL from Galaxy server.
amqp_url: amqp://guest:guest@localhost:5672//
# Communicate to Pulsar nodes that Galaxy should be accessed on the Docker
# host - the Macbook.
galaxy_url: http://host.docker.internal:8080/

execution:
default: pulsar_tes
environments:
local:
runner: local
local_slots: 1
pulsar_tes:
runner: pulsar_tes
# Funnel will run on 8000 by default.
tes_url: http://localhost:8000
pulsar_app_config:
message_queue_url: amqp://guest:[email protected]:5672//

tools:
- class: local
environment: local


Google Cloud Platform Batch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. figure:: pulsar_gcp_coexecution_deployment.plantuml.svg

GA4GH TES job execution with a biocontainer for the tool and no message queue.

.. figure:: pulsar_gcp_deployment.plantuml.svg

GA4GH TES job execution with Conda dependencies for the tool and no message queue.

Pulsar job destination options to configure these scenarios:

.. figure:: job_destination_parameters_gcp.png

A Galaxy job configuration (job_conf.yml) for using GCP with Pulsar and RabbitMQ might look like:

::

runners:
local:
load: galaxy.jobs.runners.local:LocalJobRunner
pulsar_gcp:
load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
# RabbitMQ URL from Galaxy server.
amqp_url: <amqp_url>
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
#galaxy_url: <galaxy_url>

execution:
default: pulsar_gcp
environments:
local:
runner: local
local_slots: 1
pulsar_gcp:
runner: pulsar_gcp
# GCP Project ID to use (required)
project_id: project-id-here
# GCP region or zone to use (optional)
#region: us-central1
# Max walltime to use in seconds (defaults to 60 * 60 * 24)
#walltime_limit: 216000
# GCP Credentials setup.
#credentials_file: ~/.config/gcloud/application_default_credentials.json
pulsar_app_config:
# RabbitMQ URL the execute nodes should use to connect to the AMQP server.
message_queue_url: <amqp_url>

tools:
- class: local
environment: local

For testing these configurations - John setup a production-ish RabbitMQ server on
173.255.213.165 with user `john` and password `password` that is accessible from
anywhere. John also opened the router ports to expose their Macbook and set Galaxy
to bind to ``0.0.0.0`` using the `bind` option in the `gunicorn` section of `galaxy.yml`.

The job configuration for this test setup looked something like:

::

runners:
local:
load: galaxy.jobs.runners.local:LocalJobRunner
pulsar_gcp:
load: galaxy.jobs.runners.pulsar:PulsarGcpBatchJobRunner
amqp_url: "amqp://john:[email protected]/"
# If Pulsar needs to talk to Galaxy at a particular host and port, set that here.
galaxy_url: http://71.162.7.202:8080/

execution:
default: pulsar_gcp
environments:
local:
runner: local
local_slots: 1
pulsar_gcp:
runner: pulsar_gcp
project_id: tonal-bloom-123435
region: us-central1
walltime_limit: 216000
pulsar_app_config:
# RabbitMQ URL the execute nodes should use to connect to the AMQP server.
message_queue_url: "amqp://john:[email protected]/"

tools:
- class: local
environment: local


AWS Batch
~~~~~~~~~~

Expand Down
7 changes: 4 additions & 3 deletions docs/galaxy_conf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,11 @@ making use of the HTTP transport method:
.. literalinclude:: files/job_conf_sample_mq_rsync.xml
:language: xml

Targeting Apache Mesos (Prototype)
``````````````````````````````````
Targeting GCP Batch, Kubernetes, or TES
```````````````````````````````````````

See `commit message <https://github.com/galaxyproject/pulsar/commit/5888810b47da5065f532534b9594704bdd241d03>`_ for initial work on this and `this post on galaxy-dev <http://dev.list.galaxyproject.org/Using-Mesos-to-Enable-distributed-computing-under-Galaxy-tp4662310p4664829.html>`_.
Check out :ref:`containers` for information on using Pulsar with these
container-native execution environments.

Generating Galaxy Metadata in Pulsar Jobs
`````````````````````````````````````````
Expand Down
16 changes: 16 additions & 0 deletions docs/gen_erd_diagrams.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os
import sys

import erdantic as erd

sys.path.insert(1, os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)))

from pulsar.client.container_job_config import GcpJobParams

DOC_SOURCE_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
class_to_diagram = {
GcpJobParams: "job_destination_parameters_gcp",
}

for clazz, diagram_name in class_to_diagram.items():
erd.draw(clazz, out=f"{DOC_SOURCE_DIR}/{diagram_name}.png")
Binary file added docs/job_destination_parameters_gcp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/job_managers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ the Galaxy mailing list.
More Options
-------------------------------

Any manager can override the ``staging_directory`` used by setting this
Most managers can override the ``staging_directory`` used by setting this
property in its configuration section.

The ``min_polling_interval: 0.5`` option can be set on any manager to control
Expand Down
Loading
Loading