Skip to content

Conversation

@jmchilton
Copy link
Member

@jmchilton jmchilton commented Jul 11, 2025

Implements #390.

Co-execution Diagram

From https://pulsar.readthedocs.io/en/latest/containers.html#galaxy-and-shared-file-systems about the advantages of a Pulsar co-execution client (a container native paradigm) over implementing a Galaxy job runner (an interface designed for shared file systems and DRMAA-style batch of jobs, not containers):

The most glaring disadvantage of not using Pulsar in the above scenarios is that Galaxy must be deployed in the same container with the same mounts as the job execution environment. This prevents leveraging external cloud compute, multi-cloud compute, and makes it unsuitable for common Galaxy use cases such as large public instances, Galaxy’s leveraging institution non-cloud storage, etc… Even within the same cloud - a large shared file system can be an expensive prospect and Pulsar may allow making use of buckets and such more tractable.

@jmchilton jmchilton changed the title [WIP] Co-execution client Google Cloud Platform Batch v1 [WIP] Co-execution client for Google Cloud Platform Batch v1 Jul 11, 2025
Copy link
Contributor

@kysrpex kysrpex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar enough with Pulsar to review this PR; but I can use this chance to ask questions and deliver a cleaner ARC client #401 implementation :)

state = status.state
return {
"status": gcp_state_to_pulsar_status(state),
"complete": "true" if gcp_state_is_complete(state) else "false", # Ancient John, what were you thinking?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ARC client implementation I am determining if the job is complete using the Pulsar state rather than the ARC job state.

    def full_status(self):
        pulsar_state = self.get_status()
        return {
            "status": pulsar_state,
            "complete": "true" if manager_status.is_job_done(pulsar_state) else "false",
            # ancient John, what were you thinking? 👀
            "outputs_directory_contents": [],
            # it needs to be defined, otherwise `PulsarOutputs.has_outputs` fails; it is ok that it is empty because
            # ARC is responsible for staging the outputs (Galaxy does not have to collect any outputs)
        }

I get the Pulsar state via a mapping too (similar to gcp_state_to_pulsar_status()). What would be the advantage of using two mappings gcp_state_to_pulsar_status(), gcp_state_is_complete() as you are doing for the GCP client implementation? (or what is my mistake?)

Would I need to use something akin to gcp_state_is_complete()

Comment on lines +1058 to +1088
def __init__(self, destination_params, job_id, client_manager):
super().__init__(destination_params, job_id, client_manager)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client manager is generated by Galaxy calling pulsar.client.manager.build_client_manager(). For the ARC client implementation I was forced to define a kwarg arc_enabled (and do some extra trickery). I assume this is not how things work. How are you managing to make Galaxy create a PollingJobClientManager rather than a ClientManager when you want to use the polling job client?

def build_client_manager(**kwargs: Dict[str, Any]) -> ClientManagerInterface:
    if 'job_manager' in kwargs:
        return ClientManager(**kwargs)  # TODO: Consider more separation here.
    elif kwargs.get('amqp_url', None):
        return MessageQueueClientManager(**kwargs)
    elif kwargs.get("k8s_enabled") or kwargs.get("tes_url") or kwargs.get("arc_enabled"):
        return PollingJobClientManager(**kwargs)
    else:
        return ClientManager(**kwargs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did galaxyproject/galaxy@6b65775, which is almost the same. However, I find your approach more attractive.

Could you mention me when you open a PR for this? I'd like to suggest something (merging self.client_manager_kwargs with the superclass' client_manager_kwargs) and adapt galaxyproject/galaxy#20598 to your PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'm sorry I have not been very responsive but I do appreciate your attention to the details.

@jmchilton jmchilton force-pushed the gcp branch 11 times, most recently from 56f07ec to 691c84c Compare September 4, 2025 14:44
@jmchilton jmchilton changed the title [WIP] Co-execution client for Google Cloud Platform Batch v1 Co-execution client for Google Cloud Platform Batch v1 Sep 4, 2025
@jmchilton jmchilton merged commit 7881005 into galaxyproject:master Sep 4, 2025
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants