Skip to content

New SeparatedTransferHost that accommodates SFTP-disabled submission nodes#456

Merged
gpetretto merged 14 commits intoMatgenix:developfrom
mcgalcode:feature/transfer-host
Mar 5, 2026
Merged

New SeparatedTransferHost that accommodates SFTP-disabled submission nodes#456
gpetretto merged 14 commits intoMatgenix:developfrom
mcgalcode:feature/transfer-host

Conversation

@mcgalcode
Copy link
Copy Markdown
Contributor

Hi all! I have been using jobflow-remote lately and it is such a joy. Thank you for the work you've put into it, and for the thoughtful designs.

The purpose of this PR is to accommodate the situation where the node that accommodates SSH access and queue access cannot also accommodate file transfer protocols. At our supercomputing cluster, we have separate data-transfer nodes and login nodes (a common arrangement I think). The data-transfer nodes do not allow job submission, and the login nodes do not allow SFTP connections. As a result, the jobflow-remote copy-files-then-submit process cannot occur on either one of these node types. To get around this, I introduced a new type of host which routes command execution and file movement to different nodes (these nodes would canonically be the SFTP-enabled data transfer node and the SLURM/PBS-enabled login node).

This implementation has been working for me, so I thought I would offer it to upstream. I am looking forward to hearing your thoughts on it, and also to potentially be corrected if there is a way to do this already with jobflow-remote configuration that I missed.

Thanks again!
Max

@gpetretto
Copy link
Copy Markdown
Contributor

Hi @mcgalcode, thanks a lot for the feedback and for sharing your implementation. I think it will be useful for other users as well.

We had some internal discussion because we considered if this could be asbtracted to be the base for a more general mix of command execution and tools for transfer. However, since there is no request in that direction at the moment, it would be difficult to make a meaningful abstraction at this time. Thus we think that it would be good to go on with merging your implementation almost as it is and see if there is a need to make a more general split in the future.

Related to the above point, if at some point we have to change the structure of the Host objects one potential problem would be the structure of the configuration file, which is validated by pydantic. Any backward incompatible change may result in an invalid configuration file for the users. For this reason I would ask you if, instead of making the additional transfer an attribute of RemoteWorker, it could be possible to make an additional type of worker (e.g. SeparetedTransferRemoteWorker, or if you can think of a more suitable name) which is a subclass of RemoteWorker, contains the additional transfer option and overrides the type attribute to something like separated_transfer_remote. I hope this will make it easier for us if in the future to handle other potential use cases, if we will ever have a need for them. What do you think?

I will also add a few more comments in the code.

Copy link
Copy Markdown
Contributor

@gpetretto gpetretto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates!

I have one last point: I noticed that some of the methods should maybe be switched to the command_host. In fact, due to some limitations in the fabric interface or the sftp commands that can execute, even if they act on the files they are done by calling the execute method rather than the sftp interface. I suppose you still have a full ssh access to the other node, so it probably works fine, but if that was only an sftp server I imagine that those commands may have failed.
What do you think?

@mcgalcode
Copy link
Copy Markdown
Contributor Author

Hi @gpetretto thanks for your review! And sorry to push code with out an accompanying message, I was still thinking things over a bit here.

All your comments make sense to me and I'll make the changes and test them with my set up. I agree that having those bash commands running on the command host makes more sense. I remember thinking about this earlier but I can't recall why I did that in the first place.

Some of the file operations are achieved via shell commands:
	- mkdir
	- rmtree
	- copy
	- move

These commands should be executed on the command host, even
though they achieve file operations. The division between command/transfer
is based on sftp usage, not the nature of the operation.
@gpetretto
Copy link
Copy Markdown
Contributor

Thanks!
I have seen that the change that I suggested abouth the SeparatedTransferWorker subclass is breaking the typing.
I think that the correct approach here would be to have a BaseRemoteWorker that inherits from BaseWorker and contains all the attributes that are now in RemoteWorker and type is left as inherited from BaseWorker. Then both RemoteWorker and SeparatedTransferWorker inherit from BaseRemoteWorker, define their type and implement their version of get_host.
Anyway, I am open to other suggestions if you think that there would be a better way of fixing this.

@mcgalcode
Copy link
Copy Markdown
Contributor Author

@gpetretto Ahh yes I should have run the pre-commit hooks. You're right. I tried to think of a better way than what you described here, but nothing came up. I think it's pretty clean still, and makes sense for the class hierarchy.

@mcgalcode mcgalcode force-pushed the feature/transfer-host branch from d4d86a4 to ebd8c69 Compare February 28, 2026 01:17
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 95.06173% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.85%. Comparing base (e0a3b9d) to head (55a397c).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/jobflow_remote/remote/host/separated.py 94.64% 2 Missing and 1 partial ⚠️
src/jobflow_remote/config/base.py 95.65% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #456      +/-   ##
===========================================
+ Coverage    75.62%   75.85%   +0.23%     
===========================================
  Files           51       52       +1     
  Lines         8085     8160      +75     
  Branches      1341     1344       +3     
===========================================
+ Hits          6114     6190      +76     
+ Misses        1531     1527       -4     
- Partials       440      443       +3     
Flag Coverage Δ
all_local_tests 75.66% <95.06%> (+0.23%) ⬆️
all_tests 75.85% <95.06%> (+0.23%) ⬆️
db_tests 71.42% <51.85%> (-0.21%) ⬇️
integration_local_tests 46.01% <51.85%> (+0.03%) ⬆️
integration_remote_tests 26.21% <50.61%> (+0.20%) ⬆️
integration_tests 46.88% <51.85%> (+0.02%) ⬆️
unit_tests 27.35% <95.06%> (+1.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/jobflow_remote/remote/host/__init__.py 100.00% <100.00%> (ø)
src/jobflow_remote/config/base.py 86.50% <95.65%> (+2.37%) ⬆️
src/jobflow_remote/remote/host/separated.py 94.64% <94.64%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gpetretto
Copy link
Copy Markdown
Contributor

gpetretto commented Mar 4, 2026

Everything looks good to me. Thanks! Can you maybe just fix the two linting errors? They seem to be quite pointless in the tests, so I think you can even mark them to be ignored if it is easier than changing the code.

After that I would merge, unless there is anything else that you plan to add.

@mcgalcode
Copy link
Copy Markdown
Contributor Author

Nothing more planned here for now on my end! Linting errors are fixed.

@gpetretto gpetretto merged commit b48199f into Matgenix:develop Mar 5, 2026
20 checks passed
@mcgalcode
Copy link
Copy Markdown
Contributor Author

@gpetretto Thank you for fixing that last issue! Sorry about that, not sure how I missed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants