New SeparatedTransferHost that accommodates SFTP-disabled submission nodes#456
Conversation
|
Hi @mcgalcode, thanks a lot for the feedback and for sharing your implementation. I think it will be useful for other users as well. We had some internal discussion because we considered if this could be asbtracted to be the base for a more general mix of command execution and tools for transfer. However, since there is no request in that direction at the moment, it would be difficult to make a meaningful abstraction at this time. Thus we think that it would be good to go on with merging your implementation almost as it is and see if there is a need to make a more general split in the future. Related to the above point, if at some point we have to change the structure of the Host objects one potential problem would be the structure of the configuration file, which is validated by pydantic. Any backward incompatible change may result in an invalid configuration file for the users. For this reason I would ask you if, instead of making the additional I will also add a few more comments in the code. |
gpetretto
left a comment
There was a problem hiding this comment.
Thanks a lot for the updates!
I have one last point: I noticed that some of the methods should maybe be switched to the command_host. In fact, due to some limitations in the fabric interface or the sftp commands that can execute, even if they act on the files they are done by calling the execute method rather than the sftp interface. I suppose you still have a full ssh access to the other node, so it probably works fine, but if that was only an sftp server I imagine that those commands may have failed.
What do you think?
|
Hi @gpetretto thanks for your review! And sorry to push code with out an accompanying message, I was still thinking things over a bit here. All your comments make sense to me and I'll make the changes and test them with my set up. I agree that having those bash commands running on the command host makes more sense. I remember thinking about this earlier but I can't recall why I did that in the first place. |
Some of the file operations are achieved via shell commands: - mkdir - rmtree - copy - move These commands should be executed on the command host, even though they achieve file operations. The division between command/transfer is based on sftp usage, not the nature of the operation.
|
Thanks! |
…oteWorker for typing
|
@gpetretto Ahh yes I should have run the pre-commit hooks. You're right. I tried to think of a better way than what you described here, but nothing came up. I think it's pretty clean still, and makes sense for the class hierarchy. |
d4d86a4 to
ebd8c69
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #456 +/- ##
===========================================
+ Coverage 75.62% 75.85% +0.23%
===========================================
Files 51 52 +1
Lines 8085 8160 +75
Branches 1341 1344 +3
===========================================
+ Hits 6114 6190 +76
+ Misses 1531 1527 -4
- Partials 440 443 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Everything looks good to me. Thanks! Can you maybe just fix the two linting errors? They seem to be quite pointless in the tests, so I think you can even mark them to be ignored if it is easier than changing the code. After that I would merge, unless there is anything else that you plan to add. |
|
Nothing more planned here for now on my end! Linting errors are fixed. |
|
@gpetretto Thank you for fixing that last issue! Sorry about that, not sure how I missed it! |
Hi all! I have been using jobflow-remote lately and it is such a joy. Thank you for the work you've put into it, and for the thoughtful designs.
The purpose of this PR is to accommodate the situation where the node that accommodates SSH access and queue access cannot also accommodate file transfer protocols. At our supercomputing cluster, we have separate data-transfer nodes and login nodes (a common arrangement I think). The data-transfer nodes do not allow job submission, and the login nodes do not allow SFTP connections. As a result, the jobflow-remote copy-files-then-submit process cannot occur on either one of these node types. To get around this, I introduced a new type of host which routes command execution and file movement to different nodes (these nodes would canonically be the SFTP-enabled data transfer node and the SLURM/PBS-enabled login node).
This implementation has been working for me, so I thought I would offer it to upstream. I am looking forward to hearing your thoughts on it, and also to potentially be corrected if there is a way to do this already with jobflow-remote configuration that I missed.
Thanks again!
Max