Skip to content

Comments

improved retrievals and adaptions to next pyslk release#48

Merged
observingClouds merged 57 commits intoobservingClouds:mainfrom
neumannd:main
Mar 13, 2025
Merged

improved retrievals and adaptions to next pyslk release#48
observingClouds merged 57 commits intoobservingClouds:mainfrom
neumannd:main

Conversation

@neumannd
Copy link
Collaborator

@neumannd neumannd commented Dec 5, 2024

@neumannd neumannd added the enhancement New feature or request label Dec 5, 2024
@neumannd
Copy link
Collaborator Author

neumannd commented Dec 5, 2024

@observingClouds Can I prevent somehow that tests are generated automatically for all pyslk functions? Setting up all SlkMock functions properly will take a lot of time.

@observingClouds
Copy link
Owner

@neumannd can you point me to some tests? I think I cannot completely follow you. You just need to mock those functions that are actually used within slkspec.

@neumannd
Copy link
Collaborator Author

@observingClouds

You just need to mock those functions that are actually used within slkspec.

That's the main issue I have. It is quite time consuming to set up proper mock functions for this purpose. I am aware that this would be the proper proceedure. But, I am lacking time to finished this task in December. Would you merge anyway?

@observingClouds
Copy link
Owner

I have to say no here as I believe it's crucial for all code additions to pass the tests to maintain the integrity of the project.

@neumannd
Copy link
Collaborator Author

neumannd commented Feb 6, 2025

@observingClouds : The tests succeed now. @doguskbilir Updated the mock functions. Do you have any comments on the current code version?

Additionally, @doguskbilir is working on setting up a pyproject.toml file for the installation of the package. Thus, the old installation method via setup.py could be replaced by it. However, we do this in an extra PR?

@observingClouds
Copy link
Owner

Amazing! Great work! Yes, please do the pyproject.toml update in a separate PR.

@observingClouds observingClouds self-requested a review February 8, 2025 11:46
Copy link
Owner

@observingClouds observingClouds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neumannd and @doguskbilir for this hard work! I really appreciate it. My main comment is about the complexity of _retrieve_item(). Please restructure it so that flake8 does no longer complain about C901. I think with a few extra functions and classes it will become much more readable and help us in the long run to maintain this package better.

slkspec/core.py Outdated
Comment on lines 700 to 713
"""
{'SKIPPED': {'SKIPPED_TARGET_EXISTS': ['/arch/bm0146/k204221/iow/INDEX.txt']},
'FILES': {'/arch/bm0146/k204221/iow/INDEX.txt': '/home/k204221/tmp/INDEX.txt'}}

# dry run
{'ENVISAGED': {'ENVISAGED': ['/arch/bm0146/k204221/iow/INDEX.txt']},
'FILES': {'/arch/bm0146/k204221/iow/INDEX.txt': '/home/k204221/tmp/abcdef2/INDEX.txt'}}

# after successful retrieval
{'ENVISAGED': {'ENVISAGED': []}, 'FILES': {'/arch/bm0146/k204221/iow/INDEX.txt':
'/home/k204221/tmp/INDEX.txt'}, 'SUCCESS': {'SUCCESS': ['/arch/bm0146/k204221/iow/INDEX.txt']}}

{'FAILED': {'FAILED_NOT_CACHED': ['/arch/bm0146/k204221/iow/iow_data5_001.tar']},
'FILES': {'/arch/bm0146/k204221/iow/iow_data5_001.tar': '/home/k204221/tmp/iow_data5_001.tar'}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go into the docstring of a new function?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring contains possible output of the function call four lines above. The following lines contain multiple if-clauses to react on the output. Therefore, I think it is reasonable to please this docstring here and not into the docstring of the function's header.

@neumannd
Copy link
Collaborator Author

@observingClouds The tests do not start running. Any idea what causes this?

@neumannd
Copy link
Collaborator Author

@observingClouds There is are for loops in the code in which recalls are started and their status is checked. E.g. one of the for loops runs as long as there are unfinished recalls. Takes some time to implement this. Would it be possible to skip the tests for now and add them later? I know that not the ideal way but at some point there should be a working slkspec again.

@florianziemen
Copy link

florianziemen commented Feb 25, 2025

Wouldn't that whole recall/retrieve loop logic be something that would have a good place in pyslk, so any client can use it? As a client, I could simply call retrieve_with_recall, and pyslk could handle the rest.

@neumannd
Copy link
Collaborator Author

@florianziemen In principle, I agree with you. But ... ;-) .

This approach is very inefficient when it is done for one or two files. Then, it is better to run the classical retrieval command (or something else).

This approach is very useful when the users want to retrieve many files at once. For the latter purpose we have already a nice script setup with automatic submission of SLURM jobs (https://docs.dkrz.de/doc/datastorage/hsm/retrievals.html#retrieve-more-than-a-handful-of-files).

Additionally, the approach taken here is pseudo-parallelized and hard to debug if something failes. Providing it to the broad community for individual usage might be a bit risky because nobody understands what is going on. We would need to invest much more time to make it mass-compatible.

Therefore, I do not think that it was very useful in pyslk -- even counter-productive if users start using it for getting one file after each other.

However, if this was set up from the scratch with a central thread running (server-like) then this approach might improve the situation of the users. The thread could "accept" newly requested files, put single requests together and organized recalls/retrievals. This approach would match the usage pattern in Python when users work with catalogs. I think that this is too much work in the moment -- particularly with a HSM proxy system at the horizon.

@observingClouds
Copy link
Owner

Hi @neumannd,
I am a bit confused about the tests failing again and needing more work. Were they not passing the other day and close to being finished? I thought only the flake8 issue was remaining.

@neumannd
Copy link
Collaborator Author

@observingClouds The tests of Dogus' PR finished. For this PR, the tests already failed on Friday. I think there is an infinite loop when I run the core functionality with the pyslk Mock functions. At least it looks like this when I run the tests locally.

@florianziemen
Copy link

Additionally, the approach taken here is pseudo-parallelized and hard to debug if something failes. Providing it to the broad community for individual usage might be a bit risky because nobody understands what is going on. We would need to invest much more time to make it mass-compatible.

This does not sound like we'd want to have it in a package like slkspec that @observingClouds provides to everybody without resources for further support.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 39.10761% with 232 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@6836bab). Learn more about missing BASE report.

Files with missing lines Patch % Lines
slkspec/core.py 39.10% 232 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #48   +/-   ##
=======================================
  Coverage        ?   57.94%           
=======================================
  Files           ?        3           
  Lines           ?      585           
  Branches        ?        0           
=======================================
  Hits            ?      339           
  Misses          ?      246           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@neumannd
Copy link
Collaborator Author

@observingClouds Mocks updated. All current tests succeed. On paper, I have prepared two additional tests which would automatically test the new code part in more detail. These are tests which I ran manually. I need to add these -- but not this and not the next week. I'd prefer if this PR could be merged already before.

@neumannd
Copy link
Collaborator Author

@observingClouds What is this codecov thing? Do I need to care about this?

@neumannd
Copy link
Collaborator Author

Additionally, the approach taken here is pseudo-parallelized and hard to debug if something failes. Providing it to the broad community for individual usage might be a bit risky because nobody understands what is going on. We would need to invest much more time to make it mass-compatible.

This does not sound like we'd want to have it in a package like slkspec that @observingClouds provides to everybody without resources for further support.

@florianziemen @observingClouds I formulated it unclear. The process of getting data from tape itself is hard to debug if a tools does the recall and retrieval for the user in a black box. I don't mean that my code in particular has this issue.

After a meeting with Flo, we found two bugs which I will fix today or tomorrow. There is one general thing to discuss:

If a file cannot be recalled or retrieved, should an exception be thrown directly? Or is it OK to try to get all files first, collect "bad files" (recall or retrieval failed) and announce this in the end. Currently, I do the letter. Additionally, I write a list of all failed recalls and retrievals into one file and provide the path to the file in the error message. I think that it reasonable because then the user directly knows about all tapes/files which are problematic.

@observingClouds Would you agree with this procedure?

@neumannd
Copy link
Collaborator Author

neumannd commented Mar 3, 2025

@neumannd
Copy link
Collaborator Author

neumannd commented Mar 3, 2025

@observingClouds Tests succeed. Code is running also in real use cases. Do you have any more comments?

@neumannd
Copy link
Collaborator Author

neumannd commented Mar 3, 2025

@observingClouds @florianziemen pyslk is available at PyPI now -- starting with version 2.2.9. Therefore, I updated the requirement line for pyslk in the pyproject.toml. pyslk 2.2.10 is the current release and contains a few changes in the documentation. It will be installed at Levante tomorrow.

slkspec/core.py Outdated
return self._file
return self._url

# flake8: noqa: C901
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neumannd can we remove this now that the function got a bit simplified?

@neumannd
Copy link
Collaborator Author

@observingClouds Is there anything which needs to be adapted from your point of view?

@observingClouds
Copy link
Owner

@neumannd, my only remaining comment is to get rid of # flake8: noqa: C901

@neumannd
Copy link
Collaborator Author

@observingClouds Removed it and updated the code.

@neumannd
Copy link
Collaborator Author

@observingClouds adapted CHANGELOG.md

@observingClouds observingClouds merged commit a52124a into observingClouds:main Mar 13, 2025
4 checks passed
@observingClouds
Copy link
Owner

It is done!! 🎉 Thanks for being patient with all my comments!

@observingClouds
Copy link
Owner

If you like I can immediately do the v0.0.3 release

@neumannd
Copy link
Collaborator Author

If you like I can immediately do the v0.0.3 release

@observingClouds Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants