Replies: 4 comments 3 replies
-
|
@TonyB9000 Further comments:
See workflow described above. Unfortunately, it requires a manual step.
No, that is the fundamental limitation described above. It's always going to extract requested files after transferring the tars.
This is essentially the workflow described above, except there is some automation ability using
No, that can't be the case, otherwise |
Beta Was this translation helpful? Give feedback.
-
|
@forsyth2 You wrote "It's always going to extract requested files after transferring the tars." I thought that "zstash check" (with --keep) would pull over an archive of tar-files without extracting any files, and leave the tarfiles alone. But without supplying specific file-patterns, there is no way to determine which tar-files to include. To make the "--tars" useful, there needs to be a function that converts an extraction (file-matching) pattern to a list of tarfiles that would be involved. Even a two-step process could be a big time-saver, if one needs only a few of the (say) 200 tarfiles of an archive. The real drawback of forcing file extraction is that there is little opportunity to direct the results of each extraction pattern to a different file-system destination. You are at the mercy of the tarred path/file names. I need to be able to trust that a set of patterns have no intersection. This is reasonable for a single archive, but may not hold up across multiple archives. |
Beta Was this translation helpful? Give feedback.
-
|
@forsyth2 This would work, in principle. The only issue that remains is as follows: Suppose both "land" and "ocean" files are (somehow) mixed within some of the same tar-files. I first ask for "land", obtain the needed tar files, and separately issue "zstash extract" afterwards. All is well. Then, I want the "ocean" files. If I apply this to the "--kept" tar files, I may extract some - but not all of the ocean files. If instead I issue a "zstash check --hpss=NERSC archive/ocean/hist/timeseriesMonthly" I will certainly receive all needed tarfiles, but may have redundantly transferred some tar files. This is nit-picking, perhaps, but I seek to minimize transfer and extraction cycles. What would be "ideal", from my perspective, would be the ability to supply a list of patterns (or else, one at a time) and obtain the names of the associated tarfiles, take their union, and then conduct a "zstash check --tars " for just those tar files. Honestly, it never occurred to me until this moment that I might be able to issue: zstash --check --hpss=NERSC <pattern_1> <pattern_2> ... <pattern_n> and have zstash conduct the "union" internally. Then, I would be sure that the arriving tar-files contain ALL of the files satisfying the patterns. Is this possible? |
Beta Was this translation helpful? Give feedback.
-
|
@forsyth2 Awesome! Thank you for taking the time to test this. The documentation should mention this as a feature (if it is there - supporting a space-delimited list of file patterns - I'm sure I missed it). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question criteria
What is the deadline?
N/A
Describe your question
From @TonyB9000:
Are there are any possible answers you came across?
Usage docs are here. I was thinking some combination of
--tarsand--includecould handle this, but the fundamental issue is that there's no way to transfer tars without extracting from them.Potential workflow:
What machine were you running on?
N/A
Environment
N/A
Minimal Complete Verifiable Example (MCVE)
No response
Relevant log output
No response
Anything else we need to know?
No response
Beta Was this translation helpful? Give feedback.
All reactions