Refactor to resolve tars not deleting when --non-blocking is set#416
Refactor to resolve tars not deleting when --non-blocking is set#416
--non-blocking is set#416Conversation
|
Action items:
|
|
All tests are passing now. Self-review guide from Claude: Self-Review Guide for Progressive Tar File Deletion FixOverviewThis diff fixes the issue where tar files weren't being deleted after successful Globus transfers when Key Changes to Review1. New Transfer Tracking System (
|
|
Follow-up: Performance Review for Progressive Tar File DeletionPerformance Concerns to Review1. Status Check OverheadCurrent Implementation: def delete_successfully_transferred_files(self):
for batch in self.batches:
if batch.is_globus and batch.task_id and (batch.task_status != "SUCCEEDED"):
if self.globus_config and self.globus_config.transfer_client:
task = self.globus_config.transfer_client.get_task(batch.task_id)
batch.task_status = task["status"]Issues:
Impact:
Potential Optimizations:
2. Batch List GrowthCurrent Implementation: self.batches: List[TransferBatch] = []
# Grows unbounded throughout the runIssues:
Potential Optimizations:
3. File Deletion PerformanceCurrent Implementation: def delete_files(self):
for src_path in self.file_paths:
if os.path.exists(src_path):
os.remove(src_path)Issues:
Impact:
Potential Optimizations:
4. Non-Blocking Mode EfficiencyCurrent Behavior: # After EVERY hpss_put() call:
if not keep:
transfer_manager.delete_successfully_transferred_files()Issues:
Example:
This means 5 status checks when only 1-2 would be needed. Potential Optimizations:
5. Globus Transfer BatchingCurrent Implementation: # Creates new batch if last one was submitted
if not transfer_manager.batches or transfer_manager.batches[-1].task_id:
new_batch = TransferBatch()Questions:
Looking at the code:
Potential Optimizations:
6. Test Performance ImpactNew Progressive Deletion Tests: dd if=/dev/zero of=zstash_demo/file1.dat bs=1M count=700 # 700 MB
dd if=/dev/zero of=zstash_demo/file2.dat bs=1M count=700 # 700 MB
dd if=/dev/zero of=zstash_demo/file3.dat bs=1M count=700 # 700 MBIssues:
Recommendations:
7. Finalization PerformanceCurrent Implementation: def globus_finalize(transfer_manager: TransferManager, non_blocking: bool = False):
# Submit any pending transfer_data
# Wait for most recent transfer
# Wait for last task (if different)
# Delete successfully transferred filesIssues:
Questions:
Recommended Performance ImprovementsPriority 1: High Impact, Low Effort
def delete_successfully_transferred_files(self):
# Only check batches that haven't been processed yet
batches_to_check = [b for b in self.batches if b.file_paths] # Has files to delete
for batch in batches_to_check:
# ... existing logic
def delete_files(self):
for src_path in self.file_paths:
try:
os.remove(src_path)
except FileNotFoundError:
logger.warning(f"File already deleted: {src_path}")
# In hpss.py::hpss_transfer()
if not keep:
# Only check every 5 files or if this is the last file
if (transfer_manager.cumulative_tarfiles_pushed % 5 == 0) or is_last_file:
transfer_manager.delete_successfully_transferred_files()Priority 2: Medium Impact, Medium Effort
class TransferManager:
def __init__(self):
self.batches: List[TransferBatch] = []
self.last_deletion_check_index: int = 0 # New field
def delete_successfully_transferred_files(self):
# Only check batches from last_deletion_check_index forward
for i in range(self.last_deletion_check_index, len(self.batches)):
batch = self.batches[i]
# ... check and delete logic
if batch.file_paths == []: # Processed
self.last_deletion_check_index = i + 1
def delete_successfully_transferred_files(self):
# ... existing logic
# Remove fully processed batches
self.batches = [b for b in self.batches if b.file_paths or not b.task_id]Priority 3: Lower Priority / More Investigation Needed
class TransferManager:
def __init__(self):
self.last_status_check_time: float = 0
def delete_successfully_transferred_files(self):
now = time.time()
if now - self.last_status_check_time < 30: # Don't check more than every 30s
return
self.last_status_check_time = now
# ... existing logic
Performance Testing Checklist
Questions to Answer
|
064029c to
ed3248d
Compare
|
I've rebased off the latest I ran the tar deletion test and got: The AI-generated reviews above are excessively comprehensive, so I will tag Copilot to review this PR and see what it deems relevant. |
There was a problem hiding this comment.
Pull request overview
Refactors Globus transfer handling to ensure local tar files are deleted after successful transfers even when --non-blocking is used, addressing issue #374.
Changes:
- Introduces a
TransferManager/TransferBatchmodel to track submitted transfers and associated local files for later deletion. - Threads the transfer manager through
create/update→add_files→hpss_put→globus_transfer, and performs deletion based on task status checks. - Expands the Globus tar deletion integration test coverage (including a new “progressive deletion” scenario).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| zstash/update.py | Wires TransferManager through update flow and into finalization. |
| zstash/transfer_tracking.py | New module defining transfer tracking data structures and deletion logic. |
| zstash/hpss_utils.py | Passes TransferManager into hpss_put from tar creation loop. |
| zstash/hpss.py | Refactors deletion logic to rely on TransferManager rather than prior global lists. |
| zstash/globus.py | Refactors Globus state handling to use TransferManager/GlobusConfig instead of module globals. |
| zstash/create.py | Wires TransferManager through create flow and into finalization. |
| tests/integration/bash_tests/run_from_any/test_globus_tar_deletion.bash | Adds verbose logging and a new progressive-deletion integration test scenario. |
Comments suppressed due to low confidence (1)
zstash/hpss.py:15
- The refactor leaves
prev_transfers/curr_transfersas unused globals. Since deletion is now handled viaTransferManager, keeping these around is confusing and may trigger linting warnings. Consider removing them if they are no longer part of the design.
prev_transfers: List[str] = list()
curr_transfers: List[str] = list()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zstash/globus.py
Outdated
| if mrt.task_status == "ACTIVE": | ||
| logger.info( | ||
| f"{ts_utc()}: Previous task_id {task_id} Still Active. Returning ACTIVE." | ||
| f"{ts_utc()}: Previous task_id {mrt.task_id} Still Active. Returning ACTIVE." | ||
| ) | ||
| return "ACTIVE" | ||
| elif prev_task_status == "SUCCEEDED": | ||
| # Don't return early - continue to submit the new transfer |
There was a problem hiding this comment.
The ACTIVE-task handling no longer returns early; it now proceeds to submit a new transfer while the previous one is still active. This changes behavior from “single in-flight task” to potentially many concurrent tasks, which can hit Globus service limits and complicate operational debugging. If concurrency is not intended, consider restoring the early return/queueing behavior or adding an explicit cap.
There was a problem hiding this comment.
This the expected behavior of --non-blocking, not an issue.
| if transfer_type == "put": | ||
| return task_status | ||
|
|
||
| if transfer_type == "get" and task_id: | ||
| globus_wait(task_id) | ||
| globus_wait(transfer_manager.globus_config.transfer_client, task_id) |
There was a problem hiding this comment.
globus_transfer() returns task_status, but that variable is initialized to "UNKNOWN" and never updated from the actual task result (blocking mode only updates new_mrt.task_status). As written, the function will always return "UNKNOWN" for both put and get. Consider returning new_mrt.task_status (or setting task_status from the wait result) so callers can rely on the return value.
zstash/create.py
Outdated
| if args.follow_symlinks: | ||
| raise Exception("Archive creation failed due to broken symlink.") | ||
| else: | ||
| raise e |
There was a problem hiding this comment.
Catching FileNotFoundError as e and then using raise e drops the original traceback context. Use a bare raise to re-raise the original exception (or wrap it with raise ... from e if you need a different exception type).
| raise e | |
| raise |
zstash/update.py
Outdated
| if args.follow_symlinks: | ||
| raise Exception("Archive update failed due to broken symlink.") | ||
| else: | ||
| raise e |
There was a problem hiding this comment.
Catching FileNotFoundError as e and then using raise e drops the original traceback context. Use a bare raise to re-raise the original exception (or raise ... from e if you need to add context).
| raise e | |
| raise |
zstash/globus.py
Outdated
| # In non-blocking mode, this ensures index.db and any accumulated tar files complete | ||
| # In blocking mode, this is redundant but harmless | ||
| skip_last_wait: bool = False | ||
| if transfer and transfer.task_id: | ||
| if transfer.task_id == last_task_id: | ||
| skip_last_wait = ( | ||
| True # No reason to call globus_wait twice on the same task_id | ||
| ) | ||
| logger.info( | ||
| f"{ts_utc()}: Waiting for transfer task_id={transfer.task_id} to complete" | ||
| ) | ||
| globus_wait( | ||
| transfer_manager.globus_config.transfer_client, transfer.task_id | ||
| ) | ||
| if last_task_id and (not skip_last_wait): | ||
| logger.info( | ||
| f"{ts_utc()}: Waiting for last transfer task_id={last_task_id} to complete" | ||
| ) | ||
| globus_wait(transfer_manager.globus_config.transfer_client, last_task_id) |
There was a problem hiding this comment.
non_blocking is passed into globus_finalize(), but the finalization path still waits for task completion (globus_wait) regardless. This makes --non-blocking effectively block at the end and can dramatically increase runtime for large transfers. Consider skipping the wait logic when non_blocking=True (or rename/remove the flag if always-wait is intended).
| # In non-blocking mode, this ensures index.db and any accumulated tar files complete | |
| # In blocking mode, this is redundant but harmless | |
| skip_last_wait: bool = False | |
| if transfer and transfer.task_id: | |
| if transfer.task_id == last_task_id: | |
| skip_last_wait = ( | |
| True # No reason to call globus_wait twice on the same task_id | |
| ) | |
| logger.info( | |
| f"{ts_utc()}: Waiting for transfer task_id={transfer.task_id} to complete" | |
| ) | |
| globus_wait( | |
| transfer_manager.globus_config.transfer_client, transfer.task_id | |
| ) | |
| if last_task_id and (not skip_last_wait): | |
| logger.info( | |
| f"{ts_utc()}: Waiting for last transfer task_id={last_task_id} to complete" | |
| ) | |
| globus_wait(transfer_manager.globus_config.transfer_client, last_task_id) | |
| # In non-blocking mode, skip waiting here so the caller can return immediately. | |
| # In blocking mode, this ensures index.db and any accumulated tar files complete. | |
| if non_blocking: | |
| logger.info( | |
| f"{ts_utc()}: Non-blocking mode: skipping wait for Globus " | |
| f"transfer completion during finalize" | |
| ) | |
| else: | |
| skip_last_wait: bool = False | |
| if transfer and transfer.task_id: | |
| if transfer.task_id == last_task_id: | |
| # No reason to call globus_wait twice on the same task_id | |
| skip_last_wait = True | |
| logger.info( | |
| f"{ts_utc()}: Waiting for transfer task_id={transfer.task_id} " | |
| f"to complete" | |
| ) | |
| globus_wait( | |
| transfer_manager.globus_config.transfer_client, transfer.task_id | |
| ) | |
| if last_task_id and (not skip_last_wait): | |
| logger.info( | |
| f"{ts_utc()}: Waiting for last transfer " | |
| f"task_id={last_task_id} to complete" | |
| ) | |
| globus_wait( | |
| transfer_manager.globus_config.transfer_client, last_task_id | |
| ) |
There was a problem hiding this comment.
They should be waiting at the end. globus_finalize is called last by create and update -- Globus transfers should complete before zstash returns.
|
Addressed relevant code review comments. Confirmed all 6 tar deletion tests still pass. Remaining action items:
|
|
This is a very-high level pseudo-code algorithm of what is happening with the tars: |
|
Confirmed all tests pass on Chrysalis and Perlmutter |
|
@TonyB9000 If you have the time, this could use at least a high-level code review. I also plan to go over it briefly at Monday's technical discussion. The issue being resolved is #374, which notified us that Here, I've refactored the code to remove global variables and clarify the logic flow (i.e., using The test was already added in #404 (which confirms on the @chengzhuzhang @golaz ^Just for your awareness. I will discuss more at our next meeting. |
|
@forsyth2 First a general comment: We never interrupt a tar-file in the middle of tar-file formation. So for clarity, I would employ a fully separate "make_tarfile()" that returns when a size-threshold is reached, and THEN submit that tar-file to a process whose return behavior depends upon "BLOCKING", "transfer success" etc. It bothers me that "add_files()" does FAR MORE than just "add_files". Properly, it could be called "conduct_all_processing()". The "pseudocode" you supply (really, a call-sequence outline) is thus a bit obscure as well. My Translation (up to the point of my understanding: Here is what LivChat says about "globus_block_wait()": Also: The AI suggests the following improvement: I have yet to understand, thoroughly, the functionality of It appears to be called only upon RETURN from globus_transfer. The code manipulates "batches" - but do we ever have more than 3 batches (one subject to transfer, and one storing up new tar-files, and one whose transfer is completed - successful or otherwise)? If BLOCKING, this would only be 2, as we never "store up" tar-files. |
|
Thanks for the thorough review @TonyB9000. I'm working my way through your suggestions. You're right that In the process of reviewing logic flows, I noticed what appears to be an error in the code that has persisted for several years: see https://github.com/E3SM-Project/zstash/pull/10/changes#r2898357541 |
Oh wait, I see now the creation very much relies on variables defined in that function. I guess we're just going to have to go with a complicated function name |
17286e1 to
67c880a
Compare
67c880a to
f757094
Compare
|
@TonyB9000 Ok how's this new commit look? |
|
I'll do another pull so I can see everything in context. Long ago, all of the remote-transfer operations occurred under a function called "add_files". Now, it is all embedded under a function called "construct_tars()". It works, but just feels weird. It is like embedding a process that creates humans, all under a function called "adjust_atoms()". |
|
@TonyB9000 Responding to your email from yesterday, 4/2:
if not transfer_manager.batches or transfer_manager.batches[-1].task_id:
# Either no batches exist, or the last batch was already submitted
new_batch = TransferBatch()
new_batch.is_globus = scheme == "globus"
transfer_manager.batches.append(new_batch)
logger.debug(
f"{ts_utc()}: Created new TransferBatch, total batches: {len(transfer_manager.batches)}"
)
mrb: Optional[TransferBatch] = transfer_manager.get_most_recent_batch()Correct, it must exist.
update_cumulative_tarfiles_pushed(transfer_manager, transfer_data)
task = submit_transfer_with_checks(
transfer_manager.globus_config.transfer_client, transfer_data
)
# Update the current batch with the task info
# The batch was already created in hpss_transfer with files added to it
# We just need to mark it as submitted
if transfer_manager.batches:
# Update these two fields of the most recent batch
# (which is still available in this function as `mrb`).
transfer_manager.batches[-1].task_id = task_id
transfer_manager.batches[-1].task_status = TaskStatus.SUBMITTEDNo, there is no new transfer manager. We do in fact know we have batches -- hence the # This block should be impossible to reach.
# By now, we've ensured that `get_most_recent_batch()` returns a batch,
# and we haven't removed any batches since then,
# so there should always be at least one batch in `batches`.
error_str = "transfer_manager has no batches"
logger.error(error_str)
raise RuntimeError(error_str)Now, you wonder, why bother with this check then? We do so because this is the first time we reference
git grep -n "TransferManager("
# zstash/create.py:57: transfer_manager: TransferManager = TransferManager()
# zstash/hpss.py:26: transfer_manager = TransferManager()
# zstash/hpss.py:184: transfer_manager = TransferManager()
# zstash/update.py:25: transfer_manager = TransferManager()
if not transfer_manager:
transfer_manager = TransferManager()git grep -n "hpss_get("
# zstash/extract.py:178: hpss_get(hpss, get_db_filename(cache), cache)
# zstash/extract.py:554: hpss_get(hpss, tfname, cache)
# zstash/hpss.py:173:def hpss_get(
# zstash/ls.py:105: hpss_get(hpss, get_db_filename(cache), cache)
# zstash/update.py:164: hpss_get(hpss, get_db_filename(cache), cache, transfer_manager)So,
if not transfer_manager:
transfer_manager = TransferManager()git grep -n "hpss_transfer("
zstash/hpss.py:15:def hpss_transfer(
# zstash/hpss.py:161: hpss_transfer(
# zstash/hpss.py:187: hpss_transfer(
git grep -A 8 -n "hpss_put("
# zstash/create.py:97: logger.debug(f"{ts_utc()}: calling hpss_put() for {get_db_filename(cache)}")
# zstash/create.py:98: hpss_put(
# zstash/create.py-99- hpss,
# zstash/create.py-100- get_db_filename(cache),
# zstash/create.py-101- cache,
# zstash/create.py-102- keep=args.keep,
# zstash/create.py-103- is_index=True,
# zstash/create.py-104- transfer_manager=transfer_manager,
# zstash/create.py-105- )
# zstash/create.py-106-
# --
# zstash/hpss.py:149:def hpss_put(
# zstash/hpss.py-150- hpss: str,
# zstash/hpss.py-151- file_path: str,
# zstash/hpss.py-152- cache: str,
# zstash/hpss.py-153- keep: bool = True,
# zstash/hpss.py-154- non_blocking: bool = False,
# zstash/hpss.py-155- is_index=False,
# zstash/hpss.py-156- transfer_manager: Optional[TransferManager] = None,
# zstash/hpss.py-157-):
# --
# zstash/hpss_utils.py:139: hpss_put(
# zstash/hpss_utils.py-140- hpss,
# zstash/hpss_utils.py-141- os.path.join(cache, self.tfname),
# zstash/hpss_utils.py-142- cache,
# zstash/hpss_utils.py-143- keep,
# zstash/hpss_utils.py-144- non_blocking,
# zstash/hpss_utils.py-145- is_index=False,
# zstash/hpss_utils.py-146- transfer_manager=transfer_manager,
# zstash/hpss_utils.py-147- )
# --
# zstash/update.py:39: hpss_put(
# zstash/update.py-40- hpss,
# zstash/update.py-41- get_db_filename(cache),
# zstash/update.py-42- cache,
# zstash/update.py-43- keep=args.keep,
# zstash/update.py-44- is_index=True,
# zstash/update.py-45- transfer_manager=transfer_manager,
# zstash/update.py-46- )
# zstash/update.py-47-So, You can see that unlike
Flowchart 1Call hierarchy & some pseudo-code: Comments on Flowchart 1:
Overall, Flowchart 1 looks good. Flowchart 2a (left side)Call hierarchy & some pseudo-code: Overall, Flowchart 2a looks good. Flowchart 2b (right side)Call hierarchy & some pseudo-code: Comments on Flowchart 2b:
Otherwise, Flowchart 2b looks good. Action items for me
|
|
The new commit addresses those 2 action items. I haven't rerun the tests, but none of these changes should affect functionality. In particular, |
Try writing that in that tiny little diamond :)
I was being creative there. By pointing to the entire "box" for "globus_transfer", it should imply invoking at "start", as there is no where else to start. I am trying to figure out how to fit the rest of the logic on that diagram - I'm only barely at the point where "block-wait" would apply. |
Thanks @TonyB9000. For our purposes, I'd say the diagrams are most useful if they facilitate the code review. If we think we understand the logic well enough as-is and the tests are passing (which they are, last I checked), then we can probably go ahead and merge the PR. (And do future refactor work in #435). |
|
I also have performance testing set up in #427, so we can compare performance. I would hope nothing we've changed here would impact performance, but it would be good to check. I'll try to work on performance profiling today, but I'm not sure we have enough data points from I'm not sure when the release candidate deadline has been moved to, but I imagine it will be early next week. |
It appears this PR has actually largely improved performance for Globus, but slightly degraded it for no-hpss/local and hpss. Additionally, with a limited number of runs, it's unclear to me how much of the Globus variation is inherent to Globus and not because of Performance charts made using code from #427 comparing this branch to Performance charts made using code from #427 for this branch: Performance profiling setup stepscd ~/ez/zstash
git status
# On branch add-performance-profiling
# nothing to commit, working tree clean
# Let's make a new branch with these commits AND the tar deletion commits
git checkout -b profile-refactored-tar-deletion
git log --oneline
# Good, has the correct commits
git fetch upstream issue-374-refactor-tar-deletion
git rebase upstream/issue-374-refactor-tar-deletion
git log --oneline
# Good, has commits from both
nersc_conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-profile-tar-deletion-20260403
conda activate zstash-profile-tar-deletion-20260403
pre-commit run --all-files
python -m pip install .
cd tests/performance
emacs generate_performance_data.bash # Edit parameters
git diff # Check diff
./generate_performance_data.bash
# ~2 hours to run, note there is the manual step to paste an auth code
# [SUCCESS] All tests completed. Results saved to: /pscratch/sd/f/forsyth/zstash_performance/performance_pr416_20260403/results.csv
# [INFO] Now edit IO paths and run: python visualize_performance.py
emacs visualize_performance.py # Edit parameters
git diff # Check diff
pre-commit run --all-files
git add -A
python visualize_performance.py
# Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260403_pr416_vs_pr427.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260403_pr416_vs_pr427.png
# Figure 2 (baseline comparison) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260403_pr416_vs_pr427_vs_baseline.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260403_pr416_vs_pr427_vs_baseline.png |
|
@forsyth2 Nice work! Fascinating result. I could be wrong (not sure how the OS, or Python, optimizes the stack) but reducing the call-depth may have a real impact as well. |
I might try to run this again today or tomorrow to get another data point at least.
ChatGPT says
Well, that's a bit unfortunate. I find having a bunch of helper functions to be much cleaner/easier to maintain. |
|
Giving it the pseudo-code above, it says:
|
|
My plan is to 1) run the performance check a second time on the same exact code to get more performance data, 2) time-permitting, try flattening the call structure in inner file-processing loop and re-running performance checks. Alternatively, the second item could be done as part of #435 instead. |
I think that would be valuable in general, get a sense of how much "natural variation" exists when running the same operations.
I agree. Pulling "common repeat" stuff into natural subroutines is easier to maintain, but incurs performance burden. Eliminating "hpss_get()" and "hpss_put()", in favor of "hpss_transfer( ..., 'get', ...)" and "hpss_transfer( ...,'put', ...) wouild eliminate a stack call, but not really change manitainability. How it might improve performance is a question. |
|
@forsyth2 Curious thought. One could have a "code flattener" that takes "well-subroutined" code, and rewrites it with subroutines eliminated and code repeated everywhere - "bloating" the code in volume, but just for the purpose of compiling. You would always maintain the code with subroutines intact. Why is this not the practice, if python calls are so expensive? (Not possible with recursion of course). |
|
Hmm ChatGPT has a lot to say on that topic:
Full AI responseBecause in Python, that kind of source-level “flattening” usually does not buy enough to justify the cost, and often does not attack the real source of overhead. A few reasons. First, manual or automatic inlining is much more valuable in compiled languages than in Python. In C/C++, when a compiler inlines a function, it can do much more than just remove the call:
In Python, eliminating a function call removes some interpreter/frame overhead, but the body is still executed by the Python interpreter one bytecode at a time. You do not suddenly unlock the same class of deep optimizations a native compiler would. So the payoff is smaller. Second, Python code is highly dynamic, which makes general-purpose source flattening awkward and fragile. A function call in Python is not always just “jump to fixed code”:
A flattener that preserves exact Python semantics in all those cases is hard to make. A flattener that ignores them would silently change behavior. Third, flattening duplicates bytecode and usually hurts maintainability and sometimes runtime characteristics:
So you trade a modest reduction in call overhead for bigger, uglier, harder-to-debug code. Fourth, the better optimization boundary in Python is often not “inline more Python,” but “do less Python.” The common winning moves are:
Those usually dominate any gain from flattening pure Python call structure. Fifth, Python already has better ways to preserve abstraction without paying as much at runtime, depending on the case:
That is a more targeted version of what your colleague is imagining. There is also a deeper point: a “code flattener” is basically trying to be a compiler optimization pass. But CPython is not a traditional optimizing compiler pipeline. It does not take your whole program, prove properties about it, inline aggressively, and emit optimized native code. Since Python remains dynamic at runtime, source rewriting has less leverage. So the answer to “why is this not the practice?” is roughly:
That said, the idea is not crazy. It does exist in neighboring forms:
So the real-world practice is not “flatten all Python before compiling,” but rather:
For your colleague’s idea, the closest practical statement is:
In your tar/transfer workflow, I would only consider this for the tiny per-file inner-loop helpers, not for the orchestration or I/O-heavy layers. |
|
@forsyth2 Very nice summary. I'd like to know how much overhead (if any) is incurred by the "hpss_put()" type call-stack lengthening conducted only to "rename" a function (almost purely a "pass-through"). Experiment: Call simple function "X" 100 times, Then rewrite the code to call function "A" 100 times, where A() calls B(), calls C(), calls D() ... Calls X(). Any parameters are simply passed through. Test performance burden as call length increases. |
Ok, here's another data point. Figure 1Performance profile running on the exact same code as the first run:
Figure 2Comparing those results to
It seems like again we come to the same conclusion: Globus runs are much faster and the no-HPSS and HPSS runs are a little slower. Considering, (1) I believe most users use Figure 3Comparing those results instead to the results from the first run:
It appears the Globus runs are not too too different in runtime. Only two runs really stand out in terms of consistency between performance profiles:
SetupSetup steps# 2026-04-06
```bash
# Running on Bebop since Chrysalis is down.
cd ~/ez/zstash
git status
# On branch profile-refactored-tar-deletion
git diff --staged | cat
# diff --git a/tests/performance/generate_performance_data.bash b/tests/performance/generate_performance_data.bash
# index 83bc913..c313bf1 100755
# --- a/tests/performance/generate_performance_data.bash
# +++ b/tests/performance/generate_performance_data.bash
# @@ -15,7 +15,7 @@ set -e
# # Run from Perlmutter, so that we can do both
# # a direct transfer to HPSS & a Globus transfer to Chrysalis
# work_dir=/pscratch/sd/f/forsyth/zstash_performance/
# -unique_id=performance_20260402
# +unique_id=performance_pr416_20260403
# dir_to_copy_from=/global/cfs/cdirs/e3sm/forsyth/E3SMv2/v2.LR.historical_0201/
# subdir0=build/
# diff --git a/tests/performance/visualize_performance.py b/tests/performance/visualize_performance.py
# index 79e5f7f..32266e6 100644
# --- a/tests/performance/visualize_performance.py
# +++ b/tests/performance/visualize_performance.py
# @@ -56,17 +56,19 @@ import pandas as pd
# # The results to show in Fig. 1
# RESULTS_CSV: str = (
# - "/pscratch/sd/f/forsyth/zstash_performance/performance_20260402/results.csv"
# + "/pscratch/sd/f/forsyth/zstash_performance/performance_pr416_20260403/results.csv"
# )
# # The results to compare against in Fig. 2.
# # Set to None to skip Fig. 2.
# -BASELINE_RESULTS_CSV: Optional[str] = None
# +BASELINE_RESULTS_CSV: Optional[str] = (
# + "/pscratch/sd/f/forsyth/zstash_performance/performance_20260402/results.csv"
# +)
# # Output path for the saved figures.
# # Set to None to display interactively instead of saving.
# OUTPUT_PATH: Optional[str] = (
# - "/global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance__20260402_pr427.png"
# + "/global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260403_pr416_vs_pr427.png"
# )
# # ---------------------------------------------------------------------------
nersc_conda
conda activate zstash-pr427-performance-profile-20260402
git add -A
pre-commit run --all-files
git commit -m "Profiling 20260403"
# Edit tests/performance/generate_performance_data.bash
# unique_id=performance_pr416_20260406
python -m pip install .
cd tests/performance
git diff # Check diff
./generate_performance_data.bash
# ~2-2.5 hours to run, note there is the manual step to paste an auth code (about 5-10 minutes into run time)
# [SUCCESS] All tests completed. Results saved to: /pscratch/sd/f/forsyth/zstash_performance/performance_pr416_20260406/results.csv
# [INFO] Now edit IO paths and run: python visualize_performance.py
# First, we'll compare against `main`.
# Edit tests/performance/visualize_performance.py
# RESULTS_CSV => /pscratch/sd/f/forsyth/zstash_performance/performance_pr416_20260406/results.csv
# OUTPUT_PATH => /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_pr427.png
git diff # Check diff
pre-commit run --all-files
git add -A
python visualize_performance.py
# Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_pr427.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260406_pr416_vs_pr427.png
# Figure 2 (baseline comparison) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_pr427_vs_baseline.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260406_pr416_vs_pr427_vs_baseline.png
# Second, we'll compare against the performance profile of the same exact code.
# Edit tests/performance/visualize_performance.py
# BASELINE_RESULTS_CSV => /pscratch/sd/f/forsyth/zstash_performance/performance_pr416_20260403/results.csv
# OUTPUT_PATH => /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_20260403_pr416.png
git diff # Check diff
pre-commit run --all-files
git add -A
python visualize_performance.py
# Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_20260403_pr416.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260406_pr416_vs_20260403_pr416.png
# Figure 2 (baseline comparison) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_20260406_pr416_vs_20260403_pr416_vs_baseline.png
# Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_20260406_pr416_vs_20260403_pr416_vs_baseline.png |
@TonyB9000 Do you have any comments on the plots above? I think this PR should be good to merge. (And we can do more code cleanup in #435 of course). |
So, I will merge this now. |





Summary
Objectives:
--non-blockingis set.Issue resolution:
--non-blockingcompletes #383 (closed because of significant rebase conflicts), Delete transferred files #405 (closed because of design not working properly)Select one: This pull request is...
Big Change
1. Does this do what we want it to do?
Required:
2. Are the implementation details accurate & efficient?
Required:
3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: