Skip to content

Conversation

@forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Nov 24, 2025

Summary

Objectives:

  • Delete tars eventually, even if --non-blocking is set

Issue resolution:

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Big Change

  • To merge, I will use "Create a merge commit". That is, this change is large enough to require multiple units of work (i.e., it should be multiple commits).

1. Does this do what we want it to do?

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added at least one automated test. Every objective above is represented in at least one test.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request adds at least one new possible command line option. I have tested using this option with and without any other option that may interact with it.
    • N/A

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zstash/conda, not just an import statement.
    • N/A

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.
    • This fix makes zstash behaves as should be expected based on the docs.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

@forsyth2 forsyth2 self-assigned this Nov 24, 2025
@forsyth2 forsyth2 added semver: bug Bug fix (will increment patch version) Globus Globus labels Nov 24, 2025
@forsyth2 forsyth2 force-pushed the issue-374-tar-deletion-rebased20251124 branch 2 times, most recently from 2252767 to 552daae Compare November 24, 2025 23:48
@forsyth2
Copy link
Collaborator Author

Visually compared diffs of #383 and this PR; they appear to match up correctly.

@forsyth2
Copy link
Collaborator Author

Note the first commit (183f024) is cherry-picked from #404

@forsyth2 forsyth2 force-pushed the issue-374-tar-deletion-rebased20251124 branch from 96dda0f to 7ead900 Compare December 3, 2025 23:44
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 3, 2025

Merged #404 (the test that fails on main and should pass with this fix), and rebased this PR off the latest main.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 3, 2025

Indeed, we now get:

==========================================
TEST RESULTS
==========================================
✓ blocking_non-keep PASSED
✓ non-blocking_non-keep PASSED
✓ blocking_keep PASSED
✓ non-blocking_keep PASSED
==========================================
TEST SUMMARY
==========================================
Total tests: 4
Passed: 4
Failed: 0
==========================================
All globus tar deletion tests completed successfully.

Remaining TODO:

  • Visual code review
  • Run full zstash test suite
  • Formal code review

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 4, 2025

Run full zstash test suite

Unfortunately, the globus_auth test caught an error. I will need to do some more debugging.

Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/ls.py", line 105, in ls_database
    hpss_get(hpss, get_db_filename(cache), cache)
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/hpss.py", line 173, in hpss_get
    hpss_transfer(hpss, file_path, "get", cache, False)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/hpss.py", line 116, in hpss_transfer
    raise RuntimeError(
        "Scheme is 'globus' but no GlobusTransferCollection provided"
    )
RuntimeError: Scheme is 'globus' but no GlobusTransferCollection provided

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/bin/zstash", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/main.py", line 75, in main
    ls()
    ~~^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/ls.py", line 34, in ls
    matches: List[FilesRow] = ls_database(args, cache)
                              ~~~~~~~~~~~^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/pr405-tar-deletion-20251203/lib/python3.13/site-packages/zstash/ls.py", line 107, in ls_database
    raise FileNotFoundError("There was nothing to ls.")
FileNotFoundError: There was nothing to ls.
Expected grep 'file_empty.txt' not found in run1_ls.log. Test failed.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 8, 2025

Fix & Chrysalis tests
cd ~/ez/zstash
git status
# On branch improve-globus-refresh

# Handle uncommitted changes
git add -A
git commit -m "Testing" --no-verify

git checkout issue-374-tar-deletion-rebased20251124
lcrc_conda
conda activate pr405-tar-deletion-20251203
git log
# Good, matches the 2 commits of https://github.com/E3SM-Project/zstash/pull/405/commits

git diff 01ffe34230c357145258544ed26d0b64327323ed 06255950a408a2fabb1606ebbbc718ed06c49697 | cat
# Review diff

# Make fixes and then:
pre-commit run --all-files
python -m pip install .

# Main test
cd tests/integration/bash_tests/run_from_any
./test_globus_tar_deletion.bash 20251208_pr405 /home/ac.forsyth2/ez/zstash /home/ac.forsyth2/zstash_tests LCRC_IMPROV_DTN_ENDPOINT yes

That gives:

==========================================
TEST RESULTS
==========================================
✓ blocking_non-keep PASSED
✓ non-blocking_non-keep PASSED
✓ blocking_keep PASSED
✓ non-blocking_keep PASSED
==========================================
TEST SUMMARY
==========================================
Total tests: 4
Passed: 4
Failed: 0
==========================================
All globus tar deletion tests completed successfully.

Run full test suite:

cd ~/ez/zstash
pytest tests/unit/test_*.py
# 1 passed in 1.61s
python -m unittest tests/integration/python_tests/group_by_command/test_*.py
# Ran 69 tests in 61.546s
# OK (skipped=32)
python -m unittest tests/integration/python_tests/group_by_workflow/test_*.py
# Ran 4 tests in 3.378s
# OK

cd tests/integration/bash_tests/run_from_any/
./globus_auth.bash pr405_full_test_try2 chrysalis /home/ac.forsyth2/ez/zstash /home/ac.forsyth2/zstash_tests /global/homes/f/forsyth/zstash_tests /home/f/forsyth/zstash_tests /compyfs/fors729/zstash_tests
# All globus_auth tests completed successfully.
# Good, the fix worked!

cd ~/ez/zstash
cd tests/integration/bash_tests/run_from_chrysalis/
# Revoke consents: https://auth.globus.org/v2/web/consents > Globus Endpoint Performance Monitoring > rescind all
rm ~/.zstash_globus_tokens.json
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
# NERSC_PERLMUTTER_ENDPOINT=6bdc7956-fc0f-4ad2-989c-7aa5ee643a79
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79//global/homes/f/forsyth/zstash/tests/test_database_corruption_setup23 zstash_demo
# Paste an auth code here, but NOT during the database_corruption test.
rm -rf zstash_demo/
time ./database_corruption.bash pr405_test_db
# Success count: 25
# Fail count: 0
# Review: 
# real	8m0.382s

time ./symlinks.sh 
# ./symlinks.sh: line 18: cd: /home/ac.forsyth2/ez/zstash/tests/utils/test_symlinks: No such file or directory
mkdir -p /home/ac.forsyth2/ez/zstash/tests/utils/test_symlinks
time ./symlinks.sh
# real	0m8.316s 
# Good, no errors

cd ~/ez/zstash
git status
rm -rf tests/integration/bash_tests/run_from_chrysalis/workdir/
pre-commit run --all-files
git add -A
git commit -m "Fix gtc handling for get"
git push upstream issue-374-tar-deletion-rebased20251124
Perlmutter tests
cd ~/ez/zstash
git status
# On branch test_unified_1.12.0rc4_perlmutter
# nothing to commit, working tree clean
git fetch upstream issue-374-tar-deletion-rebased20251124
git checkout -b issue-374-tar-deletion-rebased20251124 upstream/issue-374-tar-deletion-rebased20251124

nersc_conda # Activate conda.
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n pr405-tar-deletion-20251208
conda activate pr405-tar-deletion-20251208
pre-commit run --all-files
python -m pip install .

cd ~/ez/zstash
pytest tests/unit/test_*.py
# 1 passed in 0.21s
python -m unittest tests/integration/python_tests/group_by_command/test_*.py

That gives 3 errors. (Note that errors newly occur because the HPSS tests only work on Perlmutter).

======================================================================
FAIL: testUpdateCacheHPSS (tests.integration.python_tests.group_by_command.test_update.TestUpdate.testUpdateCacheHPSS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 194, in testUpdateCacheHPSS
    self.helperUpdateCache("testUpdateCacheHPSS", HPSS_ARCHIVE)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 142, in helperUpdateCache
    self.stop(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/base.py", line 141, in stop
    self.fail(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

======================================================================
FAIL: testUpdateKeepHPSS (tests.integration.python_tests.group_by_command.test_update.TestUpdate.testUpdateKeepHPSS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 187, in testUpdateKeepHPSS
    self.helperUpdateKeep("testUpdateKeepHPSS", HPSS_ARCHIVE)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 112, in helperUpdateKeep
    self.stop(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/base.py", line 141, in stop
    self.fail(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

======================================================================
FAIL: testUpdateNonEmptyHPSS (tests.integration.python_tests.group_by_command.test_update.TestUpdate.testUpdateNonEmptyHPSS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 201, in testUpdateNonEmptyHPSS
    self.helperUpdateNonEmpty("testUpdateNonEmptyHPSS", HPSS_ARCHIVE)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/test_update.py", line 166, in helperUpdateNonEmpty
    self.stop(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
  File "/global/u1/f/forsyth/ez/zstash/tests/integration/python_tests/group_by_command/base.py", line 141, in stop
    self.fail(error_message)
    ~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: The zstash cache zstash_test/zstash does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

----------------------------------------------------------------------
Ran 69 tests in 321.441s

FAILED (failures=3)
python -m unittest tests/integration/python_tests/group_by_workflow/test_*.py
# Ran 4 tests in 2.650s
# OK

# Skipping tests/integration/bash_tests/run_from_any/ tests
# `./globus_auth.bash` is cumbersome to run, but passed on Chrysalis
# `./test_globus_tar_deletion.bash` is set up to run a transfer from an endpoint to itself, 
# which has sometimes caused run into problems on Perlmutter. This test passed on Chrysalis.

cd tests/integration/bash_tests/run_from_perlmutter/
time ./follow_symlinks.sh
# real	0m32.711s
# Good, no errors
time ./test_update_non_empty_hpss.bash
# real	0m9.237s
# Good, no errors
time ./test_ls_globus.bash # Had to paste an auth-code
# real	0m43.061s
# Good, no errors

There a 3 HPSS test failures to account for. All other tests appear to be working ok now.

Once all tests are passing:

  • Do another visual review of the PR.
  • Squash commits into one and write a descriptive commit message.
  • Put into code review.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 2, 2026

Claude's review guide:

Expand

Code Review Guide: Fix Tar File Deletion After Globus Transfer

Overview

This diff addresses a bug where tar files were not being deleted after successful Globus transfers when --keep was not specified. The fix introduces proper transfer tracking and deferred deletion to handle Globus's asynchronous nature.

Key Changes

1. New Transfer Tracking Module (transfer_tracking.py)

  • Purpose: Centralize state management for transfers
  • Review Points:
    • GlobusTransfer class tracks individual transfer data, task IDs, and status
    • GlobusTransferCollection manages multiple transfers and shared state (endpoints, client)
    • HPSSTransferCollection tracks files queued for deletion
    • delete_transferred_files() and delete_current_files() handle actual deletion logic

Questions to ask:

  • Are the class responsibilities clear and well-separated?
  • Is the deletion logic safe (checks for file existence)?
  • Should transfers use collections.deque as the TODO suggests?

2. Refactored Global State (globus.py)

Before: Used module-level global variables

remote_endpoint = None
local_endpoint = None
transfer_client: TransferClient = None

After: State encapsulated in collection objects passed as parameters

Review Points:

  • All functions now receive gtc (GlobusTransferCollection) parameter
  • globus_activate() now returns GlobusTransferCollection instead of mutating globals
  • globus_transfer() creates new GlobusTransfer records and appends to collection
  • Thread safety: Is this code thread-safe now? (Likely wasn't before either)

3. Transfer Tracking Logic (hpss.py - hpss_transfer())

Critical Section (lines 93-97):

# Only track files for deletion if they should be deleted
# Don't track if keep=True or if it's an index file
if (not keep) and (not is_index):
    htc.curr_transfers.append(file_path)

Review Points:

  • ✅ Correctly excludes index.db from deletion tracking
  • ✅ Respects --keep flag
  • Verify: Should is_index check be redundant with proper keep flag usage?

Deletion Logic (lines 144-151):

if (not keep) and (not is_index):
    if scheme != "globus":
        # For direct HPSS, delete immediately since transfer is synchronous
        delete_current_files(htc)
    elif globus_status == "SUCCEEDED":
        # Note: This is intended to fulfill the default removal of successfully-transfered
        # tar files when keep=False, irrespective of non-blocking status
        delete_transferred_files(htc)

Review Points:

  • Direct HPSS transfers: Deletes immediately (synchronous)
  • Globus transfers: Only deletes if status is "SUCCEEDED"
  • Potential issue: In non-blocking mode, status might still be "ACTIVE" or "UNKNOWN" here
  • Check: Is deletion properly deferred to globus_finalize()?

4. Finalization Logic (globus.py - globus_finalize())

Key Addition (lines 258-263):

# Clean up tar files that were queued for deletion
if htc.curr_transfers:
    delete_transferred_files(htc)
if htc.prev_transfers:
    delete_transferred_files(htc)

Review Points:

  • Ensures cleanup happens after all transfers complete
  • Waits for transfers before cleanup (via globus_wait() calls)
  • Question: Why delete both curr_transfers and prev_transfers? Is this double-deletion safe?
  • Verify the state machine: curr_transfersprev_transfers → deletion

5. Exception Handling Refactor (create.py, update.py)

Before: Separate code paths for follow_symlinks
After: Single try-except with conditional error message

Review Points:

  • ✅ Reduces code duplication
  • ✅ Proper exception re-raising when follow_symlinks=False
  • Verify: Does this maintain the same behavior?

Testing Checklist

Functional Tests

  • Tar files deleted after successful Globus transfer (without --keep)
  • Tar files preserved with --keep flag
  • index.db never deleted
  • Non-blocking mode: files deleted after finalization
  • Blocking mode: files deleted immediately after each transfer
  • Direct HPSS transfers still work correctly
  • Multiple tar files in single run handled correctly

Edge Cases

  • Transfer fails: files should NOT be deleted
  • Mixed successful/failed transfers
  • Very long transfers with EXHAUSTED_TIMEOUT_RETRIES
  • Globus endpoint activation failures
  • File already deleted (verify graceful handling)

Regression Tests

  • --follow-symlinks with broken symlink behavior unchanged
  • --dry-run doesn't trigger deletions
  • Database corruption handling unchanged

Potential Issues to Investigate

1. Status Tracking in Non-Blocking Mode

In hpss_transfer(), status might be "UNKNOWN" or "ACTIVE" when deletion logic runs. Files might not get queued for deletion properly.

Check: Does globus_finalize() compensate for this?

2. Double Transfer Data Submission

In globus_finalize(), if there's pending transfer_data, it submits a new transfer. But globus_transfer() already submitted data.

Verify: Is this intentional for batching? Or could it duplicate transfers?

3. State Machine Clarity

The relationship between curr_transfers and prev_transfers is unclear:

  • When does curr become prev?
  • Why delete both in finalize?
  • What happens on consecutive calls?

Suggestion: Add state diagram or clearer comments

4. Error Paths

If globus_finalize() calls sys.exit(1) (line 236), are files left behind?

Check: Should there be cleanup in error paths?

5. Index Database Tracking

Lines commented with # Don't track index.db for deletion - verify this is consistently applied everywhere index.db is transferred.

Code Quality Notes

Strengths

  • ✅ Encapsulation improves over global variables
  • ✅ Type hints added (Optional[GlobusTransferCollection])
  • ✅ Logging messages are detailed and helpful
  • ✅ File existence checks before deletion

Improvements Needed

  • Documentation: Add docstrings to new classes and functions
  • State machine: Document the curr_transfers/prev_transfers lifecycle
  • Magic values: "SUCCEEDED", "ACTIVE" should be constants
  • Consider: Replace List with deque for transfer collections (performance)

Questions for Author

  1. Why is index.db excluded from htc tracking in some calls but the parameter is commented out?
  2. In non-blocking mode, how do you ensure files are deleted if status is still "ACTIVE" when hpss_transfer() returns?
  3. Should globus_wait() be called for ALL submitted transfers, not just the most recent?
  4. What's the expected behavior if a user Ctrl-C's during finalization?

Approval Checklist

  • Logic correctly handles both blocking and non-blocking modes
  • No files deleted when --keep is specified
  • index.db never deleted
  • State management is clear and correct
  • Error paths handle cleanup appropriately
  • Tests cover all new code paths
  • Documentation updated to reflect new behavior

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review using Claude's code-review guide. Currently (as of commit 4), the entire test suite is passing.

Self-review

Verifications

Verify: Should is_index check be redundant with proper keep flag usage?

Yes, index.db is treated differently than the tar files. keep applies to the tar files. The index.db is always kept.

Potential issue: In non-blocking mode, status might still be "ACTIVE" or "UNKNOWN" here
Check: Is deletion properly deferred to globus_finalize()?

Yes, the test added in #404 appears to confirm this.

Question: Why delete both curr_transfers and prev_transfers? Is this double-deletion safe?
Verify the state machine: curr_transfers → prev_transfers → deletion

I believe this is fine. By this point in the code, we've passed the # Wait for any submitted transfers to complete block and are just trying to clear the transfer queues.

Verify: Does this maintain the same behavior?

I believe so; no symlink test was broken.

Tar files deleted after successful Globus transfer (without --keep)
Tar files preserved with --keep flag
index.db never deleted
Non-blocking mode: files deleted after finalization

Yes, in test added in #404.

Direct HPSS transfers still work correctly

Yes, no HPSS test was broken.

Very long transfers with EXHAUSTED_TIMEOUT_RETRIES

This is more the domain of #407 (which was recently determined to already be resolved on main).

--follow-symlinks with broken symlink behavior unchanged

Yes, no symlink test broke.

--dry-run doesn't trigger deletions

Verified by visual inspection of --dry-run code path.

Database corruption handling unchanged

Yes, that test didn't break.

In hpss_transfer(), status might be "UNKNOWN" or "ACTIVE" when deletion logic runs. > Files might not get queued for deletion properly.
Check: Does globus_finalize() compensate for this?

Yes. By this point in the code, we've passed the # Wait for any submitted transfers to complete block and are just trying to clear the transfer queues.

In globus_finalize(), if there's pending transfer_data, it submits a new transfer. But globus_transfer() already submitted data.
Verify: Is this intentional for batching? Or could it duplicate transfers?

This is explicitly in a # Check if there's any pending transfer data that hasn't been submitted yet block.

When does curr become prev?
Why delete both in finalize?
What happens on consecutive calls?

curr is the list of tars being transferred via Globus right now. prev was the previous list. prev is the list of tars that are now ok to delete because they've been successfully transferred. They are both deleted in finalize because we have completed globus_wait by then and all queues should be cleared.

If globus_finalize() calls sys.exit(1) (line 236), are files left behind?
Check: Should there be cleanup in error paths?

We probably should not be deleting in the case of error.

Lines commented with # Don't track index.db for deletion - verify this is consistently applied everywhere index.db is transferred.

Yes, otherwise the tests would not be passing.

Why is index.db excluded from htc tracking in some calls but the parameter is commented out?

This is referring to:

    hpss_put(
        hpss,
        get_db_filename(cache),
        cache,
        keep=args.keep,
        is_index=True,
        gtc=gtc,
        # htc=htc,  # Don't track index.db for deletion
    )

It's commented out there to make specific note that we don't need to track index.db, only the tars.

In non-blocking mode, how do you ensure files are deleted if status is still "ACTIVE" when hpss_transfer() returns?

That's the point of the wait logic.

Should globus_wait() be called for ALL submitted transfers, not just the most recent?

No, that would render --non-blocking useless. The point is to wait on blocking transfers and at the very end.

What's the expected behavior if a user Ctrl-C's during finalization?

I imagine we'd want to keep the files since we're not sure they're transferred yet then.

Not explicitly tested/verified

Blocking mode: files deleted immediately after each transfer

The test from #404 does test that the files are deleted, but I suppose it's not explicitly testing it's done after each transfer.

Multiple tar files in single run handled correctly

The test from #404 is only testing one tar file.

Transfer fails: files should NOT be deleted
Mixed successful/failed transfers
Globus endpoint activation failures

How do we force a transfer failure?

File already deleted (verify graceful handling)

We'd have to concurrently delete files while the test was transferring data.

Possible improvements

Magic values: "SUCCEEDED", "ACTIVE" should be constants

I can make an enum for this.

Consider: Replace List with deque for transfer collections (performance)

That seems reasonable. (There probably aren't that many tars batched together for each transfer, but there's not really a downside to switching to deque).

Possible concerns:

Thread safety: Is this code thread-safe now? (Likely wasn't before either)

@forsyth2 forsyth2 marked this pull request as ready for review January 2, 2026 19:43
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding explanatory notes

logger.debug(f"{ts_utc()}:Calling globus_activate(hpss)")
globus_activate(hpss)
logger.debug(f"{ts_utc()}:Calling globus_activate()")
gtc = globus_activate(hpss)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a new object to hold state data, rather than using global variables.

# Create and set up the database
logger.debug(f"{ts_utc()}: Calling create_database()")
failures: List[str] = create_database(cache, args)
htc: HPSSTransferCollection = HPSSTransferCollection()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea of using objects to store data. Don't confuse with GlobusTransferCollection().

keep=args.keep,
is_index=True,
gtc=gtc,
# htc=htc, # Don't track index.db for deletion
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to track index.db because we never delete it.

Comment on lines +301 to +302
if args.follow_symlinks:
raise Exception("Archive creation failed due to broken symlink.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to make the code easier to read and avoid an identical function call being written out twice (once in each part of the if/else block).

sys.exit(1)


def file_exists(name: str) -> bool:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to zstash/globus_utils.py

Comment on lines +17 to +18
# ACTIVE, SUCCEEDED, FAILED, INACTIVE
self.task_status: Optional[str] = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible improvement: could replace with an enum

Comment on lines +43 to +44
self.prev_transfers: List[str] = [] # Can remove
self.curr_transfers: List[str] = [] # Still using!
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible improvement: use dequeue rather than List

logger.debug(f"{ts_utc()}: HPSSTransferCollection initialized")


def delete_transferred_files(htc: HPSSTransferCollection):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in Globus case: we can delete the previous files and rotate the current transfer list to be the "previous" list.

logger.debug(f"{ts_utc()}: prev_transfers has been set to {htc.prev_transfers}")


def delete_current_files(htc: HPSSTransferCollection):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in HPSS case: we can delete the current files right away because we don't have to wait long for a transfer.

Comment on lines +296 to +297
if args.follow_symlinks:
raise Exception("Archive update failed due to broken symlink.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same update to the follow_symlinks exception wrapping as in create.py

@forsyth2 forsyth2 marked this pull request as draft January 2, 2026 22:02
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 2, 2026

Converted back to draft. Upon further self-review, need to resolve a few things:

  • This code is only deleting the tars at the end, not after each successful transfer (as would be expected in --non-blocking mode). The test from Add test for tar deletion #404 should also be updated to reflect this.
  • The TransferCollection could use some refactoring. It may also be beneficial to implement the PR as two commits: 1) refactoring, 2) actual beahvior changes

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 3, 2026

Closing in favor of #416

@forsyth2 forsyth2 closed this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Globus Globus semver: bug Bug fix (will increment patch version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: tar files are not deleted after successful globus transfer

2 participants