Concurrency fix for Global cache PoC#6832
Draft
jorgee wants to merge 1 commit intoglobal-cache-playgroundfrom
Draft
Concurrency fix for Global cache PoC#6832jorgee wants to merge 1 commit intoglobal-cache-playgroundfrom
jorgee wants to merge 1 commit intoglobal-cache-playgroundfrom
Conversation
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces distributed locking for task work directory creation in Nextflow, enabling safe concurrent execution in environments using a global cache (such as cloud storage). It adds a new
GlobalCacheLockManagerthat uses atomic file creation for distributed locks, updates the S3 file system provider to support atomic file creation withCREATE_NEWsemantics, and refactors locking interfaces for extensibility. Comprehensive tests for the new lock manager are also included.Distributed Locking for Global Cache:
GlobalCacheLockManager(GlobalCacheLockManager.groovy), which implements distributed locks using atomic file creation in cloud storage, and theLockHandleinterface to standardize lock release. ([[1]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-568d1408d477494155d7329f8929018eb157a09264142a10c39ab24f73a782d6R1-R122),[[2]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-5085a999ac12b57fa5d9a066631a888fb8855a1c4bc44c595ca3814979414528R1-R23))TaskProcessorto usesafeCheckAndCreateDir, ensuring that only one pipeline creates a task work directory when using a global cache, preventing race conditions. ([[1]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-3250bcd11fe7bb7aeeaf9839c44ac5d737dc27ddc9a734954789d09eb9dab399L825-R841),[[2]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-3250bcd11fe7bb7aeeaf9839c44ac5d737dc27ddc9a734954789d09eb9dab399R853-R888),[[3]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-3250bcd11fe7bb7aeeaf9839c44ac5d737dc27ddc9a734954789d09eb9dab399R100))Atomic File Creation Support for S3 and Az:
S3FileSystemProviderandS3OutputStreamto support theCREATE_NEWoption, usingIf-None-Match: *precondition for atomic object creation. ThrowsFileAlreadyExistsExceptionif the object already exists, ensuring correct lock semantics. ([[1]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-0fe8c2b4ba2ecc70a4c3409eafb255f6f3a3abb2a0ead25f6290a5b55e5faa38R223-R224),[[2]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-0fe8c2b4ba2ecc70a4c3409eafb255f6f3a3abb2a0ead25f6290a5b55e5faa38L237-R239),[[3]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-0fe8c2b4ba2ecc70a4c3409eafb255f6f3a3abb2a0ead25f6290a5b55e5faa38L261-R263),[[4]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-0fe8c2b4ba2ecc70a4c3409eafb255f6f3a3abb2a0ead25f6290a5b55e5faa38L336-R338),[[5]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-0fe8c2b4ba2ecc70a4c3409eafb255f6f3a3abb2a0ead25f6290a5b55e5faa38L349-R352),[[6]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-9be7f9a49f9f4da9db70ad595bc09e6396b5b5bb1326640548d0c40405c61519R27-R37),[[7]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-9be7f9a49f9f4da9db70ad595bc09e6396b5b5bb1326640548d0c40405c61519R147-R151),[[8]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-9be7f9a49f9f4da9db70ad595bc09e6396b5b5bb1326640548d0c40405c61519R214-R218),[[9]](https://github.com/nextflow-io/nextflow/pull/6832/files#diff-9be7f9a49f9f4da9db70ad595bc09e6396b5b5bb1326640548d0c40405c61519R634-R653))Testing:
GlobalCacheLockManagerTest,S3GlobalCacheLockManagerTest,AzGlobalCacheLockManagerTestandGsGlobalCacheLockManagerTestwith comprehensive tests for lock acquisition, release, concurrency, and error handling to ensure robust distributed locking behavior.