-
Notifications
You must be signed in to change notification settings - Fork 29
Delete datasets on S3 #8924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Delete datasets on S3 #8924
Conversation
📝 WalkthroughWalkthroughCentralizes dataset deletion into a new DatasetService.deleteDataset flow that deletes data from disk and managed S3, adds ManagedS3Service and S3 utilities, refactors path-centric DAOs/clients/routes (including DELETE-with-JSON), updates controllers/routes/upload and datastore services, and tightens UPath behavior and tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
86-97: Consider parallelizing prefix deletion.Line 92 uses
Fox.serialCombinedto process deletions sequentially. For datasets with many prefixes, this could be slow. Consider whether parallel processing (e.g.,Fox.combined) would be more appropriate, especially since S3 operations can benefit from concurrency.frontend/javascripts/admin/rest_api.ts (1)
1381-1385: LGTM! Endpoint updated to match backend route change.The URL change from
/api/datasets/${datasetId}/deleteOnDiskto/api/datasets/${datasetId}correctly aligns with the backend controller refactoring wheredeleteOnDiskwas renamed todelete.Consider renaming the function from
deleteDatasetOnDisktodeleteDatasetin a follow-up to match the backend's naming and better reflect that it now handles both on-disk and S3 deletions.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
304-315: Consider error handling for partial failures.The method correctly splits paths into local and managed S3, then processes each group. However,
Fox.serialCombinedwill fail-fast on the first error for local paths, and if local deletions succeed but S3 deletions fail (or vice versa), the operation will be partially complete.For idempotent cleanup operations this may be acceptable, but consider whether you need:
- Parallel local path deletion for performance
- All-or-nothing semantics with rollback
- Aggregated error reporting showing all failures
If the current fail-fast behavior is intended, you might want to document this in a comment or method documentation.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
438-448: Endpoint host derivation may drop port and assumes credentialName is a URI.Using new URI(...).getHost loses ports (e.g., MinIO on :9000) and returns null if credentialName isn’t a URI literal.
Consider:
- Parse authority instead of host to retain host:port.
- Guard nulls and fail fast with a clear error.
Example:- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost - newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox + val endpointUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) + val authority = Option(endpointUri.getAuthority).getOrElse(endpointUri.getHost) + newBasePath <- UPath.fromString(s"s3://$authority/$s3UploadBucket/$s3ObjectKey").toFoxConfirm Datastore.S3Upload.credentialName is always a valid URI and whether non-default ports must be preserved in UPath for your downstream S3 handlers.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
428-435: Consider logging and basic input guards for deletePaths.Add a brief info log (count and first path) and short-circuit empty lists to reduce no-op calls and aid auditing.
def deletePaths(): Action[Seq[UPath]] = Action.async(validateJson[Seq[UPath]]) { implicit request => accessTokenService.validateAccessFromTokenContext(UserAccessRequest.webknossos) { for { - _ <- dataSourceService.deletePathsFromDiskOrManagedS3(request.body) + _ <- if (request.body.isEmpty) Fox.successful(()) else { + logger.info(s"Deleting ${request.body.size} path(s), first=${request.body.headOption.getOrElse("")}") + dataSourceService.deletePathsFromDiskOrManagedS3(request.body) + } } yield Ok } }app/models/dataset/Dataset.scala (3)
918-931: Robustness: prefer COALESCE(realPath, path) for prefix search to handle missing realPath.If realPath is NULL, rows are skipped even if path matches.
- WHERE starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)Ensure your Postgres version supports starts_with(text, text). If not, replace with path LIKE $prefix || '%' or a range predicate.
1303-1317: Attachments-only paths query: consider excluding pending uploads.If pending attachments exist, their paths may not be physically present yet.
Add AND NOT uploadToPathIsPending to mirror other reads, unless you deliberately want to delete pending targets too.
1273-1302: Performance: add indexes for path-prefix lookups.Prefix scans on dataset_mags.realPath/path and dataset_layer_attachments.path will benefit from indexes.
- Create btree indexes with text_pattern_ops or consider pg_trgm for broader LIKE usage:
- CREATE INDEX ON webknossos.dataset_mags (realPath text_pattern_ops);
- CREATE INDEX ON webknossos.dataset_layer_attachments (path text_pattern_ops);
Evaluate with EXPLAIN on your typical prefixes.Also applies to: 1318-1331
app/models/dataset/DatasetService.scala (1)
537-544: Skip remote call when there’s nothing to delete.Avoid a no-op network roundtrip for virtual datasets when no unique paths exist.
- _ <- datastoreClient.deletePaths(pathsUsedOnlyByThisDataset) + _ <- Fox.runIf(pathsUsedOnlyByThisDataset.nonEmpty) { + datastoreClient.deletePaths(pathsUsedOnlyByThisDataset) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
app/controllers/DatasetController.scala(1 hunks)app/controllers/WKRemoteDataStoreController.scala(2 hunks)app/models/dataset/Dataset.scala(3 hunks)app/models/dataset/DatasetService.scala(4 hunks)app/models/dataset/WKRemoteDataStoreClient.scala(1 hunks)app/models/storage/UsedStorageService.scala(1 hunks)conf/webknossos.latest.routes(1 hunks)frontend/javascripts/admin/rest_api.ts(1 hunks)test/backend/UPathTestSuite.scala(1 hunks)unreleased_changes/8924.md(1 hunks)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala(1 hunks)util/src/main/scala/com/scalableminds/util/tools/Fox.scala(0 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(6 hunks)webknossos-datastore/conf/datastore.latest.routes(1 hunks)
💤 Files with no reviewable changes (1)
- util/src/main/scala/com/scalableminds/util/tools/Fox.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (14)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
runIf(167-176)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectoryBox(122-130)
app/models/storage/UsedStorageService.scala (1)
app/models/dataset/DataStore.scala (1)
findOneByName(118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
Full(661-699)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
Fox(28-228)Fox(230-303)serialCombined(93-97)serialCombined(97-109)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
isLocal(30-30)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
pathIsInManagedS3(153-157)deletePaths(77-84)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)deleteDirectoryRecursively(184-191)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-50)hostBucketFromUri(9-20)objectKeyFromUri(38-47)isNonAmazonHost(47-50)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala (1)
toRemoteUriUnsafe(75-78)
test/backend/UPathTestSuite.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (4)
UPath(55-97)fromStringUnsafe(62-79)startsWith(139-143)startsWith(209-213)
app/models/dataset/WKRemoteDataStoreClient.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (14)
addQueryParam(30-33)addQueryParam(33-36)addQueryParam(36-39)addQueryParam(39-42)addQueryParam(42-45)addQueryParam(45-50)addQueryParam(50-53)addQueryParam(53-56)addQueryParam(56-59)addQueryParam(59-63)addQueryParam(63-68)addQueryParam(68-71)getWithJsonResponse(132-135)deleteJson(238-242)app/controllers/UserTokenController.scala (1)
RpcTokenHolder(31-39)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(77-84)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)
app/models/dataset/DatasetService.scala (9)
app/models/annotation/Annotation.scala (2)
findOne(289-296)countAllByDataset(644-656)app/models/storage/UsedStorageService.scala (2)
UsedStorageService(33-227)refreshStorageReportForDataset(211-227)app/models/dataset/Dataset.scala (6)
deleteDataset(709-737)findPathsUsedOnlyByThisDataset(899-918)findPathsUsedOnlyByThisDataset(1303-1318)findDatasetsWithMagsInDir(918-931)findDatasetsWithAttachmentsInDir(1318-1331)findOne(203-210)app/controllers/WKRemoteDataStoreController.scala (1)
deleteDataset(208-215)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
deleteDataset(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(77-84)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deletePaths(134-141)deleteOnDisk(122-129)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (8)
toFox(12-12)Fox(28-228)Fox(230-303)s(234-238)s(238-248)s(248-257)shiftBox(310-310)successful(51-54)
app/controllers/DatasetController.scala (5)
util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)
log(55-62)app/security/UserAwareRequestLogging.scala (1)
log(13-21)app/models/user/User.scala (2)
isAdminOf(56-59)isAdminOf(59-62)app/utils/WkConf.scala (1)
Features(135-155)app/models/dataset/DatasetService.scala (1)
deleteDataset(533-559)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-78)DataVaults(61-63)S3Upload(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-50)hostBucketFromUri(9-20)hostBucketFromUpath(22-26)objectKeyFromUri(38-47)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (8)
UPath(55-97)toRemoteUriUnsafe(122-122)toRemoteUriUnsafe(199-199)getScheme(120-120)getScheme(197-197)fromString(60-60)startsWith(139-143)startsWith(209-213)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectory(116-120)app/models/dataset/WKRemoteDataStoreClient.scala (1)
deleteOnDisk(122-129)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk(413-428)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
successful(51-54)failure(56-60)
app/models/dataset/Dataset.scala (3)
app/utils/sql/SqlInterpolation.scala (1)
q(20-39)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
toSingleBox(63-81)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
DSRemoteWebknossosClient(57-216)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
ManagedS3Service(28-157)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectoryBox(122-130)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk(14-36)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
deleteDatasetFromDB(569-584)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (1)
238-242: LGTM! Well-structured addition.The new
deleteJsonmethod follows the established pattern of other JSON-based HTTP methods in this file (postJson,putJson,patchJson), ensuring consistency and maintainability.app/models/dataset/WKRemoteDataStoreClient.scala (2)
129-132: Verify caching behavior is intended.The
lazy valmeans the base directory path will be cached for the lifetime of thisWKRemoteDataStoreClientinstance. If the datastore's base directory configuration can change at runtime, this could return stale data. Consider whether adefwould be more appropriate, or if the caching is intentional (e.g., base directory is truly static).
134-139: LGTM! Proper integration with the new deleteJson method.The
deletePathsmethod correctly uses the newdeleteJsonRPC method to send the paths as a JSON payload, maintaining consistency with the overall deletion workflow.app/models/storage/UsedStorageService.scala (1)
214-214: LGTM! Dependency refactoring improves modularity.The change from
datasetService.dataStoreFor(dataset)todataStoreDAO.findOneByName(dataset._dataStore.trim)successfully decouplesUsedStorageServicefromDatasetService, improving the service architecture. The.trim()provides defensive handling of potential whitespace.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (3)
77-84: LGTM! Well-structured deletion orchestration.The
deletePathsmethod properly groups paths by bucket before delegating todeleteS3PathsOnBucket, ensuring efficient batch processing.
122-143: LGTM! Proper pagination handling.The recursive
listKeysAtPrefixcorrectly handles S3's pagination (max 1000 keys per request) with continuation tokens, ensuring all objects under a prefix are discovered before deletion.
153-155: Credential name format is URI-based, not simple identifiers.The configuration documentation confirms
credential.nameis an S3 URI path (e.g.,s3://bucket/prefix), not a simple identifier. The comment states: "The credentials are selected by uri prefix, so different s3 uri styles may need duplicated credential entries." ThestartsWithlogic correctly matches paths against credential name prefixes to identify managed S3 resources.Likely an incorrect or invalid review comment.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
183-195: LGTM! Type refinement and new parents accessor improve API clarity.Narrowing the
parentreturn type toRemoteUPath(line 183) and introducing theparentsmethod (lines 187-195) provide better type safety and a clear way to access the path hierarchy.
209-213: LGTM! Semantic path comparison improvement.The updated
startsWithimplementation now correctly checks for parent relationships viaparents.containsrather than using string prefix matching. This prevents false positives (e.g., "s3://bucket/pathSomewhereElse" incorrectly matching "s3://bucket/path").webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)
41-41: LGTM! Proper DI configuration for the new service.The eager singleton binding of
ManagedS3Serviceensures it's available for injection into dependent services (likeUploadServiceandDataSourceService) and is instantiated at startup, which is appropriate for a service managing S3 connections.unreleased_changes/8924.md (1)
1-5: LGTM! Clear and accurate changelog entries.The changelog appropriately documents both the new S3 deletion capability and the safety improvement for preventing deletion of shared dataset directories.
test/backend/UPathTestSuite.scala (1)
132-145: LGTM! Valuable test coverage for path component comparison.These test additions correctly verify that
startsWithcompares actual path components rather than string prefixes, preventing false positives from substring matches.util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1)
14-14: LGTM! Appropriate conversion from trait to object.Converting
PathUtilsfrom a trait to an object is a sensible refactor that makes it a proper utility class. The related changes inDataSourceToDiskWriter.scalashow the call sites have been updated accordingly.app/controllers/WKRemoteDataStoreController.scala (1)
212-212: LGTM! Improved separation of concerns.Delegating dataset deletion logic to
datasetService.deleteDatasetFromDBis a good refactor that keeps the controller thin and moves business logic to the service layer.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala (2)
15-15: LGTM! Correct update for PathUtils object usage.Removing
PathUtilsfrom the trait mixins aligns with its conversion to an object in the related changes.
30-30: LGTM! Correct object method invocation.The call site correctly uses
PathUtils.ensureDirectoryBoxas an object method.webknossos-datastore/conf/datastore.latest.routes (2)
107-107: LGTM! New endpoint for base directory retrieval.The new
baseDirAbsoluteendpoint provides the base directory path, likely needed for coordinating dataset deletion operations.
114-114: LGTM! New endpoint for batch path deletion.The new
deletePathsendpoint enables batch deletion of dataset paths, supporting both local and S3 storage deletion as described in the PR objectives.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
28-28: LGTM! New dependency for managed S3 operations.The new
managedS3Servicedependency enables S3 path deletion as described in the PR objectives.
317-318: LGTM! Straightforward existence check.The
existsOnDiskmethod correctly checks for dataset directory existence.webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (6)
11-12: LGTM! Updated imports for centralized S3 utilities.The imports correctly reference the new
S3UriUtilsfor centralized S3 URI parsing.
52-55: LGTM! Correct usage of centralized utility.Using
S3UriUtils.hostBucketFromUrieliminates code duplication and centralizes S3 URI parsing logic.
112-120: LGTM! Consistent utility usage.The refactored code correctly uses
S3UriUtils.objectKeyFromUrithroughout the method.
124-128: LGTM! Consistent refactoring.Correctly uses
S3UriUtils.objectKeyFromUriin thelistDirectorymethod.
162-167: LGTM! Consistent refactoring.Correctly uses
S3UriUtils.objectKeyFromUriin thegetUsedStorageBytesmethod.
227-236: LGTM! Consistent refactoring.Correctly uses
S3UriUtils.isNonAmazonHostto determine endpoint configuration.app/controllers/DatasetController.scala (1)
584-597: Deletion endpoint looks good; verify downstream effects.Auth gating (feature flag, isEditableBy, org admin) and logging are solid. Ensure datasetService.deleteDataset also:
- Deletes managed S3/local paths where applicable.
- Refreshes any storage/accounting state post-delete.
If not already covered, add an integration test that deletes a virtual dataset with managed S3 paths and asserts physical deletion plus DB removal.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
360-363: Good: safer directory creation.Switch to PathUtils.ensureDirectoryBox with failure mapping is correct for permission errors.
532-546: Good: centralize S3 transfers via ManagedS3Service and fail on partial uploads.Using transferManagerBox and checking failedTransfers aligns with the new S3 abstraction.
579-592: Good: deleteOnDisk now carries datasetId into DatasetDeleter.Passing datasetId improves auditability and parity with remote delete flows.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
90-94: Confirm exposure of absolute base dir is acceptable.Endpoint is gated by UserAccessRequest.webknossos. If that’s strictly server-to-server, fine; otherwise consider returning a tokenized ID instead.
413-426: Simplified deleteOnDisk flow looks correct.Switch to dsRemoteWebknossosClient.getDataSource and single deleteOnDisk call is cleaner and consistent with new semantics.
app/models/dataset/Dataset.scala (1)
819-821: LGTM: seed realPath alongside path.Storing realPath initially equal to path simplifies later updates and lookups.
app/models/dataset/DatasetService.scala (1)
533-558: Verify bucket allowlist enforcement in ManagedS3Service.deletePaths – potential security gap detected.The datastore-side
deletePathsmethod inManagedS3Service(line 77–97) does not explicitly validate that deletions target only managed/configured S3 buckets. The code extracts bucket names from UPaths and deletes without cross-checking against an allowlist:val pathsByBucket: Map[Option[String], Seq[UPath]] = paths.groupBy(S3UriUtils.hostBucketFromUpath) for { _ <- Fox.serialCombined(pathsByBucket.keys) { bucket: Option[String] => deleteS3PathsOnBucket(bucket, pathsByBucket(bucket)) } }This will attempt deletion from any S3 bucket in the paths provided, relying only on S3 client credentials availability. While the webknossos app-layer filters (line 305–306 in
DataSourceService.deletePathsFromDiskOrManagedS3), the datastore should enforce its own bucket allowlist to prevent accidental/malicious deletion from external buckets.Required fixes:
- Add explicit bucket validation in
deleteS3PathsOnBucketto only permits3UploadBucketOpt(or configured managed buckets)- Add integration test verifying deletion is rejected for non-managed S3 URIs
- Document the bucket restriction behavior
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
app/models/dataset/Dataset.scala (2)
871-889: Fix NULL read and UPath parsing failures in mags path query.The query selects
m1.pathwhich can be NULL according to the database schema, but reads it asStringand then parses toUPath, causing runtime failures.Apply this diff to handle NULL values properly:
def findPathsUsedOnlyByThisDataset(datasetId: ObjectId): Fox[Seq[UPath]] = for { pathsStr <- run(q""" - SELECT m1.path FROM webknossos.dataset_mags m1 + SELECT COALESCE(m1.realPath, m1.path) AS path + FROM webknossos.dataset_mags m1 WHERE m1._dataset = $datasetId AND NOT EXISTS ( SELECT m2.path FROM webknossos.dataset_mags m2 WHERE m2._dataset != $datasetId AND ( m2.path = m1.path OR m2.realpath = m1.realpath ) ) - """.as[String]) - paths <- pathsStr.map(UPath.fromString).toList.toSingleBox("Invalid UPath").toFox + """.as[Option[String]]) + paths <- pathsStr.flatten + .filter(_.nonEmpty) + .map(UPath.fromString) + .toList + .toSingleBox("Invalid UPath") + .toFox } yield paths
890-903: Address NULL realpath vulnerability in deletion guard query.The deletion guard uses
starts_with(m.realpath, ...)to find shared data, butrealpathis nullable in the database schema. Legacy mags with NULLrealpathwill be missed, risking accidental removal of shared data.Apply this diff to use COALESCE for fallback:
def findDatasetsWithMagsInDir(absolutePath: UPath, dataStore: DataStore, ignoredDataset: ObjectId): Fox[Seq[ObjectId]] = { // ensure trailing slash on absolutePath to avoid string prefix false positives val absolutePathWithTrailingSlash = if (absolutePath.toString.endsWith("/")) absolutePath.toString else absolutePath.toString + "/" run(q""" SELECT d._id FROM webknossos.dataset_mags m JOIN webknossos.datasets d ON m._dataset = d._id - WHERE starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash) AND d._id != $ignoredDataset AND d._datastore = ${dataStore.name.trim} """.as[ObjectId]) }app/models/dataset/DatasetService.scala (1)
547-560: Critical: DB deletion not sequenced; failures ignored.Using
_ =at lines 554-556 does not sequence the Fox operations into the for-comprehension. Errors won't propagate and the method may return before deletion completes.Apply this diff to properly sequence the operations:
def deleteDatasetFromDB(datasetId: ObjectId): Fox[Unit] = for { existingDatasetBox <- datasetDAO.findOne(datasetId)(GlobalAccessContext).shiftBox _ <- existingDatasetBox match { case Full(dataset) => for { annotationCount <- annotationDAO.countAllByDataset(dataset._id)(GlobalAccessContext) - _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO.deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + _ <- usedStorageService.refreshStorageReportForDataset(dataset) } yield () case _ => Fox.successful(()) } } yield ()
🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)
1275-1289: Consider NULL-safe handling for attachment paths.While attachment paths appear to be non-nullable in the schema, using
Option[String]for database reads provides defensive coding against schema changes or unexpected NULL values.Apply this diff for safer NULL handling:
def findPathsUsedOnlyByThisDataset(datasetId: ObjectId): Fox[Seq[UPath]] = for { pathsStr <- run(q""" SELECT a1.path FROM webknossos.dataset_layer_attachments a1 WHERE a1._dataset = $datasetId AND NOT EXISTS ( SELECT a2.path FROM webknossos.dataset_layer_attachments a2 WHERE a2._dataset != $datasetId AND a2.path = a1.path ) - """.as[String]) - paths <- pathsStr.map(UPath.fromString).toList.toSingleBox("Invalid UPath").toFox + """.as[Option[String]]) + paths <- pathsStr.flatten + .filter(_.nonEmpty) + .map(UPath.fromString) + .toList + .toSingleBox("Invalid UPath") + .toFox } yield paths
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/models/dataset/Dataset.scala(3 hunks)app/models/dataset/DatasetService.scala(4 hunks)conf/webknossos.latest.routes(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(0 hunks)
💤 Files with no reviewable changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (2)
app/models/dataset/Dataset.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
toFox(12-12)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
toSingleBox(63-81)
app/models/dataset/DatasetService.scala (5)
app/models/annotation/Annotation.scala (2)
findOne(289-296)countAllByDataset(644-656)app/models/storage/UsedStorageService.scala (2)
UsedStorageService(33-227)refreshStorageReportForDataset(211-227)app/models/dataset/Dataset.scala (6)
deleteDataset(709-737)findPathsUsedOnlyByThisDataset(871-890)findPathsUsedOnlyByThisDataset(1275-1290)findDatasetsWithMagsInDir(890-903)findDatasetsWithAttachmentsInDir(1290-1303)findOne(203-210)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(77-84)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (4)
conf/webknossos.latest.routes (2)
110-110: LGTM! New endpoint follows established patterns.The new
finishUploadToPathsendpoint follows the established reserve/finish pattern (similar to lines 107-109) and has consistent naming conventions. This addition supports the multi-path upload functionality mentioned in the PR objectives.
106-106: API change properly implemented and verified.The breaking API change has been handled correctly:
- The old route
DELETE /datasets/:datasetId/deleteOnDiskhas been completely removed from webknossos routes- The new route
DELETE /datasets/:datasetIdis in place and maps to the updatedDatasetController.delete()method- The frontend API client in
rest_api.tshas been updated to call the new endpoint- All frontend components accessing this functionality use the updated API client
- Backend implementation properly supports both virtual datasets (using
deletePaths) and non-virtual datasets (usingdeleteOnDisk)No additional client updates are needed within the codebase. The change is complete and consistent.
app/models/dataset/Dataset.scala (1)
1290-1303: LGTM! Proper deletion guard for attachments.The trailing slash logic correctly prevents false positives from string prefix matching, and the query properly joins with datasets to filter by datastore.
app/models/dataset/DatasetService.scala (1)
537-545: LGTM with dependency on DAO fixes.The method correctly combines mags and attachments queries and deduplicates the results. However, its correctness depends on the underlying DAO queries properly handling NULL realpath values (see comments on Dataset.scala).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
439-449: S3 bucket/endpoint derivation: add explicit error mapping and fix null-safe host extraction.Two valid concerns, but the suggested diff is syntactically broken:
Missing error mapping on
s3UploadBucketOpt.toFoxyields a generic failure message. Add explicit context:?~> "No managed S3 upload bucket configured (Datastore.S3Upload)."
URI.getHost()can return null and will fail on methods likeendsWith()or in string interpolation. UseOption(credUri.getHost).getOrElse(credUri.getAuthority)instead.The proposed diff is incorrect: the
.flatMap(identity)at the end is a type error. Instead, wrap the entire S3 path construction in the existing for-yield block without attempting to flatten outside:- s3UploadBucket <- managedS3Service.s3UploadBucketOpt.toFox + s3UploadBucket <- managedS3Service.s3UploadBucketOpt.toFox ?~> "No managed S3 upload bucket configured (Datastore.S3Upload)." @@ - endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost - newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox + credUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) + endPointHost = Option(credUri.getHost).getOrElse(credUri.getAuthority) + newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
♻️ Duplicate comments (2)
app/models/dataset/DatasetService.scala (2)
511-535: Critical: NULL realpath handling remains unfixed.The past review identified that deletion guards miss datasets with NULL
realpathvalues:
Virtual deletion (lines 516-517):
findPathsUsedOnlyByThisDatasetusesm2.realpath = m1.realpathin SQL, which silently fails when realpath is NULL (NULL comparisons return unknown, not true). This could cause deletion of paths shared by datasets with NULL realpath.Non-virtual deletion (lines 528-531):
findDatasetsWithMagsInDirusesstarts_with(m.realpath, ...), which returns NULL for rows with NULL realpath. Other datasets sharing the directory won't be detected, risking data loss.Required fix: Update the DAO queries to use
COALESCE(realpath, path)or add explicit NULL handling.
547-560: Critical: DB deletion still not sequenced; failures still ignored.The past review correctly identified that line 554 uses
_ =instead of_ <-, which does not sequence the deletion into the Fox chain. ThedatasetDAO.deleteDataset(...).flatMap(...)Future/Fox executes independently, so:
- Errors won't propagate to the caller
- The method may return before deletion completes
- Storage refresh failures are silently ignored
Required fix:
case Full(dataset) => for { annotationCount <- annotationDAO.countAllByDataset(dataset._id)(GlobalAccessContext) - _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO.deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + _ <- usedStorageService.refreshStorageReportForDataset(dataset) } yield ()
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
363-363: Good: safer directory creation with PathUtils.ensureDirectoryBox.This avoids exceptions and returns proper Fox errors. Consider applying the same pattern in handleUploadChunk for parent dir creation to keep error handling consistent.
- PathUtils.ensureDirectory(uploadDir.resolve(filePath).getParent) + PathUtils.ensureDirectoryBox(uploadDir.resolve(filePath).getParent).toFox ?~> "dataset.import.fileAccessDenied"
542-550: Surface clearer info on failed S3 transfers.The error currently prints a collection object. Consider logging the count and a few sample keys to aid debugging, and include the uploadId/datasetId if available.
- _ <- Fox.fromBool(failedTransfers.isEmpty) ?~> - s"Some files failed to upload to S3: $failedTransfers" + _ <- Fox.fromBool(failedTransfers.isEmpty) ?~> { + val count = failedTransfers.size() + s"Some files failed to upload to S3 (failed count=$count)." + }Add metrics for transfer success/failure counts and durations to monitor S3 health over time.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/controllers/WKRemoteDataStoreController.scala(2 hunks)app/models/dataset/DatasetService.scala(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
Fox(28-228)Fox(230-303)app/models/dataset/DatasetService.scala (1)
deleteDatasetFromDB(547-562)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
ManagedS3Service(28-157)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectoryBox(122-130)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk(14-36)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)
app/models/dataset/DatasetService.scala (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)app/models/storage/UsedStorageService.scala (2)
UsedStorageService(33-227)refreshStorageReportForDataset(211-227)app/controllers/WKRemoteDataStoreController.scala (1)
deleteDataset(207-214)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
deleteDataset(133-138)app/models/dataset/Dataset.scala (6)
deleteDataset(709-737)findPathsUsedOnlyByThisDataset(871-890)findPathsUsedOnlyByThisDataset(1275-1290)findDatasetsWithMagsInDir(890-903)findDatasetsWithAttachmentsInDir(1290-1303)findOne(203-210)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(77-84)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deletePaths(134-141)deleteOnDisk(122-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
13-13: LGTM! Import cleanup aligns with API refactoring.The removal of
LayerMagLinkInfofrom the import statement is consistent with the removal of thefetchPathsmethod and the shift toward path-centric APIs mentioned in the PR context.
57-211: No action required—all references properly updated.Verification confirms no callers of the removed
fetchPathsmethod remain in the codebase. The public API change has been cleanly executed with all callers migrated.app/controllers/WKRemoteDataStoreController.scala (1)
6-6: LGTM! Deletion logic successfully centralized.The removal of the
Fullimport and simplification of the deletion endpoint to delegate toDatasetService.deleteDatasetFromDBis a clean refactoring. This centralizes deletion logic and reduces duplication.Also applies to: 211-211
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
583-596: deleteOnDisk call-site update confirmed complete.All identified call sites to
DatasetDeleter.deleteOnDisk()now use the correct new signature withdatasetIdas the first parameter. The two call sites in UploadService.scala (lines 583 and 591) and the call site in DataSourceController.scala (line 419) all pass arguments in the correct order matching the function definition. No old-style call sites with different argument ordering were found.
23-24: ManagedS3Service DI wiring is complete and properly configured.Verification confirms: ManagedS3Service is bound at DataStoreModule line 41, UploadService is bound at line 42, and the constructor parameter at line 100 is correctly added. No test files were found with broken UploadService instantiation; tests use Guice injection so no updates needed. The changes are safe to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
48-48: Log the actual deduplicated path.Line 48 logs
targetPath, but whenretryCount > 0, the actual destination isdeduplicatedTargetPath. Log the path that was actually used.Apply this diff:
- logger.info(s"Successfully moved dataset from $sourcePath to $targetPath.") + logger.info(s"Successfully moved dataset from $sourcePath to $deduplicatedTargetPath.")app/models/dataset/Dataset.scala (1)
871-890: Critical: Query returns wrong paths for non-virtual datasetsThe query selects only
m1.path, but for non-virtual datasets the physical path to delete is stored inm1.realPath. This means:
- Non-virtual datasets would return logical paths instead of physical filesystem paths
- Deletion would target the wrong locations
- Could cause data loss or orphaned data
A past reviewer flagged this exact issue and suggested using
COALESCE(m1.realPath, m1.path).Based on learnings
Apply this diff to select the actual physical path:
- pathsStrOpts <- run(q""" - SELECT m1.path FROM webknossos.dataset_mags m1 - WHERE m1._dataset = $datasetId - AND m1.path IS NOT NULL + pathsStrOpts <- run(q""" + SELECT COALESCE(m1.realPath, m1.path) AS path + FROM webknossos.dataset_mags m1 + WHERE m1._dataset = $datasetId + AND COALESCE(m1.realPath, m1.path) IS NOT NULL AND NOT EXISTS ( SELECT m2.path FROM webknossos.dataset_mags m2 WHERE m2._dataset != $datasetId AND ( - m2.path = m1.path + m2.path = COALESCE(m1.realPath, m1.path) OR - ( - m2.realpath IS NOT NULL AND m2.realpath = m1.realpath - ) + m2.realpath = COALESCE(m1.realPath, m1.path) ) ) """.as[Option[String]])app/models/dataset/DatasetService.scala (3)
548-557: Good fix: sequence deletion and refresh; errors now propagate.Replacing
_ = ...with<-for both deleteDataset and refreshStorageReport correctly sequences the Fox chain.
511-523: Critical: “used-only-by-this-dataset” can miss shared data (NULL realpath/symlink).Current DAO logic for mags uses realpath equality that fails with NULL and can delete paths still used by other datasets. Use COALESCE to align with index semantics, or conservatively block when NULLs exist. Also consider validating paths before delete.
Suggested DAO fix (app/models/dataset/Dataset.scala):
- AND NOT EXISTS ( - SELECT m2.path - FROM webknossos.dataset_mags m2 - WHERE m2._dataset != $datasetId - AND ( - m2.path = m1.path - OR ( - m2.realpath IS NOT NULL AND m2.realpath = m1.realpath - ) - ) - ) + AND NOT EXISTS ( + SELECT 1 + FROM webknossos.dataset_mags m2 + WHERE m2._dataset != $datasetId + AND COALESCE(m2.realpath, m2.path) = COALESCE(m1.realpath, m1.path) + )Optional safety here:
- pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset + pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset + // extra safety: validate only-local/managed-S3 and existence + _ <- validatePaths(pathsUsedOnlyByThisDataset, datastore)Run to confirm DAOs use COALESCE:
#!/bin/bash rg -n -C2 'findMagPathsUsedOnlyByThisDataset|COALESCE\\(m2\\.realpath, m2\\.path\\)' app/models/dataset/Dataset.scala
525-533: Critical: Deletion guard misses datasets with NULL realpath; use COALESCE in DAO.findDatasetsUsingDataFromDir relies on DAO query filtering m.realpath IS NOT NULL and starts_with(m.realpath,...). Legacy NULL realpaths won’t be seen → risk of deleting shared data.
Suggested DAO fix (app/models/dataset/Dataset.scala):
- WHERE m.realpath IS NOT NULL - AND starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)Optional: add a conservative pre-check to abort if any mags under directory have realpath IS NULL until a backfill/migration makes realpath non-NULL.
Also review attachments symmetry: attachments guard uses a.path only; if attachments ever gain realpath, mirror COALESCE there too.
#!/bin/bash rg -n -C3 'findDatasetsWithMagsInDir|starts_with\\(' app/models/dataset/Dataset.scala rg -n 'COALESCE\\(m\\.realpath, m\\.path\\)' app/models/dataset/Dataset.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
9-22: Harden S3 URI parsing: null-safety, regional endpoints, and key normalization.
- Avoid NPEs on
uri.getHost.- Support regional virtual-hosted endpoints (e.g.,
bucket.s3.us-west-2.amazonaws.com).- Normalize keys to avoid leading “/” for AWS S3.
- Make short-style detection scheme-aware (
s3://only).- Improve non-AWS host detection.
Apply this diff to the existing functions:
@@ - def hostBucketFromUri(uri: URI): Option[String] = { - val host = uri.getHost - if (host == null) { - None - } else if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key - Some(host) - } else if (isVirtualHostedStyle(uri)) { - Some(host.substring(0, host.length - ".s3.amazonaws.com".length)) - } else if (isPathStyle(uri)) { - Some(uri.getPath.substring(1).split("/")(0)) - } else { - None - } - } + def hostBucketFromUri(uri: URI): Option[String] = { + val hostO = hostOpt(uri) + if (isShortStyle(uri)) { + hostO + } else if (isVirtualHostedStyle(uri)) { + hostO.flatMap(bucketFromVirtualHost) + } else if (isPathStyle(uri)) { + val path = Option(uri.getPath).getOrElse("") + path.stripPrefix("/").split("/", 2).headOption.filter(_.nonEmpty) + } else None + } @@ - // https://bucket-name.s3.region-code.amazonaws.com/key-name - private def isVirtualHostedStyle(uri: URI): Boolean = - uri.getHost.endsWith(".s3.amazonaws.com") + // https://bucket-name.s3.amazonaws.com or https://bucket-name.s3.us-west-2.amazonaws.com + private def isVirtualHostedStyle(uri: URI): Boolean = + hostOpt(uri).exists(h => VirtualHostedPattern.pattern.matcher(h).matches) @@ - // https://s3.region-code.amazonaws.com/bucket-name/key-name - private def isPathStyle(uri: URI): Boolean = - uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") || - (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains(".")) + // https://s3.amazonaws.com/bucket/key, https://s3.us-west-2.amazonaws.com/bucket/key + // or non-AWS S3-compatible hosts (MinIO, Ceph, etc.) + private def isPathStyle(uri: URI): Boolean = + hostOpt(uri).exists { h => + h.matches("^s3([.-][\\w\\-_]+)?\\.amazonaws\\.com$") || + (!h.contains("amazonaws.com") && h.contains(".")) + } @@ - // S3://bucket-name/key-name - private def isShortStyle(uri: URI): Boolean = - !uri.getHost.contains(".") + // s3://bucket-name/key-name (short style) + private def isShortStyle(uri: URI): Boolean = + "s3".equalsIgnoreCase(uri.getScheme) && hostOpt(uri).exists(h => !h.contains(".")) @@ - def objectKeyFromUri(uri: URI): Box[String] = - if (isVirtualHostedStyle(uri)) { - Full(uri.getPath) - } else if (isPathStyle(uri)) { - Full(uri.getPath.substring(1).split("/").tail.mkString("/")) - } else if (isShortStyle(uri)) { - Full(uri.getPath.tail) - } else Failure(s"Not a valid s3 uri: $uri") + def objectKeyFromUri(uri: URI): Box[String] = { + val path = Option(uri.getPath).getOrElse("") + if (isVirtualHostedStyle(uri) || isShortStyle(uri)) { + Full(path.stripPrefix("/")) + } else if (isPathStyle(uri)) { + val parts = path.stripPrefix("/").split("/", 2) + if (parts.length == 2) Full(parts(1)) else Failure(s"No object key in $uri") + } else Failure(s"Not a valid s3 uri: $uri") + } @@ - def isNonAmazonHost(uri: URI): Boolean = - (isPathStyle(uri) && !uri.getHost.endsWith(".amazonaws.com")) || uri.getHost == "localhost" + def isNonAmazonHost(uri: URI): Boolean = + hostOpt(uri).exists(h => (isPathStyle(uri) && !h.endsWith(".amazonaws.com")) || h == "localhost")Add these helpers (append within the object):
// e.g. bucket.s3.amazonaws.com or bucket.s3.us-west-2.amazonaws.com private val VirtualHostedPattern = """^(.+?)\.s3([.-][\w-]+)?\.amazonaws\.com$""".r private def hostOpt(uri: URI): Option[String] = Option(uri.getHost).orElse(Option(uri.getAuthority)).map(_.trim).filter(_.nonEmpty) private def bucketFromVirtualHost(host: String): Option[String] = host match { case VirtualHostedPattern(bucket, _) => Some(bucket) case _ => None }This fixes failures on regional endpoints and ensures keys do not carry a leading slash, which is critical for correct List/Delete behavior. Based on learnings.
Also applies to: 27-39, 40-50
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
39-55: Don’t derive bucket/endpoint fromcredentialName; use the matched S3 credential (name/endpoint/region).Assuming
credentialNameis a logical identifier, parsing it as a URI yields wrong bucket and an invalid endpoint (e.g.,https://bucket). Derive from the selectedS3AccessKeyCredentialinstead; use its optionalendpoint/region, and parsenameonly if it is actually a URI.Apply this refactor:
@@ - lazy val s3UploadBucketOpt: Option[String] = - // by convention, the credentialName is the S3 URI so we can extract the bucket from it. - S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName)) + private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] = + dataStoreConfig.Datastore.DataVaults.credentials + .flatMap(new CredentialConfigReader(_).getCredential) + .collectFirst { + case c @ S3AccessKeyCredential(name, _, _, _, _) + if dataStoreConfig.Datastore.S3Upload.credentialName == name => c + } + + private lazy val s3UploadBaseUriOpt: Option[URI] = + s3UploadCredentialOpt.flatMap { c => + // name might be a UPath/URI root like s3://bucket/prefix or https://minio:9000/bucket + tryo(UPath.fromString(c.name).toRemoteUriUnsafe).toOption + .orElse(tryo(new URI(c.name)).toOption) + } + + lazy val s3UploadBucketOpt: Option[String] = + s3UploadBaseUriOpt.flatMap(S3UriUtils.hostBucketFromUri) @@ - private lazy val s3UploadEndpoint: URI = { - // by convention, the credentialName is the S3 URI so we can extract the bucket from it. - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpointOpt: Option[URI] = + s3UploadCredentialOpt.flatMap(_.endpoint.map(URI.create))This prevents constructing invalid endpoints and decouples behavior from a brittle config assumption. Based on learnings.
If `credentialName` is indeed a URI in your deployments, please confirm and we can simplify accordingly. </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (7)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)</summary><blockquote> `41-41`: **Extract hardcoded retry limit as a named constant.** The retry limit `15` is hardcoded. Consider extracting it as a private constant (e.g., `private val MaxMoveDedupRetries = 15`) for clarity and maintainability. ```diff + private val MaxMoveDedupRetries = 15 + private def deleteWithRetry(sourcePath: Path, targetPath: Path, retryCount: Int = 0)( implicit ec: ExecutionContext): Fox[Unit] = - if (retryCount > 15) { + if (retryCount > MaxMoveDedupRetries) { Fox.failure(s"Deleting dataset failed: too many retries.")app/models/dataset/Dataset.scala (2)
892-906: Consider checking bothrealpathandpathfor completenessThe method currently only checks
m.realpathto find datasets with mags in a directory. While this is correct for non-virtual datasets (filesystem paths), you might want to also checkm.pathfor safety, especially if virtual datasets could have paths under the same prefix.If needed, apply this diff to check both fields:
run(q""" SELECT d._id FROM webknossos.dataset_mags m JOIN webknossos.datasets d ON m._dataset = d._id - WHERE m.realpath IS NOT NULL - AND starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE ( + (m.realpath IS NOT NULL AND starts_with(m.realpath, $absolutePathWithTrailingSlash)) + OR starts_with(m.path, $absolutePathWithTrailingSlash) + ) AND d._id != $ignoredDataset AND d._datastore = ${dataStore.name.trim} """.as[ObjectId])
1293-1306: Add NULL check for consistency and safetySimilar to the mags version, this method should filter out NULL paths to avoid potential issues with the string prefix check.
Apply this diff to add a NULL check:
run(q""" SELECT d._id FROM webknossos.dataset_layer_attachments a JOIN webknossos.datasets d ON a._dataset = d._id - WHERE starts_with(a.path, $absolutePathWithTrailingSlash) + WHERE a.path IS NOT NULL + AND starts_with(a.path, $absolutePathWithTrailingSlash) AND d._id != $ignoredDataset AND d._datastore = ${dataStore.name.trim} """.as[ObjectId])webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
1-7: Add tests for core URI forms.Cover:
- s3://bucket/key
- https://bucket.s3.amazonaws.com/key
- https://bucket.s3.us-west-2.amazonaws.com/key
- https://s3.us-west-2.amazonaws.com/bucket/key
- http://minio.local:9000/bucket/key
Do you want me to scaffold a ScalaTest suite for this?
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (3)
79-86: Fail fast on invalid/non‑S3 paths; avoid aborting all buckets due to one bad path.Currently, grouping by
Option[String]may includeNoneand the first failure can abort the whole deletion.- def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = { - val pathsByBucket: Map[Option[String], Seq[UPath]] = paths.groupBy(S3UriUtils.hostBucketFromUpath) - for { - _ <- Fox.serialCombined(pathsByBucket.keys) { bucket: Option[String] => - deleteS3PathsOnBucket(bucket, pathsByBucket(bucket)) - } - } yield () - } + def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = { + val (nonS3, s3Paths) = paths.partition(p => !p.getScheme.contains(PathSchemes.schemeS3)) + if (nonS3.nonEmpty) return Fox.failure(s"Non‑S3 paths passed to ManagedS3Service: ${nonS3.mkString(", ")}") + + val grouped = s3Paths.groupBy(S3UriUtils.hostBucketFromUpath) + grouped.get(None).foreach { invalid => + // surface all invalid paths instead of a generic failure + return Fox.failure(s"Could not determine S3 bucket for: ${invalid.mkString(", ")}") + } + val byBucket: Map[String, Seq[UPath]] = grouped.collect { case (Some(b), ps) => (b, ps) } + Fox.serialCombined(byBucket.keys.toSeq)(b => deleteS3PathsOnBucket(Some(b), byBucket(b))).map(_ => ()) + }This keeps valid buckets progressing even if a caller accidentally passes a bad path and also returns actionable error text.
75-78:transferManagerBoxis unused.Remove or use it (e.g., for future multipart/copy ops) to avoid dead code.
155-158: Speed uppathIsInManagedS3and limit to S3 credentials.Avoid repeatedly parsing all credentials; precompute S3 roots and filter to S3 credentials only.
Example:
private lazy val managedS3Roots: Seq[UPath] = globalCredentials.collect { case S3AccessKeyCredential(name, _, _, _, _) if name.startsWith("s3://") => UPath.fromStringUnsafe(name).normalize } def pathIsInManagedS3(path: UPath): Boolean = path.getScheme.contains(PathSchemes.schemeS3) && managedS3Roots.exists(path.startsWith)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
app/controllers/WKRemoteDataStoreController.scala(2 hunks)app/models/dataset/Dataset.scala(3 hunks)app/models/dataset/DatasetService.scala(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-78)DataVaults(61-63)S3Upload(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-52)hostBucketFromUri(9-22)hostBucketFromUpath(24-28)objectKeyFromUri(40-49)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (8)
UPath(55-97)toRemoteUriUnsafe(122-122)toRemoteUriUnsafe(199-199)getScheme(120-120)getScheme(197-197)fromString(60-60)startsWith(139-143)startsWith(209-213)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
collectFirst(614-644)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
combined(82-93)fromFuture(40-51)successful(51-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectory(116-120)app/models/dataset/WKRemoteDataStoreClient.scala (1)
deleteOnDisk(122-129)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk(413-428)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
successful(51-54)failure(56-60)
app/models/dataset/Dataset.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)app/utils/sql/SqlInterpolation.scala (1)
q(20-39)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
flatten(269-279)toFox(12-12)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
toSingleBox(63-81)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
Full(661-699)
app/models/dataset/DatasetService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)app/models/annotation/Annotation.scala (2)
findOne(289-296)countAllByDataset(644-656)app/models/storage/UsedStorageService.scala (2)
UsedStorageService(33-227)refreshStorageReportForDataset(211-227)app/models/dataset/Dataset.scala (6)
deleteDataset(709-737)findMagPathsUsedOnlyByThisDataset(871-892)findAttachmentPathsUsedOnlyByThisDataset(1278-1293)findDatasetsWithMagsInDir(892-906)findDatasetsWithAttachmentsInDir(1293-1306)findOne(203-210)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(79-86)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deletePaths(134-141)deleteOnDisk(122-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (1)
app/models/dataset/Dataset.scala (1)
10-10: LGTM: Clean import simplificationThe shift from
DataSourceMagInfotoUPathaligns with the path-centric refactoring for dataset deletion.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/controllers/DatasetController.scala(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/controllers/DatasetController.scala (5)
app/telemetry/SlackNotificationService.scala (1)
SlackNotificationService(11-59)util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (2)
log(55-62)logTime(32-48)util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
toString(15-15)Instant(14-45)Instant(47-103)now(48-48)logSince(72-75)app/utils/WkConf.scala (1)
Features(135-155)app/models/dataset/DatasetService.scala (1)
deleteDataset(511-538)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (1)
app/controllers/DatasetController.scala (1)
38-38: Slack notifier wiring verified and correct.Import and DI of SlackNotificationService in DatasetController are confirmed. The service is properly defined, bound as an eager singleton in
WebknossosModule.scala, and consistently used across the application (SqlClient, TaskCreationService, DatasetService, JobController, AnnotationController, and others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
428-436: Surface invalid/ignored paths to clients.If DataSourceService rejects paths (non‑managed S3 or outside baseDir), propagate a 400 with details instead of a generic failure/OK. This avoids silent skips. Consider returning per‑path results.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
439-449: Use ManagedS3Service bucket; centralize URI building.Switch to managedS3Service.s3UploadBucketOpt is correct. Suggest moving endpoint host/URI construction into ManagedS3Service (e.g., a helper returning UPath base) to avoid duplicating URI parsing here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
app/controllers/WKRemoteDataStoreController.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
deleteDatasetFromDB(548-562)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)serialCombined(93-97)serialCombined(97-109)toFox(12-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
isLocal(30-30)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
pathIsInManagedS3(155-159)deletePaths(79-86)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)deleteDirectoryRecursively(184-191)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (4)
validateAccessFromTokenContext(87-94)UserAccessRequest(30-30)UserAccessRequest(31-72)webknossos(70-72)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
dataBaseDir(146-146)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDataSource(190-197)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk(14-36)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deleteOnDisk(122-129)deletePaths(134-141)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(79-86)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson(204-209)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
deletePathsFromDiskOrManagedS3(300-311)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectoryBox(122-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (12)
app/controllers/WKRemoteDataStoreController.scala (3)
6-6: LGTM! Clean import removal.The
Fullimport is no longer needed in this controller since the Box pattern matching has moved toDatasetService.deleteDatasetFromDB.
38-51: Excellent refactoring – proper separation of concerns.Removing
annotationDAOfrom the controller is correct since the annotation-count logic now resides inDatasetService.deleteDatasetFromDB. This improves maintainability by centralizing deletion logic in the service layer.
202-212: Well-architected delegation to service layer.The refactoring correctly moves deletion orchestration to
DatasetService.deleteDatasetFromDB, which now handles dataset lookup, annotation counting, conditional deletion, and storage refresh. The idempotent behavior (returning success for non-existent datasets) aligns with HTTP DELETE semantics.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
27-32: DI addition looks good.
The ManagedS3Service injection integrates cleanly with existing services.
313-314: Tiny nit: helper is fine.
existsOnDisk is straightforward and used correctly in refresh path.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
90-94: baseDirAbsolute endpoint: OK.
Scoped to webknossos context; returning a path string is acceptable.
415-426: deleteOnDisk flow simplification: OK.
Fetching the data source first and delegating to service is cleaner; auth via webknossos context matches WK→DS usage.Confirm no external clients rely on the previous deleteDataset(datasetId) permission here.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
23-24: Imports updated for ManagedS3Service: OK.
100-103: Constructor DI: OK.
ManagedS3Service centralization improves cohesion and config handling.
146-147: PathUtils.ensureDirectoryBox usage: good.
Better error mapping vs direct ensureDirectory.
541-548: S3 TransferManager via service: good.
Clear failure if S3 not configured; checks failedTransfers. LGTM.
582-594: Failure cleanup argument order fix: OK.
deleteOnDisk now receives datasetId first; matches signature.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback :D
Testing results:
Delete non-virtual datasets, should run through if paths are unique.
Other datasets symlinking into this one should block the deletion
Looks good 👍
Data on S3 buckets not listed with global credentials should never be deleted.
Also seemed good
Delete virtual datasets, should run through. Other datasets referencing the data should still have it afterwards.
I am having trouble with setting up a scenario where a s3 mag is shared.
I enabled uploading to s3. Made a successful upload. Used the new s3 path + credentials for explore remote and this also worked. But the dataset is now not rendered 🤔.
The server console says:
backend: 2025-10-29 14:27:27,600 [error] com.scalableminds.webknossos.datastore.services.DSDatasetErrorLoggingService - Error while loading bucket for layer 2-2-1 at BucketPosition(voxelMag1 at (32, 0, 160), bucket at (1,0,5), mag(1, 1, 1), additional coordinates: ), cuboid: Cuboid((32, 0, 160) / (1, 1, 1),32,32,32) for <path removed by me>: Could not read header at zarr.json <~ Failed to read from vault path <~ The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1TR63AYS5G6NZF1Y, Extended Request ID: <truncated by myself>) (SDK Attempt Count: 1) Stack trace: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: <truncated by myself>) (SDK Attempt Count: 1)
The rest of the testing went well
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Outdated
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
Show resolved
Hide resolved
|
Oh and I forgot to mention that not all S3 is cleared: The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
39-55: Remove@tailrecannotation.The
@tailrecannotation cannot be satisfied because the recursive call (line 52) is inside a try-catch block, preventing tail-call optimization. This was already flagged in a previous review.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
300-306: Lock down local deletions to the datastore base directory.Any caller can submit a
file://UPath pointing outside the datastore root, and we will happily recurse-delete that absolute path. From the new DELETE endpoint this lets a remote user wipe arbitrary directories (even/) on the datastore host. We must validate each local path againstdataBaseDir, reject anything else, and only then delete either the directory tree or single files. Also fail fast on non-managed remote paths instead of silently skipping them.Please apply a guard like the following:
def deletePathsFromDiskOrManagedS3(paths: Seq[UPath]): Fox[Unit] = { - val localPaths = paths.filter(_.isLocal).flatMap(_.toLocalPath) - val managedS3Paths = paths.filter(managedS3Service.pathIsInManagedS3) - for { - _ <- Fox.serialCombined(localPaths)(PathUtils.deleteDirectoryRecursively(_).toFox) - _ <- managedS3Service.deletePaths(managedS3Paths) - } yield () + val (localCandidates, remoteCandidates) = paths.partition(_.isLocal) + val managedS3Paths = remoteCandidates.filter(managedS3Service.pathIsInManagedS3) + val disallowedRemote = remoteCandidates.diff(managedS3Paths) + val baseDir = dataBaseDir.toAbsolutePath.normalize + + for { + _ <- Fox.fromBool(disallowedRemote.isEmpty) ?~> + s"Refusing to delete non-managed remote paths: ${disallowedRemote.mkString(", ")}" + validatedLocal <- Fox.serialCombined(localCandidates) { upath => + for { + local <- upath.toLocalPath.toFox + abs = local.toAbsolutePath.normalize + _ <- Fox.fromBool(abs.startsWith(baseDir)) ?~> + s"Refusing to delete outside datastore base directory ($baseDir): $abs" + } yield abs + } + _ <- Fox.serialCombined(validatedLocal) { abs => + if (Files.isDirectory(abs)) PathUtils.deleteDirectoryRecursively(abs).toFox + else tryo(Files.deleteIfExists(abs)).map(_ => ()).toFox + } + _ <- managedS3Service.deletePaths(managedS3Paths) + } yield () }This matches the earlier review feedback but is still unresolved, so please fix before release.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
39-72: Stop interpretingcredentialNameas an S3 URI.
Datastore.S3Upload.credentialNameis configured as a plain credential ID (e.g."managed-s3"). The new code assumes it is an S3 URI, immediately parsing it withnew URI(...)and forcing an endpoint override based on that host. On existing installations this throwsURISyntaxException, sos3ClientBoxbecomes a Failure and all managed S3 deletions stop working. Even if the parse succeeded, reinterpreting the name changes semantics and breaks compatibility.Please derive bucket/endpoint/region from the actual
S3AccessKeyCredentialmetadata (the optionalregionandendpointfields) and only override the client when those optional fields are populated. The field should remain a plain identifier. For example:- private lazy val s3UploadCredentialsOpt: Option[(String, String)] = - dataStoreConfig.Datastore.DataVaults.credentials.flatMap { credentialConfig => - new CredentialConfigReader(credentialConfig).getCredential - }.collectFirst { - case S3AccessKeyCredential(credentialName, accessKeyId, secretAccessKey, _, _) - if dataStoreConfig.Datastore.S3Upload.credentialName == credentialName => - (accessKeyId, secretAccessKey) - } - - lazy val s3UploadBucketOpt: Option[String] = - S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName)) - - private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI("https", null, credentialUri.getHost, -1, null, null, null) - } + private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] = + dataStoreConfig.Datastore.DataVaults.credentials + .flatMap(new CredentialConfigReader(_).getCredential) + .collectFirst { + case cred @ S3AccessKeyCredential(name, _, _, _, _) + if dataStoreConfig.Datastore.S3Upload.credentialName == name => cred + } + + private lazy val s3UploadCredentialsOpt: Option[(String, String)] = + s3UploadCredentialOpt.map(cred => (cred.identifier, cred.secret)) + + private lazy val s3EndpointOverrideOpt: Option[URI] = + s3UploadCredentialOpt.flatMap(_.endpoint).flatMap(ep => tryo(new URI(ep)).toOption) + + private lazy val s3Region: Region = + s3UploadCredentialOpt.flatMap(_.region).map(Region.of).getOrElse(Region.US_EAST_1)And in the builder only call
endpointOverride/forcePathStylewhens3EndpointOverrideOptis defined:- .crossRegionAccessEnabled(true) - .forcePathStyle(true) - .endpointOverride(s3UploadEndpoint) - .region(Region.US_EAST_1) + .crossRegionAccessEnabled(true) + .region(s3Region) + s3EndpointOverrideOpt.foreach { endpoint => + builder.endpointOverride(endpoint) + builder.forcePathStyle(true) + }This keeps the existing configuration contract intact and avoids runtime failures.
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
47-47: Consider usingsinterpolator instead off.The
finterpolator is intended for formatted strings with format specifiers (like%d,%s). Without format specifiers,sis more appropriate and avoids unnecessary overhead.Apply this diff:
- if (retryCount == 0) targetPath else targetPath.resolveSibling(f"${targetPath.getFileName} ($retryCount)") + if (retryCount == 0) targetPath else targetPath.resolveSibling(s"${targetPath.getFileName} ($retryCount)")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-01-27T12:06:42.865Z
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
🧬 Code graph analysis (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectory(116-120)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
Full(661-699)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk(413-428)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
successful(51-54)failure(56-60)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)serialCombined(93-97)serialCombined(97-109)toFox(12-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (3)
isLocal(29-29)toLocalPath(111-111)toLocalPath(176-176)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
pathIsInManagedS3(153-157)deletePaths(79-86)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)deleteDirectoryRecursively(184-191)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
dataBaseDir(146-146)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-75)DataVaults(58-60)S3Upload(61-65)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-52)hostBucketFromUri(9-22)hostBucketFromUpath(24-28)objectKeyFromUri(40-49)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (6)
UPath(54-96)toRemoteUriUnsafe(121-121)toRemoteUriUnsafe(188-188)fromString(59-59)startsWith(138-142)startsWith(198-205)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
toFox(12-12)combined(82-93)fromFuture(40-51)successful(51-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)
2-11: LGTM! Clean refactor of trait dependencies.The trait signature and imports have been simplified appropriately. Removing
DataSourceToDiskWriterand addingPathUtilsaligns well with the new trash-based deletion approach.
24-36: LGTM! Idempotent deletion with clear logging.The existence check and error handling are correct. Returning
Fox.successful(())when the path doesn't exist makes the deletion idempotent, which is the right behavior. The logging clearly distinguishes between actual deletion and no-op cases.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
I would have expected that cumsum.json would not have been uploaded to s3 in the first place, as it was not referenced in the datasource-properties.json 🤔 But yes, I think it’s acceptable for the moment that it is not deleted. It is also not very large. |
Ok than no need for me to do this again as this takes some time? |
|
Oh I see, the error states
This means that the file does not exist, which is alway what the error says:
Therefore, I think the root cause for this is that the exploration thinks it found a valid layer, but this was actually a mag. Thus, the datasource stored in the db is buggy and thus loading the dataset does not work properly. |
|
But this exploration bug is very likely not related to this pr. So I'd say we can make this green as I did some little retesting showing that it still works as well as your latest changes looking good? |
|
Yes, I agree :) Also, no need to re-test the linked mags in virtual datasets, I did that today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as this bug is unrelated, I opened #9034.
And as the code is good and doing what it should be doing 🟢


Steps to test:
TODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)