Skip to content

Conversation

@frcroth
Copy link
Contributor

@frcroth frcroth commented Sep 15, 2025

  • Enables dataset deletion for virtual datasets with data stored on our managed S3.
  • For non-virtual datasets, only the directory in the binaryData directory is deleted.
  • If other datasets have realpaths in that directory, deletion is blocked. (Previously there was a move-and-resymlink logic, which is now removed).

Steps to test:

  • Delete virtual datasets, should run through. Other datasets referencing the data should still have it afterwards.
  • Delete non-virtual datasets, should run through if paths are unique.
    • Other datasets symlinking into this one should block the deletion
  • Enable S3 upload with credentials, datasets should be deletable afterwards.
  • Data on S3 buckets not listed with global credentials should never be deleted.

TODOs:

  • on datastore side, delete local and managedS3 paths
  • delete virtual datasets (paths used only by them)
  • delete non-virtual datasets (unless any other dataset’s paths point in there)
  • Allow manually deleting “Not yet fully uploaded” datasets (be it via publish or via upload)
  • Re-test delete as cleanup on failed dataset uploads
  • Re-test delete with S3 paths
  • cleanup

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

📝 Walkthrough

Walkthrough

Centralizes dataset deletion into a new DatasetService.deleteDataset flow that deletes data from disk and managed S3, adds ManagedS3Service and S3 utilities, refactors path-centric DAOs/clients/routes (including DELETE-with-JSON), updates controllers/routes/upload and datastore services, and tightens UPath behavior and tests.

Changes

Cohort / File(s) Summary
Dataset controller & routes
app/controllers/DatasetController.scala, conf/webknossos.latest.routes, frontend/javascripts/admin/rest_api.ts
Renamed deleteOnDiskdelete and route /datasets/:datasetId; added finishUploadToPaths; frontend DELETE now targets /api/datasets/${datasetId}; controller adds logging, Slack instrumentation and calls datasetService.deleteDataset.
Dataset service & DAOs
app/models/dataset/DatasetService.scala, app/models/dataset/Dataset.scala
Replaced deleteVirtualOrDiskDataset with deleteDataset; added dependencies (DatasetLayerAttachmentsDAO, AnnotationDAO, UsedStorageService); added path-based DAO methods and helpers findDatasetsUsingDataFromDir / deleteDatasetFromDB; updated mags/attachments handling.
WK remote datastore client & controller
app/controllers/WKRemoteDataStoreController.scala, app/models/dataset/WKRemoteDataStoreClient.scala, webknossos-datastore/conf/datastore.latest.routes, webknossos-datastore/app/.../controllers/DataSourceController.scala
Removed getPaths and AnnotationDAO injection; added client getBaseDirAbsolute and DELETE-with-JSON support; added controller endpoints baseDirAbsolute and deletePaths; controller delegates path deletion to DataSourceService.deletePathsFromDiskOrManagedS3 and simplifies delete-on-disk flow.
Managed S3 & S3 utilities
webknossos-datastore/app/.../services/ManagedS3Service.scala, webknossos-datastore/app/.../helpers/S3UriUtils.scala, webknossos-datastore/app/.../datavault/S3DataVault.scala
Introduced ManagedS3Service for grouped/batched deletions and managed-S3 checks; added S3UriUtils for S3 URI parsing; refactored S3DataVault internals to use S3UriUtils.
Datastore RPC & clients
webknossos-datastore/app/.../rpc/RPCRequest.scala, webknossos-datastore/app/.../services/DSRemoteWebknossosClient.scala
Added RPCRequest.deleteJson[T] (DELETE with JSON body); removed fetchPaths from DSRemoteWebknossosClient.
Dataset deletion helper & UPath changes
webknossos-datastore/app/.../helpers/DatasetDeleter.scala, webknossos-datastore/app/.../helpers/UPath.scala
Simplified DatasetDeleter trait (removed mixin), changed deleteOnDisk signature to take datasetId first and implement move-to-trash with retry; narrowed RemoteUPath.parent type and made startsWith compare path semantics.
DataSource service & upload
webknossos-datastore/app/.../services/DataSourceService.scala, webknossos-datastore/app/.../services/uploading/UploadService.scala, webknossos-datastore/app/.../services/DataSourceToDiskWriter.scala
DataSourceService gains managedS3Service, deletePathsFromDiskOrManagedS3, and existsOnDisk; UploadService now uses ManagedS3Service and removes legacy S3 client logic; DataSourceToDiskWriter no longer mixes in PathUtils.
Storage service & utilities
app/models/storage/UsedStorageService.scala, util/src/main/scala/com/scalableminds/util/io/PathUtils.scala, util/src/main/scala/com/scalableminds/util/tools/Fox.scala
UsedStorageService removed DatasetService dependency and now resolves datastores via DAO; PathUtils converted from trait to object; removed redundant Fox imports.
Datastore DI & routes
webknossos-datastore/app/.../DataStoreModule.scala, webknossos-datastore/conf/datastore.latest.routes
Added eager singleton binding for ManagedS3Service; added GET baseDirAbsolute and DELETE deletePaths routes; removed previous invalidateCache route.
Tests & release notes
test/backend/UPathTestSuite.scala, unreleased_changes/8924.md, frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts
Strengthened UPath.startsWith tests; added release note about managed-S3 deletions and symlink-blocking behavior; removed dataset paths E2E test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Extra attention:
    • DatasetService.deleteDataset / deleteDatasetFromDB: ordering of datastore path deletion vs DB deletion, shared-directory detection, and annotation handling.
    • ManagedS3Service: credential parsing, S3 client/transfer manager lifecycle, pagination and batched delete correctness.
    • DAO SQL/query changes (DatasetMagsDAO / DatasetLayerAttachmentsDAO) for correct path discovery and deduplication.
    • DatasetDeleter.deleteOnDisk retry semantics and interaction with PathUtils move-to-trash.
    • RPCRequest.deleteJson and clients/controllers relying on DELETE-with-body.
    • DI wiring for new constructor params and eager binding.

Possibly related PRs

Suggested reviewers

  • MichaelBuessemeyer

Poem

🐰 I hopped through buckets, keys, and trash,
Nudged tangled symlinks into a stash.
With retries, logs, and a nimble paw,
I cleared disk and S3 without a flaw.
A carrot cheer for tidy data law!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Delete datasets on S3" directly and clearly describes the main change in this pull request. The changeset introduces infrastructure to enable dataset deletion from managed S3 storage, which is precisely what the title conveys. The title is concise, specific, and avoids vague terminology, making it immediately clear to a developer scanning the history what the primary objective is.
Linked Issues Check ✅ Passed The code changes address the primary objective from issue #8803: enabling the "Delete on disk" action to also remove data from the project's managed S3 storage. Key implementations include the new ManagedS3Service with pathIsInManagedS3 validation to restrict deletion to managed S3 only, updated DatasetService.deleteDataset orchestrating virtual dataset deletion, DataSourceService.deletePathsFromDiskOrManagedS3 partitioning paths between local and S3 deletions, and new DAO methods (findMagPathsUsedOnlyByThisDataset, findDatasetsWithMagsInDir, etc.) to support conflict checking for non-virtual datasets. The route changes and endpoint additions properly expose the new functionality.
Out of Scope Changes Check ✅ Passed The changes are cohesive and support the PR objective of enabling S3 deletion with proper safeguards. Key in-scope modifications include new S3 utilities (S3UriUtils), ManagedS3Service for managed S3 interaction, updated DatasetDeleter to work with the new deletion approach, DAO methods for conflict detection, and endpoint changes. The removal of legacy getPaths endpoints and move-and-resymlink logic aligns with the intentional refactoring described in the PR. Minor refactorings such as PathUtils trait-to-object conversion appear necessary for integration with the new deletion infrastructure. All changes directly support or enable the S3 deletion feature.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch s3-delete

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@frcroth frcroth assigned frcroth and unassigned frcroth Sep 17, 2025
@fm3 fm3 self-assigned this Sep 23, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

86-97: Consider parallelizing prefix deletion.

Line 92 uses Fox.serialCombined to process deletions sequentially. For datasets with many prefixes, this could be slow. Consider whether parallel processing (e.g., Fox.combined) would be more appropriate, especially since S3 operations can benefit from concurrency.

frontend/javascripts/admin/rest_api.ts (1)

1381-1385: LGTM! Endpoint updated to match backend route change.

The URL change from /api/datasets/${datasetId}/deleteOnDisk to /api/datasets/${datasetId} correctly aligns with the backend controller refactoring where deleteOnDisk was renamed to delete.

Consider renaming the function from deleteDatasetOnDisk to deleteDataset in a follow-up to match the backend's naming and better reflect that it now handles both on-disk and S3 deletions.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)

304-315: Consider error handling for partial failures.

The method correctly splits paths into local and managed S3, then processes each group. However, Fox.serialCombined will fail-fast on the first error for local paths, and if local deletions succeed but S3 deletions fail (or vice versa), the operation will be partially complete.

For idempotent cleanup operations this may be acceptable, but consider whether you need:

  • Parallel local path deletion for performance
  • All-or-nothing semantics with rollback
  • Aggregated error reporting showing all failures

If the current fail-fast behavior is intended, you might want to document this in a comment or method documentation.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

438-448: Endpoint host derivation may drop port and assumes credentialName is a URI.

Using new URI(...).getHost loses ports (e.g., MinIO on :9000) and returns null if credentialName isn’t a URI literal.

Consider:

  • Parse authority instead of host to retain host:port.
  • Guard nulls and fail fast with a clear error.
    Example:
-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
-            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
+            val endpointUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
+            val authority   = Option(endpointUri.getAuthority).getOrElse(endpointUri.getHost)
+            newBasePath <- UPath.fromString(s"s3://$authority/$s3UploadBucket/$s3ObjectKey").toFox

Confirm Datastore.S3Upload.credentialName is always a valid URI and whether non-default ports must be preserved in UPath for your downstream S3 handlers.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

428-435: Consider logging and basic input guards for deletePaths.

Add a brief info log (count and first path) and short-circuit empty lists to reduce no-op calls and aid auditing.

   def deletePaths(): Action[Seq[UPath]] =
     Action.async(validateJson[Seq[UPath]]) { implicit request =>
       accessTokenService.validateAccessFromTokenContext(UserAccessRequest.webknossos) {
         for {
-          _ <- dataSourceService.deletePathsFromDiskOrManagedS3(request.body)
+          _ <- if (request.body.isEmpty) Fox.successful(()) else {
+                 logger.info(s"Deleting ${request.body.size} path(s), first=${request.body.headOption.getOrElse("")}")
+                 dataSourceService.deletePathsFromDiskOrManagedS3(request.body)
+               }
         } yield Ok
       }
     }
app/models/dataset/Dataset.scala (3)

918-931: Robustness: prefer COALESCE(realPath, path) for prefix search to handle missing realPath.

If realPath is NULL, rows are skipped even if path matches.

-        WHERE starts_with(m.realpath, $absolutePathWithTrailingSlash)
+        WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)

Ensure your Postgres version supports starts_with(text, text). If not, replace with path LIKE $prefix || '%' or a range predicate.


1303-1317: Attachments-only paths query: consider excluding pending uploads.

If pending attachments exist, their paths may not be physically present yet.

Add AND NOT uploadToPathIsPending to mirror other reads, unless you deliberately want to delete pending targets too.


1273-1302: Performance: add indexes for path-prefix lookups.

Prefix scans on dataset_mags.realPath/path and dataset_layer_attachments.path will benefit from indexes.

  • Create btree indexes with text_pattern_ops or consider pg_trgm for broader LIKE usage:
    • CREATE INDEX ON webknossos.dataset_mags (realPath text_pattern_ops);
    • CREATE INDEX ON webknossos.dataset_layer_attachments (path text_pattern_ops);
      Evaluate with EXPLAIN on your typical prefixes.

Also applies to: 1318-1331

app/models/dataset/DatasetService.scala (1)

537-544: Skip remote call when there’s nothing to delete.

Avoid a no-op network roundtrip for virtual datasets when no unique paths exist.

-          _ <- datastoreClient.deletePaths(pathsUsedOnlyByThisDataset)
+          _ <- Fox.runIf(pathsUsedOnlyByThisDataset.nonEmpty) {
+            datastoreClient.deletePaths(pathsUsedOnlyByThisDataset)
+          }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e96e381 and 376ef47.

📒 Files selected for processing (24)
  • app/controllers/DatasetController.scala (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/Dataset.scala (3 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)
  • app/models/storage/UsedStorageService.scala (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • frontend/javascripts/admin/rest_api.ts (1 hunks)
  • test/backend/UPathTestSuite.scala (1 hunks)
  • unreleased_changes/8924.md (1 hunks)
  • util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1 hunks)
  • util/src/main/scala/com/scalableminds/util/tools/Fox.scala (0 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6 hunks)
  • webknossos-datastore/conf/datastore.latest.routes (1 hunks)
💤 Files with no reviewable changes (1)
  • util/src/main/scala/com/scalableminds/util/tools/Fox.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (14)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • runIf (167-176)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectoryBox (122-130)
app/models/storage/UsedStorageService.scala (1)
app/models/dataset/DataStore.scala (1)
  • findOneByName (118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • Full (661-699)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
  • Fox (28-228)
  • Fox (230-303)
  • serialCombined (93-97)
  • serialCombined (97-109)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • isLocal (30-30)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
  • pathIsInManagedS3 (153-157)
  • deletePaths (77-84)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • deleteDirectoryRecursively (184-191)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deletePaths (428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
  • S3UriUtils (7-50)
  • hostBucketFromUri (9-20)
  • objectKeyFromUri (38-47)
  • isNonAmazonHost (47-50)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala (1)
  • toRemoteUriUnsafe (75-78)
test/backend/UPathTestSuite.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (4)
  • UPath (55-97)
  • fromStringUnsafe (62-79)
  • startsWith (139-143)
  • startsWith (209-213)
app/models/dataset/WKRemoteDataStoreClient.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (14)
  • addQueryParam (30-33)
  • addQueryParam (33-36)
  • addQueryParam (36-39)
  • addQueryParam (39-42)
  • addQueryParam (42-45)
  • addQueryParam (45-50)
  • addQueryParam (50-53)
  • addQueryParam (53-56)
  • addQueryParam (56-59)
  • addQueryParam (59-63)
  • addQueryParam (63-68)
  • addQueryParam (68-71)
  • getWithJsonResponse (132-135)
  • deleteJson (238-242)
app/controllers/UserTokenController.scala (1)
  • RpcTokenHolder (31-39)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (77-84)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deletePaths (428-437)
app/models/dataset/DatasetService.scala (9)
app/models/annotation/Annotation.scala (2)
  • findOne (289-296)
  • countAllByDataset (644-656)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (33-227)
  • refreshStorageReportForDataset (211-227)
app/models/dataset/Dataset.scala (6)
  • deleteDataset (709-737)
  • findPathsUsedOnlyByThisDataset (899-918)
  • findPathsUsedOnlyByThisDataset (1303-1318)
  • findDatasetsWithMagsInDir (918-931)
  • findDatasetsWithAttachmentsInDir (1318-1331)
  • findOne (203-210)
app/controllers/WKRemoteDataStoreController.scala (1)
  • deleteDataset (208-215)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (77-84)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • deletePaths (134-141)
  • deleteOnDisk (122-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (8)
  • toFox (12-12)
  • Fox (28-228)
  • Fox (230-303)
  • s (234-238)
  • s (238-248)
  • s (248-257)
  • shiftBox (310-310)
  • successful (51-54)
app/controllers/DatasetController.scala (5)
util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)
  • log (55-62)
app/security/UserAwareRequestLogging.scala (1)
  • log (13-21)
app/models/user/User.scala (2)
  • isAdminOf (56-59)
  • isAdminOf (59-62)
app/utils/WkConf.scala (1)
  • Features (135-155)
app/models/dataset/DatasetService.scala (1)
  • deleteDataset (533-559)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-78)
  • DataVaults (61-63)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
  • PathSchemes (3-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
  • S3UriUtils (7-50)
  • hostBucketFromUri (9-20)
  • hostBucketFromUpath (22-26)
  • objectKeyFromUri (38-47)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (8)
  • UPath (55-97)
  • toRemoteUriUnsafe (122-122)
  • toRemoteUriUnsafe (199-199)
  • getScheme (120-120)
  • getScheme (197-197)
  • fromString (60-60)
  • startsWith (139-143)
  • startsWith (209-213)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectory (116-120)
app/models/dataset/WKRemoteDataStoreClient.scala (1)
  • deleteOnDisk (122-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deleteOnDisk (413-428)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • successful (51-54)
  • failure (56-60)
app/models/dataset/Dataset.scala (3)
app/utils/sql/SqlInterpolation.scala (1)
  • q (20-39)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • toSingleBox (63-81)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • DSRemoteWebknossosClient (57-216)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • ManagedS3Service (28-157)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectoryBox (122-130)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (14-36)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
  • deleteDatasetFromDB (569-584)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (1)

238-242: LGTM! Well-structured addition.

The new deleteJson method follows the established pattern of other JSON-based HTTP methods in this file (postJson, putJson, patchJson), ensuring consistency and maintainability.

app/models/dataset/WKRemoteDataStoreClient.scala (2)

129-132: Verify caching behavior is intended.

The lazy val means the base directory path will be cached for the lifetime of this WKRemoteDataStoreClient instance. If the datastore's base directory configuration can change at runtime, this could return stale data. Consider whether a def would be more appropriate, or if the caching is intentional (e.g., base directory is truly static).


134-139: LGTM! Proper integration with the new deleteJson method.

The deletePaths method correctly uses the new deleteJson RPC method to send the paths as a JSON payload, maintaining consistency with the overall deletion workflow.

app/models/storage/UsedStorageService.scala (1)

214-214: LGTM! Dependency refactoring improves modularity.

The change from datasetService.dataStoreFor(dataset) to dataStoreDAO.findOneByName(dataset._dataStore.trim) successfully decouples UsedStorageService from DatasetService, improving the service architecture. The .trim() provides defensive handling of potential whitespace.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (3)

77-84: LGTM! Well-structured deletion orchestration.

The deletePaths method properly groups paths by bucket before delegating to deleteS3PathsOnBucket, ensuring efficient batch processing.


122-143: LGTM! Proper pagination handling.

The recursive listKeysAtPrefix correctly handles S3's pagination (max 1000 keys per request) with continuation tokens, ensuring all objects under a prefix are discovered before deletion.


153-155: Credential name format is URI-based, not simple identifiers.

The configuration documentation confirms credential.name is an S3 URI path (e.g., s3://bucket/prefix), not a simple identifier. The comment states: "The credentials are selected by uri prefix, so different s3 uri styles may need duplicated credential entries." The startsWith logic correctly matches paths against credential name prefixes to identify managed S3 resources.

Likely an incorrect or invalid review comment.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)

183-195: LGTM! Type refinement and new parents accessor improve API clarity.

Narrowing the parent return type to RemoteUPath (line 183) and introducing the parents method (lines 187-195) provide better type safety and a clear way to access the path hierarchy.


209-213: LGTM! Semantic path comparison improvement.

The updated startsWith implementation now correctly checks for parent relationships via parents.contains rather than using string prefix matching. This prevents false positives (e.g., "s3://bucket/pathSomewhereElse" incorrectly matching "s3://bucket/path").

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)

41-41: LGTM! Proper DI configuration for the new service.

The eager singleton binding of ManagedS3Service ensures it's available for injection into dependent services (like UploadService and DataSourceService) and is instantiated at startup, which is appropriate for a service managing S3 connections.

unreleased_changes/8924.md (1)

1-5: LGTM! Clear and accurate changelog entries.

The changelog appropriately documents both the new S3 deletion capability and the safety improvement for preventing deletion of shared dataset directories.

test/backend/UPathTestSuite.scala (1)

132-145: LGTM! Valuable test coverage for path component comparison.

These test additions correctly verify that startsWith compares actual path components rather than string prefixes, preventing false positives from substring matches.

util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (1)

14-14: LGTM! Appropriate conversion from trait to object.

Converting PathUtils from a trait to an object is a sensible refactor that makes it a proper utility class. The related changes in DataSourceToDiskWriter.scala show the call sites have been updated accordingly.

app/controllers/WKRemoteDataStoreController.scala (1)

212-212: LGTM! Improved separation of concerns.

Delegating dataset deletion logic to datasetService.deleteDatasetFromDB is a good refactor that keeps the controller thin and moves business logic to the service layer.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceToDiskWriter.scala (2)

15-15: LGTM! Correct update for PathUtils object usage.

Removing PathUtils from the trait mixins aligns with its conversion to an object in the related changes.


30-30: LGTM! Correct object method invocation.

The call site correctly uses PathUtils.ensureDirectoryBox as an object method.

webknossos-datastore/conf/datastore.latest.routes (2)

107-107: LGTM! New endpoint for base directory retrieval.

The new baseDirAbsolute endpoint provides the base directory path, likely needed for coordinating dataset deletion operations.


114-114: LGTM! New endpoint for batch path deletion.

The new deletePaths endpoint enables batch deletion of dataset paths, supporting both local and S3 storage deletion as described in the PR objectives.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

28-28: LGTM! New dependency for managed S3 operations.

The new managedS3Service dependency enables S3 path deletion as described in the PR objectives.


317-318: LGTM! Straightforward existence check.

The existsOnDisk method correctly checks for dataset directory existence.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (6)

11-12: LGTM! Updated imports for centralized S3 utilities.

The imports correctly reference the new S3UriUtils for centralized S3 URI parsing.


52-55: LGTM! Correct usage of centralized utility.

Using S3UriUtils.hostBucketFromUri eliminates code duplication and centralizes S3 URI parsing logic.


112-120: LGTM! Consistent utility usage.

The refactored code correctly uses S3UriUtils.objectKeyFromUri throughout the method.


124-128: LGTM! Consistent refactoring.

Correctly uses S3UriUtils.objectKeyFromUri in the listDirectory method.


162-167: LGTM! Consistent refactoring.

Correctly uses S3UriUtils.objectKeyFromUri in the getUsedStorageBytes method.


227-236: LGTM! Consistent refactoring.

Correctly uses S3UriUtils.isNonAmazonHost to determine endpoint configuration.

app/controllers/DatasetController.scala (1)

584-597: Deletion endpoint looks good; verify downstream effects.

Auth gating (feature flag, isEditableBy, org admin) and logging are solid. Ensure datasetService.deleteDataset also:

  • Deletes managed S3/local paths where applicable.
  • Refreshes any storage/accounting state post-delete.

If not already covered, add an integration test that deletes a virtual dataset with managed S3 paths and asserts physical deletion plus DB removal.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

360-363: Good: safer directory creation.

Switch to PathUtils.ensureDirectoryBox with failure mapping is correct for permission errors.


532-546: Good: centralize S3 transfers via ManagedS3Service and fail on partial uploads.

Using transferManagerBox and checking failedTransfers aligns with the new S3 abstraction.


579-592: Good: deleteOnDisk now carries datasetId into DatasetDeleter.

Passing datasetId improves auditability and parity with remote delete flows.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

90-94: Confirm exposure of absolute base dir is acceptable.

Endpoint is gated by UserAccessRequest.webknossos. If that’s strictly server-to-server, fine; otherwise consider returning a tokenized ID instead.


413-426: Simplified deleteOnDisk flow looks correct.

Switch to dsRemoteWebknossosClient.getDataSource and single deleteOnDisk call is cleaner and consistent with new semantics.

app/models/dataset/Dataset.scala (1)

819-821: LGTM: seed realPath alongside path.

Storing realPath initially equal to path simplifies later updates and lookups.

app/models/dataset/DatasetService.scala (1)

533-558: Verify bucket allowlist enforcement in ManagedS3Service.deletePaths – potential security gap detected.

The datastore-side deletePaths method in ManagedS3Service (line 77–97) does not explicitly validate that deletions target only managed/configured S3 buckets. The code extracts bucket names from UPaths and deletes without cross-checking against an allowlist:

val pathsByBucket: Map[Option[String], Seq[UPath]] = paths.groupBy(S3UriUtils.hostBucketFromUpath)
for {
  _ <- Fox.serialCombined(pathsByBucket.keys) { bucket: Option[String] =>
    deleteS3PathsOnBucket(bucket, pathsByBucket(bucket))
  }
}

This will attempt deletion from any S3 bucket in the paths provided, relying only on S3 client credentials availability. While the webknossos app-layer filters (line 305–306 in DataSourceService.deletePathsFromDiskOrManagedS3), the datastore should enforce its own bucket allowlist to prevent accidental/malicious deletion from external buckets.

Required fixes:

  1. Add explicit bucket validation in deleteS3PathsOnBucket to only permit s3UploadBucketOpt (or configured managed buckets)
  2. Add integration test verifying deletion is rejected for non-managed S3 URIs
  3. Document the bucket restriction behavior

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
app/models/dataset/Dataset.scala (2)

871-889: Fix NULL read and UPath parsing failures in mags path query.

The query selects m1.path which can be NULL according to the database schema, but reads it as String and then parses to UPath, causing runtime failures.

Apply this diff to handle NULL values properly:

 def findPathsUsedOnlyByThisDataset(datasetId: ObjectId): Fox[Seq[UPath]] =
   for {
     pathsStr <- run(q"""
-         SELECT m1.path FROM webknossos.dataset_mags m1
+         SELECT COALESCE(m1.realPath, m1.path) AS path
+         FROM webknossos.dataset_mags m1
          WHERE m1._dataset = $datasetId
          AND NOT EXISTS (
             SELECT m2.path
             FROM webknossos.dataset_mags m2
             WHERE m2._dataset != $datasetId
             AND (
               m2.path = m1.path
               OR
               m2.realpath = m1.realpath
             )
          )
-             """.as[String])
-     paths <- pathsStr.map(UPath.fromString).toList.toSingleBox("Invalid UPath").toFox
+             """.as[Option[String]])
+     paths <- pathsStr.flatten
+       .filter(_.nonEmpty)
+       .map(UPath.fromString)
+       .toList
+       .toSingleBox("Invalid UPath")
+       .toFox
   } yield paths

890-903: Address NULL realpath vulnerability in deletion guard query.

The deletion guard uses starts_with(m.realpath, ...) to find shared data, but realpath is nullable in the database schema. Legacy mags with NULL realpath will be missed, risking accidental removal of shared data.

Apply this diff to use COALESCE for fallback:

 def findDatasetsWithMagsInDir(absolutePath: UPath,
                                 dataStore: DataStore,
                                 ignoredDataset: ObjectId): Fox[Seq[ObjectId]] = {
   // ensure trailing slash on absolutePath to avoid string prefix false positives
   val absolutePathWithTrailingSlash =
     if (absolutePath.toString.endsWith("/")) absolutePath.toString else absolutePath.toString + "/"
   run(q"""
       SELECT d._id FROM webknossos.dataset_mags m
       JOIN webknossos.datasets d ON m._dataset = d._id
-      WHERE starts_with(m.realpath, $absolutePathWithTrailingSlash)
+      WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)
       AND d._id != $ignoredDataset
       AND d._datastore = ${dataStore.name.trim}
      """.as[ObjectId])
 }
app/models/dataset/DatasetService.scala (1)

547-560: Critical: DB deletion not sequenced; failures ignored.

Using _ = at lines 554-556 does not sequence the Fox operations into the for-comprehension. Errors won't propagate and the method may return before deletion completes.

Apply this diff to properly sequence the operations:

 def deleteDatasetFromDB(datasetId: ObjectId): Fox[Unit] =
   for {
     existingDatasetBox <- datasetDAO.findOne(datasetId)(GlobalAccessContext).shiftBox
     _ <- existingDatasetBox match {
       case Full(dataset) =>
         for {
           annotationCount <- annotationDAO.countAllByDataset(dataset._id)(GlobalAccessContext)
-          _ = datasetDAO
-            .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-            .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+          _ <- datasetDAO.deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+          _ <- usedStorageService.refreshStorageReportForDataset(dataset)
         } yield ()
       case _ => Fox.successful(())
     }
   } yield ()
🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

1275-1289: Consider NULL-safe handling for attachment paths.

While attachment paths appear to be non-nullable in the schema, using Option[String] for database reads provides defensive coding against schema changes or unexpected NULL values.

Apply this diff for safer NULL handling:

 def findPathsUsedOnlyByThisDataset(datasetId: ObjectId): Fox[Seq[UPath]] =
   for {
     pathsStr <- run(q"""
          SELECT a1.path FROM webknossos.dataset_layer_attachments a1
          WHERE a1._dataset = $datasetId
          AND NOT EXISTS (
             SELECT a2.path
             FROM webknossos.dataset_layer_attachments a2
             WHERE a2._dataset != $datasetId
             AND a2.path = a1.path
          )
-             """.as[String])
-     paths <- pathsStr.map(UPath.fromString).toList.toSingleBox("Invalid UPath").toFox
+             """.as[Option[String]])
+     paths <- pathsStr.flatten
+       .filter(_.nonEmpty)
+       .map(UPath.fromString)
+       .toList
+       .toSingleBox("Invalid UPath")
+       .toFox
   } yield paths
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 376ef47 and 768402e.

📒 Files selected for processing (4)
  • app/models/dataset/Dataset.scala (3 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (0 hunks)
💤 Files with no reviewable changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (2)
app/models/dataset/Dataset.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • toFox (12-12)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • toSingleBox (63-81)
app/models/dataset/DatasetService.scala (5)
app/models/annotation/Annotation.scala (2)
  • findOne (289-296)
  • countAllByDataset (644-656)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (33-227)
  • refreshStorageReportForDataset (211-227)
app/models/dataset/Dataset.scala (6)
  • deleteDataset (709-737)
  • findPathsUsedOnlyByThisDataset (871-890)
  • findPathsUsedOnlyByThisDataset (1275-1290)
  • findDatasetsWithMagsInDir (890-903)
  • findDatasetsWithAttachmentsInDir (1290-1303)
  • findOne (203-210)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (77-84)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (4)
conf/webknossos.latest.routes (2)

110-110: LGTM! New endpoint follows established patterns.

The new finishUploadToPaths endpoint follows the established reserve/finish pattern (similar to lines 107-109) and has consistent naming conventions. This addition supports the multi-path upload functionality mentioned in the PR objectives.


106-106: API change properly implemented and verified.

The breaking API change has been handled correctly:

  • The old route DELETE /datasets/:datasetId/deleteOnDisk has been completely removed from webknossos routes
  • The new route DELETE /datasets/:datasetId is in place and maps to the updated DatasetController.delete() method
  • The frontend API client in rest_api.ts has been updated to call the new endpoint
  • All frontend components accessing this functionality use the updated API client
  • Backend implementation properly supports both virtual datasets (using deletePaths) and non-virtual datasets (using deleteOnDisk)

No additional client updates are needed within the codebase. The change is complete and consistent.

app/models/dataset/Dataset.scala (1)

1290-1303: LGTM! Proper deletion guard for attachments.

The trailing slash logic correctly prevents false positives from string prefix matching, and the query properly joins with datasets to filter by datastore.

app/models/dataset/DatasetService.scala (1)

537-545: LGTM with dependency on DAO fixes.

The method correctly combines mags and attachments queries and deduplicates the results. However, its correctness depends on the underlying DAO queries properly handling NULL realpath values (see comments on Dataset.scala).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

439-449: S3 bucket/endpoint derivation: add explicit error mapping and fix null-safe host extraction.

Two valid concerns, but the suggested diff is syntactically broken:

  1. Missing error mapping on s3UploadBucketOpt.toFox yields a generic failure message. Add explicit context: ?~> "No managed S3 upload bucket configured (Datastore.S3Upload)."

  2. URI.getHost() can return null and will fail on methods like endsWith() or in string interpolation. Use Option(credUri.getHost).getOrElse(credUri.getAuthority) instead.

  3. The proposed diff is incorrect: the .flatMap(identity) at the end is a type error. Instead, wrap the entire S3 path construction in the existing for-yield block without attempting to flatten outside:

-            s3UploadBucket <- managedS3Service.s3UploadBucketOpt.toFox
+            s3UploadBucket <- managedS3Service.s3UploadBucketOpt.toFox ?~> "No managed S3 upload bucket configured (Datastore.S3Upload)."
@@
-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
-            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
+            credUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
+            endPointHost = Option(credUri.getHost).getOrElse(credUri.getAuthority)
+            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
♻️ Duplicate comments (2)
app/models/dataset/DatasetService.scala (2)

511-535: Critical: NULL realpath handling remains unfixed.

The past review identified that deletion guards miss datasets with NULL realpath values:

  1. Virtual deletion (lines 516-517): findPathsUsedOnlyByThisDataset uses m2.realpath = m1.realpath in SQL, which silently fails when realpath is NULL (NULL comparisons return unknown, not true). This could cause deletion of paths shared by datasets with NULL realpath.

  2. Non-virtual deletion (lines 528-531): findDatasetsWithMagsInDir uses starts_with(m.realpath, ...), which returns NULL for rows with NULL realpath. Other datasets sharing the directory won't be detected, risking data loss.

Required fix: Update the DAO queries to use COALESCE(realpath, path) or add explicit NULL handling.


547-560: Critical: DB deletion still not sequenced; failures still ignored.

The past review correctly identified that line 554 uses _ = instead of _ <-, which does not sequence the deletion into the Fox chain. The datasetDAO.deleteDataset(...).flatMap(...) Future/Fox executes independently, so:

  • Errors won't propagate to the caller
  • The method may return before deletion completes
  • Storage refresh failures are silently ignored

Required fix:

         case Full(dataset) =>
           for {
             annotationCount <- annotationDAO.countAllByDataset(dataset._id)(GlobalAccessContext)
-            _ = datasetDAO
-              .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-              .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+            _ <- datasetDAO.deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+            _ <- usedStorageService.refreshStorageReportForDataset(dataset)
           } yield ()
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

363-363: Good: safer directory creation with PathUtils.ensureDirectoryBox.

This avoids exceptions and returns proper Fox errors. Consider applying the same pattern in handleUploadChunk for parent dir creation to keep error handling consistent.

-            PathUtils.ensureDirectory(uploadDir.resolve(filePath).getParent)
+            PathUtils.ensureDirectoryBox(uploadDir.resolve(filePath).getParent).toFox ?~> "dataset.import.fileAccessDenied"

542-550: Surface clearer info on failed S3 transfers.

The error currently prints a collection object. Consider logging the count and a few sample keys to aid debugging, and include the uploadId/datasetId if available.

-      _ <- Fox.fromBool(failedTransfers.isEmpty) ?~>
-        s"Some files failed to upload to S3: $failedTransfers"
+      _ <- Fox.fromBool(failedTransfers.isEmpty) ?~> {
+        val count = failedTransfers.size()
+        s"Some files failed to upload to S3 (failed count=$count)."
+      }

Add metrics for transfer success/failure counts and durations to monitor S3 health over time.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 768402e and df11bf1.

📒 Files selected for processing (4)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/DatasetService.scala (5 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • Fox (28-228)
  • Fox (230-303)
app/models/dataset/DatasetService.scala (1)
  • deleteDatasetFromDB (547-562)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • ManagedS3Service (28-157)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectoryBox (122-130)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (14-36)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
app/models/dataset/DatasetService.scala (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (33-227)
  • refreshStorageReportForDataset (211-227)
app/controllers/WKRemoteDataStoreController.scala (1)
  • deleteDataset (207-214)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • deleteDataset (133-138)
app/models/dataset/Dataset.scala (6)
  • deleteDataset (709-737)
  • findPathsUsedOnlyByThisDataset (871-890)
  • findPathsUsedOnlyByThisDataset (1275-1290)
  • findDatasetsWithMagsInDir (890-903)
  • findDatasetsWithAttachmentsInDir (1290-1303)
  • findOne (203-210)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (77-84)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • deletePaths (134-141)
  • deleteOnDisk (122-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)

13-13: LGTM! Import cleanup aligns with API refactoring.

The removal of LayerMagLinkInfo from the import statement is consistent with the removal of the fetchPaths method and the shift toward path-centric APIs mentioned in the PR context.


57-211: No action required—all references properly updated.

Verification confirms no callers of the removed fetchPaths method remain in the codebase. The public API change has been cleanly executed with all callers migrated.

app/controllers/WKRemoteDataStoreController.scala (1)

6-6: LGTM! Deletion logic successfully centralized.

The removal of the Full import and simplification of the deletion endpoint to delegate to DatasetService.deleteDatasetFromDB is a clean refactoring. This centralizes deletion logic and reduces duplication.

Also applies to: 211-211

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

583-596: deleteOnDisk call-site update confirmed complete.

All identified call sites to DatasetDeleter.deleteOnDisk() now use the correct new signature with datasetId as the first parameter. The two call sites in UploadService.scala (lines 583 and 591) and the call site in DataSourceController.scala (line 419) all pass arguments in the correct order matching the function definition. No old-style call sites with different argument ordering were found.


23-24: ManagedS3Service DI wiring is complete and properly configured.

Verification confirms: ManagedS3Service is bound at DataStoreModule line 41, UploadService is bound at line 42, and the constructor parameter at line 100 is correctly added. No test files were found with broken UploadService instantiation; tests use Guice injection so no updates needed. The changes are safe to merge.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)

48-48: Log the actual deduplicated path.

Line 48 logs targetPath, but when retryCount > 0, the actual destination is deduplicatedTargetPath. Log the path that was actually used.

Apply this diff:

-        logger.info(s"Successfully moved dataset from $sourcePath to $targetPath.")
+        logger.info(s"Successfully moved dataset from $sourcePath to $deduplicatedTargetPath.")
app/models/dataset/Dataset.scala (1)

871-890: Critical: Query returns wrong paths for non-virtual datasets

The query selects only m1.path, but for non-virtual datasets the physical path to delete is stored in m1.realPath. This means:

  • Non-virtual datasets would return logical paths instead of physical filesystem paths
  • Deletion would target the wrong locations
  • Could cause data loss or orphaned data

A past reviewer flagged this exact issue and suggested using COALESCE(m1.realPath, m1.path).

Based on learnings

Apply this diff to select the actual physical path:

-      pathsStrOpts <- run(q"""
-           SELECT m1.path FROM webknossos.dataset_mags m1
-           WHERE m1._dataset = $datasetId
-           AND m1.path IS NOT NULL
+      pathsStrOpts <- run(q"""
+           SELECT COALESCE(m1.realPath, m1.path) AS path
+           FROM webknossos.dataset_mags m1
+           WHERE m1._dataset = $datasetId
+           AND COALESCE(m1.realPath, m1.path) IS NOT NULL
            AND NOT EXISTS (
               SELECT m2.path
               FROM webknossos.dataset_mags m2
               WHERE m2._dataset != $datasetId
               AND (
-                m2.path = m1.path
+                m2.path = COALESCE(m1.realPath, m1.path)
                 OR
-                (
-                  m2.realpath IS NOT NULL AND m2.realpath = m1.realpath
-                )
+                m2.realpath = COALESCE(m1.realPath, m1.path)
               )
            )
               """.as[Option[String]])
app/models/dataset/DatasetService.scala (3)

548-557: Good fix: sequence deletion and refresh; errors now propagate.

Replacing _ = ... with <- for both deleteDataset and refreshStorageReport correctly sequences the Fox chain.


511-523: Critical: “used-only-by-this-dataset” can miss shared data (NULL realpath/symlink).

Current DAO logic for mags uses realpath equality that fails with NULL and can delete paths still used by other datasets. Use COALESCE to align with index semantics, or conservatively block when NULLs exist. Also consider validating paths before delete.

Suggested DAO fix (app/models/dataset/Dataset.scala):

- AND NOT EXISTS (
-    SELECT m2.path
-    FROM webknossos.dataset_mags m2
-    WHERE m2._dataset != $datasetId
-    AND (
-      m2.path = m1.path
-      OR (
-        m2.realpath IS NOT NULL AND m2.realpath = m1.realpath
-      )
-    )
- )
+ AND NOT EXISTS (
+    SELECT 1
+    FROM webknossos.dataset_mags m2
+    WHERE m2._dataset != $datasetId
+    AND COALESCE(m2.realpath, m2.path) = COALESCE(m1.realpath, m1.path)
+ )

Optional safety here:

-          pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset
+          pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset
+          // extra safety: validate only-local/managed-S3 and existence
+          _ <- validatePaths(pathsUsedOnlyByThisDataset, datastore)

Run to confirm DAOs use COALESCE:

#!/bin/bash
rg -n -C2 'findMagPathsUsedOnlyByThisDataset|COALESCE\\(m2\\.realpath, m2\\.path\\)' app/models/dataset/Dataset.scala

525-533: Critical: Deletion guard misses datasets with NULL realpath; use COALESCE in DAO.

findDatasetsUsingDataFromDir relies on DAO query filtering m.realpath IS NOT NULL and starts_with(m.realpath,...). Legacy NULL realpaths won’t be seen → risk of deleting shared data.

Suggested DAO fix (app/models/dataset/Dataset.scala):

- WHERE m.realpath IS NOT NULL
- AND starts_with(m.realpath, $absolutePathWithTrailingSlash)
+ WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)

Optional: add a conservative pre-check to abort if any mags under directory have realpath IS NULL until a backfill/migration makes realpath non-NULL.

Also review attachments symmetry: attachments guard uses a.path only; if attachments ever gain realpath, mirror COALESCE there too.

#!/bin/bash
rg -n -C3 'findDatasetsWithMagsInDir|starts_with\\(' app/models/dataset/Dataset.scala
rg -n 'COALESCE\\(m\\.realpath, m\\.path\\)' app/models/dataset/Dataset.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)

9-22: Harden S3 URI parsing: null-safety, regional endpoints, and key normalization.

  • Avoid NPEs on uri.getHost.
  • Support regional virtual-hosted endpoints (e.g., bucket.s3.us-west-2.amazonaws.com).
  • Normalize keys to avoid leading “/” for AWS S3.
  • Make short-style detection scheme-aware (s3:// only).
  • Improve non-AWS host detection.

Apply this diff to the existing functions:

@@
-  def hostBucketFromUri(uri: URI): Option[String] = {
-    val host = uri.getHost
-    if (host == null) {
-      None
-    } else if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key
-      Some(host)
-    } else if (isVirtualHostedStyle(uri)) {
-      Some(host.substring(0, host.length - ".s3.amazonaws.com".length))
-    } else if (isPathStyle(uri)) {
-      Some(uri.getPath.substring(1).split("/")(0))
-    } else {
-      None
-    }
-  }
+  def hostBucketFromUri(uri: URI): Option[String] = {
+    val hostO = hostOpt(uri)
+    if (isShortStyle(uri)) {
+      hostO
+    } else if (isVirtualHostedStyle(uri)) {
+      hostO.flatMap(bucketFromVirtualHost)
+    } else if (isPathStyle(uri)) {
+      val path = Option(uri.getPath).getOrElse("")
+      path.stripPrefix("/").split("/", 2).headOption.filter(_.nonEmpty)
+    } else None
+  }
@@
-  // https://bucket-name.s3.region-code.amazonaws.com/key-name
-  private def isVirtualHostedStyle(uri: URI): Boolean =
-    uri.getHost.endsWith(".s3.amazonaws.com")
+  // https://bucket-name.s3.amazonaws.com or https://bucket-name.s3.us-west-2.amazonaws.com
+  private def isVirtualHostedStyle(uri: URI): Boolean =
+    hostOpt(uri).exists(h => VirtualHostedPattern.pattern.matcher(h).matches)
@@
-  // https://s3.region-code.amazonaws.com/bucket-name/key-name
-  private def isPathStyle(uri: URI): Boolean =
-    uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") ||
-      (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))
+  // https://s3.amazonaws.com/bucket/key, https://s3.us-west-2.amazonaws.com/bucket/key
+  // or non-AWS S3-compatible hosts (MinIO, Ceph, etc.)
+  private def isPathStyle(uri: URI): Boolean =
+    hostOpt(uri).exists { h =>
+      h.matches("^s3([.-][\\w\\-_]+)?\\.amazonaws\\.com$") ||
+      (!h.contains("amazonaws.com") && h.contains("."))
+    }
@@
-  // S3://bucket-name/key-name
-  private def isShortStyle(uri: URI): Boolean =
-    !uri.getHost.contains(".")
+  // s3://bucket-name/key-name  (short style)
+  private def isShortStyle(uri: URI): Boolean =
+    "s3".equalsIgnoreCase(uri.getScheme) && hostOpt(uri).exists(h => !h.contains("."))
@@
-  def objectKeyFromUri(uri: URI): Box[String] =
-    if (isVirtualHostedStyle(uri)) {
-      Full(uri.getPath)
-    } else if (isPathStyle(uri)) {
-      Full(uri.getPath.substring(1).split("/").tail.mkString("/"))
-    } else if (isShortStyle(uri)) {
-      Full(uri.getPath.tail)
-    } else Failure(s"Not a valid s3 uri: $uri")
+  def objectKeyFromUri(uri: URI): Box[String] = {
+    val path = Option(uri.getPath).getOrElse("")
+    if (isVirtualHostedStyle(uri) || isShortStyle(uri)) {
+      Full(path.stripPrefix("/"))
+    } else if (isPathStyle(uri)) {
+      val parts = path.stripPrefix("/").split("/", 2)
+      if (parts.length == 2) Full(parts(1)) else Failure(s"No object key in $uri")
+    } else Failure(s"Not a valid s3 uri: $uri")
+  }
@@
-  def isNonAmazonHost(uri: URI): Boolean =
-    (isPathStyle(uri) && !uri.getHost.endsWith(".amazonaws.com")) || uri.getHost == "localhost"
+  def isNonAmazonHost(uri: URI): Boolean =
+    hostOpt(uri).exists(h => (isPathStyle(uri) && !h.endsWith(".amazonaws.com")) || h == "localhost")

Add these helpers (append within the object):

// e.g. bucket.s3.amazonaws.com or bucket.s3.us-west-2.amazonaws.com
private val VirtualHostedPattern = """^(.+?)\.s3([.-][\w-]+)?\.amazonaws\.com$""".r

private def hostOpt(uri: URI): Option[String] =
  Option(uri.getHost).orElse(Option(uri.getAuthority)).map(_.trim).filter(_.nonEmpty)

private def bucketFromVirtualHost(host: String): Option[String] = host match {
  case VirtualHostedPattern(bucket, _) => Some(bucket)
  case _                               => None
}

This fixes failures on regional endpoints and ensures keys do not carry a leading slash, which is critical for correct List/Delete behavior. Based on learnings.

Also applies to: 27-39, 40-50

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

39-55: Don’t derive bucket/endpoint from credentialName; use the matched S3 credential (name/endpoint/region).

Assuming credentialName is a logical identifier, parsing it as a URI yields wrong bucket and an invalid endpoint (e.g., https://bucket). Derive from the selected S3AccessKeyCredential instead; use its optional endpoint/region, and parse name only if it is actually a URI.

Apply this refactor:

@@
-  lazy val s3UploadBucketOpt: Option[String] =
-    // by convention, the credentialName is the S3 URI so we can extract the bucket from it.
-    S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName))
+  private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] =
+    dataStoreConfig.Datastore.DataVaults.credentials
+      .flatMap(new CredentialConfigReader(_).getCredential)
+      .collectFirst {
+        case c @ S3AccessKeyCredential(name, _, _, _, _)
+            if dataStoreConfig.Datastore.S3Upload.credentialName == name => c
+      }
+
+  private lazy val s3UploadBaseUriOpt: Option[URI] =
+    s3UploadCredentialOpt.flatMap { c =>
+      // name might be a UPath/URI root like s3://bucket/prefix or https://minio:9000/bucket
+      tryo(UPath.fromString(c.name).toRemoteUriUnsafe).toOption
+        .orElse(tryo(new URI(c.name)).toOption)
+    }
+
+  lazy val s3UploadBucketOpt: Option[String] =
+    s3UploadBaseUriOpt.flatMap(S3UriUtils.hostBucketFromUri)
@@
-  private lazy val s3UploadEndpoint: URI = {
-    // by convention, the credentialName is the S3 URI so we can extract the bucket from it.
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpointOpt: Option[URI] =
+    s3UploadCredentialOpt.flatMap(_.endpoint.map(URI.create))

This prevents constructing invalid endpoints and decouples behavior from a brittle config assumption. Based on learnings.


If `credentialName` is indeed a URI in your deployments, please confirm and we can simplify accordingly.

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (7)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)</summary><blockquote>

`41-41`: **Extract hardcoded retry limit as a named constant.**

The retry limit `15` is hardcoded. Consider extracting it as a private constant (e.g., `private val MaxMoveDedupRetries = 15`) for clarity and maintainability.



```diff
+  private val MaxMoveDedupRetries = 15
+
   private def deleteWithRetry(sourcePath: Path, targetPath: Path, retryCount: Int = 0)(
       implicit ec: ExecutionContext): Fox[Unit] =
-    if (retryCount > 15) {
+    if (retryCount > MaxMoveDedupRetries) {
       Fox.failure(s"Deleting dataset failed: too many retries.")
app/models/dataset/Dataset.scala (2)

892-906: Consider checking both realpath and path for completeness

The method currently only checks m.realpath to find datasets with mags in a directory. While this is correct for non-virtual datasets (filesystem paths), you might want to also check m.path for safety, especially if virtual datasets could have paths under the same prefix.

If needed, apply this diff to check both fields:

     run(q"""
         SELECT d._id FROM webknossos.dataset_mags m
         JOIN webknossos.datasets d ON m._dataset = d._id
-        WHERE m.realpath IS NOT NULL
-        AND starts_with(m.realpath, $absolutePathWithTrailingSlash)
+        WHERE (
+          (m.realpath IS NOT NULL AND starts_with(m.realpath, $absolutePathWithTrailingSlash))
+          OR starts_with(m.path, $absolutePathWithTrailingSlash)
+        )
         AND d._id != $ignoredDataset
         AND d._datastore = ${dataStore.name.trim}
        """.as[ObjectId])

1293-1306: Add NULL check for consistency and safety

Similar to the mags version, this method should filter out NULL paths to avoid potential issues with the string prefix check.

Apply this diff to add a NULL check:

     run(q"""
         SELECT d._id FROM webknossos.dataset_layer_attachments a
         JOIN webknossos.datasets d ON a._dataset = d._id
-        WHERE starts_with(a.path, $absolutePathWithTrailingSlash)
+        WHERE a.path IS NOT NULL
+        AND starts_with(a.path, $absolutePathWithTrailingSlash)
         AND d._id != $ignoredDataset
         AND d._datastore = ${dataStore.name.trim}
        """.as[ObjectId])
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)

1-7: Add tests for core URI forms.

Cover:

Do you want me to scaffold a ScalaTest suite for this?

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (3)

79-86: Fail fast on invalid/non‑S3 paths; avoid aborting all buckets due to one bad path.

Currently, grouping by Option[String] may include None and the first failure can abort the whole deletion.

-  def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = {
-    val pathsByBucket: Map[Option[String], Seq[UPath]] = paths.groupBy(S3UriUtils.hostBucketFromUpath)
-    for {
-      _ <- Fox.serialCombined(pathsByBucket.keys) { bucket: Option[String] =>
-        deleteS3PathsOnBucket(bucket, pathsByBucket(bucket))
-      }
-    } yield ()
-  }
+  def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = {
+    val (nonS3, s3Paths) = paths.partition(p => !p.getScheme.contains(PathSchemes.schemeS3))
+    if (nonS3.nonEmpty) return Fox.failure(s"Non‑S3 paths passed to ManagedS3Service: ${nonS3.mkString(", ")}")
+
+    val grouped = s3Paths.groupBy(S3UriUtils.hostBucketFromUpath)
+    grouped.get(None).foreach { invalid =>
+      // surface all invalid paths instead of a generic failure
+      return Fox.failure(s"Could not determine S3 bucket for: ${invalid.mkString(", ")}")
+    }
+    val byBucket: Map[String, Seq[UPath]] = grouped.collect { case (Some(b), ps) => (b, ps) }
+    Fox.serialCombined(byBucket.keys.toSeq)(b => deleteS3PathsOnBucket(Some(b), byBucket(b))).map(_ => ())
+  }

This keeps valid buckets progressing even if a caller accidentally passes a bad path and also returns actionable error text.


75-78: transferManagerBox is unused.

Remove or use it (e.g., for future multipart/copy ops) to avoid dead code.


155-158: Speed up pathIsInManagedS3 and limit to S3 credentials.

Avoid repeatedly parsing all credentials; precompute S3 roots and filter to S3 credentials only.

Example:

private lazy val managedS3Roots: Seq[UPath] =
  globalCredentials.collect {
    case S3AccessKeyCredential(name, _, _, _, _) if name.startsWith("s3://") =>
      UPath.fromStringUnsafe(name).normalize
  }

def pathIsInManagedS3(path: UPath): Boolean =
  path.getScheme.contains(PathSchemes.schemeS3) && managedS3Roots.exists(path.startsWith)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df11bf1 and d80fec9.

📒 Files selected for processing (6)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • app/models/dataset/Dataset.scala (3 hunks)
  • app/models/dataset/DatasetService.scala (5 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-78)
  • DataVaults (61-63)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
  • S3UriUtils (7-52)
  • hostBucketFromUri (9-22)
  • hostBucketFromUpath (24-28)
  • objectKeyFromUri (40-49)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (8)
  • UPath (55-97)
  • toRemoteUriUnsafe (122-122)
  • toRemoteUriUnsafe (199-199)
  • getScheme (120-120)
  • getScheme (197-197)
  • fromString (60-60)
  • startsWith (139-143)
  • startsWith (209-213)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • collectFirst (614-644)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • combined (82-93)
  • fromFuture (40-51)
  • successful (51-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectory (116-120)
app/models/dataset/WKRemoteDataStoreClient.scala (1)
  • deleteOnDisk (122-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deleteOnDisk (413-428)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • successful (51-54)
  • failure (56-60)
app/models/dataset/Dataset.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
app/utils/sql/SqlInterpolation.scala (1)
  • q (20-39)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • flatten (269-279)
  • toFox (12-12)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • toSingleBox (63-81)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • Full (661-699)
app/models/dataset/DatasetService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • UPath (55-97)
app/models/annotation/Annotation.scala (2)
  • findOne (289-296)
  • countAllByDataset (644-656)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (33-227)
  • refreshStorageReportForDataset (211-227)
app/models/dataset/Dataset.scala (6)
  • deleteDataset (709-737)
  • findMagPathsUsedOnlyByThisDataset (871-892)
  • findAttachmentPathsUsedOnlyByThisDataset (1278-1293)
  • findDatasetsWithMagsInDir (892-906)
  • findDatasetsWithAttachmentsInDir (1293-1306)
  • findOne (203-210)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (79-86)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • deletePaths (134-141)
  • deleteOnDisk (122-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (1)
app/models/dataset/Dataset.scala (1)

10-10: LGTM: Clean import simplification

The shift from DataSourceMagInfo to UPath aligns with the path-centric refactoring for dataset deletion.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc9908c and 6219144.

📒 Files selected for processing (1)
  • app/controllers/DatasetController.scala (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/controllers/DatasetController.scala (5)
app/telemetry/SlackNotificationService.scala (1)
  • SlackNotificationService (11-59)
util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (2)
  • log (55-62)
  • logTime (32-48)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • toString (15-15)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • logSince (72-75)
app/utils/WkConf.scala (1)
  • Features (135-155)
app/models/dataset/DatasetService.scala (1)
  • deleteDataset (511-538)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (1)
app/controllers/DatasetController.scala (1)

38-38: Slack notifier wiring verified and correct.

Import and DI of SlackNotificationService in DatasetController are confirmed. The service is properly defined, bound as an eager singleton in WebknossosModule.scala, and consistently used across the application (SqlClient, TaskCreationService, DatasetService, JobController, AnnotationController, and others).

@fm3 fm3 mentioned this pull request Oct 28, 2025
12 tasks
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

428-436: Surface invalid/ignored paths to clients.

If DataSourceService rejects paths (non‑managed S3 or outside baseDir), propagate a 400 with details instead of a generic failure/OK. This avoids silent skips. Consider returning per‑path results.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

439-449: Use ManagedS3Service bucket; centralize URI building.

Switch to managedS3Service.s3UploadBucketOpt is correct. Suggest moving endpoint host/URI construction into ManagedS3Service (e.g., a helper returning UPath base) to avoid duplicating URI parsing here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6219144 and 04b1de2.

📒 Files selected for processing (8)
  • app/controllers/WKRemoteDataStoreController.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
  • deleteDatasetFromDB (548-562)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (28-228)
  • Fox (230-303)
  • serialCombined (93-97)
  • serialCombined (97-109)
  • toFox (12-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • isLocal (30-30)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
  • pathIsInManagedS3 (155-159)
  • deletePaths (79-86)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • deleteDirectoryRecursively (184-191)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deletePaths (428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (4)
  • validateAccessFromTokenContext (87-94)
  • UserAccessRequest (30-30)
  • UserAccessRequest (31-72)
  • webknossos (70-72)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
  • dataBaseDir (146-146)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • getDataSource (190-197)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (14-36)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
  • deleteOnDisk (122-129)
  • deletePaths (134-141)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • deletePaths (79-86)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
  • deletePathsFromDiskOrManagedS3 (300-311)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectoryBox (122-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (12)
app/controllers/WKRemoteDataStoreController.scala (3)

6-6: LGTM! Clean import removal.

The Full import is no longer needed in this controller since the Box pattern matching has moved to DatasetService.deleteDatasetFromDB.


38-51: Excellent refactoring – proper separation of concerns.

Removing annotationDAO from the controller is correct since the annotation-count logic now resides in DatasetService.deleteDatasetFromDB. This improves maintainability by centralizing deletion logic in the service layer.


202-212: Well-architected delegation to service layer.

The refactoring correctly moves deletion orchestration to DatasetService.deleteDatasetFromDB, which now handles dataset lookup, annotation counting, conditional deletion, and storage refresh. The idempotent behavior (returning success for non-existent datasets) aligns with HTTP DELETE semantics.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

27-32: DI addition looks good.
The ManagedS3Service injection integrates cleanly with existing services.


313-314: Tiny nit: helper is fine.
existsOnDisk is straightforward and used correctly in refresh path.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

90-94: baseDirAbsolute endpoint: OK.
Scoped to webknossos context; returning a path string is acceptable.


415-426: deleteOnDisk flow simplification: OK.
Fetching the data source first and delegating to service is cleaner; auth via webknossos context matches WK→DS usage.

Confirm no external clients rely on the previous deleteDataset(datasetId) permission here.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

23-24: Imports updated for ManagedS3Service: OK.


100-103: Constructor DI: OK.
ManagedS3Service centralization improves cohesion and config handling.


146-147: PathUtils.ensureDirectoryBox usage: good.
Better error mapping vs direct ensureDirectory.


541-548: S3 TransferManager via service: good.
Clear failure if S3 not configured; checks failedTransfers. LGTM.


582-594: Failure cleanup argument order fix: OK.
deleteOnDisk now receives datasetId first; matches signature.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback :D

Testing results:

Delete non-virtual datasets, should run through if paths are unique.
Other datasets symlinking into this one should block the deletion

Looks good 👍

Data on S3 buckets not listed with global credentials should never be deleted.

Also seemed good

Delete virtual datasets, should run through. Other datasets referencing the data should still have it afterwards.

I am having trouble with setting up a scenario where a s3 mag is shared.
I enabled uploading to s3. Made a successful upload. Used the new s3 path + credentials for explore remote and this also worked. But the dataset is now not rendered 🤔.
The server console says:

backend: 2025-10-29 14:27:27,600 [error] com.scalableminds.webknossos.datastore.services.DSDatasetErrorLoggingService - Error while loading bucket for layer 2-2-1 at BucketPosition(voxelMag1 at (32, 0, 160), bucket at (1,0,5), mag(1, 1, 1), additional coordinates: ), cuboid: Cuboid((32, 0, 160) / (1, 1, 1),32,32,32) for <path removed by me>: Could not read header at zarr.json <~ Failed to read from vault path <~ The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1TR63AYS5G6NZF1Y, Extended Request ID: <truncated by myself>) (SDK Attempt Count: 1) Stack trace: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: <truncated by myself>) (SDK Attempt Count: 1)

The rest of the testing went well

@MichaelBuessemeyer
Copy link
Contributor

Oh and I forgot to mention that not all S3 is cleared: The segmentation-layer/agglomerates/cumsum.json is not deleted when deleting a remote s3 dataset. Maybe because it is not listed as an used attachment?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)

39-55: Remove @tailrec annotation.

The @tailrec annotation cannot be satisfied because the recursive call (line 52) is inside a try-catch block, preventing tail-call optimization. This was already flagged in a previous review.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)

300-306: Lock down local deletions to the datastore base directory.

Any caller can submit a file:// UPath pointing outside the datastore root, and we will happily recurse-delete that absolute path. From the new DELETE endpoint this lets a remote user wipe arbitrary directories (even /) on the datastore host. We must validate each local path against dataBaseDir, reject anything else, and only then delete either the directory tree or single files. Also fail fast on non-managed remote paths instead of silently skipping them.

Please apply a guard like the following:

   def deletePathsFromDiskOrManagedS3(paths: Seq[UPath]): Fox[Unit] = {
-    val localPaths = paths.filter(_.isLocal).flatMap(_.toLocalPath)
-    val managedS3Paths = paths.filter(managedS3Service.pathIsInManagedS3)
-    for {
-      _ <- Fox.serialCombined(localPaths)(PathUtils.deleteDirectoryRecursively(_).toFox)
-      _ <- managedS3Service.deletePaths(managedS3Paths)
-    } yield ()
+    val (localCandidates, remoteCandidates) = paths.partition(_.isLocal)
+    val managedS3Paths = remoteCandidates.filter(managedS3Service.pathIsInManagedS3)
+    val disallowedRemote = remoteCandidates.diff(managedS3Paths)
+    val baseDir = dataBaseDir.toAbsolutePath.normalize
+
+    for {
+      _ <- Fox.fromBool(disallowedRemote.isEmpty) ?~>
+        s"Refusing to delete non-managed remote paths: ${disallowedRemote.mkString(", ")}"
+      validatedLocal <- Fox.serialCombined(localCandidates) { upath =>
+        for {
+          local <- upath.toLocalPath.toFox
+          abs = local.toAbsolutePath.normalize
+          _ <- Fox.fromBool(abs.startsWith(baseDir)) ?~>
+            s"Refusing to delete outside datastore base directory ($baseDir): $abs"
+        } yield abs
+      }
+      _ <- Fox.serialCombined(validatedLocal) { abs =>
+        if (Files.isDirectory(abs)) PathUtils.deleteDirectoryRecursively(abs).toFox
+        else tryo(Files.deleteIfExists(abs)).map(_ => ()).toFox
+      }
+      _ <- managedS3Service.deletePaths(managedS3Paths)
+    } yield ()
   }

This matches the earlier review feedback but is still unresolved, so please fix before release.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

39-72: Stop interpreting credentialName as an S3 URI.

Datastore.S3Upload.credentialName is configured as a plain credential ID (e.g. "managed-s3"). The new code assumes it is an S3 URI, immediately parsing it with new URI(...) and forcing an endpoint override based on that host. On existing installations this throws URISyntaxException, so s3ClientBox becomes a Failure and all managed S3 deletions stop working. Even if the parse succeeded, reinterpreting the name changes semantics and breaks compatibility.

Please derive bucket/endpoint/region from the actual S3AccessKeyCredential metadata (the optional region and endpoint fields) and only override the client when those optional fields are populated. The field should remain a plain identifier. For example:

-  private lazy val s3UploadCredentialsOpt: Option[(String, String)] =
-    dataStoreConfig.Datastore.DataVaults.credentials.flatMap { credentialConfig =>
-      new CredentialConfigReader(credentialConfig).getCredential
-    }.collectFirst {
-      case S3AccessKeyCredential(credentialName, accessKeyId, secretAccessKey, _, _)
-          if dataStoreConfig.Datastore.S3Upload.credentialName == credentialName =>
-        (accessKeyId, secretAccessKey)
-    }
-
-  lazy val s3UploadBucketOpt: Option[String] =
-    S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName))
-
-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI("https", null, credentialUri.getHost, -1, null, null, null)
-  }
+  private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] =
+    dataStoreConfig.Datastore.DataVaults.credentials
+      .flatMap(new CredentialConfigReader(_).getCredential)
+      .collectFirst {
+        case cred @ S3AccessKeyCredential(name, _, _, _, _)
+            if dataStoreConfig.Datastore.S3Upload.credentialName == name => cred
+      }
+
+  private lazy val s3UploadCredentialsOpt: Option[(String, String)] =
+    s3UploadCredentialOpt.map(cred => (cred.identifier, cred.secret))
+
+  private lazy val s3EndpointOverrideOpt: Option[URI] =
+    s3UploadCredentialOpt.flatMap(_.endpoint).flatMap(ep => tryo(new URI(ep)).toOption)
+
+  private lazy val s3Region: Region =
+    s3UploadCredentialOpt.flatMap(_.region).map(Region.of).getOrElse(Region.US_EAST_1)

And in the builder only call endpointOverride/forcePathStyle when s3EndpointOverrideOpt is defined:

-        .crossRegionAccessEnabled(true)
-        .forcePathStyle(true)
-        .endpointOverride(s3UploadEndpoint)
-        .region(Region.US_EAST_1)
+        .crossRegionAccessEnabled(true)
+        .region(s3Region)
+      s3EndpointOverrideOpt.foreach { endpoint =>
+        builder.endpointOverride(endpoint)
+        builder.forcePathStyle(true)
+      }

This keeps the existing configuration contract intact and avoids runtime failures.

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)

47-47: Consider using s interpolator instead of f.

The f interpolator is intended for formatted strings with format specifiers (like %d, %s). Without format specifiers, s is more appropriate and avoids unnecessary overhead.

Apply this diff:

-          if (retryCount == 0) targetPath else targetPath.resolveSibling(f"${targetPath.getFileName} ($retryCount)")
+          if (retryCount == 0) targetPath else targetPath.resolveSibling(s"${targetPath.getFileName} ($retryCount)")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04b1de2 and 368893d.

📒 Files selected for processing (4)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-01-27T12:06:42.865Z
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
🧬 Code graph analysis (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • ensureDirectory (116-120)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
  • Full (661-699)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deleteOnDisk (413-428)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • successful (51-54)
  • failure (56-60)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (28-228)
  • Fox (230-303)
  • serialCombined (93-97)
  • serialCombined (97-109)
  • toFox (12-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (3)
  • isLocal (29-29)
  • toLocalPath (111-111)
  • toLocalPath (176-176)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
  • pathIsInManagedS3 (153-157)
  • deletePaths (79-86)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-221)
  • deleteDirectoryRecursively (184-191)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
  • deletePaths (428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
  • dataBaseDir (146-146)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-75)
  • DataVaults (58-60)
  • S3Upload (61-65)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
  • PathSchemes (3-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
  • S3UriUtils (7-52)
  • hostBucketFromUri (9-22)
  • hostBucketFromUpath (24-28)
  • objectKeyFromUri (40-49)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (6)
  • UPath (54-96)
  • toRemoteUriUnsafe (121-121)
  • toRemoteUriUnsafe (188-188)
  • fromString (59-59)
  • startsWith (138-142)
  • startsWith (198-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
  • toFox (12-12)
  • combined (82-93)
  • fromFuture (40-51)
  • successful (51-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

2-11: LGTM! Clean refactor of trait dependencies.

The trait signature and imports have been simplified appropriately. Removing DataSourceToDiskWriter and adding PathUtils aligns well with the new trash-based deletion approach.


24-36: LGTM! Idempotent deletion with clear logging.

The existence check and error handling are correct. Returning Fox.successful(()) when the path doesn't exist makes the deletion idempotent, which is the right behavior. The logging clearly distinguishes between actual deletion and no-op cases.

@fm3
Copy link
Member

fm3 commented Oct 30, 2025

Thanks for taking the time to look at my PR! I implemented your code feedback :)

Regarding testing: Yes, it is difficult to set up dependent paths on S3 currently. I just re-tested this case by running a small neuron inferral with the webknossos-worker (with it’s new_publish branch). Worked as expected :)

Still, I’m a little confused by your testing result concerning explore. If I try to explore+add a dataset previously uploaded to S3 via the dataset upload, I get this error (as it should be). Do you still know what you did there exactly?

image

@fm3
Copy link
Member

fm3 commented Oct 30, 2025

Oh and I forgot to mention that not all S3 is cleared: The segmentation-layer/agglomerates/cumsum.json is not deleted when deleting a remote s3 dataset. Maybe because it is not listed as an used attachment?

I would have expected that cumsum.json would not have been uploaded to s3 in the first place, as it was not referenced in the datasource-properties.json 🤔 But yes, I think it’s acceptable for the moment that it is not deleted. It is also not very large.

@MichaelBuessemeyer
Copy link
Contributor

🤔 I think the exploration result of the dataset uploaded is different. It is buggy imo: When I explore
<s3_root>/webknossos-uploads/sample_organization/delete_me-69034ce2800100b312b9616b I get a successful exploration, but the mag is interpreted as a layer 🤔.
image

Maybe thats the difference between our test cases?

@MichaelBuessemeyer
Copy link
Contributor

Regarding testing: Yes, it is difficult to set up dependent paths on S3 currently. I just re-tested this case by running a small neuron inferral with the webknossos-worker (with it’s new_publish branch). Worked as expected :)

Ok than no need for me to do this again as this takes some time?

@MichaelBuessemeyer
Copy link
Contributor

Oh I see, the error

backend: 2025-10-29 14:27:27,600 [error] com.scalableminds.webknossos.datastore.services.DSDatasetErrorLoggingService - Error while loading bucket for layer 2-2-1 at BucketPosition(voxelMag1 at (32, 0, 160), bucket at (1,0,5), mag(1, 1, 1), additional coordinates: ), cuboid: Cuboid((32, 0, 160) / (1, 1, 1),32,32,32) for <path removed by me>: Could not read header at zarr.json <~ Failed to read from vault path <~ The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1TR63AYS5G6NZF1Y, Extended Request ID: <truncated by myself>) (SDK Attempt Count: 1) Stack trace: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: <truncated by myself>) (SDK Attempt Count: 1)

states

The AWS Access Key Id you provided does not exist in our records.

This means that the file does not exist, which is alway what the error says:

Could not read header at zarr.json

Therefore, I think the root cause for this is that the exploration thinks it found a valid layer, but this was actually a mag. Thus, the datasource stored in the db is buggy and thus loading the dataset does not work properly.

@MichaelBuessemeyer
Copy link
Contributor

But this exploration bug is very likely not related to this pr. So I'd say we can make this green as I did some little retesting showing that it still works as well as your latest changes looking good?

@fm3
Copy link
Member

fm3 commented Oct 30, 2025

Yes, I agree :) Also, no need to re-test the linked mags in virtual datasets, I did that today

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as this bug is unrelated, I opened #9034.

And as the code is good and doing what it should be doing 🟢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delete datasets from S3

5 participants