Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Oct 27, 2025

  • When scanning datasources from directories, always add resolved mag paths immediately
  • Introduces realpaths for attachments
  • Report all realpaths that could be determined, scale down logging for those that couldn’t.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • start wk with some datasets in the binaryData dir, watch backend logging, should show realpath scan failures (provoke some by deleting mags or attachments that are referenced in some datasource-properties.jsons)
  • All realpaths that could be determined should be added to the database

TODOs:

  • immediate paths for datasources
  • simplify layerpath logic
  • attachment realpath column
  • don’t fail for datasource on single realpath failure
  • log failures
  • scan also attachments
  • evolution
  • (blocked) adapt s3-delete code to also consider realpaths of attachments

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Oct 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

📝 Walkthrough

Walkthrough

Renames datastore endpoint to updateRealPaths; introduces RealPathInfo replacing MagPathInfo; adds realPath and hasLocalData columns/migrations for dataset attachments; refactors real-path scanning and resolution across datastore services; requires explicit Vec3Int.toMagLiteral(allowScalar) and updates call sites.

Changes

Cohort / File(s) Change Summary
Controller & Routes
app/controllers/WKRemoteDataStoreController.scala, conf/webknossos.latest.routes
Endpoint renamed updatePathsupdateRealPaths and route mapping updated to match.
Dataset DAOs & Model
app/models/dataset/Dataset.scala
Replaced mag-path updates with real-path updates: updateMagRealPathsForDataset renamed/repurposed to accept Seq[RealPathInfo], added updateAttachmentRealPathsForDataset, select now returns realPath and hasLocalData, with transactional + retry semantics.
Dataset Service & Client
app/models/dataset/DatasetService.scala, app/models/dataset/WKRemoteDataStoreClient.scala, app/models/dataset/DatasetUploadToPathsService.scala
DatasetService gains dependency on DatasetLayerAttachmentsDAO, delegates to new real-path update flows and attachment updater; thumbnail/mag literal calls updated to pass explicit allowScalar.
Database Schema & Migrations
conf/evolutions/145-attachment-realpaths.sql, conf/evolutions/reversions/145-attachment-realpaths.sql, tools/postgres/schema.sql
Added realPath TEXT and hasLocalData BOOLEAN NOT NULL DEFAULT false to webknossos.dataset_layer_attachments; schema version bumped 144→145; evolutions include transaction and precondition checks and a revert script.
Datastore client & DTOs
webknossos-datastore/app/.../services/DSRemoteWebknossosClient.scala
Replaced MagPathInfo with RealPathInfo(path, realPath, hasLocalData); DataSourcePathInfo now includes attachmentPathInfos; reportRealPaths uses Seq[DataSourcePathInfo].
Datastore scanning & services
webknossos-datastore/app/.../services/DataSourceService.scala
Major refactor: added scanRealPaths / scanRealPathsForDataSource, getMagPathInfo, getAttachmentPathInfo; dataSourceFromDir gains resolveMagPaths: Boolean; unified reporting flow and richer logging including per-data-source failures.
Datastore storage resolution
webknossos-datastore/app/.../storage/DataVaultService.scala
resolveMagPath signature changed (removed layerName); relative-path logic simplified to always resolve against dataset dir; mag literal uses allowScalar = false.
Tracingstore adjustments
webknossos-tracingstore/app/.../TSRemoteDatastoreClient.scala, webknossos-tracingstore/app/.../VolumeSegmentIndexBuffer.scala
mag serialization calls updated to toMagLiteral(allowScalar = false) where applicable.
Datastore call-site updates
webknossos-datastore/app/.../controllers/DSLegacyApiController.scala, .../controllers/DataSourceController.scala, .../services/uploading/UploadService.scala
Added resolveMagPaths boolean to dataSourceFromDir call sites (true for scans, false for upload flows); improved uploaded-path validation messaging.
Utility API
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala
toMagLiteral signature changed to require explicit allowScalar: Boolean (default removed).
Changelog
unreleased_changes/9019.md
Added changelog entry documenting attachment realpath registration, improved resilience, and migration reference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • Dataset DAO transactional updates and retry/isolation semantics (attachment updater).
    • DataSourceService refactor (scanRealPaths, failure aggregation, logging) — ensure correctness and that single failures don't abort dataset-level results.
    • DataVaultService.resolveMagPath behavior change and all call-site updates (validate relative vs. layer-based resolution semantics).
    • Vec3Int.toMagLiteral signature change — confirm all call sites pass explicit allowScalar and effect on serialization.
    • DSRemoteWebknossosClient DTO changes — verify clients/servers agree on JSON formats for RealPathInfo and DataSourcePathInfo.

Possibly related PRs

Suggested reviewers

  • frcroth
  • normanrz

Poem

🐰 I hop through code with nimble feet,
Real paths and attachments now neatly meet.
If one mag trips, the scan carries on,
Data stitched together from dusk till dawn. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Make Realpath Scans More Resilient" accurately reflects the primary objective of this changeset. The PR introduces significant improvements to real-path scanning resilience by ensuring that a single failing path does not prevent storing paths for an entire dataset, adds support for attachment real-paths alongside mag real-paths, and integrates real-path determination into the original datasource scan flow. These changes directly align with the author's stated focus on resilience improvements.
Linked Issues Check ✅ Passed The changeset comprehensively addresses all objectives from linked issue #9018. First, resilience is achieved by introducing scanRealPaths and scanRealPathsForDataSource methods that collect both successful path resolutions and failures separately, ensuring individual failures do not block other path storage [issue #9018]. Second, base paths are determined early via the new dataSourceFromDir resolveMagPaths flag and addMagPaths helper, enabling immediate path resolution during initial datasource scan [issue #9018]. Third, attachment realpaths are fully supported through new DatasetLayerAttachmentsDAO.updateAttachmentRealPathsForDataset, getAttachmentPathInfo, database schema additions (realPath and hasLocalData columns), and DataSourcePathInfo.attachmentPathInfos [issue #9018]. Fourth, path determination is integrated into the original datasource scan by consolidating checkInbox flow to compute, report, and log path results together instead of separate requests [issue #9018].
Out of Scope Changes Check ✅ Passed The vast majority of changes are tightly scoped to the stated objectives. Core in-scope changes include real-path resilience improvements, attachment real-path support, database migrations, and service layer refactoring. The DataVaultService.resolveMagPath signature simplification aligns with the PR's stated "simplify layerpath logic" objective. One consideration is the removal of the default parameter from Vec3Int.toMagLiteral(allowScalar: Boolean = false), which is a breaking API change affecting multiple call sites throughout the codebase. While this change could be justified as ensuring consistent magnitude literal handling across the system, it is not explicitly mentioned in the PR objectives or description. This represents a minor scope deviation but appears intended to enforce explicit parameter passing for correctness rather than being an unrelated change.
Description Check ✅ Passed The pull request description is clearly related to the changeset and provides meaningful context about the implementation. It outlines the key changes (immediate path addition, attachment realpaths, resilient reporting), provides testing instructions with specific scenarios to validate the behavior, references the related issue (#9018), and includes a detailed checklist confirming PR requirements have been addressed. The description meaningfully conveys the purpose and scope of the changes.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch realpath-scan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from refactor-datavault-service to master October 29, 2025 12:01
@fm3 fm3 changed the title WIP: Make Realpath Scans More Resilient Make Realpath Scans More Resilient Oct 30, 2025
@fm3 fm3 marked this pull request as ready for review October 30, 2025 12:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

1262-1283: Deduplicate attachments by real path as well.

We now record realPath for attachments, but the storage query still partitions solely by the original path, so symlinked or otherwise relocated files with differing logical paths will continue to be counted multiple times. Mirror the mags query and fall back to path only when realPath is unavailable.

-              ROW_NUMBER() OVER (
-                PARTITION BY att.path
+              ROW_NUMBER() OVER (
+                PARTITION BY COALESCE(att.realPath, att.path)
                 ORDER BY ds.created ASC
               ) AS rn
@@
-          WHERE ranked.rn = 1
+          WHERE ranked.rn = 1
             AND ranked._organization = $organizationId
             AND ranked._dataStore = $dataStoreId
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e6cca4 and 5c74222.

📒 Files selected for processing (16)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (1 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)
  • conf/evolutions/145-attachment-realpaths.sql (1 hunks)
  • conf/evolutions/reversions/145-attachment-realpaths.sql (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • tools/postgres/schema.sql (2 hunks)
  • unreleased_changes/9019.md (1 hunks)
  • util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2024-11-22T17:18:04.217Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
  • app/models/dataset/DatasetService.scala
  • app/models/dataset/DatasetUploadToPathsService.scala
  • app/models/dataset/Dataset.scala
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
  • app/models/dataset/WKRemoteDataStoreClient.scala
📚 Learning: 2024-11-22T17:19:07.947Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Applied to files:

  • app/models/dataset/Dataset.scala
  • tools/postgres/schema.sql
📚 Learning: 2025-05-07T06:17:32.810Z
Learnt from: philippotto
PR: scalableminds/webknossos#8602
File: frontend/javascripts/oxalis/model/volumetracing/volume_annotation_sampling.ts:365-366
Timestamp: 2025-05-07T06:17:32.810Z
Learning: The parameter in applyVoxelMap was renamed from `sliceCount` to `sliceOffset` to better reflect its purpose, but this doesn't affect existing call sites since JavaScript/TypeScript function calls are position-based.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T14:15:05.259Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Applied to files:

  • tools/postgres/schema.sql
🧬 Code graph analysis (9)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
  • updateRealPaths (491-503)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (1)
  • additionalCoordinatesKeyPart (65-76)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
  • findOneByDataSourceId (430-433)
  • dataSourceId (93-93)
  • updateMagRealPathsForDataset (828-845)
  • updateAttachmentRealPathsForDataset (1174-1191)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • shiftBox (312-312)
  • successful (53-56)
  • failure (58-62)
app/controllers/WKRemoteDataStoreController.scala (1)
  • updateRealPaths (196-208)
app/models/dataset/DatasetUploadToPathsService.scala (1)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
  • addQueryParam (30-33)
  • addQueryParam (33-36)
  • addQueryParam (36-39)
  • addQueryParam (39-42)
  • addQueryParam (42-45)
  • addQueryParam (45-50)
  • addQueryParam (50-53)
  • addQueryParam (53-56)
  • addQueryParam (56-59)
  • addQueryParam (59-63)
  • addQueryParam (63-68)
  • addQueryParam (68-71)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
  • mag (25-25)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
  • addQueryParam (30-33)
  • addQueryParam (33-36)
  • addQueryParam (36-39)
  • addQueryParam (39-42)
  • addQueryParam (42-45)
  • addQueryParam (45-50)
  • addQueryParam (50-53)
  • addQueryParam (53-56)
  • addQueryParam (56-59)
  • addQueryParam (59-63)
  • addQueryParam (63-68)
  • addQueryParam (68-71)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scala (1)
  • allAttachments (19-19)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (79-98)
  • resolveMagPath (98-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1)

70-70: LGTM! Explicit parameter improves clarity.

The addition of the explicit allowScalar = false parameter aligns with the codebase-wide refactoring to make toMagLiteral calls more explicit. This ensures the mag is always serialized in full "x-y-z" format for the remote datastore query, avoiding ambiguity.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1)

29-29: LGTM! Explicit allowScalar = false ensures consistent key format.

The explicit parameter correctly adapts to the updated toMagLiteral API and ensures segment index keys always use the full "x-y-z" format, which is appropriate for consistent key construction and lookups.

util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)

40-40: API change properly implemented across codebase.

Verification confirms that all 32+ call sites of toMagLiteral() throughout the codebase have been updated with explicit allowScalar boolean parameters. The breaking API change was implemented comprehensively with no incomplete migrations detected.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)

79-96: LGTM: Signature simplification and clearer mag literal handling.

The removal of the unused layerName parameter and the simplified relative path resolution logic improve maintainability. The explicit allowScalar = false parameter on line 90 makes it clear that the vec3 notation is intended for the fallback case.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

40-54: LGTM: Clean data model for real-path tracking.

The introduction of RealPathInfo with separate path and realPath fields provides clear semantics for tracking both the reference path and the resolved real path. The hasLocalData flag is useful for distinguishing between local and remote data. The extension of DataSourcePathInfo to include attachmentPathInfos successfully aligns with the PR objective to support attachment real paths.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)

70-99: LGTM: Resilient real-path scanning with proper separation of concerns.

The refactored flow correctly separates datasource discovery from real-path resolution, allowing failures in path resolution to be collected without preventing successful datasources from being reported. This achieves the PR objective of making the scan more resilient.


101-128: LGTM: Comprehensive path scanning with proper failure collection.

The implementation correctly processes both mag and attachment paths for each datasource, collecting failures without propagating them. The handling of optional attachments (dataLayer.attachments.map(_.allAttachments).getOrElse(Seq.empty)) is correct, and filtering non-empty path info prevents unnecessary reporting.


130-146: LGTM: Safe real-path resolution with proper local data detection.

The implementation correctly handles both remote and local paths. For local paths, toRealPath() is wrapped in tryo to safely handle failures (e.g., broken symlinks, non-existent paths). The hasLocalData check using startsWith correctly identifies whether the resolved path is within the dataset directory.


161-195: LGTM: Enhanced logging provides good observability.

The updated logging includes timing information, real-path scan statistics, and detailed failure reporting. The verbose mode provides helpful per-team breakdowns, and failure details are logged when available, which aids troubleshooting.


265-274: LGTM: Early mag path population supports downstream processing.

The addMagPaths method correctly populates mag paths during datasource loading, ensuring they're available for subsequent real-path scanning. Using layer.mapped with newMags is the appropriate pattern for creating modified layers, and the call to resolveMagPath properly resolves each mag's path.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

244-250: Consider refactoring for improved readability.

The conditional assignment pattern is correct but the formatting makes it harder to follow. Consider this alternative structure:

-            val dataSourceWithMagPaths =
-              if (resolveMagPaths)
-                dataSourceWithAttachments.copy(
-                  dataLayers = addMagPaths(path, dataSourceWithAttachments)
-                )
-              else dataSourceWithAttachments
-            dataSourceWithMagPaths.copy(id)
+            val dataSourceWithIdAndAttachments = dataSourceWithAttachments.copy(id)
+            if (resolveMagPaths)
+              dataSourceWithIdAndAttachments.copy(
+                dataLayers = addMagPaths(path, dataSourceWithAttachments)
+              )
+            else dataSourceWithIdAndAttachments

Or more concisely:

-            val dataSourceWithMagPaths =
-              if (resolveMagPaths)
-                dataSourceWithAttachments.copy(
-                  dataLayers = addMagPaths(path, dataSourceWithAttachments)
-                )
-              else dataSourceWithAttachments
-            dataSourceWithMagPaths.copy(id)
+            val updatedDataLayers = if (resolveMagPaths) addMagPaths(path, dataSourceWithAttachments) else dataSourceWithAttachments.dataLayers
+            dataSourceWithAttachments.copy(id = id, dataLayers = updatedDataLayers)

268-277: Consider adding documentation for the method's purpose.

While the implementation is straightforward, adding a brief comment explaining when and why addMagPaths is called would help future maintainers. For example:

/**
 * Eagerly resolves and populates mag paths for all layers during datasource scanning.
 * This supports early path determination as part of the initial scan rather than as a separate request.
 */
private def addMagPaths(dataSourcePath: Path, dataSource: UsableDataSource): List[StaticLayer] =
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 961de3f and 27c12bc.

📒 Files selected for processing (2)
  • unreleased_changes/9019.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • unreleased_changes/9019.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (79-98)
  • resolveMagPath (98-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)

71-71: Excellent logging and timing improvements.

The addition of detailed timing, realpath scan summaries, and verbose failure reporting significantly improves observability. The per-team breakdown and conditional verbose logging strike a good balance between information and noise.

Also applies to: 81-91, 161-195


101-128: Resilient scanning architecture is well-implemented.

The separation of successful path resolutions from failures (lines 101-106) and the per-datasource scanning (lines 108-128) effectively achieve the PR objective: a single failing mag or attachment no longer prevents storing other paths for the datasource.


130-146: Mag path resolution is correct and handles remote/local cases appropriately.

The method properly distinguishes between remote paths (returned as-is) and local paths (resolved with toRealPath() to handle symlinks). The hasLocalData flag correctly identifies whether the resolved path is within the dataset directory.


148-159: Attachment path handling is sound with defensive absolute path check.

The explicit check on Line 153 that attachment.path.isAbsolute addresses the previous review concern about ensuring attachment paths are absolute before calling toRealPath(). This defensive check prevents incorrect resolution against the current working directory if a relative path were to slip through the upstream resolution logic.

Based on learnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make realpath scan more resilient, support attachments

2 participants