-
Notifications
You must be signed in to change notification settings - Fork 29
Make Realpath Scans More Resilient #9019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughRenames datastore endpoint to updateRealPaths; introduces RealPathInfo replacing MagPathInfo; adds realPath and hasLocalData columns/migrations for dataset attachments; refactors real-path scanning and resolution across datastore services; requires explicit Vec3Int.toMagLiteral(allowScalar) and updates call sites. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)
1262-1283: Deduplicate attachments by real path as well.We now record
realPathfor attachments, but the storage query still partitions solely by the originalpath, so symlinked or otherwise relocated files with differing logical paths will continue to be counted multiple times. Mirror the mags query and fall back topathonly whenrealPathis unavailable.- ROW_NUMBER() OVER ( - PARTITION BY att.path + ROW_NUMBER() OVER ( + PARTITION BY COALESCE(att.realPath, att.path) ORDER BY ds.created ASC ) AS rn @@ - WHERE ranked.rn = 1 + WHERE ranked.rn = 1 AND ranked._organization = $organizationId AND ranked._dataStore = $dataStoreId
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
app/controllers/WKRemoteDataStoreController.scala(1 hunks)app/models/dataset/Dataset.scala(4 hunks)app/models/dataset/DatasetService.scala(2 hunks)app/models/dataset/DatasetUploadToPathsService.scala(1 hunks)app/models/dataset/WKRemoteDataStoreClient.scala(1 hunks)conf/evolutions/145-attachment-realpaths.sql(1 hunks)conf/evolutions/reversions/145-attachment-realpaths.sql(1 hunks)conf/webknossos.latest.routes(1 hunks)tools/postgres/schema.sql(2 hunks)unreleased_changes/9019.md(1 hunks)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(7 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2024-11-22T17:18:04.217Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scalawebknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scalaapp/models/dataset/DatasetService.scalaapp/models/dataset/DatasetUploadToPathsService.scalaapp/models/dataset/Dataset.scalawebknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scalaapp/models/dataset/WKRemoteDataStoreClient.scala
📚 Learning: 2024-11-22T17:19:07.947Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Applied to files:
app/models/dataset/Dataset.scalatools/postgres/schema.sql
📚 Learning: 2025-05-07T06:17:32.810Z
Learnt from: philippotto
PR: scalableminds/webknossos#8602
File: frontend/javascripts/oxalis/model/volumetracing/volume_annotation_sampling.ts:365-366
Timestamp: 2025-05-07T06:17:32.810Z
Learning: The parameter in applyVoxelMap was renamed from `sliceCount` to `sliceOffset` to better reflect its purpose, but this doesn't affect existing call sites since JavaScript/TypeScript function calls are position-based.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T14:15:05.259Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.
Applied to files:
tools/postgres/schema.sql
🧬 Code graph analysis (9)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
updateRealPaths(491-503)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (1)
additionalCoordinatesKeyPart(65-76)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
findOneByDataSourceId(430-433)dataSourceId(93-93)updateMagRealPathsForDataset(828-845)updateAttachmentRealPathsForDataset(1174-1191)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
shiftBox(312-312)successful(53-56)failure(58-62)app/controllers/WKRemoteDataStoreController.scala (1)
updateRealPaths(196-208)
app/models/dataset/DatasetUploadToPathsService.scala (1)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
RealPathInfo(50-50)RealPathInfo(52-54)app/utils/sql/SqlInterpolation.scala (2)
q(20-39)asUpdate(74-74)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
addQueryParam(30-33)addQueryParam(33-36)addQueryParam(36-39)addQueryParam(39-42)addQueryParam(42-45)addQueryParam(45-50)addQueryParam(50-53)addQueryParam(53-56)addQueryParam(56-59)addQueryParam(59-63)addQueryParam(63-68)addQueryParam(68-71)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
mag(25-25)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
addQueryParam(30-33)addQueryParam(33-36)addQueryParam(36-39)addQueryParam(39-42)addQueryParam(42-45)addQueryParam(45-50)addQueryParam(50-53)addQueryParam(53-56)addQueryParam(56-59)addQueryParam(59-63)addQueryParam(63-68)addQueryParam(68-71)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
Instant(14-45)Instant(47-103)now(48-48)since(68-68)toString(15-15)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
reportDataSources(107-114)reportRealPaths(114-120)nonEmpty(43-43)DataSourcePathInfo(40-44)DataSourcePathInfo(46-48)RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scala (1)
allAttachments(19-19)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
resolveMagPath(79-98)resolveMagPath(98-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
formatDuration(30-82)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped(104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1)
70-70: LGTM! Explicit parameter improves clarity.The addition of the explicit
allowScalar = falseparameter aligns with the codebase-wide refactoring to maketoMagLiteralcalls more explicit. This ensures the mag is always serialized in full "x-y-z" format for the remote datastore query, avoiding ambiguity.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1)
29-29: LGTM! ExplicitallowScalar = falseensures consistent key format.The explicit parameter correctly adapts to the updated
toMagLiteralAPI and ensures segment index keys always use the full "x-y-z" format, which is appropriate for consistent key construction and lookups.util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
40-40: API change properly implemented across codebase.Verification confirms that all 32+ call sites of
toMagLiteral()throughout the codebase have been updated with explicitallowScalarboolean parameters. The breaking API change was implemented comprehensively with no incomplete migrations detected.webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)
79-96: LGTM: Signature simplification and clearer mag literal handling.The removal of the unused
layerNameparameter and the simplified relative path resolution logic improve maintainability. The explicitallowScalar = falseparameter on line 90 makes it clear that the vec3 notation is intended for the fallback case.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
40-54: LGTM: Clean data model for real-path tracking.The introduction of
RealPathInfowith separatepathandrealPathfields provides clear semantics for tracking both the reference path and the resolved real path. ThehasLocalDataflag is useful for distinguishing between local and remote data. The extension ofDataSourcePathInfoto includeattachmentPathInfossuccessfully aligns with the PR objective to support attachment real paths.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
70-99: LGTM: Resilient real-path scanning with proper separation of concerns.The refactored flow correctly separates datasource discovery from real-path resolution, allowing failures in path resolution to be collected without preventing successful datasources from being reported. This achieves the PR objective of making the scan more resilient.
101-128: LGTM: Comprehensive path scanning with proper failure collection.The implementation correctly processes both mag and attachment paths for each datasource, collecting failures without propagating them. The handling of optional attachments (
dataLayer.attachments.map(_.allAttachments).getOrElse(Seq.empty)) is correct, and filtering non-empty path info prevents unnecessary reporting.
130-146: LGTM: Safe real-path resolution with proper local data detection.The implementation correctly handles both remote and local paths. For local paths,
toRealPath()is wrapped intryoto safely handle failures (e.g., broken symlinks, non-existent paths). ThehasLocalDatacheck usingstartsWithcorrectly identifies whether the resolved path is within the dataset directory.
161-195: LGTM: Enhanced logging provides good observability.The updated logging includes timing information, real-path scan statistics, and detailed failure reporting. The verbose mode provides helpful per-team breakdowns, and failure details are logged when available, which aids troubleshooting.
265-274: LGTM: Early mag path population supports downstream processing.The
addMagPathsmethod correctly populates mag paths during datasource loading, ensuring they're available for subsequent real-path scanning. Usinglayer.mappedwithnewMagsis the appropriate pattern for creating modified layers, and the call toresolveMagPathproperly resolves each mag's path.
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
244-250: Consider refactoring for improved readability.The conditional assignment pattern is correct but the formatting makes it harder to follow. Consider this alternative structure:
- val dataSourceWithMagPaths = - if (resolveMagPaths) - dataSourceWithAttachments.copy( - dataLayers = addMagPaths(path, dataSourceWithAttachments) - ) - else dataSourceWithAttachments - dataSourceWithMagPaths.copy(id) + val dataSourceWithIdAndAttachments = dataSourceWithAttachments.copy(id) + if (resolveMagPaths) + dataSourceWithIdAndAttachments.copy( + dataLayers = addMagPaths(path, dataSourceWithAttachments) + ) + else dataSourceWithIdAndAttachmentsOr more concisely:
- val dataSourceWithMagPaths = - if (resolveMagPaths) - dataSourceWithAttachments.copy( - dataLayers = addMagPaths(path, dataSourceWithAttachments) - ) - else dataSourceWithAttachments - dataSourceWithMagPaths.copy(id) + val updatedDataLayers = if (resolveMagPaths) addMagPaths(path, dataSourceWithAttachments) else dataSourceWithAttachments.dataLayers + dataSourceWithAttachments.copy(id = id, dataLayers = updatedDataLayers)
268-277: Consider adding documentation for the method's purpose.While the implementation is straightforward, adding a brief comment explaining when and why
addMagPathsis called would help future maintainers. For example:/** * Eagerly resolves and populates mag paths for all layers during datasource scanning. * This supports early path determination as part of the initial scan rather than as a separate request. */ private def addMagPaths(dataSourcePath: Path, dataSource: UsableDataSource): List[StaticLayer] =
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
unreleased_changes/9019.md(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unreleased_changes/9019.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
Instant(14-45)Instant(47-103)now(48-48)since(68-68)toString(15-15)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
reportDataSources(107-114)reportRealPaths(114-120)nonEmpty(43-43)DataSourcePathInfo(40-44)DataSourcePathInfo(46-48)RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
resolveMagPath(79-98)resolveMagPath(98-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
formatDuration(30-82)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped(104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
71-71: Excellent logging and timing improvements.The addition of detailed timing, realpath scan summaries, and verbose failure reporting significantly improves observability. The per-team breakdown and conditional verbose logging strike a good balance between information and noise.
Also applies to: 81-91, 161-195
101-128: Resilient scanning architecture is well-implemented.The separation of successful path resolutions from failures (lines 101-106) and the per-datasource scanning (lines 108-128) effectively achieve the PR objective: a single failing mag or attachment no longer prevents storing other paths for the datasource.
130-146: Mag path resolution is correct and handles remote/local cases appropriately.The method properly distinguishes between remote paths (returned as-is) and local paths (resolved with
toRealPath()to handle symlinks). ThehasLocalDataflag correctly identifies whether the resolved path is within the dataset directory.
148-159: Attachment path handling is sound with defensive absolute path check.The explicit check on Line 153 that
attachment.path.isAbsoluteaddresses the previous review concern about ensuring attachment paths are absolute before callingtoRealPath(). This defensive check prevents incorrect resolution against the current working directory if a relative path were to slip through the upstream resolution logic.Based on learnings
URL of deployed dev instance (used for testing):
Steps to test:
TODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)