Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Oct 28, 2025

URL of deployed dev instance (used for testing):

Steps to test:

  • hit yarn enable-storage-scan, wait for next tick (every 10 minutes, 1 minute after startup, can be changed in application.conf)
  • after scan, see used storage in dashboard

TODOs:

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases

@fm3 fm3 self-assigned this Oct 28, 2025
@fm3 fm3 added the backend label Oct 28, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

📝 Walkthrough

Walkthrough

Adds per-dataset storage accounting: backend computes and conditionally returns usedStorageBytes (visible only when requester’s organization matches the dataset org), controllers and service pass requester org to DAO, frontend shows a conditional sortable "Used Storage" column for authorized users, and tooling plus an npm script to enable/reset storage scans.

Changes

Cohort / File(s) Summary
Controllers
app/controllers/DatasetController.scala, app/controllers/WKRemoteDataStoreController.scala
Pass requesting user's organization into compact dataset query; minor stylistic change using case-class-style construction for UnfinishedUpload.
Data model / DAO / Service
app/models/dataset/Dataset.scala, app/models/dataset/DatasetService.scala
Add DatasetCompactInfo.usedStorageBytes: Option[Long]; extend findAllCompactWithSearch(..., requestingUserOrga: Option[String]); add LEFT JOINs/aggregation to compute per-dataset storage and only expose it when requester org matches; service forwards org filter for unfinished uploads.
Frontend — dataset table UI
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx
Add renderStorageColumn() in DatasetRenderer/FolderRenderer; add getIsUserAdminOrDatasetManager() flag; conditionally insert a sortable "Used Storage" column (index 2) displaying usedStorageBytes for authorized users.
Frontend — types & context
frontend/javascripts/types/api_types.ts, frontend/javascripts/dashboard/dataset/dataset_collection_context.tsx
Add optional `usedStorageBytes?: number
CLI / tooling & scripts
tools/postgres/dbtool.js, package.json
Add enable-storage-scan CLI subcommand that runs SQL to enable storage reporting and reset last scan; expose an npm script enable-storage-scan to invoke it. (CLI insertion appears duplicated in patch.)
Changelog
unreleased_changes/9025.md
Add note announcing dashboard dataset used-storage display.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

  • Focus review on:
    • app/models/dataset/Dataset.scala: correctness of new JOINs/aggregation, NULL handling, deduplication and the conditional exposure of usedStorageBytes.
    • Controller → API serialization to ensure storage bytes are not leaked when requester org differs.
    • Frontend gating and insertion point of the new column, and sort/fallback behavior when values are missing.
    • tools/postgres/dbtool.js: duplicated command definitions and SQL safety/idempotence.

Possibly related PRs

Suggested reviewers

  • fm3
  • MichaelBuessemeyer
  • philippotto

Poem

🐇 I hop through rows and SQL streams,
I count the bytes where dataset gleams,
Admins may peek the Size I show,
Scans awake and metrics grow,
A rabbit tallies while the queries hum.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Show used storage in dashboard dataset list" directly and accurately summarizes the main objective of the changeset. The modifications across backend and frontend files are primarily focused on adding storage usage information to the dataset table displayed in the dashboard, including new database queries, API fields, UI rendering logic, and a CLI command to enable storage scanning. The title is concise, specific, and clearly communicates the primary change without vague terminology or misleading claims.
Description Check ✅ Passed The pull request description is clearly related to the changeset. It describes the feature being implemented (showing used storage in the dashboard dataset list), provides specific test steps (enabling the storage scan via yarn command and verifying the display in the dashboard), references the contributing issue (#9005), and documents completed checklist items including the changelog entry. The description directly corresponds to the changes across backend, frontend, database tooling, and API types files summarized in the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dataset-list-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@knollengewaechs knollengewaechs self-assigned this Oct 28, 2025
@fm3 fm3 changed the title WIP add usedStorageBytes to dataset compact writes WIP show used storage in dataset list Oct 29, 2025
@fm3 fm3 changed the title WIP show used storage in dataset list Show used storage in dashboard dataset list Oct 30, 2025
@fm3 fm3 marked this pull request as ready for review October 30, 2025 10:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tools/postgres/dbtool.js (1)

423-439: Make organization reset configurable.
Hard-coding _id = 'sample_organization' means this command no-ops on instances whose primary organization uses a different id (typical outside the seeded demo data), so administrators still need to wait for the next scheduler tick to kick off the scan. Please either target all organizations or accept the organization id (or name) as a parameter so the command behaves consistently across environments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e6cca4 and c80f5f3.

📒 Files selected for processing (9)
  • app/controllers/DatasetController.scala (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/Dataset.scala (6 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (7 hunks)
  • frontend/javascripts/types/api_types.ts (2 hunks)
  • package.json (1 hunks)
  • tools/postgres/dbtool.js (1 hunks)
  • unreleased_changes/9025.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/controllers/DatasetController.scala
  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
app/models/dataset/Dataset.scala (1)
  • dataSourceId (94-94)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (2)
frontend/javascripts/components/fast_tooltip.tsx (1)
  • FastTooltip (54-123)
frontend/javascripts/libs/format_utils.ts (1)
  • formatCountToDataAmountUnit (333-346)
app/models/dataset/DatasetService.scala (1)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
  • now (48-48)
app/models/dataset/Dataset.scala (1)
app/utils/sql/SqlEscaping.scala (1)
  • parseArrayLiteral (28-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (5)
app/controllers/WKRemoteDataStoreController.scala (1)

113-119: Good stylistic improvement!

Using the apply method without new for case class instantiation is idiomatic Scala and improves readability.

package.json (1)

107-108: New script fits nicely.
Registering the CLI entry point here keeps local workflows aligned with the dbtool addition.

app/models/dataset/Dataset.scala (1)

92-348: Storage exposure logic looks solid.
The new usedStorageBytes field plus the conditional Option wrapping ensure per-dataset storage is available while still respecting organization boundaries and hiding non-positive values.

unreleased_changes/9025.md (1)

1-2: Changelog entry captured.
Thanks for documenting the dashboard-facing change.

frontend/javascripts/types/api_types.ts (1)

269-300: API surface stays in sync.
Pulling usedStorageBytes into the compact types and converter keeps the frontend aligned with the backend payload.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (1)

425-425: This issue was already flagged in a previous review.

The cached isUserAdminOrDatasetManager field becomes stale when props update, affecting both dataset filtering (line 478) and the visibility of the storage column (line 625). See the previous review comment for the recommended fix using a getter.

Also applies to: 478-478

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c80f5f3 and 09c4b01.

📒 Files selected for processing (2)
  • frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (9 hunks)
  • frontend/javascripts/dashboard/dataset/dataset_collection_context.tsx (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
frontend/javascripts/dashboard/dataset/dataset_collection_context.tsx (1)
frontend/javascripts/libs/react_hooks.ts (1)
  • useWkSelector (292-294)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (2)
frontend/javascripts/components/fast_tooltip.tsx (1)
  • FastTooltip (54-123)
frontend/javascripts/types/globals.d.ts (1)
  • EmptyObject (31-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (5)
frontend/javascripts/dashboard/dataset/dataset_collection_context.tsx (1)

3-3: LGTM!

The storage usage data is correctly retrieved from Redux state and exposed through context. The optional chaining handles the case where activeOrganization might be undefined, and the value is properly included in the useMemo dependency array.

Also applies to: 58-58, 88-88, 259-259, 285-285

frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (4)

11-11: LGTM!

The imports for FastTooltip and formatCountToDataAmountUnit are correctly added to support the storage column rendering.

Also applies to: 26-26


399-401: LGTM!

Returning null for folder rows is appropriate since storage metrics only apply to datasets.


625-641: Storage column implementation is sound, but depends on stale cached value.

The conditional column logic is well-structured with appropriate sorting and rendering. However, it relies on this.isUserAdminOrDatasetManager which is stale (see earlier comment at line 425). Once that issue is fixed, this column will display correctly for all authorized users.


655-655: Good fix!

Correctly changed from an empty object {} to an empty function () => {} to match the expected function type for the editFolder prop.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (1)

425-425: Compute admin/dataset-manager flag from props each time.

isUserAdminOrDatasetManager is captured once during construction, so when isUserAdmin/isUserDatasetManager later update (e.g., after the current user is fetched), filtering and the Size column keep using the stale value and newly privileged users still can't see storage until the component remounts. Derive the flag from this.props on every access instead of caching it.

Apply this diff:

-  isUserAdminOrDatasetManager: boolean = this.props.isUserAdmin || this.props.isUserDatasetManager;
+  get isUserAdminOrDatasetManager(): boolean {
+    return this.props.isUserAdmin || this.props.isUserDatasetManager;
+  }
🧹 Nitpick comments (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (1)

625-641: Storage column implementation looks good.

The conditional column insertion and rendering logic is well-structured. The column is properly positioned and includes appropriate sorting.

Consider treating null storage values distinctly in sorting rather than defaulting to 0. Currently, datasets without storage data sort as if they have 0 bytes, which could mix them with datasets that genuinely have 0 bytes measured. You could sort null values to the end:

 sorter: Utils.compareBy<RowRenderer>((rowRenderer) =>
-  isRecordADataset(rowRenderer.data) && rowRenderer.data.usedStorageBytes
-    ? rowRenderer.data.usedStorageBytes
-    : 0,
+  isRecordADataset(rowRenderer.data) && rowRenderer.data.usedStorageBytes != null
+    ? rowRenderer.data.usedStorageBytes
+    : Number.MAX_SAFE_INTEGER,
 ),

This would push unmeasured datasets to the bottom when sorting ascending.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09c4b01 and 642a404.

📒 Files selected for processing (1)
  • frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (2)
frontend/javascripts/components/fast_tooltip.tsx (1)
  • FastTooltip (54-123)
frontend/javascripts/libs/format_utils.ts (1)
  • formatCountToDataAmountUnit (333-346)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (1)

291-297: LGTM! Syntax error has been resolved.

The storage column rendering logic is correct. The conditional check for usedStorageBytes is appropriate, and the tooltip provides helpful context about measurement limitations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (1)

641-645: Consider explicit null check for clarity.

The sorter condition relies on the falsy nature of 0 to work correctly. While functionally correct, using an explicit != null check would make the intent clearer.

Apply this diff:

 sorter: Utils.compareBy<RowRenderer>((rowRenderer) =>
-  isRecordADataset(rowRenderer.data) && rowRenderer.data.usedStorageBytes
+  isRecordADataset(rowRenderer.data) && rowRenderer.data.usedStorageBytes != null
     ? rowRenderer.data.usedStorageBytes
     : 0,
 ),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66e3fa5 and cb8d681.

📒 Files selected for processing (1)
  • frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-27T11:34:16.411Z
Learnt from: knollengewaechs
PR: scalableminds/webknossos#8796
File: frontend/javascripts/viewer/view/action-bar/ai_job_modals/forms/neuron_segmentation_form.tsx:68-77
Timestamp: 2025-08-27T11:34:16.411Z
Learning: The webknossos codebase uses `getDefaultLayerViewConfiguration` to ensure that layer configurations always exist in `datasetConfiguration.layers`, making direct access like `datasetConfiguration.layers[colorLayer.name]` safe across multiple files including api_latest.ts and load_histogram_data_saga.ts.

Applied to files:

  • frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx
🧬 Code graph analysis (1)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (2)
frontend/javascripts/components/fast_tooltip.tsx (1)
  • FastTooltip (54-123)
frontend/javascripts/libs/format_utils.ts (1)
  • formatCountToDataAmountUnit (333-346)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (5)
frontend/javascripts/dashboard/advanced_dataset/dataset_table.tsx (5)

11-11: LGTM: Imports are correct.

The new imports for FastTooltip and formatCountToDataAmountUnit are properly used in the storage column rendering logic.

Also applies to: 26-26


291-297: LGTM: Storage column rendering is well-implemented.

The method properly handles null values, provides helpful context via the tooltip, and formats the storage value appropriately for display.


399-401: LGTM: Folder stub method is appropriate.

Returning null for folders is correct since storage information only applies to datasets.


425-427: LGTM: Getter method correctly addresses previous concern.

This implementation computes the admin/dataset-manager flag from current props on each access, ensuring it stays in sync when props update. This properly addresses the stale-value issue raised in previous reviews.


629-649: Conditional column implementation is solid.

The storage column is appropriately gated behind admin/manager permissions and only shown when the organization has used storage. The column placement and configuration are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants