Skip to content

Conversation

@stevenwinship
Copy link
Contributor

What this PR does / why we need it: Need a way to prevent thousands of file from being uploaded due to the issues that causes.

Which issue(s) this PR closes: #11275

Special notes for your reviewer:

Suggestions on how to test this: See IT tests

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: Included

Additional documentation: native-api and config

@stevenwinship stevenwinship self-assigned this May 23, 2025
@github-actions github-actions bot added D: Dataset: large number of files https://github.com/IQSS/dataverse-pm/issues/27 FY25 Sprint 23 FY25 Sprint 23 (2025-05-07 - 2025-05-21) FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) GREI 5 Use Cases Size: 80 A percentage of a sprint. 56 hours. Type: Feature a feature request labels May 23, 2025
@stevenwinship stevenwinship moved this to In Progress 💻 in IQSS Dataverse Project May 23, 2025
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@stevenwinship stevenwinship moved this from In Progress 💻 to Ready for Review ⏩ in IQSS Dataverse Project May 27, 2025
@stevenwinship stevenwinship removed their assignment May 27, 2025
@landreev landreev self-assigned this May 30, 2025
@landreev landreev moved this from Ready for Review ⏩ to In Progress 💻 in IQSS Dataverse Project May 30, 2025
@landreev landreev moved this from In Progress 💻 to In Review 🔎 in IQSS Dataverse Project May 30, 2025
@landreev
Copy link
Contributor

landreev commented Jun 3, 2025

Just a quick status update:
I did get to look into, build and play with the PR today. Will be adding comments/asking questions tomorrow.

@github-actions

This comment has been minimized.

@coveralls
Copy link

coveralls commented Jun 3, 2025

Coverage Status

coverage: 23.141%. remained the same
when pulling 9cfa6be on 11275-add-limit-to-number-of-dataset-files
into 2fb4a0d on develop.

@landreev
Copy link
Contributor

landreev commented Jun 3, 2025

Very happy to see this implemented.
I may have found a small bug: if the limit is set to 20, and I have a dataset with 19 files, it appears that I can upload 2 more files. I cannot add any files to a dataset that's already at 20. (but please re-test/confirm)

@Column(insertable = false, updatable = false) private String dtype;

@Column( nullable = true )
private Integer datasetFileCountLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is super important to have this config setting implemented for both collections AND datasets. In real life, I'm guessing it's going to be a somewhat common case where a specific dataset will need to be given a higher limit, because of some respectable reason. It appears to be working consistently when defined on either level.

However - and this may be penny-pinching, admittedly - I'm wondering if we want this column to be in the DvObject table; seeing how most DvObjects are files. Please at least consider making it a DvObjectContainer-only element. (see dvObjectContainer.storageDriver for an example; it ends up being an extra column in the Dataverse and Dataset tables each).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the penny-pinching comment. Columns in the database that are null take up no space and therefore add no pennies to all the DvObject Datafiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have to get loaded into the objects though. Further - it's just odd that a file has a datasetFileCountLimit, which wouldn't be the case if it's on DvObjectContainer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll move it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, phrasing it in terms of objects makes more sense. I just wanted to emphasize that it was possible, even though there is no dedicated table in the db for DvObjectContainer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it

@cmbz cmbz added the FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) label Jun 19, 2025
@stevenwinship
Copy link
Contributor Author

Taking a look at this. Thanks for finding this

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11275-add-limit-to-number-of-dataset-files
ghcr.io/gdcc/configbaker:11275-add-limit-to-number-of-dataset-files

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@stevenwinship stevenwinship removed their assignment Jun 20, 2025
@ofahimIQSS
Copy link
Contributor

great fix, merging!

@ofahimIQSS ofahimIQSS merged commit 0f3bb64 into develop Jun 24, 2025
25 checks passed
@ofahimIQSS ofahimIQSS deleted the 11275-add-limit-to-number-of-dataset-files branch June 24, 2025 14:43
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Jun 24, 2025
@ofahimIQSS ofahimIQSS removed their assignment Jun 24, 2025
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Jun 24, 2025
@qqmyers
Copy link
Member

qqmyers commented Jul 1, 2025

Does this make sense:

if (authenticatedUser.isSuperuser() || permissionService.hasPermissionsFor(authenticatedUser, dataset,
EnumSet.of(Permission.EditDataset))) {
- if you can edit the dataset, you can change the limit yourself?

@qqmyers qqmyers moved this from Done 🧹 to Ready for Triage in IQSS Dataverse Project Jul 2, 2025
@qqmyers qqmyers moved this from Ready for Triage to In Progress 💻 in IQSS Dataverse Project Jul 2, 2025
@cmbz cmbz added Size: 3 A percentage of a sprint. 2.1 hours. FY26 Sprint 1 FY26 Sprint 1 (2025-07-02 - 2025-07-16) and removed Size: 80 A percentage of a sprint. 56 hours. labels Jul 2, 2025
@scolapasta scolapasta moved this from In Progress 💻 to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jul 3, 2025
@scolapasta scolapasta moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Jul 3, 2025
@stevenwinship stevenwinship moved this from In Progress 💻 to Merged 🚀 in IQSS Dataverse Project Jul 7, 2025
@stevenwinship stevenwinship removed their assignment Jul 7, 2025
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Jul 8, 2025
@cmbz cmbz added the FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) label Aug 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

D: Dataset: large number of files https://github.com/IQSS/dataverse-pm/issues/27 FY25 Sprint 23 FY25 Sprint 23 (2025-05-07 - 2025-05-21) FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) FY25 Sprint 25 FY25 Sprint 25 (2025-06-04 - 2025-06-18) FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) FY26 Sprint 1 FY26 Sprint 1 (2025-07-02 - 2025-07-16) FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) GREI 5 Use Cases Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset

7 participants