Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Files attached to a Dataset can now be limited by count

Added the ability to set a limit on the number of files that can be uploaded to a Dataset. Limits can be set globally through a JVM setting or set per Collection or Dataset.

See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/api/native-api.html#imposing-a-limit-to-the-number-of-files-allowed-to-be-uploaded-to-a-dataset), #11275, and #11359.
54 changes: 53 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2474,7 +2474,7 @@ When adding a file to a dataset, you can optionally specify the following:
- Whether or not the file is restricted.
- Whether or not the file skips :doc:`tabular ingest </user/tabulardataingest/index>`. If the ``tabIngest`` parameter is not specified, it defaults to ``true``.

Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide. Also, see :ref:`set-dataset-file-limit-api`, for limitations to the number of files allowed per Dataset.

In the curl example below, all of the above are specified but they are optional.

Expand Down Expand Up @@ -2699,6 +2699,58 @@ In some circumstances, it may be useful to move or copy files into Dataverse's s
Two API calls are available for this use case to add files to a dataset or to replace files that were already in the dataset.
These calls were developed as part of Dataverse's direct upload mechanism and are detailed in :doc:`/developers/s3-direct-upload-api`.

Imposing a limit to the number of files allowed to be uploaded to a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Having thousands of files in a Dataset can cause issues. Most users would be better off with the data repackaged in fewer large bundles. To help curtail these issues, a limit can be set to prevent the number of file uploads from getting out of hand.

The limit can be set via JVM setting :ref:`dataverse.files.default-dataset-file-count-limit` to be installation wide, or, set on each Collection/Dataset.

For Installation wide limit, the limit can be set via JVM. ./asadmin $ASADMIN_OPTS create-jvm-options "-Ddataverse.files.default-dataset-file-count-limit=<limit>"

For Collections, the attribute can be controlled by calling the Create or Update Dataverse API and adding ``datasetFileCountLimit=500`` to the Json body.

For Datasets, the attribute can be set using the `Update Dataset Files Limit <#setting-the-files-count-limit-on-a-dataset>`_ API and passing the qp `fileCountLimit=500`.

Setting a value of -1 will clear the limit for that level. If no limit is found on the Dataset, the hierarchy of parent nodes will be checked until finally the JVM setting is checked.

With this setting set a 400 error response stating that the limit has been reached, including the effective limit, will be returned.

Please note that a superuser will be exempt from this rule.

The check will use the value defined in the Dataset first, and if not set (value <1) the Dataverse/Collection will be checked, and finally the JVM setting.

.. _set-dataset-file-limit-api:

Setting the files count limit on a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to update the number of files allowed for a Dataset, without causing a Draft version of the Dataset being created, the following API can be used

.. note:: To clear the limit simply set the limit to -1 or call the DELETE API.

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
export LIMIT=500

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/datasets/$ID/files/uploadlimit/$LIMIT"

To delete the existing limit:

curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/datasets/$ID/files/uploadlimit"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/24/files/uploadlimit/500"

To delete the existing limit:

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/24/files/uploadlimit"

Report the data (file) size of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2561,6 +2561,20 @@ Notes:
- During startup, this directory will be checked for existence and write access. It will be created for you
if missing. If it cannot be created or does not have proper write access, application deployment will fail.

.. _dataverse.files.default-dataset-file-count-limit:

dataverse.files.default-dataset-file-count-limit
++++++++++++++++++++++++++++++++++++++++++++++++

Configure a limit to the maximum number of Datafiles that can be uploaded to a Dataset.

Notes:

- This is a default that can be overwritten in any Dataverse/Collection or Dataset.
- A value less than 1 will be treated as no limit set.
- Changing this value will not delete any existing files. It is only intended for preventing new files from being uploaded.
- Superusers will not be governed by this rule.

.. _dataverse.files.uploads:

dataverse.files.uploads
Expand Down
82 changes: 82 additions & 0 deletions scripts/search/tests/data/dataset-finch1-fileLimit.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"datasetVersion": {
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
},
"metadataBlocks": {
"citation": {
"fields": [
{
"value": "Darwin's Finches",
"typeClass": "primitive",
"multiple": false,
"typeName": "title"
},
{
"value": [
{
"authorName": {
"value": "Finch, Fiona",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorName"
},
"authorAffiliation": {
"value": "Birds Inc.",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorAffiliation"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "author"
},
{
"value": [
{ "datasetContactEmail" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactEmail",
"value" : "[email protected]"
},
"datasetContactName" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactName",
"value": "Finch, Fiona"
}
}],
"typeClass": "compound",
"multiple": true,
"typeName": "datasetContact"
},
{
"value": [ {
"dsDescriptionValue":{
"value": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"multiple":false,
"typeClass": "primitive",
"typeName": "dsDescriptionValue"
}}],
"typeClass": "compound",
"multiple": true,
"typeName": "dsDescription"
},
{
"value": [
"Medicine, Health and Life Sciences"
],
"typeClass": "controlledVocabulary",
"multiple": true,
"typeName": "subject"
}
],
"displayName": "Citation Metadata"
}
}
},
"datasetFileCountLimit": 100
}
4 changes: 3 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/Dataset.java
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,9 @@
@NamedQuery(name = "Dataset.findByReleaseUserId",
query = "SELECT o FROM Dataset o WHERE o.releaseUser.id=:releaseUserId"),
@NamedQuery(name = "Dataset.countAll",
query = "SELECT COUNT(ds) FROM Dataset ds")
query = "SELECT COUNT(ds) FROM Dataset ds"),
@NamedQuery(name = "Dataset.countFilesByOwnerId",
query = "SELECT COUNT(dvo) FROM DvObject dvo WHERE dvo.owner.id=:ownerId AND dvo.dtype='DataFile'")
})
@NamedNativeQuery(
name = "Dataset.findAllOrSubsetOrderByFilesOwned",
Expand Down
11 changes: 10 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -4082,7 +4082,16 @@ public String save() {
// have been created in the dataset.
dataset = datasetService.find(dataset.getId());

List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(dataset.getOrCreateEditVersion(), newFiles, null, true);
boolean ignoreUploadFileLimits = this.session.getUser() != null ? this.session.getUser().isSuperuser() : false;
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(dataset.getOrCreateEditVersion(), newFiles, null, true, ignoreUploadFileLimits);
if (filesAdded.size() < nNewFiles) {
// Not all files were saved
Integer limit = dataset.getEffectiveDatasetFileCountLimit();
if (limit != null) {
String msg = BundleUtil.getStringFromBundle("file.add.count_exceeds_limit", List.of(limit.toString()));
JsfHelper.addInfoMessage(msg);
}
}
newFiles.clear();

// and another update command:
Expand Down
10 changes: 10 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -1077,4 +1077,14 @@ public long getDatasetCount() {
return em.createNamedQuery("Dataset.countAll", Long.class).getSingleResult();
}

/**
*
* @param id - owner id
* @return Total number of datafiles for this dataset/owner
*/
public int getDataFileCountByOwner(long id) {
Long c = em.createNamedQuery("Dataset.countFilesByOwnerId", Long.class).setParameter("ownerId", id).getSingleResult();
return c.intValue(); // ignoring the truncation since the number should never be too large
}

}
1 change: 0 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DvObject.java
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,6 @@ public StorageQuota getStorageQuota() {
public void setStorageQuota(StorageQuota storageQuota) {
this.storageQuota = storageQuota;
}

/**
*
* @param other
Expand Down
27 changes: 23 additions & 4 deletions src/main/java/edu/harvard/iq/dataverse/DvObjectContainer.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,9 @@
import edu.harvard.iq.dataverse.util.json.JsonUtil;
import jakarta.json.JsonObject;
import jakarta.json.JsonObjectBuilder;
import jakarta.persistence.CascadeType;
import jakarta.persistence.*;

import java.util.Optional;
import jakarta.persistence.MappedSuperclass;
import jakarta.persistence.OneToOne;
import jakarta.persistence.Transient;

import org.apache.commons.lang3.StringUtils;

Expand Down Expand Up @@ -56,6 +54,9 @@ public boolean isEffectivelyPermissionRoot() {

@OneToOne(mappedBy = "dvObjectContainer",cascade={ CascadeType.REMOVE, CascadeType.PERSIST}, orphanRemoval=true)
private StorageUse storageUse;

@Column( nullable = true )
private Integer datasetFileCountLimit;

public String getEffectiveStorageDriverId() {
String id = storageDriver;
Expand Down Expand Up @@ -260,5 +261,23 @@ public PidProvider getEffectivePidGenerator() {
}
return pidGenerator;
}
public Integer getDatasetFileCountLimit() {
return datasetFileCountLimit;
}
public void setDatasetFileCountLimit(Integer datasetFileCountLimit) {
this.datasetFileCountLimit = datasetFileCountLimit != null && datasetFileCountLimit < 0 ? null : datasetFileCountLimit;
}

public Integer getEffectiveDatasetFileCountLimit() {
if (!isDatasetFileCountLimitSet(getDatasetFileCountLimit()) && getOwner() != null) {
return getOwner().getEffectiveDatasetFileCountLimit();
} else if (!isDatasetFileCountLimitSet(getDatasetFileCountLimit())) {
Optional<Integer> opt = JvmSettings.DEFAULT_DATASET_FILE_COUNT_LIMIT.lookupOptional(Integer.class);
return (opt.isPresent()) ? opt.get() : null;
}
return getDatasetFileCountLimit();
}
public boolean isDatasetFileCountLimitSet(Integer datasetFileCountLimit) {
return datasetFileCountLimit != null && datasetFileCountLimit >= 0;
}
}
38 changes: 32 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,10 @@ public enum Referrer {
private Long maxIngestSizeInBytes = null;
// CSV: 4.8 MB, DTA: 976.6 KB, XLSX: 5.7 MB, etc.
private String humanPerFormatTabularLimits = null;
private Integer multipleUploadFilesLimit = null;
private Integer multipleUploadFilesLimit = null;
// Maximum number of files per dataset allowed ot be uploaded
private Integer maxFileUploadCount = null;
private Integer fileUploadsAvailable = null;

//MutableBoolean so it can be passed from DatasetPage, supporting DatasetPage.cancelCreate()
private MutableBoolean uploadInProgress = null;
Expand Down Expand Up @@ -393,6 +396,10 @@ public String populateHumanPerFormatTabularLimits() {
return String.join(", ", formatLimits);
}

public Integer getFileUploadsAvailable() {
return fileUploadsAvailable != null ? fileUploadsAvailable : -1;
}

/*
The number of files the GUI user is allowed to upload in one batch,
via drag-and-drop, or through the file select dialog. Now configurable
Expand Down Expand Up @@ -543,17 +550,28 @@ public String initCreateMode(String modeToken, DatasetVersion version, MutableBo
this.maxIngestSizeInBytes = systemConfig.getTabularIngestSizeLimit();
this.humanPerFormatTabularLimits = populateHumanPerFormatTabularLimits();
this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();

setFileUploadCountLimits(0);
logger.fine("done");

saveEnabled = true;

return null;
}
private void setFileUploadCountLimits(int preLoaded) {
this.maxFileUploadCount = this.maxFileUploadCount == null ? dataset.getEffectiveDatasetFileCountLimit() : this.maxFileUploadCount;
Long id = dataset.getId() != null ? dataset.getId() : dataset.getOwner() != null ? dataset.getOwner().getId() : null;
this.fileUploadsAvailable = this.maxFileUploadCount != null && id != null ?
Math.max(0, this.maxFileUploadCount - datasetService.getDataFileCountByOwner(id) - preLoaded) :
-1;
}

public boolean isQuotaExceeded() {
return systemConfig.isStorageQuotasEnforced() && uploadSessionQuota != null && uploadSessionQuota.getRemainingQuotaInBytes() == 0;
}
public boolean isFileUploadCountExceeded() {
boolean ignoreLimit = this.session.getUser().isSuperuser();
return !ignoreLimit && !isFileReplaceOperation() && fileUploadsAvailable != null && fileUploadsAvailable == 0;
}

public String init() {
// default mode should be EDIT
Expand Down Expand Up @@ -604,8 +622,8 @@ public String init() {
}
this.maxIngestSizeInBytes = systemConfig.getTabularIngestSizeLimit();
this.humanPerFormatTabularLimits = populateHumanPerFormatTabularLimits();
this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();

this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();
setFileUploadCountLimits(0);
hasValidTermsOfAccess = isHasValidTermsOfAccess();
if (!hasValidTermsOfAccess) {
PrimeFaces.current().executeScript("PF('blockDatasetForm').show()");
Expand Down Expand Up @@ -1103,9 +1121,17 @@ public String save() {
}
}
}

boolean ignoreUploadFileLimits = this.session.getUser() != null ? this.session.getUser().isSuperuser() : false;
// Try to save the NEW files permanently:
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(workingVersion, newFiles, null, true);
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(workingVersion, newFiles, null, true, ignoreUploadFileLimits);
if (filesAdded.size() < nNewFiles) {
// Not all files were saved
Integer limit = dataset.getEffectiveDatasetFileCountLimit();
if (limit != null) {
String msg = BundleUtil.getStringFromBundle("file.add.count_exceeds_limit", List.of(limit.toString()));
JsfHelper.addInfoMessage(msg);
}
}

// reset the working list of fileMetadatas, as to only include the ones
// that have been added to the version successfully:
Expand Down
Loading