Skip to content

Commit 47ead36

Browse files
committed
Merge remote-tracking branch 'IQSS/develop' into AWSv2
2 parents fb9a881 + 2267ef0 commit 47ead36

35 files changed

+983
-126
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
### Files attached to a Dataset can now be limited by count
2+
3+
Added the ability to set a limit on the number of files that can be uploaded to a Dataset. Limits can be set globally through a JVM setting or set per Collection or Dataset.
4+
5+
See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/api/native-api.html#imposing-a-limit-to-the-number-of-files-allowed-to-be-uploaded-to-a-dataset), #11275, and #11359.

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1583,6 +1583,9 @@ The fully expanded example above (without environment variables) looks like this
15831583
15841584
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?excludeFiles=false"
15851585
1586+
Note: the data object will contain the fields ``datasetId``, ``datasetPersistentId`` (since `4.18 <https://github.com/IQSS/dataverse/issues/6397>`_) and ``datasetType`` (since `6.7 <https://github.com/IQSS/dataverse/issues/11573>`_).
1587+
All of these fields are equal and constant for all versions of a dataset.
1588+
15861589
The optional ``excludeFiles`` parameter specifies whether the files should be listed in the output (defaults to ``true``). Note that a separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.
15871590

15881591
.. code-block:: bash
@@ -2474,7 +2477,7 @@ When adding a file to a dataset, you can optionally specify the following:
24742477
- Whether or not the file is restricted.
24752478
- Whether or not the file skips :doc:`tabular ingest </user/tabulardataingest/index>`. If the ``tabIngest`` parameter is not specified, it defaults to ``true``.
24762479

2477-
Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
2480+
Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide. Also, see :ref:`set-dataset-file-limit-api`, for limitations to the number of files allowed per Dataset.
24782481

24792482
In the curl example below, all of the above are specified but they are optional.
24802483

@@ -2699,6 +2702,58 @@ In some circumstances, it may be useful to move or copy files into Dataverse's s
26992702
Two API calls are available for this use case to add files to a dataset or to replace files that were already in the dataset.
27002703
These calls were developed as part of Dataverse's direct upload mechanism and are detailed in :doc:`/developers/s3-direct-upload-api`.
27012704

2705+
Imposing a limit to the number of files allowed to be uploaded to a Dataset
2706+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2707+
2708+
Having thousands of files in a Dataset can cause issues. Most users would be better off with the data repackaged in fewer large bundles. To help curtail these issues, a limit can be set to prevent the number of file uploads from getting out of hand.
2709+
2710+
The limit can be set via JVM setting :ref:`dataverse.files.default-dataset-file-count-limit` to be installation wide, or, set on each Collection/Dataset.
2711+
2712+
For Installation wide limit, the limit can be set via JVM. ./asadmin $ASADMIN_OPTS create-jvm-options "-Ddataverse.files.default-dataset-file-count-limit=<limit>"
2713+
2714+
For Collections, the attribute can be controlled by calling the Create or Update Dataverse API and adding ``datasetFileCountLimit=500`` to the Json body.
2715+
2716+
For Datasets, the attribute can be set using the `Update Dataset Files Limit <#setting-the-files-count-limit-on-a-dataset>`_ API and passing the qp `fileCountLimit=500`.
2717+
2718+
Setting a value of -1 will clear the limit for that level. If no limit is found on the Dataset, the hierarchy of parent nodes will be checked until finally the JVM setting is checked.
2719+
2720+
With this setting set a 400 error response stating that the limit has been reached, including the effective limit, will be returned.
2721+
2722+
Please note that a superuser will be exempt from this rule.
2723+
2724+
The check will use the value defined in the Dataset first, and if not set (value <1) the Dataverse/Collection will be checked, and finally the JVM setting.
2725+
2726+
.. _set-dataset-file-limit-api:
2727+
2728+
Setting the files count limit on a Dataset
2729+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2730+
In order to update the number of files allowed for a Dataset, without causing a Draft version of the Dataset being created, the following API can be used
2731+
2732+
.. note:: To clear the limit simply set the limit to -1 or call the DELETE API.
2733+
2734+
.. code-block:: bash
2735+
2736+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2737+
export SERVER_URL=https://demo.dataverse.org
2738+
export ID=24
2739+
export LIMIT=500
2740+
2741+
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/datasets/$ID/files/uploadlimit/$LIMIT"
2742+
2743+
To delete the existing limit:
2744+
2745+
curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/datasets/$ID/files/uploadlimit"
2746+
2747+
The fully expanded example above (without environment variables) looks like this:
2748+
2749+
.. code-block:: bash
2750+
2751+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/24/files/uploadlimit/500"
2752+
2753+
To delete the existing limit:
2754+
2755+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/24/files/uploadlimit"
2756+
27022757
Report the data (file) size of a Dataset
27032758
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
27042759

doc/sphinx-guides/source/installation/config.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2560,6 +2560,20 @@ Notes:
25602560
- During startup, this directory will be checked for existence and write access. It will be created for you
25612561
if missing. If it cannot be created or does not have proper write access, application deployment will fail.
25622562

2563+
.. _dataverse.files.default-dataset-file-count-limit:
2564+
2565+
dataverse.files.default-dataset-file-count-limit
2566+
++++++++++++++++++++++++++++++++++++++++++++++++
2567+
2568+
Configure a limit to the maximum number of Datafiles that can be uploaded to a Dataset.
2569+
2570+
Notes:
2571+
2572+
- This is a default that can be overwritten in any Dataverse/Collection or Dataset.
2573+
- A value less than 1 will be treated as no limit set.
2574+
- Changing this value will not delete any existing files. It is only intended for preventing new files from being uploaded.
2575+
- Superusers will not be governed by this rule.
2576+
25632577
.. _dataverse.files.uploads:
25642578

25652579
dataverse.files.uploads
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
{
2+
"datasetVersion": {
3+
"license": {
4+
"name": "CC0 1.0",
5+
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
6+
},
7+
"metadataBlocks": {
8+
"citation": {
9+
"fields": [
10+
{
11+
"value": "Darwin's Finches",
12+
"typeClass": "primitive",
13+
"multiple": false,
14+
"typeName": "title"
15+
},
16+
{
17+
"value": [
18+
{
19+
"authorName": {
20+
"value": "Finch, Fiona",
21+
"typeClass": "primitive",
22+
"multiple": false,
23+
"typeName": "authorName"
24+
},
25+
"authorAffiliation": {
26+
"value": "Birds Inc.",
27+
"typeClass": "primitive",
28+
"multiple": false,
29+
"typeName": "authorAffiliation"
30+
}
31+
}
32+
],
33+
"typeClass": "compound",
34+
"multiple": true,
35+
"typeName": "author"
36+
},
37+
{
38+
"value": [
39+
{ "datasetContactEmail" : {
40+
"typeClass": "primitive",
41+
"multiple": false,
42+
"typeName": "datasetContactEmail",
43+
"value" : "finch@mailinator.com"
44+
},
45+
"datasetContactName" : {
46+
"typeClass": "primitive",
47+
"multiple": false,
48+
"typeName": "datasetContactName",
49+
"value": "Finch, Fiona"
50+
}
51+
}],
52+
"typeClass": "compound",
53+
"multiple": true,
54+
"typeName": "datasetContact"
55+
},
56+
{
57+
"value": [ {
58+
"dsDescriptionValue":{
59+
"value": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
60+
"multiple":false,
61+
"typeClass": "primitive",
62+
"typeName": "dsDescriptionValue"
63+
}}],
64+
"typeClass": "compound",
65+
"multiple": true,
66+
"typeName": "dsDescription"
67+
},
68+
{
69+
"value": [
70+
"Medicine, Health and Life Sciences"
71+
],
72+
"typeClass": "controlledVocabulary",
73+
"multiple": true,
74+
"typeName": "subject"
75+
}
76+
],
77+
"displayName": "Citation Metadata"
78+
}
79+
}
80+
},
81+
"datasetFileCountLimit": 100
82+
}

src/main/java/edu/harvard/iq/dataverse/Dataset.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,9 @@
7272
@NamedQuery(name = "Dataset.findByReleaseUserId",
7373
query = "SELECT o FROM Dataset o WHERE o.releaseUser.id=:releaseUserId"),
7474
@NamedQuery(name = "Dataset.countAll",
75-
query = "SELECT COUNT(ds) FROM Dataset ds")
75+
query = "SELECT COUNT(ds) FROM Dataset ds"),
76+
@NamedQuery(name = "Dataset.countFilesByOwnerId",
77+
query = "SELECT COUNT(dvo) FROM DvObject dvo WHERE dvo.owner.id=:ownerId AND dvo.dtype='DataFile'")
7678
})
7779
@NamedNativeQuery(
7880
name = "Dataset.findAllOrSubsetOrderByFilesOwned",

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4082,7 +4082,16 @@ public String save() {
40824082
// have been created in the dataset.
40834083
dataset = datasetService.find(dataset.getId());
40844084

4085-
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(dataset.getOrCreateEditVersion(), newFiles, null, true);
4085+
boolean ignoreUploadFileLimits = this.session.getUser() != null ? this.session.getUser().isSuperuser() : false;
4086+
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(dataset.getOrCreateEditVersion(), newFiles, null, true, ignoreUploadFileLimits);
4087+
if (filesAdded.size() < nNewFiles) {
4088+
// Not all files were saved
4089+
Integer limit = dataset.getEffectiveDatasetFileCountLimit();
4090+
if (limit != null) {
4091+
String msg = BundleUtil.getStringFromBundle("file.add.count_exceeds_limit", List.of(limit.toString()));
4092+
JsfHelper.addInfoMessage(msg);
4093+
}
4094+
}
40864095
newFiles.clear();
40874096

40884097
// and another update command:

src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1077,4 +1077,14 @@ public long getDatasetCount() {
10771077
return em.createNamedQuery("Dataset.countAll", Long.class).getSingleResult();
10781078
}
10791079

1080+
/**
1081+
*
1082+
* @param id - owner id
1083+
* @return Total number of datafiles for this dataset/owner
1084+
*/
1085+
public int getDataFileCountByOwner(long id) {
1086+
Long c = em.createNamedQuery("Dataset.countFilesByOwnerId", Long.class).setParameter("ownerId", id).getSingleResult();
1087+
return c.intValue(); // ignoring the truncation since the number should never be too large
1088+
}
1089+
10801090
}

src/main/java/edu/harvard/iq/dataverse/DvObject.java

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -506,7 +506,6 @@ public StorageQuota getStorageQuota() {
506506
public void setStorageQuota(StorageQuota storageQuota) {
507507
this.storageQuota = storageQuota;
508508
}
509-
510509
/**
511510
*
512511
* @param other

src/main/java/edu/harvard/iq/dataverse/DvObjectContainer.java

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,9 @@
99
import edu.harvard.iq.dataverse.util.json.JsonUtil;
1010
import jakarta.json.JsonObject;
1111
import jakarta.json.JsonObjectBuilder;
12-
import jakarta.persistence.CascadeType;
12+
import jakarta.persistence.*;
13+
1314
import java.util.Optional;
14-
import jakarta.persistence.MappedSuperclass;
15-
import jakarta.persistence.OneToOne;
16-
import jakarta.persistence.Transient;
1715

1816
import org.apache.commons.lang3.StringUtils;
1917

@@ -56,6 +54,9 @@ public boolean isEffectivelyPermissionRoot() {
5654

5755
@OneToOne(mappedBy = "dvObjectContainer",cascade={ CascadeType.REMOVE, CascadeType.PERSIST}, orphanRemoval=true)
5856
private StorageUse storageUse;
57+
58+
@Column( nullable = true )
59+
private Integer datasetFileCountLimit;
5960

6061
public String getEffectiveStorageDriverId() {
6162
String id = storageDriver;
@@ -260,5 +261,23 @@ public PidProvider getEffectivePidGenerator() {
260261
}
261262
return pidGenerator;
262263
}
264+
public Integer getDatasetFileCountLimit() {
265+
return datasetFileCountLimit;
266+
}
267+
public void setDatasetFileCountLimit(Integer datasetFileCountLimit) {
268+
this.datasetFileCountLimit = datasetFileCountLimit != null && datasetFileCountLimit < 0 ? null : datasetFileCountLimit;
269+
}
263270

271+
public Integer getEffectiveDatasetFileCountLimit() {
272+
if (!isDatasetFileCountLimitSet(getDatasetFileCountLimit()) && getOwner() != null) {
273+
return getOwner().getEffectiveDatasetFileCountLimit();
274+
} else if (!isDatasetFileCountLimitSet(getDatasetFileCountLimit())) {
275+
Optional<Integer> opt = JvmSettings.DEFAULT_DATASET_FILE_COUNT_LIMIT.lookupOptional(Integer.class);
276+
return (opt.isPresent()) ? opt.get() : null;
277+
}
278+
return getDatasetFileCountLimit();
279+
}
280+
public boolean isDatasetFileCountLimitSet(Integer datasetFileCountLimit) {
281+
return datasetFileCountLimit != null && datasetFileCountLimit >= 0;
282+
}
264283
}

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,10 @@ public enum Referrer {
201201
private Long maxIngestSizeInBytes = null;
202202
// CSV: 4.8 MB, DTA: 976.6 KB, XLSX: 5.7 MB, etc.
203203
private String humanPerFormatTabularLimits = null;
204-
private Integer multipleUploadFilesLimit = null;
204+
private Integer multipleUploadFilesLimit = null;
205+
// Maximum number of files per dataset allowed ot be uploaded
206+
private Integer maxFileUploadCount = null;
207+
private Integer fileUploadsAvailable = null;
205208

206209
//MutableBoolean so it can be passed from DatasetPage, supporting DatasetPage.cancelCreate()
207210
private MutableBoolean uploadInProgress = null;
@@ -393,6 +396,10 @@ public String populateHumanPerFormatTabularLimits() {
393396
return String.join(", ", formatLimits);
394397
}
395398

399+
public Integer getFileUploadsAvailable() {
400+
return fileUploadsAvailable != null ? fileUploadsAvailable : -1;
401+
}
402+
396403
/*
397404
The number of files the GUI user is allowed to upload in one batch,
398405
via drag-and-drop, or through the file select dialog. Now configurable
@@ -543,17 +550,28 @@ public String initCreateMode(String modeToken, DatasetVersion version, MutableBo
543550
this.maxIngestSizeInBytes = systemConfig.getTabularIngestSizeLimit();
544551
this.humanPerFormatTabularLimits = populateHumanPerFormatTabularLimits();
545552
this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();
546-
553+
setFileUploadCountLimits(0);
547554
logger.fine("done");
548555

549556
saveEnabled = true;
550557

551558
return null;
552559
}
560+
private void setFileUploadCountLimits(int preLoaded) {
561+
this.maxFileUploadCount = this.maxFileUploadCount == null ? dataset.getEffectiveDatasetFileCountLimit() : this.maxFileUploadCount;
562+
Long id = dataset.getId() != null ? dataset.getId() : dataset.getOwner() != null ? dataset.getOwner().getId() : null;
563+
this.fileUploadsAvailable = this.maxFileUploadCount != null && id != null ?
564+
Math.max(0, this.maxFileUploadCount - datasetService.getDataFileCountByOwner(id) - preLoaded) :
565+
-1;
566+
}
553567

554568
public boolean isQuotaExceeded() {
555569
return systemConfig.isStorageQuotasEnforced() && uploadSessionQuota != null && uploadSessionQuota.getRemainingQuotaInBytes() == 0;
556570
}
571+
public boolean isFileUploadCountExceeded() {
572+
boolean ignoreLimit = this.session.getUser().isSuperuser();
573+
return !ignoreLimit && !isFileReplaceOperation() && fileUploadsAvailable != null && fileUploadsAvailable == 0;
574+
}
557575

558576
public String init() {
559577
// default mode should be EDIT
@@ -604,8 +622,8 @@ public String init() {
604622
}
605623
this.maxIngestSizeInBytes = systemConfig.getTabularIngestSizeLimit();
606624
this.humanPerFormatTabularLimits = populateHumanPerFormatTabularLimits();
607-
this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();
608-
625+
this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit();
626+
setFileUploadCountLimits(0);
609627
hasValidTermsOfAccess = isHasValidTermsOfAccess();
610628
if (!hasValidTermsOfAccess) {
611629
PrimeFaces.current().executeScript("PF('blockDatasetForm').show()");
@@ -1103,9 +1121,17 @@ public String save() {
11031121
}
11041122
}
11051123
}
1106-
1124+
boolean ignoreUploadFileLimits = this.session.getUser() != null ? this.session.getUser().isSuperuser() : false;
11071125
// Try to save the NEW files permanently:
1108-
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(workingVersion, newFiles, null, true);
1126+
List<DataFile> filesAdded = ingestService.saveAndAddFilesToDataset(workingVersion, newFiles, null, true, ignoreUploadFileLimits);
1127+
if (filesAdded.size() < nNewFiles) {
1128+
// Not all files were saved
1129+
Integer limit = dataset.getEffectiveDatasetFileCountLimit();
1130+
if (limit != null) {
1131+
String msg = BundleUtil.getStringFromBundle("file.add.count_exceeds_limit", List.of(limit.toString()));
1132+
JsfHelper.addInfoMessage(msg);
1133+
}
1134+
}
11091135

11101136
// reset the working list of fileMetadatas, as to only include the ones
11111137
// that have been added to the version successfully:

0 commit comments

Comments
 (0)