You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- A new section - Scaling Dataverse with Data Size - has been added to the Admin Guide. It is intended to help administrators configure Dataverse appropriately to handle larger amounts of data.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/developers/big-data-support.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -196,6 +196,6 @@ As described in that document, Globus transfers can be initiated by choosing the
196
196
197
197
An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video <https://youtu.be/3ek7F_Dxcjk?t=5289>`_ around the 1 hr 28 min mark.
198
198
199
-
See also :ref:`Globus settings <:GlobusSettings>`.
199
+
See also :ref:`Globus settings <:GlobusSettings>` and :ref:`globus-stores`.
200
200
201
201
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/installation/config.rst
+57-18Lines changed: 57 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1036,15 +1036,18 @@ File Storage
1036
1036
1037
1037
By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/payara6/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.\<id\>.directory`` JVM option described below.
1038
1038
1039
-
A Dataverse installation can alternately store files in a Swift or S3-compatible object store, or on a Globus endpoint, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection basis.
1040
-
1039
+
A Dataverse installation can alternately store files in a Swift or S3-compatible object store, or on a Globus endpoint, and can now be configured to support multiple stores at once.
1041
1040
A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web or Globus accessible trusted remote store.
1041
+
With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection or per-dataset basis.
1042
+
:doc:`/admin/big-data-administration` provides more detail about the pros and cons of different types of storage.
1042
1043
1043
1044
A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\<id\>.upload-out-of-band`` JVM option to ``true``.
1044
1045
By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
1045
1046
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset <direct-add-to-dataset-api>` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.
1046
1047
1047
-
The following sections describe how to set up various types of stores and how to configure for multiple stores.
1048
+
The following sections describe how to set up various types of stores and how to configure for multiple stores. See also :ref:`choose-store`.
1049
+
1050
+
.. _multiple-stores:
1048
1051
1049
1052
Multi-store Basics
1050
1053
++++++++++++++++++
@@ -1105,6 +1108,8 @@ File stores have one option - the directory where files should be stored. This c
1105
1108
1106
1109
Multiple file stores should specify different directories (which would nominally be the reason to use multiple file stores), but one may share the same directory as "\-Ddataverse.files.directory" option - this would result in temp files being stored in the /temp subdirectory within the file store's root directory.
1107
1110
1111
+
See also :ref:`file-stores`.
1112
+
1108
1113
Swift Storage
1109
1114
+++++++++++++
1110
1115
@@ -1200,6 +1205,8 @@ The Dataverse Software S3 driver supports multi-part upload for large files (ove
1200
1205
1201
1206
**Note:** The Dataverse Project Team is most familiar with AWS S3, and can provide support on its usage with the Dataverse Software. Thanks to community contributions, the application's architecture also allows non-AWS S3 providers. The Dataverse Project Team can provide very limited support on these other providers. We recommend reaching out to the wider Dataverse Project Community if you have questions.
1202
1207
1208
+
See also :ref:`s3-stores`.
1209
+
1203
1210
First: Set Up Accounts and Access Credentials
1204
1211
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1205
1212
@@ -1430,6 +1437,8 @@ You may provide the values for these via any `supported MicroProfile Config API
1430
1437
2. A non-empty ``dataverse.files.<id>.profile`` will be ignored when no credentials can be found for this profile name.
1431
1438
Current codebase does not make use of "named profiles" as seen for AWS CLI besides credentials.
1432
1439
1440
+
.. _s3-compatible:
1441
+
1433
1442
Reported Working S3-Compatible Storage
1434
1443
######################################
1435
1444
@@ -1516,29 +1525,33 @@ In addition to having the type "remote" and requiring a label, Trusted Remote St
1516
1525
These and other available options are described in the table below.
1517
1526
1518
1527
Trusted remote stores can range from being a static trusted website to a sophisticated service managing access requests and logging activity
1519
-
and/or managing access to a secure enclave. See :doc:`/developers/big-data-support` for additional information on how to use a trusted remote store. For specific remote stores, consult their documentation when configuring the remote store in your Dataverse installation.
1528
+
and/or managing access to a secure enclave. See :doc:`/admin/big-data-administration` (specifically :ref:`remote-stores`) and :doc:`/developers/big-data-support` for additional information on how to use a trusted remote store. For specific remote stores, consult their documentation when configuring the remote store in your Dataverse installation.
1520
1529
1521
-
Note that in the current implementation, activites where Dataverse needs access to data bytes, e.g. to create thumbnails or validate hash values at publication will fail if a remote store does not allow Dataverse access. Implementers of such trusted remote stores should consider using Dataverse's settings to disable ingest, validation of files at publication, etc. as needed.
1530
+
Note that in the current implementation, activities where Dataverse needs access to data bytes, e.g. to create thumbnails or validate hash values at publication will fail if a remote store does not allow Dataverse access. Implementers of such trusted remote stores should consider using Dataverse's settings to disable ingest, validation of files at publication, etc. as needed.
1522
1531
1523
1532
Once you have configured a trusted remote store, you can point your users to the :ref:`add-remote-file-api` section of the API Guide.
dataverse.files.<id>.type ``remote`` **Required** to mark this storage as remote. (none)
1541
+
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage. (none)
1542
+
dataverse.files.<id>.base-url <?> **Required** All files must have URLs of the form <baseUrl>/* . (none)
1543
+
dataverse.files.<id>.base-store <?> **Required** The id of a base store (of type file, s3, or swift). (the default store)
1544
+
dataverse.files.<id>.upload-out-of-band ``true`` **Required to be true** Dataverse does not manage file placement ``false``
1545
+
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download (should usually be true). ``false``
1546
+
dataverse.files.<id>.secret-key <?> A key used to sign download requests sent to the remote store. Optional. (none)
1547
+
dataverse.files.<id>.public ``true``/``false`` True if the remote store does not enforce Dataverse access controls ``false``
1548
+
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of files that should be ingested (none)
1549
+
dataverse.files.<id>.url-expiration-minutes <?> If direct downloads and using signing: time until links expire. Optional. 60
1550
+
dataverse.files.<id>.remote-store-name <?> A short name used in the UI to indicate where a file is located. Optional. (none)
1551
+
dataverse.files.<id>.remote-store-url <?> A URL to an info page about the remote store used in the UI. Optional. (none)
1552
+
dataverse.files.<id>.files-not-accessible-by-dataverse ``true``/``false`` True if the file is at the URL provided, false if that is a landing page ``false``
@@ -1578,6 +1591,7 @@ Once you have configured a globus store, or configured an S3 store for Globus ac
1578
1591
for a managed store) - using a microprofile alias is recommended (none)
1579
1592
dataverse.files.<id>.reference-endpoints-with-basepaths <?> A comma separated list of *remote* trusted Globus endpoint id/<basePath>s (none)
1580
1593
dataverse.files.<id>.files-not-accessible-by-dataverse ``true``/``false`` Should be false for S3 Connector-based *managed* stores, true for others ``false``
1594
+
dataverse.files.<id>.public ``true``/``false`` True can be used to disable users ability restrict/embargo files ``false``
@@ -2804,6 +2818,8 @@ when using it to configure your core name!
2804
2818
2805
2819
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_PATH``.
2806
2820
2821
+
.. _dataverse.solr.min-files-to-use-proxy:
2822
+
2807
2823
dataverse.solr.min-files-to-use-proxy
2808
2824
+++++++++++++++++++++++++++++++++++++
2809
2825
@@ -2815,6 +2831,8 @@ A recommended value would be ~1000 but the optimal value may vary depending on d
2815
2831
2816
2832
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_MIN_FILES_TO_USE_PROXY``.
2817
2833
2834
+
.. _dataverse.solr.concurrency.max-async-indexes:
2835
+
2818
2836
dataverse.solr.concurrency.max-async-indexes
2819
2837
++++++++++++++++++++++++++++++++++++++++++++
2820
2838
@@ -4447,6 +4465,8 @@ Notes:
4447
4465
4448
4466
- For larger file upload sizes, you may need to configure your reverse proxy timeout. If using apache2 (httpd) with Shibboleth, add a timeout to the ProxyPass defined in etc/httpd/conf.d/ssl.conf (which is described in the :doc:`/installation/shibboleth` setup).
4449
4467
4468
+
.. _:MultipleUploadFilesLimit:
4469
+
4450
4470
:MultipleUploadFilesLimit
4451
4471
+++++++++++++++++++++++++
4452
4472
@@ -4525,6 +4545,8 @@ Examples:
4525
4545
4526
4546
``curl -X PUT -d '{"default":"0", "CSV":"268435456"}' http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit``
4527
4547
4548
+
.. _:ZipUploadFilesLimit:
4549
+
4528
4550
:ZipUploadFilesLimit
4529
4551
++++++++++++++++++++
4530
4552
@@ -4543,13 +4565,17 @@ By default your Dataverse installation will attempt to connect to Solr on port 8
4543
4565
4544
4566
**Note:** instead of using a database setting, you could alternatively use JVM settings like :ref:`dataverse.solr.host`.
4545
4567
4568
+
.. _:SolrFullTextIndexing:
4569
+
4546
4570
:SolrFullTextIndexing
4547
4571
+++++++++++++++++++++
4548
4572
4549
4573
Whether or not to index the content of files such as PDFs. The default is false.
4550
4574
4551
4575
``curl -X PUT -d true http://localhost:8080/api/admin/settings/:SolrFullTextIndexing``
4552
4576
4577
+
.. _:SolrMaxFileSizeForFullTextIndexing:
4578
+
4553
4579
:SolrMaxFileSizeForFullTextIndexing
4554
4580
+++++++++++++++++++++++++++++++++++
4555
4581
@@ -4571,12 +4597,15 @@ To enable the setting::
4571
4597
4572
4598
curl -X PUT -d true "http://localhost:8080/api/admin/settings/:DisableSolrFacets"
4573
4599
4600
+
.. _:DisableSolrFacetsForGuestUsers:
4574
4601
4575
4602
:DisableSolrFacetsForGuestUsers
4576
4603
+++++++++++++++++++++++++++++++
4577
4604
4578
4605
Similar to the above, but will disable the facets for Guest (unauthenticated) users only.
4579
4606
4607
+
.. _:DisableSolrFacetsWithoutJsession:
4608
+
4580
4609
:DisableSolrFacetsWithoutJsession
4581
4610
+++++++++++++++++++++++++++++++++
4582
4611
@@ -5079,6 +5108,8 @@ If you don’t want date facets to be sorted chronologically, set:
5079
5108
5080
5109
``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:ChronologicalDateFacets``
5081
5110
5111
+
.. _:CustomZipDownloadServiceUrl:
5112
+
5082
5113
:CustomZipDownloadServiceUrl
5083
5114
++++++++++++++++++++++++++++
5084
5115
@@ -5210,6 +5241,8 @@ A suggested minimum includes author, datasetContact, and contributor, but additi
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/installation/prep.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,7 +117,7 @@ Decisions to Make
117
117
118
118
Here are some questions to keep in the back of your mind as you test and move into production:
119
119
120
-
- How much storage do I need?
120
+
- How much storage do I need? What is the scale of data I will need to handle (see :doc:`/admin/big-data-administration`)?
121
121
- Which features do I want based on :ref:`architecture`?
122
122
- How do I want my users to log in to the Dataverse installation? With local accounts? With Shibboleth/SAML? With OAuth providers such as ORCID, GitHub, or Google?
123
123
- Do I want to to run my app server on the standard web ports (80 and 443) or do I want to "front" my app server with a proxy such as Apache or nginx? See "Network Ports" in the :doc:`config` section.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/user/dataset-management.rst
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -391,14 +391,17 @@ If the bounding box was successfully populated, :ref:`geospatial-search` should
391
391
Compressed Files
392
392
----------------
393
393
394
-
Compressed files in .zip format are unpacked automatically. If a .zip file fails to unpack for whatever reason, it will upload as is. If the number of files inside are more than a set limit (1,000 by default, configurable by the Administrator), you will get an error message and the .zip file will upload as is.
394
+
Depending on the configuration, compressed files in .zip format are unpacked automatically. If a .zip file is not unpacked, it will upload as is.
395
+
If the number of files inside are more than a set limit (1,000 by default, configurable by the Administrator), you will get an error message and the .zip file will upload as is.
395
396
396
397
If the uploaded .zip file contains a folder structure, the Dataverse installation will keep track of this structure. A file's location within this folder structure is displayed in the file metadata as the File Path. When you download the contents of the dataset, this folder structure will be preserved and files will appear in their original locations.
397
398
398
399
These folder names are subject to strict validation rules. Only the following characters are allowed: the alphanumerics, '_', '-', '.' and ' ' (white space). When a zip archive is uploaded, the folder names are automatically sanitized, with any invalid characters replaced by the '.' character. Any sequences of dots are further replaced with a single dot. For example, the folder name ``data&info/code=@137`` will be converted to ``data.info/code.137``. When uploading through the Web UI, the user can change the values further on the edit form presented, before clicking the 'Save' button.
399
400
400
401
.. note:: If you upload multiple .zip files to one dataset, any subdirectories that are identical across multiple .zips will be merged together when the user downloads the full dataset.
401
402
403
+
If a .zip file is not unpacked and Zip Previewer is installed (see :ref:`file-previews`), it will be possible for users to view the contents of the zip file and to download individual files from within the .zip.
0 commit comments