Skip to content

Commit 994bbcb

Browse files
authored
Merge branch 'develop' into 10476-display-on-create-field-option
2 parents 90810ef + ff8a037 commit 994bbcb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+495
-173
lines changed

.env

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
APP_IMAGE=gdcc/dataverse:unstable
22
POSTGRES_VERSION=17
33
DATAVERSE_DB_USER=dataverse
4-
SOLR_VERSION=9.3.0
5-
SKIP_DEPLOY=0
4+
SOLR_VERSION=9.8.0
5+
SKIP_DEPLOY=0

.github/workflows/copy_labels.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
name: Copy labels from issue to pull request
2+
3+
on:
4+
pull_request:
5+
types: [opened]
6+
7+
jobs:
8+
copy-labels:
9+
runs-on: ubuntu-latest
10+
name: Copy labels from linked issues
11+
steps:
12+
- name: copy-labels
13+
uses: michalvankodev/[email protected]
14+
with:
15+
repo-token: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/deploy_beta_testing.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
overwrite: true
6969

7070
- name: Execute payara war deployment remotely
71-
uses: appleboy/[email protected].0
71+
uses: appleboy/[email protected].1
7272
env:
7373
INPUT_WAR_FILE: ${{ env.war_file }}
7474
with:

conf/solr/schema.xml

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -38,36 +38,37 @@
3838
catchall "text" field, and use that for searching.
3939
-->
4040

41-
<schema name="default-config" version="1.6">
41+
<schema name="default-config" version="1.7">
4242
<!-- attribute "name" is the name of this schema and is only used for display purposes.
43-
version="x.y" is Solr's version number for the schema syntax and
43+
version="x.y" is Solr's version number for the schema syntax and
4444
semantics. It should not normally be changed by applications.
4545
46-
1.0: multiValued attribute did not exist, all fields are multiValued
46+
1.0: multiValued attribute did not exist, all fields are multiValued
4747
by nature
48-
1.1: multiValued attribute introduced, false by default
49-
1.2: omitTermFreqAndPositions attribute introduced, true by default
48+
1.1: multiValued attribute introduced, false by default
49+
1.2: omitTermFreqAndPositions attribute introduced, true by default
5050
except for text fields.
5151
1.3: removed optional field compress feature
5252
1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
53-
behavior when a single string produces multiple tokens. Defaults
53+
behavior when a single string produces multiple tokens. Defaults
5454
to off for version >= 1.4
55-
1.5: omitNorms defaults to true for primitive field types
55+
1.5: omitNorms defaults to true for primitive field types
5656
(int, float, boolean, string...)
5757
1.6: useDocValuesAsStored defaults to true.
58+
1.7: docValues defaults to true, uninvertible defaults to false.
5859
-->
5960

6061
<!-- Valid attributes for fields:
6162
name: mandatory - the name for the field
62-
type: mandatory - the name of a field type from the
63+
type: mandatory - the name of a field type from the
6364
fieldTypes section
6465
indexed: true if this field should be indexed (searchable or sortable)
6566
stored: true if this field should be retrievable
6667
docValues: true if this field should have doc values. Doc Values is
6768
recommended (required, if you are using *Point fields) for faceting,
6869
grouping, sorting and function queries. Doc Values will make the index
69-
faster to load, more NRT-friendly and more memory-efficient.
70-
They are currently only supported by StrField, UUIDField, all
70+
faster to load, more NRT-friendly and more memory-efficient.
71+
They are currently only supported by StrField, UUIDField, all
7172
*PointFields, and depending on the field type, they might require
7273
the field to be single-valued, be required or have a default value
7374
(check the documentation of the field type you're interested in for
@@ -82,9 +83,9 @@
8283
given field.
8384
When using MoreLikeThis, fields used for similarity should be
8485
stored for best performance.
85-
termPositions: Store position information with the term vector.
86+
termPositions: Store position information with the term vector.
8687
This will increase storage costs.
87-
termOffsets: Store offset information with the term vector. This
88+
termOffsets: Store offset information with the term vector. This
8889
will increase storage costs.
8990
required: The field is required. It will throw an error if the
9091
value does not exist
@@ -102,10 +103,10 @@
102103
<!-- In this _default configset, only four fields are pre-declared:
103104
id, _version_, and _text_ and _root_. All other fields will be type guessed and added via the
104105
"add-unknown-fields-to-the-schema" update request processor chain declared in solrconfig.xml.
105-
106-
Note that many dynamic fields are also defined - you can use them to specify a
106+
107+
Note that many dynamic fields are also defined - you can use them to specify a
107108
field's type via field naming conventions - see below.
108-
109+
109110
WARNING: The _text_ catch-all field will significantly increase your index size.
110111
If you don't need it, consider removing it and the corresponding copyField directive."
111112
-->
@@ -115,12 +116,12 @@
115116
<field name="_version_" type="plong" indexed="false" stored="false"/>
116117
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
117118

118-
119-
120-
121-
122-
<!-- Start: Dataverse-specific -->
123-
119+
120+
121+
122+
123+
<!-- Start: Dataverse-specific -->
124+
124125
<!-- catchall field, containing all other searchable text fields (implemented
125126
via copyField further on in this schema -->
126127
<!-- Dataverse solr 7.3.0: for some reason the old text wasn't working so switched to _text_ for copyfields -->
@@ -216,7 +217,7 @@
216217
<!-- https://redmine.hmdc.harvard.edu/issues/3482 -->
217218
<!-- 'Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)' http://wiki.apache.org/solr/CommonQueryParameters#sort -->
218219
<!-- http://stackoverflow.com/questions/13360706/solr-4-0-alphabetical-sorting-trouble/13361226#13361226 -->
219-
<field name="nameSort" type="alphaOnlySort" indexed="true" stored="true"/>
220+
<field name="nameSort" type="string" indexed="true" stored="true"/>
220221

221222
<field name="dateSort" type="pdate" indexed="true" stored="true"/>
222223

@@ -785,7 +786,7 @@
785786
<filter class="solr.TrimFilterFactory" />
786787
<!-- The PatternReplaceFilter gives you the flexibility to use
787788
Java Regular expression to replace any sequence of characters
788-
matching a pattern with an arbitrary replacement string,
789+
matching a pattern with an arbitrary replacement string,
789790
which may include back references to portions of the original
790791
string matched by the pattern.
791792
@@ -798,8 +799,8 @@
798799
<!-- https://redmine.hmdc.harvard.edu/issues/3482#note-11 -->
799800
<!-- <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> -->
800801
</analyzer>
801-
</fieldType>
802-
802+
</fieldType>
803+
803804
<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
804805
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
805806
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />

conf/solr/solrconfig.xml

Lines changed: 21 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -35,52 +35,7 @@
3535
that you fully re-index after changing this setting as it can
3636
affect both how text is indexed and queried.
3737
-->
38-
<luceneMatchVersion>9.7</luceneMatchVersion>
39-
40-
<!-- <lib/> directives can be used to instruct Solr to load any Jars
41-
identified and use them to resolve any "plugins" specified in
42-
your solrconfig.xml or schema.xml (ie: Analyzers, Request
43-
Handlers, etc...).
44-
45-
All directories and paths are resolved relative to the
46-
instanceDir.
47-
48-
Please note that <lib/> directives are processed in the order
49-
that they appear in your solrconfig.xml file, and are "stacked"
50-
on top of each other when building a ClassLoader - so if you have
51-
plugin jars with dependencies on other jars, the "lower level"
52-
dependency jars should be loaded first.
53-
54-
If a "./lib" directory exists in your instanceDir, all files
55-
found in it are included as if you had used the following
56-
syntax...
57-
58-
<lib dir="./lib" />
59-
-->
60-
61-
<!-- A 'dir' option by itself adds any files found in the directory
62-
to the classpath, this is useful for including all jars in a
63-
directory.
64-
65-
When a 'regex' is specified in addition to a 'dir', only the
66-
files in that directory which completely match the regex
67-
(anchored on both ends) will be included.
68-
69-
If a 'dir' option (with or without a regex) is used and nothing
70-
is found that matches, a warning will be logged.
71-
72-
The example below can be used to load a Solr Module along
73-
with their external dependencies.
74-
-->
75-
<!-- <lib dir="${solr.install.dir:../../../..}/modules/ltr/lib" regex=".*\.jar" /> -->
76-
77-
<!-- an exact 'path' can be used instead of a 'dir' to specify a
78-
specific jar file. This will cause a serious error to be logged
79-
if it can't be loaded.
80-
-->
81-
<!--
82-
<lib path="../a-jar-that-does-not-exist.jar" />
83-
-->
38+
<luceneMatchVersion>9.11</luceneMatchVersion>
8439

8540
<!-- Data Directory
8641
@@ -256,16 +211,9 @@
256211
is recommended (see below).
257212
"dir" - the target directory for transaction logs, defaults to the
258213
solr data directory.
259-
"numVersionBuckets" - sets the number of buckets used to keep
260-
track of max version values when checking for re-ordered
261-
updates; increase this value to reduce the cost of
262-
synchronizing access to version buckets during high-volume
263-
indexing, this requires 8 bytes (long) * numVersionBuckets
264-
of heap space per Solr core.
265214
-->
266215
<updateLog>
267216
<str name="dir">${solr.ulog.dir:}</str>
268-
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
269217
</updateLog>
270218

271219
<!-- AutoCommit
@@ -360,6 +308,21 @@
360308
-->
361309
<maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
362310

311+
<!-- Minimum acceptable prefix-size for prefix-based queries.
312+
313+
Prefix-based queries consume memory in proportion to the number of terms in the index
314+
that start with that prefix. Short prefixes tend to match many many more indexed-terms
315+
and consume more memory as a result, sometimes causing stability issues on the node.
316+
317+
This setting allows administrators to require that prefixes meet or exceed a specified
318+
minimum length requirement. Prefix queries that don't meet this requirement return an
319+
error to users. The limit may be overridden on a per-query basis by specifying a
320+
'minPrefixQueryTermLength' local-param value.
321+
322+
The flag value of '-1' can be used to disable enforcement of this limit.
323+
-->
324+
<minPrefixQueryTermLength>${solr.query.minPrefixLength:-1}</minPrefixQueryTermLength>
325+
363326
<!-- Solr Internal Query Caches
364327
Starting with Solr 9.0 the default cache implementation used is CaffeineCache.
365328
-->
@@ -494,23 +457,6 @@
494457
-->
495458
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
496459

497-
<!-- Use Filter For Sorted Query
498-
499-
A possible optimization that attempts to use a filter to
500-
satisfy a search. If the requested sort does not include
501-
score, then the filterCache will be checked for a filter
502-
matching the query. If found, the filter will be used as the
503-
source of document ids, and then the sort will be applied to
504-
that.
505-
506-
For most situations, this will not be useful unless you
507-
frequently get the same search repeatedly with different sort
508-
options, and none of them ever use "score"
509-
-->
510-
<!--
511-
<useFilterForSortedQuery>true</useFilterForSortedQuery>
512-
-->
513-
514460
<!-- Query Related Event Listeners
515461
516462
Various IndexSearcher related events can trigger Listeners to
@@ -1015,6 +961,10 @@
1015961
<str name="pattern">[^\w-\.]</str>
1016962
<str name="replacement">_</str>
1017963
</updateProcessor>
964+
<updateProcessor class="solr.NumFieldLimitingUpdateRequestProcessorFactory" name="max-fields">
965+
<int name="maxFields">1000</int>
966+
<bool name="warnOnly">true</bool>
967+
</updateProcessor>
1018968
<updateProcessor class="solr.ParseBooleanFieldUpdateProcessorFactory" name="parse-boolean"/>
1019969
<updateProcessor class="solr.ParseLongFieldUpdateProcessorFactory" name="parse-long"/>
1020970
<updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/>
@@ -1061,7 +1011,7 @@
10611011

10621012
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
10631013
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
1064-
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
1014+
processor="uuid,remove-blank,field-name-mutating,max-fields,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
10651015
<processor class="solr.LogUpdateProcessorFactory"/>
10661016
<processor class="solr.DistributedUpdateProcessorFactory"/>
10671017
<processor class="solr.RunUpdateProcessorFactory"/>
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Solr 9.8.0 is now the version recommended in our installation guides and used with automated testing. Other libraries Dataverse uses have been updated as well.
2+
3+
For the upgrade instructions section:
4+
5+
[note that 6.6 may contain other solr-related changes, so the instructions may need to contain information merged from multiple release notes!]
6+
7+
If you are upgrading Solr:
8+
- Install solr-9.8.0 following the instructions from the Installation guide.
9+
- Run a full reindex to populate the search catalog.

doc/release-notes/11095-fix-extcvoc-indexing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ in indexing failure for the dataset (e.g. when the script tried to index both th
33
Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to /api/admin/index/solr/schema.
44
Configuring the Solr schema and the update-fields.sh script as usually recommended when using custom metadata blocks will resolve the issue.
55

6-
The overall release notes should include a Solr update (which hopefully is required by an update to 9.7.0 anyway) and our standard instructions
6+
The overall release notes should include a Solr update (which hopefully is required by an update to 9.8.0 anyway) and our standard instructions
77
should change to recommending use of the update-fields.sh script when using custom metadatablocks *and/or external vocabulary scripts*.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
This release fixes a bug that caused Dataverse to generate unnecessary solr documents for files when a file is added/deleted from a draft dataset. These documents could accumulate and potentially impact performance.
22

3-
Assuming the upgrade to solr 9.7.0 also occurs in this release, there's nothing else needed for this PR. (Starting with a new solr insures the solr db is empty and that a reindex is already required.)
3+
Assuming the upgrade to solr 9.8.0 also occurs in this release, there's nothing else needed for this PR. (Starting with a new solr insures the solr db is empty and that a reindex is already required.)
44

55

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Bugs that caused 1) guestbook questions to appear along with terms of use/terms of access in the request access dialog when no guestbook was configured, and 2) terms of access to not be shown when using the per-file request access/download menu items have been fixed.
2+
Text related to configuring the choice to have guestbooks appear when file access is requested or when files are downloaded has been updated to make it clearer that this only affects datasets where guestbooks have been configured.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
The :CustomDatasetSummaryFields setting now allows spaces along with a comma separating field names. In addition, a bug that caused license information to be hidden if there are no values for any of the custom fields specified has been fixed.

0 commit comments

Comments
 (0)