Skip to content

Commit 7db4b0d

Browse files
Merge branch 'develop' into 11528-display-collection-size-on-add-files-page
2 parents 3488373 + bef0f83 commit 7db4b0d

File tree

14 files changed

+353
-160
lines changed

14 files changed

+353
-160
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
A bug introduced in Dataverse 6.8 that makes attempts to replace non-tabular files via the current Dataverse UI fail has been fixed. (The bug would also cause the replace API to fail if an empty dataFileTags array is sent.)
2+
3+
See #11976
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
The API returning information about datasets (`/api/datasets/{id}`) now includes a `locks` field containing a list of the types of all existing locks, e.g. `"locks": ["InReview"]`.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
(assuming the earlier PRs have been merged, tehre will be a section on indexing improvements already)
2+
This release also avoids creating unused Solr entries for files in drafts of new versions of published datasets (decreasing the Solr db size and thereby improving performance).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
It came to our attention that the [Dataverse Uploader GitHub Action](https://guides.dataverse.org/en/6.10/admin/integrations.html#github) was [failing](https://github.com/IQSS/dataverse-uploader/issues/28) with an "unhashable type" error. This has been fixed in a new release, [v1.7](https://github.com/IQSS/dataverse-uploader/releases/tag/v1.7).

doc/sphinx-guides/source/api/search.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ The parameters and JSON response are partly inspired by the `GitHub Search API <
1717

1818
Please note that in Dataverse Software 4.3 and older the "citation" field wrapped the persistent ID URL in an ``<a>`` tag but this has been changed to plaintext. If you want the old value with HTML in it, a new field called "citationHtml" can be used.
1919

20-
2120
Parameters
2221
----------
2322

@@ -27,7 +26,7 @@ Name Type Description
2726
q string The search term or terms. Using "title:data" will search only the "title" field. "*" can be used as a wildcard either alone or adjacent to a term (i.e. "bird*"). For example, https://demo.dataverse.org/api/search?q=title:data . For a list of fields to search, please see https://github.com/IQSS/dataverse/issues/2558 (for now).
2827
type string Can be either "dataverse", "dataset", or "file". Multiple "type" parameters can be used to include multiple types (i.e. ``type=dataset&type=file``). If omitted, all types will be returned. For example, https://demo.dataverse.org/api/search?q=*&type=dataset
2928
subtree string The identifier of the Dataverse collection to which the search should be narrowed. The subtree of this Dataverse collection and all its children will be searched. Multiple "subtree" parameters can be used to include multiple Dataverse collections. For example, https://demo.dataverse.org/api/search?q=data&subtree=birds&subtree=cats .
30-
sort string The sort field. Supported values include "name", "date" and "relevance". See example under "order".
29+
sort string The sort field. Supported values include "name", "date", and "score". Sorting by "score" orders by **relevance** and is the default if this parameter is omitted.
3130
order string The order in which to sort. Can either be "asc" or "desc". For example, https://demo.dataverse.org/api/search?q=data&sort=name&order=asc
3231
per_page int The number of results to return per request. The default is 10. The max is 1000. See :ref:`iteration example <iteration-example>`.
3332
start int A cursor for paging through search results. See :ref:`iteration example <iteration-example>`.

src/main/java/edu/harvard/iq/dataverse/FileMetadata.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,15 @@
6767
*/
6868
@Table(indexes = {@Index(columnList="datafile_id"), @Index(columnList="datasetversion_id")} )
6969
@NamedNativeQuery(
70-
name = "FileMetadata.compareFileMetadata",
70+
name = "FileMetadata.getDatafilesWithChangedMetadata",
7171
query = "WITH fm_categories AS (" +
7272
" SELECT fmd.filemetadatas_id, " +
7373
" STRING_AGG(dfc.name, ',' ORDER BY dfc.name) AS categories " +
7474
" FROM FileMetadata_DataFileCategory fmd " +
7575
" JOIN DataFileCategory dfc ON fmd.filecategories_id = dfc.id " +
7676
" GROUP BY fmd.filemetadatas_id " +
7777
") " +
78-
"SELECT fm1.id " +
78+
"SELECT fm1.datafile_id AS id " +
7979
"FROM FileMetadata fm1 " +
8080
"LEFT JOIN FileMetadata fm2 ON fm1.datafile_id = fm2.datafile_id " +
8181
" AND fm2.datasetversion_id = ?1 " +
@@ -93,11 +93,11 @@
9393
" ) " +
9494
" ) " +
9595
" )",
96-
resultSetMapping = "IdToLongMapping"
96+
resultSetMapping = "IdToIntegerMapping"
9797
)
9898
/* When this mapping was to Long.class, Postgres was still returning an Integer, causing indexing failures - see #11776 */
9999
@SqlResultSetMapping(
100-
name = "IdToLongMapping",
100+
name = "IdToIntegerMapping",
101101
columns = @ColumnResult(name = "id", type = Integer.class)
102102
)
103103
@Entity

src/main/java/edu/harvard/iq/dataverse/datasetutility/OptionalFileParams.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -618,7 +618,7 @@ private void replaceFileDataTagsInFile(DataFile df) throws DataFileTagException{
618618
// --------------------------------------------------
619619
// Is this a tabular file?
620620
// --------------------------------------------------
621-
if (!df.isTabularData()){
621+
if (!df.isTabularData() && !getDataFileTags().isEmpty()){
622622
String errMsg = BundleUtil.getStringFromBundle("file.metadata.datafiletag.not_tabular");
623623

624624
throw new DataFileTagException(errMsg);

src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java

Lines changed: 23 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -602,10 +602,24 @@ private void doIndexDataset(Dataset dataset, boolean doNormalSolrDocCleanUp) thr
602602
writeDebugInfo(debug, dataset);
603603
}
604604
if (doNormalSolrDocCleanUp) {
605+
List<String> solrIdsOfPermissionDocsToDelete = new ArrayList<>();
605606
try {
606607
solrIdsOfDocsToDelete = findFilesOfParentDataset(dataset.getId());
607608
logger.fine("Existing file docs: " + String.join(", ", solrIdsOfDocsToDelete));
608609
if (!solrIdsOfDocsToDelete.isEmpty()) {
610+
if (!latestVersion.isDraft()) {
611+
// After publication, we need to delete old draft perm docs
612+
// For the first draft, a perm doc will exist for each file
613+
// For subsequent drafts, perm docs should only exist for new files/those with changed metadata
614+
// This code adds the ids of draft perm docs for all files - if the docs don't exist, Solr will just ignore them
615+
for (String fileDocId : solrIdsOfDocsToDelete) {
616+
if (!fileDocId.endsWith(draftSuffix)) {
617+
solrIdsOfPermissionDocsToDelete.add(fileDocId + draftSuffix + discoverabilityPermissionSuffix);
618+
}
619+
}
620+
621+
logger.fine("Existing permission docs: " + String.join(", ", solrIdsOfPermissionDocsToDelete));
622+
}
609623
// We keep the latest version's docs unless it is deaccessioned and there is no
610624
// published/released version
611625
// So skip the loop removing those docs from the delete list except in that case
@@ -649,7 +663,7 @@ private void doIndexDataset(Dataset dataset, boolean doNormalSolrDocCleanUp) thr
649663
logger.fine("Solr docs to delete: " + String.join(", ", solrIdsOfDocsToDelete));
650664

651665
if (!solrIdsOfDocsToDelete.isEmpty()) {
652-
List<String> solrIdsOfPermissionDocsToDelete = new ArrayList<>();
666+
653667
for (String file : solrIdsOfDocsToDelete) {
654668
// Also remove associated permission docs
655669
solrIdsOfPermissionDocsToDelete.add(file + discoverabilityPermissionSuffix);
@@ -1416,7 +1430,7 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
14161430
long maxSize = maxFTIndexingSize != null ? maxFTIndexingSize.longValue() : Long.MAX_VALUE;
14171431

14181432
List<String> filesIndexed = new ArrayList<>();
1419-
final List<Long> changedFileMetadataIds = new ArrayList<>();
1433+
final List<Long> changedFileIds = new ArrayList<>();
14201434
if (datasetVersion != null) {
14211435
List<FileMetadata> fileMetadatas = datasetVersion.getFileMetadatas();
14221436
List<FileMetadata> rfm = new ArrayList<>();
@@ -1427,42 +1441,17 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
14271441
fileMap.put(released.getDataFile().getId(), released);
14281442
}
14291443

1430-
Query query = em.createNamedQuery("FileMetadata.compareFileMetadata", Long.class);
1431-
query.setParameter(1, dataset.getReleasedVersion().getId());
1432-
query.setParameter(2, datasetVersion.getId());
1433-
1434-
/*
1435-
* When the query was configured to return Long, it was returning Integer. The query has been changed to return Integer now. The code here is robust if that changes in the future.
1436-
*/
1437-
List<Object> queryResults = query.getResultList();
1438-
for (Object result : queryResults) {
1439-
if (result != null) {
1440-
// Ensure we're adding Long objects to the list
1441-
if (result instanceof Integer intResult) {
1442-
logger.finest("Converted Integer result to Long: " + result);
1443-
changedFileMetadataIds.add(Long.valueOf(intResult));
1444-
} else if (result instanceof Long longResult) {
1445-
// Already a Long, add directly
1446-
logger.finest("Added existing Long to list: " + result);
1447-
changedFileMetadataIds.add(longResult);
1448-
} else {
1449-
// If it's not a Long, convert it to one via String
1450-
try {
1451-
changedFileMetadataIds.add(Long.valueOf(result.toString()));
1452-
logger.finest("Converted non-Long result to Long: " + result + " of type " + result.getClass().getName());
1453-
} catch (NumberFormatException e) {
1454-
logger.warning("Could not convert query result to Long: " + result);
1455-
}
1456-
}
1457-
}
1458-
}
1444+
solrIndexService.populateChangedFileIds(
1445+
dataset.getReleasedVersion().getId(),
1446+
datasetVersion.getId(),
1447+
changedFileIds);
14591448
logger.fine(
14601449
"We are indexing a draft version of a dataset that has a released version. We'll be checking file metadatas if they are exact clones of the released versions.");
14611450
} else if (datasetVersion.isDraft()) {
14621451
// Add all file metadata ids to changedFileMetadataIds
1463-
changedFileMetadataIds.addAll(
1452+
changedFileIds.addAll(
14641453
fileMetadatas.stream()
1465-
.map(FileMetadata::getId)
1454+
.map(fm -> fm.getDataFile().getId())
14661455
.collect(Collectors.toList())
14671456
);
14681457
}
@@ -1526,7 +1515,7 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
15261515
}
15271516
boolean indexThisFile = false;
15281517

1529-
if (indexThisMetadata && (isReleasedVersion || changedFileMetadataIds.contains(fileMetadata.getId()))) {
1518+
if (indexThisMetadata && (isReleasedVersion || changedFileIds.contains(datafile.getId()))) {
15301519
indexThisFile = true;
15311520
} else if (indexThisMetadata) {
15321521
// Draft version, file is not new or all file metadata matches the released version

src/main/java/edu/harvard/iq/dataverse/search/SearchPermissionsServiceBean.java

Lines changed: 0 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -92,52 +92,6 @@ public List<String> findDvObjectPerms(DvObject dvObject) {
9292
return permStrings;
9393
}
9494

95-
public Map<DatasetVersion.VersionState, Boolean> getDesiredCards(Dataset dataset) {
96-
Map<DatasetVersion.VersionState, Boolean> desiredCards = new LinkedHashMap<>();
97-
DatasetVersion latestVersion = dataset.getLatestVersion();
98-
DatasetVersion.VersionState latestVersionState = latestVersion.getVersionState();
99-
DatasetVersion releasedVersion = dataset.getReleasedVersion();
100-
boolean atLeastOnePublishedVersion = false;
101-
if (releasedVersion != null) {
102-
atLeastOnePublishedVersion = true;
103-
} else {
104-
atLeastOnePublishedVersion = false;
105-
}
106-
107-
if (atLeastOnePublishedVersion == false) {
108-
if (latestVersionState.equals(DatasetVersion.VersionState.DRAFT)) {
109-
desiredCards.put(DatasetVersion.VersionState.DRAFT, true);
110-
desiredCards.put(DatasetVersion.VersionState.DEACCESSIONED, false);
111-
desiredCards.put(DatasetVersion.VersionState.RELEASED, false);
112-
} else if (latestVersionState.equals(DatasetVersion.VersionState.DEACCESSIONED)) {
113-
desiredCards.put(DatasetVersion.VersionState.DEACCESSIONED, true);
114-
desiredCards.put(DatasetVersion.VersionState.RELEASED, false);
115-
desiredCards.put(DatasetVersion.VersionState.DRAFT, false);
116-
} else {
117-
String msg = "No-op. Unexpected condition reached: There is no published version and the latest published version is neither " + DatasetVersion.VersionState.DRAFT + " nor " + DatasetVersion.VersionState.DEACCESSIONED + ". Its state is " + latestVersionState + ".";
118-
logger.info(msg);
119-
}
120-
} else if (atLeastOnePublishedVersion == true) {
121-
if (latestVersionState.equals(DatasetVersion.VersionState.RELEASED)
122-
|| latestVersionState.equals(DatasetVersion.VersionState.DEACCESSIONED)) {
123-
desiredCards.put(DatasetVersion.VersionState.RELEASED, true);
124-
desiredCards.put(DatasetVersion.VersionState.DRAFT, false);
125-
desiredCards.put(DatasetVersion.VersionState.DEACCESSIONED, false);
126-
} else if (latestVersionState.equals(DatasetVersion.VersionState.DRAFT)) {
127-
desiredCards.put(DatasetVersion.VersionState.DRAFT, true);
128-
desiredCards.put(DatasetVersion.VersionState.RELEASED, true);
129-
desiredCards.put(DatasetVersion.VersionState.DEACCESSIONED, false);
130-
} else {
131-
String msg = "No-op. Unexpected condition reached: There is at least one published version but the latest version is neither published nor draft";
132-
logger.info(msg);
133-
}
134-
} else {
135-
String msg = "No-op. Unexpected condition reached: Has a version been published or not?";
136-
logger.info(msg);
137-
}
138-
return desiredCards;
139-
}
140-
14195
private boolean hasBeenPublished(Dataverse dataverse) {
14296
return dataverse.isReleased();
14397
}

0 commit comments

Comments
 (0)