Skip to content

Conversation

@jo-pol
Copy link
Contributor

@jo-pol jo-pol commented Oct 6, 2025

See #11301
and #11700

How I tested the solution:

On our own VM

Tried cases

  • When logged in as super 2 rows were returned and the choice for host dataverse shown.
  • When logged in without super user rights a single row was returned and the host dataverse not shown.

Saw no performance difference in these cases. The query was executed quickly and only once in both cases.

After solving the merge conflict
Did not retest with the debugger, but performance still was OK.

Also tried /api/mydata/retrieve/collectionList which returned the expected number.

@qqmyers qqmyers moved this to Ready for Triage in IQSS Dataverse Project Oct 6, 2025
@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Oct 6, 2025
@qqmyers qqmyers added this to the 6.9 milestone Oct 6, 2025
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates #11301 and removes the significant performance issue there. I think that, to move forward, this just needs to have a release note, and to get rid of the unused imports. I've made a couple notes about other possible minor performance improvements - I think unless @jo-pol wants to make those or someone on the team thinks they're needed, they probably aren't needed (I'll poll people at standup/triage tomorrow).


import org.apache.commons.lang3.StringUtils;
import org.joda.time.DateTime;
import org.joda.time.Duration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't used any more.

}

public boolean isHasDataversesToChoose() {
this.hasDataversesToChoose = dataverseService.getDataverseCount() > 1;
Copy link
Member

@qqmyers qqmyers Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW (and so we don't lose it): the original PR has a suggestion that one could query ~ "SELECT 1 FROM Dataverse d limit 2" which would be more efficient that getting a count and still address the issue (is there more than one dataverse so the user can actually make a choice). I'll also note that this form might be easier to extend to look for dataverses where a user has permission to add a dataset in the future. That said, just getting the count is orders of magnitude faster than the problematic query in the original PR, so I think this is fine unless @jo-pol wants to go further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to look for dataverses where a user has permission to add a dataset in the future

That thought crossed my mind too. We now still might show the field while there is nothing to choose.

}

public boolean isHasDataversesToChoose() {
// TODO we actually need to look for dataverses where a user has permission to add a dataset
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to create a new dataset with a user that had no permission to any of the 100K numbered test dataverses. It took a few seconds when searching with 2 digits.
So a variant of DataverseServiceBean.filterDataversesForHosting (on github) without a pattern for the query and exiting after one (or two?) successfull authourisation checks would take way too long for users with permission for too few dataverses.

A more efficient query might be something like

SELECT dv
FROM roleassignment ra
JOIN dataverse      dv
  on dv.id = ra.definitionpoint_id
where ra.assigneeidentifier in ('@user001', ':authenticated-users');

However, RoleAssignment.findAssigneesWithPermissionOnDvObject (on github) is dazzling me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that query is more complex because it has to look for filedownloads configured on the dataset owning a file, etc. I haven't fully thought it through, but this custom code from QDR might be a reasonable model: https://github.com/QualitativeDataRepository/dataverse/blob/b9c0bdc374518cc499eba564e8ad04ec059bc1ff/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java#L1014-L1051 - it checks for superuser first, and if, the user isn't one, gets the list of roles that include the relevant permission (AddDataset in this case) and looks for a case where the user has that role on something. I think you could further constrain by looking for the roleassignment's dvobject having dtype='Dataverse'. Setting the limit to 2 would then let you see if that user has a choice somewhere. I think this is reasonably performant, but I haven't tested on a very large database (as with the case in this PR, it was orders of magnitude faster than the code I replaced).

Do you want to try that? And maybe cache the result so it isn't called 7 times and call this PR ready to move forward?

Copy link
Contributor Author

@jo-pol jo-pol Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the dazzling query PermissionServiceBean.LIST_ALL_DATAVERSES_USER_HAS_PERMISSION I might have to do something with groups too. That might be more work than @janvanmansum baragained for.

Anyway, even on a simplified solution I get:

jakarta.ejb.EJBException: An exception occurred while creating a query in EntityManager: 
Exception Description: Syntax error parsing [SELECT dv FROM roleassignment ra JOIN dataverse      dv   ON dv.id = ra.definitionpoint_id WHERE ra.assigneeidentifier = '@user001' OR ra.assigneeidentifier = ':authenticated-users' limit 2]. 
[90, 182] The expression is not a valid conditional expression.

The query works in psql.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified attempt: DANS-KNAW-jp@4fcf8e1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacked PermissionServiceBean.findPermittedCollections with "limit 2" for when it should have returned 50K instances. That reacted quickly. So I need

permissionService.findPermittedCollections(req, user, Set.of(Permission.CreateDataset))

with a variant of the query in the simplified attempt.

Copy link
Member

@qqmyers qqmyers Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jo-pol - Not sure where my comment went - so ~retyping.
The main issue with the 'simplified attempt' is that it's a native sql query and not JPA. That could be addressed by using createNativeQuery() or using JPA. I suggest the following code (have not tested). It handles all the groups the user might be in and limits which roles are checked, along with keeping your check that the definitionpoints are Dataverses.

I'm not sure if it is more efficient that the findPermittedCollections - certainly looks simpler. The code below does break finding the roles and groups out into separate queries (less efficient?), but it may be more efficient to be checking the role ids (which hopefully are indexed) rather than doing the bit-level or on the permissions. I'm not sure how well postgres can optimize in either case.

        Set<RoleAssignee> ras = new HashSet<>(groupService.groupsFor(req));
        ras.add(user);

        List<String> raIds = ras.stream().map(roas -> roas.getIdentifier()).collect(Collectors.toList());

        List<Long> roleIds = new ArrayList<>();
        
        for (DataverseRole role : roleService.findAll()) {
            if (role.permissions().contains(Permission.AddDataset)) {
                roleIds.add(role.getId());
            }
        }
        // Just check for more than one matching record
        // More efficient than counting all records
        try {
            return em.createQuery(
                "SELECT ra.id FROM RoleAssignment ra " +
                    "JOIN Dataverse dv ON dv.id = ra.definitionPoint.id " +
                    "WHERE ra.assigneeIdentifier IN :assigneeIdentifiers " +
                    "AND ra.role.id IN :roleIds",
            Long.class)
            .setParameter("assigneeIdentifiers", raIds)
            .setParameter("roleIds", roleIds)
            .setMaxResults(2)
            .getResultList().size() > 1;

        } catch (NoResultException e) {
            return false;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers

That could be addressed by using createNativeQuery()

That is how the permissionService executes the query anyway and how I reused it with limit 2 added. Reusing existing logic reduces maintenance risks IMHO. Updated the cover message with how I tested.

@ofahimIQSS
Copy link
Contributor

discussed during triage - Jim will follow up

@jo-pol jo-pol requested a review from qqmyers October 9, 2025 12:38
@qqmyers qqmyers moved this from Ready for Triage to In Review 🔎 in IQSS Dataverse Project Oct 14, 2025
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jo-pol - I'm going to approve so it goes to Ready for QA. Can you resolve the merge conflicts and make the requested change to the release notes.

@@ -0,0 +1,9 @@
### Restored suppression of the host dataverse field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the release note is what users see, I'd suggest changing the title to "Suppression of the Host Dataverse field" and removing the second (for developers) and third (no longer true) paragraphs.

@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Oct 15, 2025
@qqmyers qqmyers removed their assignment Oct 15, 2025
jo-pol added a commit to DANS-KNAW-jp/dataverse that referenced this pull request Oct 20, 2025
}
}
return null;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled how to assemble the URL for testing linkingDataverses with the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, Intellij does not blame me for the code inside getIpRange and setSearchParamValues but github does blame me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following command looks more like the intellij result

 git blame -w -M -C --minimal -L 954,1040 PermissionServiceBean.java

@coveralls
Copy link

Coverage Status

coverage: 23.504% (-0.002%) from 23.506%
when pulling 7376590 on DANS-KNAW-jp:restore-11301
into a696b36 on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented Oct 22, 2025

@jo-pol - are you done with changes? If so, we can remove your assignment and let it go into QA.

@cmbz cmbz added FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) FY26 Sprint 9 FY26 Sprint 9 (2025-10-22 - 2025-11-05) labels Oct 22, 2025
@jo-pol jo-pol removed their assignment Oct 23, 2025
@jo-pol
Copy link
Contributor Author

jo-pol commented Oct 23, 2025

@jo-pol - are you done with changes? If so, we can remove your assignment and let it go into QA.

I am done with the changes and unassigned myself. I'd recommend to test linkingDataverses with the API. Don't have a clue to do it myself.

jo-pol added a commit to DANS-KNAW-jp/dataverse that referenced this pull request Oct 23, 2025
cherry pick stopped at merge IQSS/development
@ofahimIQSS ofahimIQSS self-assigned this Oct 23, 2025
@ofahimIQSS ofahimIQSS moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Oct 23, 2025
@ofahimIQSS
Copy link
Contributor

Hey @jo-pol, I wasn't able to reproduce the issue on demo where we have trouble editing the host DV - could you please try and reproduce the issue in demo (running v6.8 at the moment) -

Thanks!

@jo-pol
Copy link
Contributor Author

jo-pol commented Oct 27, 2025

could you please try and reproduce the issue in demo (running v6.8 at the moment) -

The following scenario still blocks the deposit page. The top bar and footer are accessible, the deposit form becomes faint and no new draft appears when selecting "My Data".

  • In the user interface click Add Data → New Dataset
  • The cursor is placed in the “Host Dataverse” field.
  • Quickly type: space, back-space, tab
  • Fill in the required fields and click Save.

@jo-pol jo-pol changed the title restored suppressing dataverse field restored suppressing host dataverse field Oct 27, 2025
@ofahimIQSS
Copy link
Contributor

Thank you!
I was able to reproduce the issue - going to try the fix now, will take it from here.

@cmbz cmbz assigned pdurbin and unassigned ofahimIQSS Nov 5, 2025
@cmbz cmbz added the FY26 Sprint 10 FY26 Sprint 10 (2025-11-05 - 2025-11-19) label Nov 5, 2025
@pdurbin
Copy link
Member

pdurbin commented Nov 6, 2025

This seems to work. First, I ran mvn test -Dtest=InReviewWorkflowIT to create an author user and a curator user. A collection that both can create datasets in was also created.

Then I made it so no one can create anything in the root collection without explicit permission:

Screenshot 2025-11-06 at 3 40 31 PM

The author doesn't see the "host dataverse" field since they can only create datasets in the one collection:

Screenshot 2025-11-06 at 3 39 03 PM

For the curator, I created an extra collection and gave them access to create datasets so they see the "host dataverse" field:

Screenshot 2025-11-06 at 3 40 09 PM Screenshot 2025-11-06 at 3 41 44 PM

Merging.

@pdurbin pdurbin merged commit 1ba6926 into IQSS:develop Nov 6, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Nov 6, 2025
@pdurbin pdurbin removed their assignment Nov 6, 2025
@jo-pol jo-pol deleted the restore-11301 branch November 10, 2025 09:36
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) FY26 Sprint 9 FY26 Sprint 9 (2025-10-22 - 2025-11-05) FY26 Sprint 10 FY26 Sprint 10 (2025-11-05 - 2025-11-19) Size: 3 A percentage of a sprint. 2.1 hours.

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

6 participants