-
Notifications
You must be signed in to change notification settings - Fork 531
restored suppressing host dataverse field #11865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
qqmyers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR updates #11301 and removes the significant performance issue there. I think that, to move forward, this just needs to have a release note, and to get rid of the unused imports. I've made a couple notes about other possible minor performance improvements - I think unless @jo-pol wants to make those or someone on the team thinks they're needed, they probably aren't needed (I'll poll people at standup/triage tomorrow).
|
|
||
| import org.apache.commons.lang3.StringUtils; | ||
| import org.joda.time.DateTime; | ||
| import org.joda.time.Duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't used any more.
| } | ||
|
|
||
| public boolean isHasDataversesToChoose() { | ||
| this.hasDataversesToChoose = dataverseService.getDataverseCount() > 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW (and so we don't lose it): the original PR has a suggestion that one could query ~ "SELECT 1 FROM Dataverse d limit 2" which would be more efficient that getting a count and still address the issue (is there more than one dataverse so the user can actually make a choice). I'll also note that this form might be easier to extend to look for dataverses where a user has permission to add a dataset in the future. That said, just getting the count is orders of magnitude faster than the problematic query in the original PR, so I think this is fine unless @jo-pol wants to go further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to look for dataverses where a user has permission to add a dataset in the future
That thought crossed my mind too. We now still might show the field while there is nothing to choose.
| } | ||
|
|
||
| public boolean isHasDataversesToChoose() { | ||
| // TODO we actually need to look for dataverses where a user has permission to add a dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to create a new dataset with a user that had no permission to any of the 100K numbered test dataverses. It took a few seconds when searching with 2 digits.
So a variant of DataverseServiceBean.filterDataversesForHosting (on github) without a pattern for the query and exiting after one (or two?) successfull authourisation checks would take way too long for users with permission for too few dataverses.
A more efficient query might be something like
SELECT dv
FROM roleassignment ra
JOIN dataverse dv
on dv.id = ra.definitionpoint_id
where ra.assigneeidentifier in ('@user001', ':authenticated-users');
However, RoleAssignment.findAssigneesWithPermissionOnDvObject (on github) is dazzling me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that query is more complex because it has to look for filedownloads configured on the dataset owning a file, etc. I haven't fully thought it through, but this custom code from QDR might be a reasonable model: https://github.com/QualitativeDataRepository/dataverse/blob/b9c0bdc374518cc499eba564e8ad04ec059bc1ff/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java#L1014-L1051 - it checks for superuser first, and if, the user isn't one, gets the list of roles that include the relevant permission (AddDataset in this case) and looks for a case where the user has that role on something. I think you could further constrain by looking for the roleassignment's dvobject having dtype='Dataverse'. Setting the limit to 2 would then let you see if that user has a choice somewhere. I think this is reasonably performant, but I haven't tested on a very large database (as with the case in this PR, it was orders of magnitude faster than the code I replaced).
Do you want to try that? And maybe cache the result so it isn't called 7 times and call this PR ready to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the dazzling query PermissionServiceBean.LIST_ALL_DATAVERSES_USER_HAS_PERMISSION I might have to do something with groups too. That might be more work than @janvanmansum baragained for.
Anyway, even on a simplified solution I get:
jakarta.ejb.EJBException: An exception occurred while creating a query in EntityManager:
Exception Description: Syntax error parsing [SELECT dv FROM roleassignment ra JOIN dataverse dv ON dv.id = ra.definitionpoint_id WHERE ra.assigneeidentifier = '@user001' OR ra.assigneeidentifier = ':authenticated-users' limit 2].
[90, 182] The expression is not a valid conditional expression.
The query works in psql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplified attempt: DANS-KNAW-jp@4fcf8e1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hacked PermissionServiceBean.findPermittedCollections with "limit 2" for when it should have returned 50K instances. That reacted quickly. So I need
permissionService.findPermittedCollections(req, user, Set.of(Permission.CreateDataset))
with a variant of the query in the simplified attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jo-pol - Not sure where my comment went - so ~retyping.
The main issue with the 'simplified attempt' is that it's a native sql query and not JPA. That could be addressed by using createNativeQuery() or using JPA. I suggest the following code (have not tested). It handles all the groups the user might be in and limits which roles are checked, along with keeping your check that the definitionpoints are Dataverses.
I'm not sure if it is more efficient that the findPermittedCollections - certainly looks simpler. The code below does break finding the roles and groups out into separate queries (less efficient?), but it may be more efficient to be checking the role ids (which hopefully are indexed) rather than doing the bit-level or on the permissions. I'm not sure how well postgres can optimize in either case.
Set<RoleAssignee> ras = new HashSet<>(groupService.groupsFor(req));
ras.add(user);
List<String> raIds = ras.stream().map(roas -> roas.getIdentifier()).collect(Collectors.toList());
List<Long> roleIds = new ArrayList<>();
for (DataverseRole role : roleService.findAll()) {
if (role.permissions().contains(Permission.AddDataset)) {
roleIds.add(role.getId());
}
}
// Just check for more than one matching record
// More efficient than counting all records
try {
return em.createQuery(
"SELECT ra.id FROM RoleAssignment ra " +
"JOIN Dataverse dv ON dv.id = ra.definitionPoint.id " +
"WHERE ra.assigneeIdentifier IN :assigneeIdentifiers " +
"AND ra.role.id IN :roleIds",
Long.class)
.setParameter("assigneeIdentifiers", raIds)
.setParameter("roleIds", roleIds)
.setMaxResults(2)
.getResultList().size() > 1;
} catch (NoResultException e) {
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be addressed by using createNativeQuery()
That is how the permissionService executes the query anyway and how I reused it with limit 2 added. Reusing existing logic reduces maintenance risks IMHO. Updated the cover message with how I tested.
|
discussed during triage - Jim will follow up |
src/main/java/edu/harvard/iq/dataverse/DataverseServiceBean.java
Outdated
Show resolved
Hide resolved
qqmyers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jo-pol - I'm going to approve so it goes to Ready for QA. Can you resolve the merge conflicts and make the requested change to the release notes.
| @@ -0,0 +1,9 @@ | |||
| ### Restored suppression of the host dataverse field | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the release note is what users see, I'd suggest changing the title to "Suppression of the Host Dataverse field" and removing the second (for developers) and third (no longer true) paragraphs.
# Conflicts: # src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
| } | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm puzzled how to assemble the URL for testing linkingDataverses with the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, Intellij does not blame me for the code inside getIpRange and setSearchParamValues but github does blame me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following command looks more like the intellij result
git blame -w -M -C --minimal -L 954,1040 PermissionServiceBean.java
|
@jo-pol - are you done with changes? If so, we can remove your assignment and let it go into QA. |
I am done with the changes and unassigned myself. I'd recommend to test linkingDataverses with the API. Don't have a clue to do it myself. |
cherry pick stopped at merge IQSS/development
|
Hey @jo-pol, I wasn't able to reproduce the issue on demo where we have trouble editing the host DV - could you please try and reproduce the issue in demo (running v6.8 at the moment) - Thanks! |
The following scenario still blocks the deposit page. The top bar and footer are accessible, the deposit form becomes faint and no new draft appears when selecting "My Data".
|
|
Thank you! |




See #11301
and #11700
How I tested the solution:
On our own VM
Tried cases
Saw no performance difference in these cases. The query was executed quickly and only once in both cases.
After solving the merge conflict
Did not retest with the debugger, but performance still was OK.
Also tried
/api/mydata/retrieve/collectionListwhich returned the expected number.