Conversation
|
Getting closer to completion now, but there are a few things I have not yet been able to verify:
|
| self.api_url = settings.DANDI_DOI_API_URL | ||
| self.api_user = settings.DANDI_DOI_API_USER | ||
| self.api_password = settings.DANDI_DOI_API_PASSWORD | ||
| self.api_prefix = settings.DANDI_DOI_API_PREFIX |
There was a problem hiding this comment.
Previously, we would use settings.DANDI_DOI_API_PREFIX or '10.80507'
If api prefix is not set, DOI API operations should be prevented by is_configured() so I think this is appropriate
5316e66 to
73de829
Compare
There was a problem hiding this comment.
todo should still be a celery task, but we should make sure it passes or roll back transaction
There was a problem hiding this comment.
I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.
For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI (or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.
This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.
There was a problem hiding this comment.
My main concern was not to orphan the doi, but this seems like a reasonable approach
138e463 to
c1a0215
Compare
yarikoptic
left a comment
There was a problem hiding this comment.
just posting an exception we observerd while trying locally. Ideally should be more graceful I think
|
+1 we can handle that more gracefully. This exception occurred because the Dandiset was created by the superuser without a name. So FWIW even if we relax how to_datacite sets that value it will still be an invalid DOI payload. |
a5b9d24 to
3fd6081
Compare
|
@waxlamp @jjnesbitt This item is on our weekly meeting notes "beyond 30 minutes of Roni" so we rarely get to it. Could someone give the initial review? this "feature" was being in the kitchen for way too long with the design doc PR (#2012) and now this grew big and potentially growing conflicts. |
Yes, I am in the process of reviewing it. |
| # Retry with PUT if DOI already exists | ||
| update_url = f'{self.api_url}/{doi}' |
There was a problem hiding this comment.
If the DOI already exists, what does making a PUT request do? Update the metadata?
There was a problem hiding this comment.
I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.
For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI (or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.
This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.
| draft_version = dandiset.versions.filter(version='draft').first() | ||
| if draft_version and draft_version.doi is not None: |
There was a problem hiding this comment.
I have a couple of suggestions. First, we know a draft version will always exist on a dandiset, as it is created in the same transaction as the dandiset. So we don't need to check that it exists here.
Second, I actually think we should prevent deleting the dandiset until the DOI is generated. Imagine someone creates and then immediately deletes a dandiset, at the same time the celery task to create the DOI is running. This code block would be skipped, and the dandiset would still be deleted, but the DOI could have been created during that time. In that case, we would have "orphaned DOIs" to worry about.
I think it's acceptable to block deletion on draft DOI creation, and raise an appropriate error message from the request.
| # For unpublished dandisets, update or create the draft DOI | ||
| # to keep it in sync with the latest metadata | ||
| if not locked_version.dandiset.embargoed: | ||
| transaction.on_commit( | ||
| lambda: update_draft_version_doi_task.delay(locked_version.id) | ||
| ) | ||
| else: | ||
| logger.debug( | ||
| 'Skipping DOI update for embargoed Dandiset %s.', | ||
| locked_version.dandiset.identifier, | ||
| ) |
There was a problem hiding this comment.
If we're going to be keeping doi metadata up to date anytime a draft version is updated, then I'm even more in favor of what I proposed in my other comment (a table for keeping track of this). Doing it in this fashion is just a disaster waiting to happen in regards to synchronization of metadata.
|
Hi @jjnesbitt , should we wait on you to implement that extra table (then when do you think you would need time) or should we attempt at doing that? |
I can take this on, as I've already begun the implementation. |
03bee6c to
65e8306
Compare
95ef02b to
39dd8a8
Compare
- Dandiset DOI will redirect to the DLP - Example: 10.80507/dandi.000004 - Dandiset DOI is stored in the doi field of the draft version - Dandiset DOI metadata (on Datacite) will match the draft version until first publication - Once a Dandiset is published, the Dandiset DOI metadata will match the latest publication See the design document for more details: dandi#2012
The datacite_client_unsafe bypasses the safety check, but can still be used safely when mocking requests.
39dd8a8 to
988bb7b
Compare
|
@jjnesbitt how do you feel about this PR/work overall -- could/should we get it closer/over the finish line in the near future? |
|
Apologies for the delay, I've gotten back to working on this and have made significant progress. For ease of development I've made a separate branch based off this one, and will open that PR once it's ready (hopefully soon). |
|
@jjnesbitt I'm going to close this since youll be doing a new one. Feel free to reopen if you prefer. |
Example of Dandiset DOI (note -- no version) : 10.80507/dandi.000004
Remaining TODOs:
e2e/tests/dandisetLandingPage.spec.tsto ensure we have DOIsauditrelated records on DOI operationsMay be later as part of this depending on what comes in first