-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Parent issue: CERNDocumentServer/cds-rdm#440
This issue is part of inveniosoftware/product-rdm#226 to add a GitLab integration. In order to reduce code duplication, the approach will be to adapt invenio-github and turn it into a "generic" module that supports any Version Control System (VCS) as long as it provides the necessary APIs and functionality. Implementations for specific VCSes like GitHub and GitLab will be provided in new contrib files.
Stage 1
Aiming to complete a fully functional, production-ready, well-documented MVP. We will regard it as complete when:
- it provides an end-user experience equivalent to the current GitHub integration, preferably as similar as possible.
- This has some scalability issues but for now we will avoid changing too much
- a clear migration script and guide are available and have been thoroughly tested with existing Zenodo data
- unit tests have been updated
- as many bugs fixed as possible
Work is split between several PRs to make reviewing easier. These will be merged into this repo's (invenio-vcs) master branch. We will only create a release once all the functionality is ready. Before the first release, master may contain incomplete unrunnable code. For a snapshot of the latest runnable state of the VCS integration, please refer to my fork's master.
- init: set up base invenio-vcs repository, without important code #1
- feat(vcs): service layer #3
- feat(vcs): data layer #4
- feat(vcs): views layer #5
- feat(vcs): provider layer #6
- docs: upgrade and usage guide #7
- test: compat for VCS #8
- compat for new VCS integration invenio-rdm-records#2128
- config: compat with new VCS integration invenio-app-rdm#3162
Todo for stage 1
- GitLab contrib. This is a priority as it's needed to test a lot of the other features (e.g. auth). It's very difficult to test e.g. OAuth without it.
- OAuth user ID correlation
- i.e. if the VCS provider uses the same OAuth server to authenticate the user as the Invenio instance, we should check the user IDs to make sure they match. This is useful for CDS-RDM where users will be able to link CERN GitLab, which uses the same CERN SSO.
- We could express this through a more versatile hook function that returns whether/not we should accept the authenticated user.
- Update: This can be done relatively easily by configuring a custom
info_serializerhandler ininvenio.cfg. See the example for CDS: feat(vcs): support for new VCS integration CERNDocumentServer/cds-rdm#554
- Sync VCS repositories straight into the
vcs_repositoriestable instead of the OAuth remote userextra_datafield.- This will make querying a lot easier so we can paginate/search on the repository list page, which is currently very slow for users on e.g. GitLab instances where they have membership of thousands of repos due to group membership.
- Check duplication for organisational/team repos if multiple people activate them
- What happens if a user is deleted? How can we transfer the repos?
- Repo name should not be unique individually. It is unique as a tuple of (provider_id,provider,name)
- UI bug with menu not being able to differentiate between multiple dynamically-registered entries
- Unit tests
- Documentation
- Migration script and guide
- Careful testing of DB migration for existing GitHub repos/releases
- Some UI pages have not been adapted and continue to throw errors
- JSONB extra_data in oauthclient
-
Correct handling of dependency in InvenioRDM- We are keeping invenio-vcs as a mandatory dependency for now
- Check permissions
- Notifications on failed/successful archive
- See GitLab integration product-rdm#226 (comment)
- If the release fails, we should still save the draft. We will send a notification to the user so they can edit the draft until the errors are resolved.
- Requires implementing a number of classes for https://github.com/inveniosoftware/invenio-notifications
- Check sync algorithm for race conditions and performance issues.
- Copy over changes from api: optimize sync process with batch task execution invenio-github#197
Stage 2
The following features will only be implemented in future PRs once Stage 1 has been fully completed and merged:
Refresh token supportDone- In the existing GitHub impl we use access tokens which are non-expiring by default. This is a security issue in case of a database leak and is recommended against by RFC 6749.
- A PR exists (OAuth2 Token refresh implemented invenio-oauthclient#328) but needs some more work (last commit May 2024)
- Support for private repositories
"Link-only" OAuth without an option to "sign in with" a remoteDone- React + API-based UI for pagination/search of repos, using OpenSearch
- See GitLab integration product-rdm#235 (reply in thread) for details\
- Selecting community to directly publish the repo to (especially on community-mandatory instances): GitLab integration product-rdm#235 (comment)
- Propagate permissions so users who have access to a repo also have access to records created from releases
- Correct handling of orphaned repos
- Right now when a user disconnects their VCS accounts, we disable the hook on all repos they have access to. Obviously this is the wrong behaviour (e.g. what if there are other users still left connected that have access to the repo) so we need to implement something more logical.
- Allow customising the relation type on records created from repos on InvenioRDM.
- Currently, we add the repo as a Related Identifier with a relation type of
is supplement towhich seems a little unusual. We should consider whether this is a reasonable default to have, and probably allow for more easily customising it via a config variable.
- Currently, we add the repo as a Related Identifier with a relation type of
- Badge redirect doesn't work if user is not signed in
- When using the config override the OAuth remote app config is not updated as this is initialised before the override function is called. The remote app config is copied not referenced, so when we update our config it doesn't get passed down. This makes it hard to e.g. change the
base_urlwithout editinginvenio.cfg. - Licenses are not being synced for GitLab, which makes all records be CC licensed by default. This is slightly tricky to fix due to GitLab API limitations.
Migration considerations
The current plan for the GitHub to VCS migration is as follows:
- We merge all the code into the
invenio-githubfeature/vcsbranch. Once it's ready, we move that branch to a new repository,invenio-vcs. This is also published as theinvenio-vcsPyPI package. - We drop support for
invenio-githuband remove the mandatory dependency ininvenio-rdm-records. However, we continue to allow it as an optional dependency (on the instance level), so users do not have migrate immediately before the official RDM v14 release.- To allow for this, we keep the old GitHub bindings in
invenio-rdm-records. TheGITHUB_RELEASE_CLASSconfig var (now renamed toVCS_RELEASE_CLASS) can be used to set the old bindings.
- To allow for this, we keep the old GitHub bindings in
invenio-vcswill be an optional dependency at the instance level. To activate it, it needs to be installed manually, which the documentation will explain how to do. Bothinvenio-githubandinvenio-vcswill have checks to ensure they cannot be installed simultaneously.invenio-vcswill check if thegithub_repositories/github_releasestables (which exist on all InvenioRDM instances) are empty. If not, it will migrate the data into the new tables. Otherwise, it will simply initialise the new tables as empty.- We will need to ensure
invenio-githubis compatible with the latestinvenio-rdm-records(e.g. OAuthClient versions need to be the same).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
