Skip to content

Make invenio-github support other VCS providers #2

@palkerecsenyi

Description

@palkerecsenyi

Parent issue: CERNDocumentServer/cds-rdm#440


This issue is part of inveniosoftware/product-rdm#226 to add a GitLab integration. In order to reduce code duplication, the approach will be to adapt invenio-github and turn it into a "generic" module that supports any Version Control System (VCS) as long as it provides the necessary APIs and functionality. Implementations for specific VCSes like GitHub and GitLab will be provided in new contrib files.

Stage 1

Aiming to complete a fully functional, production-ready, well-documented MVP. We will regard it as complete when:

  • it provides an end-user experience equivalent to the current GitHub integration, preferably as similar as possible.
    • This has some scalability issues but for now we will avoid changing too much
  • a clear migration script and guide are available and have been thoroughly tested with existing Zenodo data
  • unit tests have been updated
  • as many bugs fixed as possible

Work is split between several PRs to make reviewing easier. These will be merged into this repo's (invenio-vcs) master branch. We will only create a release once all the functionality is ready. Before the first release, master may contain incomplete unrunnable code. For a snapshot of the latest runnable state of the VCS integration, please refer to my fork's master.

Todo for stage 1

  • GitLab contrib. This is a priority as it's needed to test a lot of the other features (e.g. auth). It's very difficult to test e.g. OAuth without it.
  • OAuth user ID correlation
    • i.e. if the VCS provider uses the same OAuth server to authenticate the user as the Invenio instance, we should check the user IDs to make sure they match. This is useful for CDS-RDM where users will be able to link CERN GitLab, which uses the same CERN SSO.
    • We could express this through a more versatile hook function that returns whether/not we should accept the authenticated user.
    • Update: This can be done relatively easily by configuring a custom info_serializer handler in invenio.cfg. See the example for CDS: feat(vcs): support for new VCS integration CERNDocumentServer/cds-rdm#554
  • Sync VCS repositories straight into the vcs_repositories table instead of the OAuth remote user extra_data field.
    • This will make querying a lot easier so we can paginate/search on the repository list page, which is currently very slow for users on e.g. GitLab instances where they have membership of thousands of repos due to group membership.
  • Check duplication for organisational/team repos if multiple people activate them
    • What happens if a user is deleted? How can we transfer the repos?
  • Repo name should not be unique individually. It is unique as a tuple of (provider_id,provider,name)
  • UI bug with menu not being able to differentiate between multiple dynamically-registered entries
    • For example: image
  • Unit tests
  • Documentation
  • Migration script and guide
  • Careful testing of DB migration for existing GitHub repos/releases
  • Some UI pages have not been adapted and continue to throw errors
  • JSONB extra_data in oauthclient
  • Correct handling of dependency in InvenioRDM
    • We are keeping invenio-vcs as a mandatory dependency for now
  • Check permissions
  • Notifications on failed/successful archive
  • Check sync algorithm for race conditions and performance issues.

Stage 2

The following features will only be implemented in future PRs once Stage 1 has been fully completed and merged:

Migration considerations

The current plan for the GitHub to VCS migration is as follows:

  • We merge all the code into the invenio-github feature/vcs branch. Once it's ready, we move that branch to a new repository, invenio-vcs. This is also published as the invenio-vcs PyPI package.
  • We drop support for invenio-github and remove the mandatory dependency in invenio-rdm-records. However, we continue to allow it as an optional dependency (on the instance level), so users do not have migrate immediately before the official RDM v14 release.
    • To allow for this, we keep the old GitHub bindings in invenio-rdm-records. The GITHUB_RELEASE_CLASS config var (now renamed to VCS_RELEASE_CLASS) can be used to set the old bindings.
  • invenio-vcs will be an optional dependency at the instance level. To activate it, it needs to be installed manually, which the documentation will explain how to do. Both invenio-github and invenio-vcs will have checks to ensure they cannot be installed simultaneously.
  • invenio-vcs will check if the github_repositories/github_releases tables (which exist on all InvenioRDM instances) are empty. If not, it will migrate the data into the new tables. Otherwise, it will simply initialise the new tables as empty.
  • We will need to ensure invenio-github is compatible with the latest invenio-rdm-records (e.g. OAuthClient versions need to be the same).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

To release 🤖

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions