Skip to content

Conversation

@sgfost
Copy link
Contributor

@sgfost sgfost commented May 24, 2025

adds a feature to the CML which allows for:

  • connecting a public GitHub repository with a codebase in the CML and;
  • pushing pushing a git repository constructed from public CML releases to the connected GitHub remote
  • importing individual releases made on a GitHub repository into the CML as new releases (of a special "imported" type) which can be published/peer reviewed

Setup/management is done by the submitter on a configuration page for each codebase, accessible from the sidebar of a codebase page or the feature overview page (/github/). From there, one is walked through the process including connecting a github account with the existing oauth flow and installing our github app on the same account, then given the option of:

  • connecting with a new repository: allows connecting to a blank github repository, then generates a git repo for a codebase which can be pushed (codebase updates are reflected in the git repo and the push can be repeated). Importing GitHub releases back is also allowed
  • connecting with an existing repository: allows connecting with an existing GitHub repo and importing GitHub releases as codebase releases

configuration

-> https://github.com/comses/infra/wiki/GitHub-Sync

details

new models

  • GithubIntegrationAppInstallation: tracks the github app installation status for a user
  • CodebaseGitRemote: represents a remote repository that should be pushed to and/or checked for new releases
  • GitRefSyncState: represents the state of a git ref (commit/tag/branch) that was generated for a codebase and the state of the push to GitHub
  • ImportedReleaseSyncState: represents the state of a release imported from GitHub, the existence of this means a release was imported from github, and should be treated differently in some cases
  • GitHubIntegrationConfiguration: (admin dashboard-editable) feature settings including overview page content

github APIs

  • GitHubApi: main wrapper for github operations (auth, pushing, creating releases, etc.)
  • GitHubReleaseImporter: handles importing releases from github by downloading the archive, extracting metadata, and creating a new CodebaseRelease

filesystem

  • CodebaseGitRepositoryApi: manages local git repository operations like building and updating a git repo from a codebase's (public) releases
  • FileCategories/CategoryManifestManager: file categorization changed in order to handle imported releases which are not organized into category directories, information is instead kept in a manifest in the db and updated by the submitter
  • ImportedCodebaseReleaseFsApi: sibling filesystem API of the regular CodebaseReleaseFsApi with some special handling and additional functionality
  • adds generated LICENSE and CITATION.cff files to new release packages

webhooks

library.views.github_sync_webhook: the github app sends webhook events to the server and we respond to installation events to track user installations

@sgfost sgfost force-pushed the feat/github-sync branch 2 times, most recently from 0e19bb9 to 4374729 Compare June 13, 2025 23:36
@sgfost sgfost force-pushed the feat/github-sync branch from 0f5b4c4 to daa6494 Compare June 30, 2025 20:27
@sgfost sgfost force-pushed the feat/github-sync branch 2 times, most recently from a1bc42c to 63d6eca Compare December 10, 2025 22:52
@sgfost sgfost force-pushed the feat/github-sync branch 2 times, most recently from 83ea70d to 4acff59 Compare December 17, 2025 19:02
@sgfost sgfost force-pushed the feat/github-sync branch 3 times, most recently from 8b2a40d to 924aa14 Compare December 30, 2025 17:31
@sgfost sgfost added the feature label Dec 31, 2025
this API is responsible for managing a local git repository mirror for a
comses codebase. PUBLIC release archives are commits/tags in the
history. Release branches are created for each release and only added to
if there is an update to metadata

`build()` and `append_releases()` are the two main API methods which
construct (or rebuild) a git repo and add new releases to the repo,
respectively

`update_release_branch()` will add a new commit containing changes to a
release branch (and update main if they point to the same thing). This
will mainly be used for updating metadata
the GithubApi provides access to auth and repository actions

adds 3 huey (async) tasks for creating a mirror, updating a mirror, and
updating metadata for a single release of a mirror
* /github page to describe the integration features
* sidebar element on release detail page will show information about
  integration status for that codebase, and allow users with edit
  permissions to create a new mirror
* use installation access tokens for user repos instead of user access
  tokens. this is a more secure workflow
* add GithubIntegrationAppInstallation model for recording app
  installations (this will need to be created/updated using webhooks)
* CodebaseGitMirror/"mirror" now refers to the local git repository
* ^ can have multiple CodebaseGitRemote's which keep track of all the
  information needed to push to/archive from remote repositories

TODO: re-implement views
this replaces the simple modal form to give better control over the feature
The distinguishing feature is whether the release has a non-null
external_release_package

This will be used to 'archive' or pull in releases made on github for
synced repos

currently, the release assets/package is not stored on the filesystem,
instead relying on an external download url, and being only concerned
with metadata
App installation tokens did not give access to get/create repos on a
user's account. Still trying to avoid storing user access tokens
(https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/generating-a-user-access-token-for-a-github-app)
so as a workaround, we will direct the submitter to create a new bare
repo before continuing

* add handler for webhook events for the github sync app
* add form/wizard for linking a pre-existing github repo (archive only)
allows setting up a push/archive sync that will automatically have the
generated git repo pushed to by providing an empty repo, as well as
setting up an archive only sync by providing any github repo

in both cases, the submitter needs to:
- link their github account with the regular oauth flow
  (so we have a way to match users with a github account)
- install the provided 'GitHub Sync' app on their github account with
  access to any repository that will be synced
and squash migrations

"import(ing)" is the wording I keep finding describes the process the
best

other potential names and their issues:
* publish - same name as the direct publishing, releases need to be
  manually published after they are imported
* pull - git command that is not used
* fetch - git command that is not used
* ingest - ok and similar to import but not quite as clear
* re-order and clarify the steps to set up a sync (app installation
  takes place after creating a repo so that permissions can be
  restricted)
* fix push log to actual show useful information
* when toggling push back on, do a build/update + push on the spot
* better error/success messages
and disallow syncing with private repos for now. adds too much
complexity to manage if we allow this

FAQs are now configurable through the settings model for the feature
along with a beta toggle and whether to enable setting up new syncs

a base set of FAQs is created in the migration
* added back CodebaseReleaseFsApi get_sip_list_url and
  get_originals_list_url helpers. I'm not entire sure why these got removed
in addition to the submitter of a specific codebase, this is so we can
display this info/wizard thing outside the context of any given codebase

* also give some context to the old external repository url field (just
  a reference)
this solves the issue where imported + published releases were trying to
show the progress without access to the original files by adding a
read-only archive preview for both types of releases
adding tags by string name is unsafe when there are (case-insensitive)
duplicate tags in the db
by splitting up into tabs and giving direct access to the feature by
providing a list of the user's submitted models to configure a sync for

* fixed an issue where importing couldn't be set up for unpublished
  codebases
+ add link to github sync main page
coerce_codemeta was incorrectly raising an exception if codemeta=None
(and codebase=None, release=None) was given when it should simply fall
through
- fixed bug preventing creating new drafts when the last release is an
  imported release
- disable requesting review (with note) on imported+published releases
- add instructions in email for making revisions on an imported review
* fixed issue with re-imported releases not updating the release package
  info (downloadurl, tag name, etc.)
* refactor metadata extractor for easier priority changes
in other words, only create draft copies when the release is published,
unpublished (how imported releases come in) should work the same as
drafts

also added some extra safety checks for requesting review on a release
is that already under review or completed review
on the model detail page as well as in review stuff
rather than opaque automatic syncing when a release is published on
either end of the cml<->github integration, 'imports' and 'pushes' are
manually triggered in a management view

configuration of a github connection is now in a much more streamlined
wizard-style set of steps with better feedback
a simpler 1:1 representation of the git refs (commit/branch/tag) and
associated push state replaces the previous N:1 which would create
multiple records for a single release to represent a connection with a
remote

since remote repo swapping will be rare, this greatly cuts down on
complexity and fixes several bugs at the cost of losing metadata when
swapping remotes

also added support for tracking the main branch with the same
GitRefSyncState model
also, prevent multiple concurrent push jobs and fix up some wording
now using a table that visually pairs releases on either side

* also added support for going straight to the github integration setup
  from codebase creation form, for people who would like to import their
  work from there
pending initial seed migration for some of the db-driven content
- incorporate upstream fix in codemeticulous, now install from specific
  commit on SciCodes repo, still pending pypi release
- on a new release, save before generating codemeta since the full
  generation would always fail without a pk
- fix a couple type mismatches in the fs module and settings
also fix some potential type errors in importing
- give hint if no github releases are found for connected repo
- add repo connection success message with info about returning to the
  management page
- fix issue where deactivated remotes couldn't be re-activated for the
  same codebase
- link to/explain how to submit new models straight from github
- spell out more requirements for making new repos and disable
  continuing until "create a new blank repo" is clicked on
based on the last large refactor that involved changing up the data
model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant