-
Couldn't load subscription status.
- Fork 22
FAQ
Models should be retrained based on newly downloaded data when any of these conditions are true:
- Existing labels have been deleted, renamed, or otherwise modified such that prediction labels are stale
- New labels have been created and applied to issues/pulls, and it's desired for those labels to begin getting predicted
- The repository has gained a high volume of issues/pulls compared to when it was trained, and prediction accuracy is low
- The predicted labels are not meeting expectations for any other reason
If a model is not retrained under these circumstances:
- A label that has been deleted or renamed can be predicted again, which results in recreating the label automatically
- New labels will not be predicted
- Prediction accuracy degrades over time
High volume repositories with stable labels can go years without the need for retraining. Because retraining is straightforward and self-service though, teams are empowered to retrain their models at the cadence they find valuable. The results of testing predictions will inform whether a newly trained model should be promoted into use.
Teams may be tempted to use a cron schedule to automate retraining on a regular bases, but this must not be done. Training must remain a human-triggered event with review of the test data before promotion into usage.
When onboarding, the workflows added that invoke the issue-labeler reference "reusable workflows" in the dotnet/issue-labeler repository using the full length commit SHA for the associated issue-labeler version.
- Reusable workflows can be referenced using either tags or full length commit SHAs
- GitHub's Security hardening for GitHub Actions documentation recommends pinning to the commit SHA as the most secure approach, and we adhere to that guidance
- The short SHA is not supported by GitHub in this context, and the full length SHA must be used