Skip to content
Jeff Handley edited this page Apr 15, 2025 · 5 revisions

Should labels be reviewed and refined before onboarding?

If feasible, yes; but it's not essential. Because the new labeler can be easily retrained at any time, it's suggested to onboard first and retrain when time permits the effort of cleaning up existing labels. Training the models typically takes less than 30 minutes.

When should models be retrained?

Models should be retrained based on newly downloaded data when any of these conditions are true:

  1. Existing labels have been deleted, renamed, or otherwise modified such that prediction labels are stale
  2. New labels have been created and applied to issues/pulls, and it's desired for those labels to begin getting predicted
  3. The repository has gained a high volume of issues/pulls compared to when it was trained, and prediction accuracy is low
  4. The predicted labels are not meeting expectations for any other reason

If a model is not retrained under these circumstances:

  1. A label that has been deleted or renamed can be predicted again, which results in recreating the label automatically
  2. New labels will not be predicted
  3. Prediction accuracy degrades over time

High volume repositories with stable labels can go years without the need for retraining. Because retraining is straightforward and self-service though, teams are empowered to retrain their models at the cadence they find valuable. The results of testing predictions will inform whether a newly trained model should be promoted into use.

Retraining invocation must never be automated

Teams may be tempted to use a cron schedule to automate retraining on a regular bases, but this must not be done. Training must remain a human-triggered event with review of the test data before promotion into usage.

Why do references to the issue-labeler's workflows use full length SHAs?

When onboarding, the workflows added that invoke the issue-labeler reference "reusable workflows" in the dotnet/issue-labeler repository using the full length commit SHA for the associated issue-labeler version.

  • Reusable workflows can be referenced using either tags or full length commit SHAs
  • GitHub's Security hardening for GitHub Actions documentation recommends pinning to the commit SHA as the most secure approach, and we adhere to that guidance
  • The short SHA is not supported by GitHub in this context, and the full length SHA must be used

Does the new labeler apply the untriaged label?

No, it does not. The legacy issue labeler had the optional configuration of applying the untriaged label to new issues, but very few repositories opted into that behavior. The recommended approach for automatically applying untriaged is to create a dotnet-policy-service configuration similar to the one at dotnet-api-docs/.github/policies/untriaged-label.yml. Alternatively, a GitHub workflow can be authored to achieve the same functionality.

Clone this wiki locally