-
Notifications
You must be signed in to change notification settings - Fork 21
Onboarding
Repositories can onboard the issue-label by configuring the necessary settings in the repository and creating the necessary GitHub workflows using the reference examples below. Once the workflows are merged into the repository, only a few steps are necessary to train and promote the models for predictions to begin. The entire process typically takes less than 2 hours for very large repositories.
Using the issue-labeler workflows requires some settings in GitHub to be configured before onboarding can be completed.
Starting from the repository to be onboarded, navigate to Settings > Actions > General (https://github.com/org/repo/settings/actions).
These are the only settings required for running the issue-labeler workflows.
- Choose: Allow enterprise, and select non-enterprise, actions and reusable workflows
- Enable: Allow actions created by GitHub
-
Allow specified actions and reusable workflows:
dotnet/issue-labeler/.github/workflows/*
- Click Save
While unrelated to issue-labeler, it is recommended to select Require approval for all external contributors. If a pull request from an external contributor prompts for approving its workflow runs, the PR's code should be thoroughly reviewed before approving the workflow run, as there are security implications to consider and it is not typical for pull requests to require such approvals unless the PR is expected to introduce new GitHub workflows.
Reference documentation:
- Approving workflow runs from public forks - GitHub Docs
- Security hardening for GitHub Actions - GitHub Docs
- Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests | GitHub Security Lab
While unrelated to issue-labeler, it is recommended to disable Allow GitHub Actions to create and approve pull requests unless the repository has explicitly configured a workflow for such purpose.
With the required GitHub Actions settings configured, the issue-labeler can be onboarded by adding the following workflow files into your repository. This is entirely self-service.
This single workflow is manually triggered from the Actions page, and each of the following steps can be enabled or disabled each time it's run.
- Download issues from GitHub
- Download pull requests from GitHub
- Train an issues model
- Train a pulls model
- Test the issues model
- Test the pulls model
If all of these steps are enabled for the run, the single workflow will do all the work necessary to prepare a repository for predicting labels on issues and pull requests. Repositories with around 100,000 issues/pulls typically complete the training process in about 2 hours.
By default, the workflow will save the new data and models into staging slots within the cache.
name: "Labeler: Train Models"
on:
# Dispatched via the Actions UI, stages new models for promotion consideration
# Each step of the workflow can be run independently: Download, Train, and Test
workflow_dispatch:
inputs:
download_issues:
description: "Issues: Download Data"
type: boolean
default: true
train_issues:
description: "Issues: Train Model"
type: boolean
default: true
test_issues:
description: "Issues: Test Model"
type: boolean
default: true
download_pulls:
description: "Pulls: Download Data"
type: boolean
default: true
train_pulls:
description: "Pulls: Train Model"
type: boolean
default: true
test_pulls:
description: "Pulls: Test Model"
type: boolean
default: true
data_limit:
description: "Max number of items to include in the model"
type: number
cache_key_suffix:
description: "The cache key suffix to use for staging data/models (use 'LIVE' to bypass staging)"
type: string
required: true
default: "staging"
jobs:
labeler-train:
permissions:
issues: read
pull-requests: read
actions: write
uses: dotnet/issue-labeler/.github/workflows/train.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
download_issues: ${{ inputs.download_issues }}
train_issues: ${{ inputs.train_issues }}
test_issues: ${{ inputs.test_issues }}
download_pulls: ${{ inputs.download_pulls }}
train_pulls: ${{ inputs.train_pulls }}
test_pulls: ${{ inputs.test_pulls }}
data_limit: ${{ inputs.data_limit && fromJSON(inputs.data_limit) || 0 }}
cache_key_suffix: ${{ inputs.cache_key_suffix }}
label_prefix: "area-"
threshold: 0.40This workflow can promote issue and/or pull request models into the LIVE cache slot to be used by predictions. The approach of training new models into a staging slot is that the new model can be tested without disrupting ongoing labeling in the repository. Once a new model is confirmed to meet expectations, it can be promoted.
name: "Labeler: Promote Models"
on:
# Dispatched via the Actions UI, promotes the staged models from
# a staging slot into the prediction environment
workflow_dispatch:
inputs:
promote_issues:
description: "Issues: Promote Model"
type: boolean
required: true
promote_pulls:
description: "Pulls: Promote Model"
type: boolean
required: true
model_cache_key:
description: "The cache key suffix to promote into the 'LIVE' cache"
type: string
required: true
default: "staging"
backup_cache_key:
description: "The cache key suffix to use for backing up the currently promoted model"
type: string
default: "backup"
permissions:
actions: write
jobs:
labeler-promote-issues:
if: ${{ inputs.promote_issues }}
uses: dotnet/issue-labeler/.github/workflows/promote-issues.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
model_cache_key: ${{ inputs.model_cache_key }}
backup_cache_key: ${{ inputs.backup_cache_key }}
labeler-promote-pulls:
if: ${{ inputs.promote_pulls }}
uses: dotnet/issue-labeler/.github/workflows/promote-pulls.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
model_cache_key: ${{ inputs.model_cache_key }}
backup_cache_key: ${{ inputs.backup_cache_key }}Predict labels for issues as they are opened in the repository. This workflow can also be triggered manually to label ranges of issue numbers.
name: "Labeler: Predict Issue Labels"
on:
# Only automatically predict area labels when issues are originally opened
issues:
types: opened
# Allow dispatching the workflow via the Actions UI, specifying ranges of numbers
workflow_dispatch:
inputs:
issue_numbers:
description: "Issue Numbers (comma-separated list of ranges)"
type: string
model_cache_key:
description: "The cache key suffix to use for loading the model"
type: string
required: true
default: "LIVE"
jobs:
predict-issues:
# Do not run the workflow on forks outside the 'dotnet' org
if: ${{ github.repository_owner == 'dotnet' && (inputs.issue_numbers || github.event.issue.number) }}
permissions:
issues: write
uses: dotnet/issue-labeler/.github/workflows/predict-issues.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
model_cache_key: ${{ inputs.model_cache_key }}
issue_numbers: ${{ inputs.issue_numbers || github.event.issue.number }}
label_prefix: "area-"
threshold: 0.40
default_label: "needs-area-label"Predict labels for pull requests as they are opened in the repository. This workflow can also be triggered manually to label ranges of pull request numbers.
name: "Labeler: Predict Pull Labels"
on:
# Per to the following documentation:
# https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request_target
#
# The `pull_request_target` event runs in the context of the base of the pull request, rather
# than in the context of the merge commit, as the `pull_request` event does. This prevents
# execution of unsafe code from the head of the pull request that could alter the repository
# or steal any secrets you use in your workflow. This event allows your workflow to do things
# like label or comment on pull requests from forks.
#
# Only automatically predict area labels when pull requests are first opened
pull_request_target:
types: opened
branches:
- 'main'
# Allow dispatching the workflow via the Actions UI, specifying ranges of numbers
workflow_dispatch:
inputs:
pull_numbers:
description: "Pull Numbers (comma-separated list of ranges)"
type: string
model_cache_key:
description: "The cache key suffix to use for loading the model"
type: string
required: true
default: "LIVE"
jobs:
predict-pulls:
# Do not run the workflow on forks outside the 'dotnet' org
if: ${{ github.repository_owner == 'dotnet' && (inputs.pull_numbers || github.event.number) }}
permissions:
pull-requests: write
uses: dotnet/issue-labeler/.github/workflows/predict-pulls.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
model_cache_key: ${{ inputs.model_cache_key }}
pull_numbers: ${{ inputs.pull_numbers || github.event.number }}
label_prefix: "area-"
threshold: 0.40
default_label: "needs-area-label"Restores the Predictor app and the prediction models from cache, failing if any of the cache entries is missing. This workflow should be called on a daily cron schedule.
name: "Labeler: Cache Retention"
on:
schedule:
- cron: "6 3 * * *" # 3:06 every day (arbitrary time daily, modified to different values in each repository)
workflow_dispatch:
jobs:
cache-retention:
# Do not run the workflow on forks outside the 'dotnet' org
if: ${{ github.repository_owner == 'dotnet' }}
uses: dotnet/issue-labeler/.github/workflows/cache-retention.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1This workflow will rebuild the Predictor app into the GitHub Action Cache if it gets evicted from cache and prediction jobs start failing.
name: "Labeler: Build Predictor App"
on:
# Allow dispatching the workflow via the Actions UI
workflow_dispatch:
inputs:
rebuild:
description: "Force a rebuild of the app"
type: boolean
jobs:
build-predictor:
permissions:
actions: write
uses: dotnet/issue-labeler/.github/workflows/build-predictor.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
with:
rebuild: ${{ inputs.rebuild }}With the required GitHub Actions settings configured and the above workflows merged, the prediction workflows will begin trying to run each time a new issue or pull request is opened. Those workflows will fail until a model has been trained and promoted.
To train the issue and pull models, navigate to the Actions page for the repository and select Labeler: Train Models in the list of workflows on the left.
A blue banner will be displayed indicating, "This workflow has a workflow_dispatch event trigger." Click Run workflow. Leaving all of the inputs on their defaults will conduct the entire download/train/test process for both issues and pull requests.
Click the Run workflow button to start the training process. Progress can be monitored from the workflow run's details page.
Once the workflow completes, the result will be a pair of models saved into the GitHub Action Cache using a 'staging' cache key suffix. There will also be data files saved into the GitHub Action Cache, also using the 'staging' cache key suffix. And the Predictor app is also built and saved into the GitHub Action Cache for use by the prediction workflows.
Within the workflow run's details, the test-issues and test-pulls steps can be reviewed for confirming the model can predict labels with acceptable accuracy. Click on test-issues or test-pulls within the labeler-train job.
In the log for the labeler-train / labeler-test-issues / test-issues step, expand the section for Run Tester to see the logs. When the Tester is running, it emits accumulated accuracy results to the log after each issue or pull request tested. By scrolling to the end of this log section, the final accumulated output can be reviewed.
The results show:
- Matches: A (B %) -- The number/percentage of issues/pulls where the prediction matches the existing label
- Mismatches: C (D %) -- The number/percentage of issues/pulls where the prediction does not match the existing label
- No Prediction: E (F %) -- The number/percentage of issues/pulls where no prediction was made, but the existing issue/pull does have an applicable label
- No Existing: G (H %) -- The number/percentage of issues/pulls where a prediction was made, but the existing issue/pull does not have an applicable label
Teams have typically been pleased when the issue-labeler is able to achieve 65%+ Matches, with the remainder split between Mismatches and No Prediction. If your repository's results are less favorable than 65% Matches, it is recommended you review your existing issues' and pulls' labels to ensure they are labeled accurately. After refining labels, the Labeler: Train Models workflow can be re-run to review the new results.
Once models are trained with favorable results, they can be promoted into the LIVE cache entries to be consumed by the prediction workflows. From the Actions page, select Labeler: Promote Models in the list of workflows on the left.
A blue banner will be displayed indicating, "This workflow has a workflow_dispatch event trigger." Click Run workflow. The checkboxes for Issues: Promote Model and Pulls: Promote Model are disabled by default. By checking both boxes and clicking Run workflow, the models trained and staged above will be promoted into immediate use by the prediction workflows.
The promotion workflow offers the ability to create a backup of any existing 'LIVE' models. If needed, the promotion workflow can promote from the 'backup' key suffix back into 'LIVE'.
The cache retention workflow that was added is configured to run on a daily schedule, ensuring that the Predictor app and trained models are restored from cache at least once daily to prevent cache evictions after 7 days of no use.
It is recommended to manually run the cache retention workflow after onboarding to test the workflow in your repository.
From the Actions page, select Labeler: Cache Retention from the list of workflows on the left. Choose Run workflow and click the Run workflow button.
The Labeler: Predict Issue Labels and Labeler: Predict Pull Labels workflows can be invoked manually through GitHub's Actions page, and they will also run automatically when new issues and pull requests are opened.
When running manually, a comma-separated list of number ranges can be entered, or the field can be left empty to run prediction over all issues/pulls that do not have an appropriate label. After onboarding, if there are issues or pulls that have not already been labeled, these workflows can be run to fill in those gaps and test the results of the issue-labeler over new issues/pulls.
When running bulk prediction jobs, be aware that GitHub's API Rate Limit will apply and cause requests for downloading issues/pulls and updating labels to fail. This may cause the job to fail or be delayed while a back-off retry strategy is applied (as of v1.0.2). Expect to be able to process about 2000 issues or pull requests per hour before the rate limit is reached.