Onboarding

Repositories can onboard the issue-label by configuring the necessary settings in the repository and creating the necessary GitHub workflows using the reference examples below. Once the workflows are merged into the repository, only a few steps are necessary to train and promote the models for predictions to begin. The entire process typically takes less than 2 hours for very large repositories.

Required GitHub Actions settings

Using the issue-labeler workflows requires some settings in GitHub to be configured before onboarding can be completed.

Starting from the repository to be onboarded, navigate to Settings > Actions > General (https://github.com/org/repo/settings/actions).

Actions permissions

These are the only settings required for running the issue-labeler workflows.

Choose: Allow enterprise, and select non-enterprise, actions and reusable workflows
- Enable: Allow actions created by GitHub
- Allow specified actions and reusable workflows: dotnet/issue-labeler/.github/workflows/*
Click Save

Approval for running fork pull request workflows from contributors

While unrelated to issue-labeler, it is recommended to select Require approval for all external contributors. If a pull request from an external contributor prompts for approving its workflow runs, the PR's code should be thoroughly reviewed before approving the workflow run, as there are security implications to consider and it is not typical for pull requests to require such approvals unless the PR is expected to introduce new GitHub workflows.

Reference documentation:

Workflow permissions

While unrelated to issue-labeler, it is recommended to disable Allow GitHub Actions to create and approve pull requests unless the repository has explicitly configured a workflow for such purpose.

GitHub Workflows to add

With the required GitHub Actions settings configured, the issue-labeler can be onboarded by adding the following workflow files into your repository. This is entirely self-service.

`/.github/workflows/labeler-train.yml`

This single workflow is manually triggered from the Actions page, and each of the following steps can be enabled or disabled each time it's run.

Download issues from GitHub
Download pull requests from GitHub
Train an issues model
Train a pulls model
Test the issues model
Test the pulls model

If all of these steps are enabled for the run, the single workflow will do all the work necessary to prepare a repository for predicting labels on issues and pull requests. Repositories with around 100,000 issues/pulls typically complete the training process in about 2 hours.

By default, the workflow will save the new data and models into staging slots within the cache.

name: "Labeler: Train Models"

on:
  # Dispatched via the Actions UI, stages new models for promotion consideration
  # Each step of the workflow can be run independently: Download, Train, and Test
  workflow_dispatch:
    inputs:
      download_issues:
        description: "Issues: Download Data"
        type: boolean
        default: true
      train_issues:
        description: "Issues: Train Model"
        type: boolean
        default: true
      test_issues:
        description: "Issues: Test Model"
        type: boolean
        default: true
      download_pulls:
        description: "Pulls: Download Data"
        type: boolean
        default: true
      train_pulls:
        description: "Pulls: Train Model"
        type: boolean
        default: true
      test_pulls:
        description: "Pulls: Test Model"
        type: boolean
        default: true

      data_limit:
        description: "Max number of items to include in the model"
        type: number

      cache_key_suffix:
        description: "The cache key suffix to use for staging data/models (use 'LIVE' to bypass staging)"
        type: string
        required: true
        default: "staging"

jobs:
  labeler-train:
    permissions:
      issues: read
      pull-requests: read
      actions: write
    uses: dotnet/issue-labeler/.github/workflows/train.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      download_issues: ${{ inputs.download_issues }}
      train_issues: ${{ inputs.train_issues }}
      test_issues: ${{ inputs.test_issues }}
      download_pulls: ${{ inputs.download_pulls }}
      train_pulls: ${{ inputs.train_pulls }}
      test_pulls: ${{ inputs.test_pulls }}
      data_limit: ${{ inputs.data_limit && fromJSON(inputs.data_limit) || 0 }}
      cache_key_suffix: ${{ inputs.cache_key_suffix }}
      label_prefix: "area-"
      threshold: 0.40

`/.github/workflows/labeler-promote.yml`

This workflow can promote issue and/or pull request models into the LIVE cache slot to be used by predictions. The approach of training new models into a staging slot is that the new model can be tested without disrupting ongoing labeling in the repository. Once a new model is confirmed to meet expectations, it can be promoted.

name: "Labeler: Promote Models"

on:
  # Dispatched via the Actions UI, promotes the staged models from
  # a staging slot into the prediction environment
  workflow_dispatch:
    inputs:
      promote_issues:
        description: "Issues: Promote Model"
        type: boolean
        required: true
      promote_pulls:
        description: "Pulls: Promote Model"
        type: boolean
        required: true
      model_cache_key:
        description: "The cache key suffix to promote into the 'LIVE' cache"
        type: string
        required: true
        default: "staging"
      backup_cache_key:
        description: "The cache key suffix to use for backing up the currently promoted model"
        type: string
        default: "backup"

permissions:
  actions: write

jobs:
  labeler-promote-issues:
    if: ${{ inputs.promote_issues }}
    uses: dotnet/issue-labeler/.github/workflows/promote-issues.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      model_cache_key: ${{ inputs.model_cache_key }}
      backup_cache_key: ${{ inputs.backup_cache_key }}

  labeler-promote-pulls:
    if: ${{ inputs.promote_pulls }}
    uses: dotnet/issue-labeler/.github/workflows/promote-pulls.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      model_cache_key: ${{ inputs.model_cache_key }}
      backup_cache_key: ${{ inputs.backup_cache_key }}

`/.github/workflows/labeler-predict-issues.yml`

Predict labels for issues as they are opened in the repository. This workflow can also be triggered manually to label ranges of issue numbers.

name: "Labeler: Predict Issue Labels"

on:
  # Only automatically predict area labels when issues are originally opened
  issues:
    types: opened

  # Allow dispatching the workflow via the Actions UI, specifying ranges of numbers
  workflow_dispatch:
    inputs:
      issue_numbers:
        description: "Issue Numbers (comma-separated list of ranges)"
        type: string
      model_cache_key:
        description: "The cache key suffix to use for loading the model"
        type: string
        required: true
        default: "LIVE"

jobs:
  predict-issues:
    # Do not run the workflow on forks outside the 'dotnet' org
    if: ${{ github.repository_owner == 'dotnet' && (inputs.issue_numbers || github.event.issue.number) }}
    permissions:
      issues: write
    uses: dotnet/issue-labeler/.github/workflows/predict-issues.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      model_cache_key: ${{ inputs.model_cache_key }}
      issue_numbers: ${{ inputs.issue_numbers || github.event.issue.number }}
      label_prefix: "area-"
      threshold: 0.40
      default_label: "needs-area-label"

`/.github/workflows/labeler-predict-pulls.yml`

Predict labels for pull requests as they are opened in the repository. This workflow can also be triggered manually to label ranges of pull request numbers.

name: "Labeler: Predict Pull Labels"

on:
  # Per to the following documentation:
  # https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request_target
  #
  # The `pull_request_target` event runs in the context of the base of the pull request, rather
  # than in the context of the merge commit, as the `pull_request` event does. This prevents
  # execution of unsafe code from the head of the pull request that could alter the repository
  # or steal any secrets you use in your workflow. This event allows your workflow to do things
  # like label or comment on pull requests from forks.
  #
  # Only automatically predict area labels when pull requests are first opened
  pull_request_target:
    types: opened
    branches:
      - 'main'

  # Allow dispatching the workflow via the Actions UI, specifying ranges of numbers
  workflow_dispatch:
    inputs:
      pull_numbers:
        description: "Pull Numbers (comma-separated list of ranges)"
        type: string
      model_cache_key:
        description: "The cache key suffix to use for loading the model"
        type: string
        required: true
        default: "LIVE"

jobs:
  predict-pulls:
    # Do not run the workflow on forks outside the 'dotnet' org
    if: ${{ github.repository_owner == 'dotnet' && (inputs.pull_numbers || github.event.number) }}
    permissions:
      pull-requests: write
    uses: dotnet/issue-labeler/.github/workflows/predict-pulls.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      model_cache_key: ${{ inputs.model_cache_key }}
      pull_numbers: ${{ inputs.pull_numbers || github.event.number }}
      label_prefix: "area-"
      threshold: 0.40
      default_label: "needs-area-label"

`/.github/workflows/labeler-cache-retention.yml`

Restores the Predictor app and the prediction models from cache, failing if any of the cache entries is missing. This workflow should be called on a daily cron schedule.

name: "Labeler: Cache Retention"

on:
  schedule:
    - cron: "6 3 * * *" # 3:06 every day (arbitrary time daily, modified to different values in each repository)

  workflow_dispatch:

jobs:
  cache-retention:
    # Do not run the workflow on forks outside the 'dotnet' org
    if: ${{ github.repository_owner == 'dotnet' }}
    uses: dotnet/issue-labeler/.github/workflows/cache-retention.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1

`/.github/workflows/labeler-build-predictor.yml`

This workflow will rebuild the Predictor app into the GitHub Action Cache if it gets evicted from cache and prediction jobs start failing.

name: "Labeler: Build Predictor App"

on:
  # Allow dispatching the workflow via the Actions UI
  workflow_dispatch:
    inputs:
      rebuild:
        description: "Force a rebuild of the app"
        type: boolean

jobs:
  build-predictor:
    permissions:
      actions: write
    uses: dotnet/issue-labeler/.github/workflows/build-predictor.yml@f0c098669828a134c0313adf3f58c1909e555d86 # v1.0.1
    with:
      rebuild: ${{ inputs.rebuild }}

With the required GitHub Actions settings configured and the above workflows merged, the prediction workflows will begin trying to run each time a new issue or pull request is opened. Those workflows will fail until a model has been trained and promoted.

Training the models

To train the issue and pull models, navigate to the Actions page for the repository and select Labeler: Train Models in the list of workflows on the left.

A blue banner will be displayed indicating, "This workflow has a workflow_dispatch event trigger." Click Run workflow. Leaving all of the inputs on their defaults will conduct the entire download/train/test process for both issues and pull requests.

Click the Run workflow button to start the training process. Progress can be monitored from the workflow run's details page.

Once the workflow completes, the result will be a pair of models saved into the GitHub Action Cache using a 'staging' cache key suffix. There will also be data files saved into the GitHub Action Cache, also using the 'staging' cache key suffix. And the Predictor app is also built and saved into the GitHub Action Cache for use by the prediction workflows.

Reviewing the test results

Within the workflow run's details, the test-issues and test-pulls steps can be reviewed for confirming the model can predict labels with acceptable accuracy. Click on test-issues or test-pulls within the labeler-train job.

In the log for the labeler-train / labeler-test-issues / test-issues step, expand the section for Run Tester to see the logs. When the Tester is running, it emits accumulated accuracy results to the log after each issue or pull request tested. By scrolling to the end of this log section, the final accumulated output can be reviewed.

The results show:

Matches: A (B %) -- The number/percentage of issues/pulls where the prediction matches the existing label
Mismatches: C (D %) -- The number/percentage of issues/pulls where the prediction does not match the existing label
No Prediction: E (F %) -- The number/percentage of issues/pulls where no prediction was made, but the existing issue/pull does have an applicable label
No Existing: G (H %) -- The number/percentage of issues/pulls where a prediction was made, but the existing issue/pull does not have an applicable label

Teams have typically been pleased when the issue-labeler is able to achieve 65%+ Matches, with the remainder split between Mismatches and No Prediction. If your repository's results are less favorable than 65% Matches, it is recommended you review your existing issues' and pulls' labels to ensure they are labeled accurately. After refining labels, the Labeler: Train Models workflow can be re-run to review the new results.

Promoting the models

Once models are trained with favorable results, they can be promoted into the LIVE cache entries to be consumed by the prediction workflows. From the Actions page, select Labeler: Promote Models in the list of workflows on the left.

A blue banner will be displayed indicating, "This workflow has a workflow_dispatch event trigger." Click Run workflow. The checkboxes for Issues: Promote Model and Pulls: Promote Model are disabled by default. By checking both boxes and clicking Run workflow, the models trained and staged above will be promoted into immediate use by the prediction workflows.

The promotion workflow offers the ability to create a backup of any existing 'LIVE' models. If needed, the promotion workflow can promote from the 'backup' key suffix back into 'LIVE'.

Cache retention

The cache retention workflow that was added is configured to run on a daily schedule, ensuring that the Predictor app and trained models are restored from cache at least once daily to prevent cache evictions after 7 days of no use.

It is recommended to manually run the cache retention workflow after onboarding to test the workflow in your repository.

From the Actions page, select Labeler: Cache Retention from the list of workflows on the left. Choose Run workflow and click the Run workflow button.

Predict issue and pull labels

The Labeler: Predict Issue Labels and Labeler: Predict Pull Labels workflows can be invoked manually through GitHub's Actions page, and they will also run automatically when new issues and pull requests are opened.

When running manually, a comma-separated list of number ranges can be entered, or the field can be left empty to run prediction over all issues/pulls that do not have an appropriate label. After onboarding, if there are issues or pulls that have not already been labeled, these workflows can be run to fill in those gaps and test the results of the issue-labeler over new issues/pulls.

When running bulk prediction jobs, be aware that GitHub's API Rate Limit will apply and cause requests for downloading issues/pulls and updating labels to fail. This may cause the job to fail or be delayed while a back-off retry strategy is applied (as of v1.0.2). Expect to be able to process about 2000 issues or pull requests per hour before the rate limit is reached.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Onboarding

Required GitHub Actions settings

Actions permissions

Approval for running fork pull request workflows from contributors

Workflow permissions

GitHub Workflows to add

`/.github/workflows/labeler-train.yml`

`/.github/workflows/labeler-promote.yml`

`/.github/workflows/labeler-predict-issues.yml`

`/.github/workflows/labeler-predict-pulls.yml`

`/.github/workflows/labeler-cache-retention.yml`

`/.github/workflows/labeler-build-predictor.yml`

Training the models

Reviewing the test results

Promoting the models

Cache retention

Predict issue and pull labels

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally