Skip to content

Commit 0e8bb94

Browse files
feat(ci): make recording workflow simpler, more parameterizable (llamastack#3169)
# What does this PR do? Recording tests has become a nightmare. This is the first part of making that process simpler by making it _less_ automatic. I tried to be too clever earlier. It simplifies the record-integration-tests workflow to use workflow dispatch inputs instead of PR labels. No more opaque stuff. Just go to the GitHub UI and run the workflow with inputs. I will soon add a helper script for this also. Other things to aid re-running just the small set of things you need to re-record: - Replaces the `test-types` JSON array parameter with a more intuitive `test-subdirs` comma-separated list. The whole JSON array crap was for matrix. - Adds a new `test-pattern` parameter to allow filtering tests using pytest's `-k` option ## Test Plan Note that this PR is in a fork not the source repository. - Replay tests on this PR are green - Manually [ran](https://github.com/ashwinb/llama-stack/actions/runs/16998562926) the replay workflow with a test-subdir and test-pattern filter, worked - Manually [ran](https://github.com/ashwinb/llama-stack/actions/runs/16998556104/job/48195080344) the **record** workflow with a simple pattern, it has worked and updated _this_ PR. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent a6e2c18 commit 0e8bb94

File tree

7 files changed

+119
-221
lines changed

7 files changed

+119
-221
lines changed

.github/actions/run-and-record-tests/action.yml

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,13 @@ name: 'Run and Record Tests'
22
description: 'Run integration tests and handle recording/artifact upload'
33

44
inputs:
5-
test-types:
6-
description: 'JSON array of test types to run'
5+
test-subdirs:
6+
description: 'Comma-separated list of test subdirectories to run'
77
required: true
8+
test-pattern:
9+
description: 'Regex pattern to pass to pytest -k'
10+
required: false
11+
default: ''
812
stack-config:
913
description: 'Stack configuration to use'
1014
required: true
@@ -35,9 +39,11 @@ runs:
3539
./scripts/integration-tests.sh \
3640
--stack-config '${{ inputs.stack-config }}' \
3741
--provider '${{ inputs.provider }}' \
38-
--test-types '${{ inputs.test-types }}' \
42+
--test-subdirs '${{ inputs.test-subdirs }}' \
43+
--test-pattern '${{ inputs.test-pattern }}' \
3944
--inference-mode '${{ inputs.inference-mode }}' \
40-
${{ inputs.run-vision-tests == 'true' && '--run-vision-tests' || '' }}
45+
${{ inputs.run-vision-tests == 'true' && '--run-vision-tests' || '' }} \
46+
| tee pytest-${{ inputs.inference-mode }}.log
4147
4248
4349
- name: Commit and push recordings
@@ -57,10 +63,10 @@ runs:
5763
git commit -m "Recordings update from CI"
5864
fi
5965
60-
git fetch origin ${{ github.event.pull_request.head.ref }}
61-
git rebase origin/${{ github.event.pull_request.head.ref }}
66+
git fetch origin ${{ github.ref_name }}
67+
git rebase origin/${{ github.ref_name }}
6268
echo "Rebased successfully"
63-
git push origin HEAD:${{ github.event.pull_request.head.ref }}
69+
git push origin HEAD:${{ github.ref_name }}
6470
echo "Pushed successfully"
6571
else
6672
echo "No recording changes"

.github/workflows/integration-tests.yml

Lines changed: 10 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -31,35 +31,23 @@ on:
3131
description: 'Test against a specific provider'
3232
type: string
3333
default: 'ollama'
34+
test-subdirs:
35+
description: 'Comma-separated list of test subdirectories to run'
36+
type: string
37+
default: ''
38+
test-pattern:
39+
description: 'Regex pattern to pass to pytest -k'
40+
type: string
41+
default: ''
3442

3543
concurrency:
3644
# Skip concurrency for pushes to main - each commit should be tested independently
3745
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
3846
cancel-in-progress: true
3947

4048
jobs:
41-
discover-tests:
42-
runs-on: ubuntu-latest
43-
outputs:
44-
test-types: ${{ steps.generate-test-types.outputs.test-types }}
45-
46-
steps:
47-
- name: Checkout repository
48-
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
49-
50-
- name: Generate test types
51-
id: generate-test-types
52-
run: |
53-
# Get test directories dynamically, excluding non-test directories
54-
# NOTE: we are excluding post_training since the tests take too long
55-
TEST_TYPES=$(find tests/integration -maxdepth 1 -mindepth 1 -type d |
56-
sed 's|tests/integration/||' |
57-
grep -Ev "^(__pycache__|fixtures|test_cases|recordings|non_ci|post_training)$" |
58-
sort | jq -R -s -c 'split("\n")[:-1]')
59-
echo "test-types=$TEST_TYPES" >> $GITHUB_OUTPUT
6049

6150
run-replay-mode-tests:
62-
needs: discover-tests
6351
runs-on: ubuntu-latest
6452
name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, vision={4})', matrix.client-type, matrix.provider, matrix.python-version, matrix.client-version, matrix.run-vision-tests) }}
6553

@@ -90,7 +78,8 @@ jobs:
9078
- name: Run tests
9179
uses: ./.github/actions/run-and-record-tests
9280
with:
93-
test-types: ${{ needs.discover-tests.outputs.test-types }}
81+
test-subdirs: ${{ inputs.test-subdirs }}
82+
test-pattern: ${{ inputs.test-pattern }}
9483
stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || 'server:ci-tests' }}
9584
provider: ${{ matrix.provider }}
9685
inference-mode: 'replay'
Lines changed: 21 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,43 @@
1+
# This workflow should be run manually when needing to re-record tests. This happens when you have
2+
# - added a new test
3+
# - or changed an existing test such that a new inference call is made
4+
# You should make a PR and then run this workflow on that PR branch. The workflow will re-record the
5+
# tests and commit the recordings to the PR branch.
16
name: Integration Tests (Record)
27

38
run-name: Run the integration test suite from tests/integration
49

510
on:
6-
pull_request_target:
7-
branches: [ main ]
8-
types: [opened, synchronize, labeled]
9-
paths:
10-
- 'llama_stack/**'
11-
- 'tests/**'
12-
- 'uv.lock'
13-
- 'pyproject.toml'
14-
- '.github/workflows/record-integration-tests.yml' # This workflow
15-
- '.github/actions/setup-ollama/action.yml'
16-
- '.github/actions/setup-test-environment/action.yml'
17-
- '.github/actions/run-and-record-tests/action.yml'
1811
workflow_dispatch:
1912
inputs:
13+
test-subdirs:
14+
description: 'Comma-separated list of test subdirectories to run'
15+
type: string
16+
default: ''
2017
test-provider:
2118
description: 'Test against a specific provider'
2219
type: string
2320
default: 'ollama'
24-
25-
concurrency:
26-
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
27-
cancel-in-progress: true
21+
run-vision-tests:
22+
description: 'Whether to run vision tests'
23+
type: boolean
24+
default: false
25+
test-pattern:
26+
description: 'Regex pattern to pass to pytest -k'
27+
type: string
28+
default: ''
2829

2930
jobs:
30-
discover-tests:
31-
if: contains(github.event.pull_request.labels.*.name, 're-record-tests') ||
32-
contains(github.event.pull_request.labels.*.name, 're-record-vision-tests')
33-
runs-on: ubuntu-latest
34-
outputs:
35-
test-types: ${{ steps.generate-test-types.outputs.test-types }}
36-
matrix-modes: ${{ steps.generate-test-types.outputs.matrix-modes }}
37-
38-
steps:
39-
- name: Checkout repository
40-
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
41-
42-
- name: Generate test types
43-
id: generate-test-types
44-
run: |
45-
# Get test directories dynamically, excluding non-test directories
46-
TEST_TYPES=$(find tests/integration -maxdepth 1 -mindepth 1 -type d -printf "%f\n" |
47-
grep -Ev "^(__pycache__|fixtures|test_cases|recordings|post_training)$" |
48-
sort | jq -R -s -c 'split("\n")[:-1]')
49-
echo "test-types=$TEST_TYPES" >> $GITHUB_OUTPUT
50-
51-
labels=$(gh pr view ${{ github.event.pull_request.number }} --json labels --jq '.labels[].name')
52-
echo "labels=$labels"
53-
54-
modes_array=()
55-
if [[ $labels == *"re-record-vision-tests"* ]]; then
56-
modes_array+=("vision")
57-
fi
58-
if [[ $labels == *"re-record-tests"* ]]; then
59-
modes_array+=("non-vision")
60-
fi
61-
62-
# Convert to JSON array
63-
if [ ${#modes_array[@]} -eq 0 ]; then
64-
matrix_modes="[]"
65-
else
66-
matrix_modes=$(printf '%s\n' "${modes_array[@]}" | jq -R -s -c 'split("\n")[:-1]')
67-
fi
68-
echo "matrix_modes=$matrix_modes"
69-
echo "matrix-modes=$matrix_modes" >> $GITHUB_OUTPUT
70-
71-
env:
72-
GH_TOKEN: ${{ github.token }}
73-
7431
record-tests:
75-
needs: discover-tests
7632
runs-on: ubuntu-latest
7733

7834
permissions:
7935
contents: write
8036

81-
strategy:
82-
fail-fast: false
83-
matrix:
84-
mode: ${{ fromJSON(needs.discover-tests.outputs.matrix-modes) }}
85-
8637
steps:
8738
- name: Checkout repository
8839
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
8940
with:
90-
ref: ${{ github.event.pull_request.head.ref }}
9141
fetch-depth: 0
9242

9343
- name: Setup test environment
@@ -96,14 +46,15 @@ jobs:
9646
python-version: "3.12" # Use single Python version for recording
9747
client-version: "latest"
9848
provider: ${{ inputs.test-provider || 'ollama' }}
99-
run-vision-tests: ${{ matrix.mode == 'vision' && 'true' || 'false' }}
49+
run-vision-tests: ${{ inputs.run-vision-tests }}
10050
inference-mode: 'record'
10151

10252
- name: Run and record tests
10353
uses: ./.github/actions/run-and-record-tests
10454
with:
105-
test-types: ${{ needs.discover-tests.outputs.test-types }}
55+
test-pattern: ${{ inputs.test-pattern }}
56+
test-subdirs: ${{ inputs.test-subdirs }}
10657
stack-config: 'server:ci-tests' # recording must be done with server since more tests are run
10758
provider: ${{ inputs.test-provider || 'ollama' }}
10859
inference-mode: 'record'
109-
run-vision-tests: ${{ matrix.mode == 'vision' && 'true' || 'false' }}
60+
run-vision-tests: ${{ inputs.run-vision-tests }}

0 commit comments

Comments
 (0)