Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/actions/run-and-record-tests/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ runs:
echo "New recordings detected, committing and pushing"
git add tests/integration/recordings/

git commit -m "Recordings update from CI (suite: ${{ inputs.suite }})"
git commit -m "Recordings update from CI (setup: ${{ inputs.setup }}, suite: ${{ inputs.suite }})"

git fetch origin ${{ github.ref_name }}
git rebase origin/${{ github.ref_name }}
echo "Rebased successfully"
Expand All @@ -82,7 +83,8 @@ runs:
if: ${{ always() }}
shell: bash
run: |
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log || true
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log 2>&1 || true
sudo docker logs vllm > vllm-${{ inputs.inference-mode }}.log 2>&1 || true

- name: Upload logs
if: ${{ always() }}
Expand Down
23 changes: 18 additions & 5 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ on:
schedule:
# If changing the cron schedule, update the provider in the test-matrix job
- cron: '0 0 * * *' # (test latest client) Daily at 12 AM UTC
- cron: '1 0 * * 0' # (test vllm) Weekly on Sunday at 1 AM UTC
workflow_dispatch:
inputs:
test-all-client-versions:
Expand All @@ -48,24 +47,38 @@ jobs:
fail-fast: false
matrix:
client-type: [library, server]
# Use vllm on weekly schedule, otherwise use test-setup input (defaults to ollama)
setup: ${{ (github.event.schedule == '1 0 * * 0') && fromJSON('["vllm"]') || fromJSON(format('["{0}"]', github.event.inputs.test-setup || 'ollama')) }}
# Use Python 3.13 only on nightly schedule (daily latest client test), otherwise use 3.12
python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
client-version: ${{ (github.event.schedule == '0 0 * * *' || github.event.inputs.test-all-client-versions == 'true') && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }}
setup: [ollama, vllm]
suite: [base, vision]
exclude:
- setup: vllm
suite: vision


steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

# This could in theory be done in the matrix, but it was getting too complex
- name: Update Matrix
id: update-matrix
run: |
REWRITTEN_SUITE="${{ matrix.suite }}"
if [[ "${{ matrix.setup }}" == "vllm" && "${{ matrix.suite }}" == "base" ]]; then
REWRITTEN_SUITE="base-vllm-subset"
fi
echo "suite=${REWRITTEN_SUITE}" >> $GITHUB_OUTPUT
echo "Rewritten suite: ${REWRITTEN_SUITE}"

- name: Setup test environment
uses: ./.github/actions/setup-test-environment
with:
python-version: ${{ matrix.python-version }}
client-version: ${{ matrix.client-version }}
setup: ${{ matrix.setup }}
suite: ${{ matrix.suite }}
suite: ${{ steps.update-matrix.outputs.suite }}
inference-mode: 'replay'

- name: Run tests
Expand All @@ -74,4 +87,4 @@ jobs:
stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || 'server:ci-tests' }}
setup: ${{ matrix.setup }}
inference-mode: 'replay'
suite: ${{ matrix.suite }}
suite: ${{ steps.update-matrix.outputs.suite }}
2 changes: 1 addition & 1 deletion scripts/integration-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ EXCLUDE_TESTS="builtin_tool or safety_with_image or code_interpreter or test_rag

# Additional exclusions for vllm setup
if [[ "$TEST_SETUP" == "vllm" ]]; then
EXCLUDE_TESTS="${EXCLUDE_TESTS} or test_inference_store_tool_calls"
EXCLUDE_TESTS="${EXCLUDE_TESTS} or test_inference_store_tool_calls or test_text_chat_completion_structured_output"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding these to the skips in the test files directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is because of the model, Our skips in test files are all based on provider
I put the skips here so that it only skips them in CI, anybody running integration test with a more capable model will still be able to use them.

If we can get to the point that this job is running, I'll happy test other models to see if I can get ride of this line alltogether.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please, ci w/ a model that passes more tests.

having a gap between what ci test and what developers see in the test suite is going to lead to bugs and confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened an alternative PR that instead use qwen3 #3545
I'll can close which ever one we don't want to go with

fi

PYTEST_PATTERN="not( $EXCLUDE_TESTS )"
Expand Down
Loading