Skip to content

Conversation

@olpipi
Copy link
Collaborator

@olpipi olpipi commented Dec 17, 2025

No description provided.

Copilot AI review requested due to automatic review settings December 17, 2025 09:51
@github-actions github-actions bot added category: WWB PR changes WWB category: continuous batching Continuous batching category: GGUF GGUF file reader labels Dec 17, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive logging infrastructure to test files for CI debugging purposes. The changes introduce timestamped logging utilities (_ts(), _log(), _stage()) and detailed timing measurements to help diagnose test execution issues in the CI environment.

Key changes:

  • Added standardized logging helper functions across all test files with timestamps in UTC
  • Instrumented test execution with timing measurements for model downloads, conversions, and generation steps
  • Added checks for offline mode environment variables to skip tests when network access is unavailable

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tools/who_what_benchmark/tests/test_cli_vlm.py Added logging utilities, timing measurements for optimum-cli exports, and executable checks
tools/who_what_benchmark/tests/test_cli_text.py Added logging utilities with timing for model downloads and conversions during test setup
tools/who_what_benchmark/tests/test_cli_reranking.py Added logging utilities, executable checks, and timing for model exports
tools/who_what_benchmark/tests/test_cli_image.py Added logging utilities, context manager for timing stages, and executable validation
tools/who_what_benchmark/tests/test_cli_embeddings.py Added logging utilities with timing measurements for optimum-cli exports
tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py Added logging utilities, offline mode checks, and timing for dataset loading and generation
tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py Added logging utilities, offline mode checks, and timing for all major test stages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@olpipi olpipi removed request for sbalandi and sgonorov January 9, 2026 13:11
Copilot AI review requested due to automatic review settings January 9, 2026 13:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 53 to 55
def _truncate(s: str, limit: int = 4000) -> str:
if s is None:
return ""
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature expects a str type, but the implementation checks for None. Consider accepting Optional[str] in the type annotation or remove the None check if it's not needed.

Copilot uses AI. Check for mistakes.

data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]')
with _stage("load_dataset"):
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trust_remote_code=True parameter allows execution of arbitrary code from the dataset. This is a security risk. Consider validating the dataset source or using an alternative approach if possible.

Suggested change
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True)
trust_remote_code = os.getenv("HF_DATASETS_TRUST_REMOTE_CODE", "").lower() in ("1", "true", "yes")
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=trust_remote_code)

Copilot uses AI. Check for mistakes.

data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True)
with _stage("load_dataset"):
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trust_remote_code=True parameter allows execution of arbitrary code from the dataset. This is a security risk. Consider validating the dataset source or using an alternative approach if possible.

Suggested change
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True)
data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]')

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 19, 2026 12:04
@olpipi olpipi requested a review from akashchi as a code owner January 19, 2026 12:04
@github-actions github-actions bot added the category: GHA CI based on Github actions label Jan 19, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

timeout: 120
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 180
- name: 'Cacheopt E2E (Part 2)'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 120
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 180
- name: 'Cacheopt E2E (Part 2)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Suggested change
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Suggested change
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'

Copilot uses AI. Check for mistakes.
timeout: 45
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Copilot uses AI. Check for mistakes.
timeout: 180
- name: 'Cacheopt E2E (Part 2)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Suggested change
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.

Suggested change
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 27, 2026 13:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

timeout: 120
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 180
- name: 'Cacheopt E2E (Part 2)'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'
cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 120
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 180
- name: 'Cacheopt E2E (Part 2)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Suggested change
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py'

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 45
- name: 'Cacheopt E2E (Part 1)'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
timeout: 30
- name: 'WWB tests'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"'
cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s'
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: GGUF GGUF file reader category: GHA CI based on Github actions category: WWB PR changes WWB do_not_merge do_not_review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant