-
Notifications
You must be signed in to change notification settings - Fork 333
[CI DEBUGGING] #3125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[CI DEBUGGING] #3125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive logging infrastructure to test files for CI debugging purposes. The changes introduce timestamped logging utilities (_ts(), _log(), _stage()) and detailed timing measurements to help diagnose test execution issues in the CI environment.
Key changes:
- Added standardized logging helper functions across all test files with timestamps in UTC
- Instrumented test execution with timing measurements for model downloads, conversions, and generation steps
- Added checks for offline mode environment variables to skip tests when network access is unavailable
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tools/who_what_benchmark/tests/test_cli_vlm.py | Added logging utilities, timing measurements for optimum-cli exports, and executable checks |
| tools/who_what_benchmark/tests/test_cli_text.py | Added logging utilities with timing for model downloads and conversions during test setup |
| tools/who_what_benchmark/tests/test_cli_reranking.py | Added logging utilities, executable checks, and timing for model exports |
| tools/who_what_benchmark/tests/test_cli_image.py | Added logging utilities, context manager for timing stages, and executable validation |
| tools/who_what_benchmark/tests/test_cli_embeddings.py | Added logging utilities with timing measurements for optimum-cli exports |
| tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py | Added logging utilities, offline mode checks, and timing for dataset loading and generation |
| tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py | Added logging utilities, offline mode checks, and timing for all major test stages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c043c12 to
0dbaa6b
Compare
0dbaa6b to
3827700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _truncate(s: str, limit: int = 4000) -> str: | ||
| if s is None: | ||
| return "" |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature expects a str type, but the implementation checks for None. Consider accepting Optional[str] in the type annotation or remove the None check if it's not needed.
|
|
||
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]') | ||
| with _stage("load_dataset"): | ||
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True) |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trust_remote_code=True parameter allows execution of arbitrary code from the dataset. This is a security risk. Consider validating the dataset source or using an alternative approach if possible.
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True) | |
| trust_remote_code = os.getenv("HF_DATASETS_TRUST_REMOTE_CODE", "").lower() in ("1", "true", "yes") | |
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=trust_remote_code) |
|
|
||
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True) | ||
| with _stage("load_dataset"): | ||
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True) |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trust_remote_code=True parameter allows execution of arbitrary code from the dataset. This is a security risk. Consider validating the dataset source or using an alternative approach if possible.
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]', trust_remote_code=True) | |
| data = datasets.load_dataset('THUDM/LongBench', subset, split='test[:32]') |
8e89c2b to
6825c88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| timeout: 120 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 180 | ||
| - name: 'Cacheopt E2E (Part 2)' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 120 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 180 | ||
| - name: 'Cacheopt E2E (Part 2)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' | |
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' |
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' | |
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' |
| timeout: 45 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| timeout: 180 | ||
| - name: 'Cacheopt E2E (Part 2)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' | |
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' |
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag is duplicated in the pytest command. Remove the duplicate to avoid confusion.
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' | |
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' |
1f489ce to
57ad4dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| timeout: 120 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 180 | ||
| - name: 'Cacheopt E2E (Part 2)' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -s -v tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 120 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 180 | ||
| - name: 'Cacheopt E2E (Part 2)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py -s' | |
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_2.py' |
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 45 | ||
| - name: 'Cacheopt E2E (Part 1)' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_kv_cache_eviction/test_kv_cache_eviction_1.py -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
| timeout: 30 | ||
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" -s' |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -s flag appears twice in the pytest command. Remove the duplicate flag to keep the command clean and avoid potential confusion.
57ad4dd to
6da9258
Compare
No description provided.