-
-
Couldn't load subscription status.
- Fork 1.2k
Shared prepared ci #2872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Shared prepared ci #2872
Conversation
WalkthroughThis update refactors test fixtures and configuration management for multi-GPU LoRA TinyLlama model tests. It replaces a function-scoped, autouse fixture for Triton cache with a module-scoped temporary directory fixture, and centralizes test configuration and dataset preparation into reusable pytest fixtures, streamlining test setup and reducing duplication. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Method
participant Fixtures as Pytest Fixtures
participant Preprocess as Preprocessing Subprocess
Test->>Fixtures: Request sft_prepared_dataset_alpaca_cfg
Fixtures->>Fixtures: Setup module_temp_dir
Fixtures->>Preprocess: Run dataset preprocessing
Preprocess-->>Fixtures: Return prepared dataset config
Fixtures-->>Test: Provide merged config
Test->>Test: Merge with test-specific overrides
Test->>Test: Run training/evaluation with config
Possibly related PRs
Poem
Warning Review ran into problems🔥 ProblemsCheck-run timed out after 90 seconds. Some checks/pipelines were still in progress when the timeout was reached. Consider increasing the reviews.tools.github-checks.timeout_ms value in your CodeRabbit configuration to allow more time for checks to complete. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code Graph Analysis (1)tests/e2e/multigpu/test_llama.py (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
🔇 Additional comments (11)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
Description
Adds fixtures to the multi-gpu smoke tests w/ packing so avoid redundant tokenization during CI
Motivation and Context
dataset processing and tokenization is one of the slower steps in the CI. For some test modules, there are shared parameters for the dataset & model that the tokenization should be the same and we really only care about whether the training is functional.
How has this been tested?
ran tests, went from ~14 min -> ~12 min runtime for the modified test class.
when forcing the tests to fail with
assert False, I see the following in the stdout logs indicating it successfully pulled preprocessed dataScreenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Refactor
Tests