Skip to content

Commit 51745a9

Browse files
authored
add ci (#1642)
1 parent 2cffdf7 commit 51745a9

File tree

14 files changed

+845
-48
lines changed

14 files changed

+845
-48
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
name: PR KT-Kernel Test
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
- develop
8+
types: [synchronize, labeled]
9+
workflow_dispatch:
10+
11+
concurrency:
12+
group: pr-kt-kernel-test-${{ github.ref }}
13+
cancel-in-progress: true
14+
15+
jobs:
16+
# =============================================== check changes ====================================================
17+
check-changes:
18+
runs-on: ubuntu-latest
19+
outputs:
20+
kt_kernel: ${{ steps.filter.outputs.kt_kernel }}
21+
steps:
22+
- name: Checkout code
23+
uses: actions/checkout@v4
24+
25+
- name: Fail if the PR does not have the 'run-ci' label
26+
if: github.event_name == 'pull_request' && !contains(github.event.pull_request.labels.*.name, 'run-ci')
27+
run: |
28+
echo "This pull request does not have the 'run-ci' label. Failing the workflow."
29+
exit 1
30+
31+
- name: Fail if the PR is a draft
32+
if: github.event_name == 'pull_request' && github.event.pull_request.draft == true
33+
run: |
34+
echo "This pull request is a draft. Failing the workflow."
35+
exit 1
36+
37+
- name: Detect file changes
38+
id: filter
39+
uses: dorny/paths-filter@v3
40+
with:
41+
filters: |
42+
kt_kernel:
43+
- "kt-kernel/**"
44+
- ".github/workflows/kt-kernel-tests.yml"
45+
46+
# =============================================== KT-Kernel tests ====================================================
47+
per-commit-kt-kernel-cpu:
48+
needs: [check-changes]
49+
if: always() && !failure() && !cancelled() &&
50+
(needs.check-changes.outputs.kt_kernel == 'true' || github.event_name == 'workflow_dispatch')
51+
runs-on: kt-cpu
52+
continue-on-error: false
53+
steps:
54+
- name: Cleanup
55+
run: |
56+
sudo rm -rf $GITHUB_WORKSPACE/* || true
57+
58+
- name: Checkout code
59+
uses: actions/checkout@v4
60+
with:
61+
submodules: recursive
62+
63+
- name: Install KT-Kernel
64+
run: |
65+
cd kt-kernel
66+
bash install.sh build
67+
68+
- name: Run KT-Kernel CPU tests
69+
timeout-minutes: 30
70+
run: |
71+
cd kt-kernel/test
72+
python3 run_suite.py --hw cpu --suite default
73+
74+
# =============================================== finish ====================================================
75+
pr-test-kt-kernel-finish:
76+
needs: [check-changes, per-commit-kt-kernel-cpu]
77+
if: always()
78+
runs-on: ubuntu-latest
79+
steps:
80+
- name: Check all dependent job statuses
81+
run: |
82+
# Convert the 'needs' context to a JSON string
83+
json_needs='${{ toJson(needs) }}'
84+
85+
# Get a list of all job names from the JSON keys
86+
job_names=$(echo "$json_needs" | jq -r 'keys_unsorted[]')
87+
88+
for job in $job_names; do
89+
# For each job, extract its result
90+
result=$(echo "$json_needs" | jq -r --arg j "$job" '.[$j].result')
91+
92+
# Print the job name and its result
93+
echo "$job: $result"
94+
95+
# Check for failure or cancellation and exit if found
96+
if [[ "$result" == "failure" || "$result" == "cancelled" ]]; then
97+
echo "The above jobs failed."
98+
exit 1
99+
fi
100+
done
101+
102+
# If the loop completes, all jobs were successful
103+
echo "All jobs completed successfully"
104+
exit 0

kt-kernel/pyproject.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,11 @@ dependencies = [
3030
"black>=25.9.0",
3131
]
3232

33-
# No optional dev group needed for formatting; using custom git hooks instead of pre-commit
33+
[project.optional-dependencies]
34+
test = [
35+
"pytest>=7.0.0",
36+
"psutil>=5.9.0",
37+
]
3438

3539
[project.urls]
3640
Homepage = "https://github.com/kvcache-ai"

kt-kernel/pytest.ini

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
[pytest]
2+
# Test paths
3+
testpaths = test/per_commit
4+
5+
# File and function naming conventions
6+
python_files = test_*.py
7+
python_classes = Test*
8+
python_functions = test_*
9+
10+
# Markers for hardware backends
11+
markers =
12+
cpu: CPU backend tests (Intel AMX/AVX512/AVX2)
13+
cuda: CUDA backend tests (NVIDIA GPUs)
14+
amd: AMD backend tests (ROCm)
15+
slow: Slow-running tests (>60 seconds)
16+
requires_model: Tests requiring model files
17+
18+
# Output options
19+
addopts =
20+
-v
21+
--tb=short
22+
--strict-markers
23+
24+
# Filter warnings
25+
filterwarnings =
26+
ignore::DeprecationWarning
27+
ignore::PendingDeprecationWarning

kt-kernel/scripts/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Convert weights to INT4/INT8 format optimized for AMX inference on CPU. These qu
2222
- **FP16**: 16-bit floating point
2323
- **BF16**: BFloat16 format
2424

25+
> **⚠️ Precision Warning:** Quantizing directly from FP8 to INT4/INT8 may cause significant accuracy degradation. For best results, use the original **BF16** model as the source for INT4/INT8 quantization.
26+
2527
## Basic Usage
2628

2729
### Quantize BF16 model to INT4
@@ -213,6 +215,37 @@ python scripts/convert_gpu_weights.py \
213215
- `--dataset`: HuggingFace dataset for calibration
214216
- `--dataset_split`: Dataset split to use
215217

218+
#### Memory Management (Avoiding OOM)
219+
220+
GPTQ quantization requires additional GPU memory for Hessian matrix computation beyond model weights. Use `--max_gpu_memory` to limit GPU memory usage and offload remaining layers to CPU:
221+
222+
```bash
223+
python scripts/convert_gpu_weights.py \
224+
--model_id /path/to/model \
225+
--output_dir /path/to/output \
226+
--quant_type W4A16 \
227+
--max_gpu_memory "40GiB"
228+
```
229+
230+
**Recommended settings:**
231+
232+
| GPU VRAM | Suggested `--max_gpu_memory` |
233+
|----------|------------------------------|
234+
| 24 GiB | 14-16 GiB |
235+
| 48 GiB | 30-35 GiB |
236+
| 80 GiB | 50-60 GiB |
237+
238+
Reserve 40-50% of GPU memory for GPTQ's Hessian matrix computation.
239+
240+
**Options:**
241+
- `--max_gpu_memory`: Maximum GPU memory for model weights per device (e.g., '40GiB')
242+
- `--max_cpu_memory`: Maximum CPU memory (default: 1000GiB when `--max_gpu_memory` is set)
243+
244+
**Important:** llmcompressor does not support disk offloading. Ensure your machine has enough GPU + CPU memory to load the entire model. If you still encounter OOM:
245+
1. Reduce `--num_calibration_samples` (e.g., 256)
246+
2. Reduce `--max_sequence_length` (e.g., 1024)
247+
3. Use `--force_cpu` to run entirely on CPU (slower but avoids GPU OOM)
248+
216249
### Examples
217250

218251
#### Example 1: Quantize Qwen3-Next-80B for Hybrid Inference (W4A16)

0 commit comments

Comments
 (0)