Skip to content

Commit bf28b9d

Browse files
committed
1. Initial commit
0 parents  commit bf28b9d

File tree

92 files changed

+10216
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+10216
-0
lines changed

.claude/settings.local.json

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
"WebSearch",
5+
"WebFetch(domain:trends.google.com)",
6+
"WebFetch(domain:ollama.com)",
7+
"WebFetch(domain:www.remio.ai)",
8+
"WebFetch(domain:www.notebookcheck.net)",
9+
"WebFetch(domain:linustechtips.com)",
10+
"WebFetch(domain:bay41.com)",
11+
"WebFetch(domain:gigcitygeek.com)",
12+
"WebFetch(domain:llm-tracker.info)",
13+
"WebFetch(domain:www.houmoai.com)",
14+
"Bash(python3:*)",
15+
"WebFetch(domain:www.jeremymorgan.com)",
16+
"WebFetch(domain:itsfoss.com)",
17+
"WebFetch(domain:www.stratosphereips.org)",
18+
"WebFetch(domain:blog.csdn.net)",
19+
"WebFetch(domain:tinycomputers.io)",
20+
"WebFetch(domain:medium.com)",
21+
"WebFetch(domain:zhuanlan.zhihu.com)",
22+
"WebFetch(domain:singhajit.com)",
23+
"WebFetch(domain:developer.ridgerun.com)",
24+
"WebFetch(domain:like2byte.com)",
25+
"WebFetch(domain:developer.nvidia.com)",
26+
"WebFetch(domain:www.dfrobot.com)",
27+
"WebFetch(domain:github.com)",
28+
"WebFetch(domain:blog.olares.com)",
29+
"WebFetch(domain:hothardware.com)",
30+
"WebFetch(domain:www.hardware-corner.net)",
31+
"WebFetch(domain:www.pugetsystems.com)",
32+
"WebFetch(domain:jetsonhacks.com)",
33+
"WebFetch(domain:forums.macrumors.com)",
34+
"WebFetch(domain:nikolasent.github.io)",
35+
"WebFetch(domain:siliconbench.radicchio.page)",
36+
"WebFetch(domain:hyperpc.ae)",
37+
"WebFetch(domain:seanvosler.medium.com)",
38+
"WebFetch(domain:community.frame.work)",
39+
"WebFetch(domain:nishtahir.com)",
40+
"WebFetch(domain:lmsys.org)",
41+
"WebFetch(domain:arxiv.org)",
42+
"Bash(ls:*)",
43+
"Bash(grep:*)",
44+
"Bash(python3 -c \":*)",
45+
"WebFetch(domain:www.reddit.com)",
46+
"Bash(python scripts/build_data.py)",
47+
"Bash(sed -i 's/^price_usd: 449$/price_usd: 569/' nvidia-rtx-4060-ti-16gb.md)",
48+
"Bash(sed -i 's/^price_usd: 999$/price_usd: 1299/' amd-rx-7900-xtx-24gb.md)",
49+
"Bash(sed -i 's/^price_usd: 1200$/price_usd: 1699/' nvidia-rtx-3090-ti-24gb.md)",
50+
"Bash(sed -i 's/^price_usd: 1200$/price_usd: 1600/' qualcomm-snapdragon-x-elite-laptop.md)",
51+
"Bash(sed -i 's/^price_usd: 1299$/price_usd: 1349/' amd-radeon-ai-pro-r9700-32gb.md)",
52+
"WebFetch(domain:www.localscore.ai)",
53+
"WebFetch(domain:dev.to)",
54+
"WebFetch(domain:gigachadllc.com)",
55+
"WebFetch(domain:www.microcenter.com)",
56+
"WebFetch(domain:lattice.uptownhr.com)",
57+
"WebFetch(domain:deepnewz.com)",
58+
"WebFetch(domain:hostbor.com)",
59+
"WebFetch(domain:creativestrategies.com)",
60+
"WebFetch(domain:www.markus-schall.de)",
61+
"WebFetch(domain:www.koyeb.com)",
62+
"WebFetch(domain:www.millstoneai.com)",
63+
"WebFetch(domain:sparecores.com)",
64+
"WebFetch(domain:www.fingon.iki.fi)",
65+
"WebFetch(domain:docs.nvidia.com)",
66+
"WebFetch(domain:qwen.readthedocs.io)",
67+
"WebFetch(domain:openllmbenchmarks.com)",
68+
"WebFetch(domain:www.inferless.com)",
69+
"WebFetch(domain:blog.silexdata.com)",
70+
"WebFetch(domain:www.databasemart.com)",
71+
"WebFetch(domain:docs.valdi.ai)"
72+
]
73+
}
74+
}

.github/workflows/build.yml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
name: Build & Deploy
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
permissions:
10+
contents: read
11+
pages: write
12+
id-token: write
13+
14+
concurrency:
15+
group: "pages"
16+
cancel-in-progress: false
17+
18+
jobs:
19+
validate-and-build:
20+
runs-on: ubuntu-latest
21+
steps:
22+
- uses: actions/checkout@v4
23+
24+
- uses: actions/setup-python@v5
25+
with:
26+
python-version: '3.12'
27+
28+
- name: Install dependencies
29+
run: pip install pyyaml
30+
31+
- name: Build devices.json
32+
run: python scripts/build_data.py
33+
34+
- name: Upload artifact
35+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
36+
uses: actions/upload-pages-artifact@v3
37+
with:
38+
path: docs
39+
40+
deploy:
41+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
42+
needs: validate-and-build
43+
runs-on: ubuntu-latest
44+
environment:
45+
name: github-pages
46+
url: ${{ steps.deployment.outputs.page_url }}
47+
steps:
48+
- name: Deploy to GitHub Pages
49+
id: deployment
50+
uses: actions/deploy-pages@v4

CONTRIBUTING.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Contributing Guide
2+
3+
Thank you for contributing! Every real-world benchmark helps the community make better purchasing decisions.
4+
5+
## How to Submit
6+
7+
### 1. Fork & Clone
8+
9+
```bash
10+
git clone https://github.com/sipeed/llmdev.guide.git
11+
cd llmdev.guide
12+
```
13+
14+
### 2. Create a Device File
15+
16+
```bash
17+
cp devices/_template.md devices/your-device-name.md
18+
```
19+
20+
Naming convention: `vendor-model.md`, lowercase with hyphens. Examples:
21+
- `nvidia-jetson-orin-nano-8gb.md`
22+
- `apple-mac-mini-m4-pro-48gb.md`
23+
- `rockchip-rk3588-16gb.md`
24+
25+
### 3. Fill in the Data
26+
27+
Follow the YAML frontmatter format in the template.
28+
29+
**Required fields:**
30+
- `id`: Unique identifier (same as filename without `.md`)
31+
- `name`: Full product name
32+
- `vendor`: Manufacturer
33+
- `device_type`: Dev Board / PCIe Card / USB Accelerator / Mini PC / Server / Module
34+
- `memory_capacity_gb`: Memory capacity in GB
35+
- `memory_bandwidth_gbs`: Memory bandwidth in GB/s
36+
- `price_usd`: Reference price in USD
37+
- `power_watts`: Power consumption under load (W)
38+
- `benchmarks`: At least one Qwen3.5 model benchmark
39+
- `submitted_by`: Your GitHub username
40+
- `date`: Submission date
41+
42+
**Per-benchmark required fields:**
43+
- `model`: Model name (Qwen3.5-9B / Qwen3.5-27B etc.)
44+
- `quant`: Quantization (int4 / fp4 / int8 / fp8 / bf16 / f32)
45+
- `framework`: Inference framework (Ollama / llama.cpp / LM Studio / vendor SDK etc.)
46+
- `decode_tps`: Output generation speed in tokens/s
47+
48+
**Per-benchmark optional fields:**
49+
- `prefill_tps`: Prefill speed in tokens/s (if your tool reports it)
50+
- `context_length`: Context length used during testing
51+
- `image_encode_ms`: Image encoding time in ms (for vision models)
52+
53+
### 4. How to Benchmark
54+
55+
Choose the method that works best for you:
56+
57+
#### Easy: Chat & Screenshot
58+
59+
Just run the model in Ollama or LM Studio and note the tokens/s displayed:
60+
61+
```bash
62+
ollama run qwen3.5:9b-q4_K_M
63+
```
64+
65+
Ask a question that generates a long response. Most tools display the generation speed (tokens/s) at the bottom of the output or in the UI. Screenshot this for your evidence.
66+
67+
#### Standard: Ollama Verbose
68+
69+
```bash
70+
ollama run qwen3.5:9b-q4_K_M --verbose
71+
```
72+
73+
This shows both **prompt eval rate** (prefill) and **eval rate** (decode) after each response. Copy these numbers directly.
74+
75+
#### Advanced: llama-bench
76+
77+
```bash
78+
# Qwen3.5-9B INT4
79+
llama-bench -m qwen3.5-9b-q4_k_m.gguf -p 512 -n 128
80+
81+
# Qwen3.5-27B INT4 (if your device has enough memory)
82+
llama-bench -m qwen3.5-27b-q4_k_m.gguf -p 512 -n 128
83+
```
84+
85+
This gives precise prefill (pp) and decode (tg) speeds with multiple runs averaged.
86+
87+
#### Tips
88+
89+
- **Run the test a few times** and use a representative result (not the first cold run)
90+
- **Ensure stable thermals**: let the device warm up, avoid thermal throttling
91+
- **Test early in the conversation** (short context) for the most comparable results
92+
- If you have a power meter, measure the actual system power draw under load
93+
94+
#### Power Measurement
95+
96+
A USB power meter or wall plug meter is ideal. If not available, use software readings (e.g., `tegrastats` on Jetson, `powermetrics` on Mac) and note the source.
97+
98+
### 5. Provide Evidence
99+
100+
In the markdown body, please include:
101+
102+
- **Test environment**: OS, framework version, model source
103+
- **Screenshot or log output**: Proving the benchmark numbers are real
104+
- **Device photo**: At least one photo of the actual device
105+
106+
Images can be uploaded via GitHub Issues and referenced by URL.
107+
108+
### 6. Submit PR
109+
110+
```bash
111+
git add devices/your-device-name.md
112+
git commit -m "Add benchmark: Device Name"
113+
git push origin main
114+
```
115+
116+
Then create a Pull Request on GitHub.
117+
118+
## Estimation from Other Models
119+
120+
If Qwen3.5 benchmarks are not yet available for your device, you may estimate from other models of **similar architecture and similar size**:
121+
122+
- **Dense → Dense only** (never cross Dense/MoE)
123+
- **MoE → MoE only** (never cross Dense/MoE)
124+
- **Use the closest size** — do not estimate across large size gaps
125+
- **Formula**: `estimated_tps = measured_tps × (source_active_params / target_active_params)`
126+
- Mark with `estimated: true` and `estimated_from: "description"` in the benchmark entry
127+
128+
Estimated values are displayed with an asterisk (*) on the website.
129+
130+
### Approved Estimation Sources
131+
132+
#### Dense → Dense
133+
134+
| Qwen3.5 Target | Active | Approved Source Models | Source Active | Factor |
135+
|----------------|--------|-----------------------|---------------|--------|
136+
| **9B** | 9B | Llama 3.1 8B, Qwen3 8B, Gemma 2 9B, DeepSeek-R1-Distill 8B | 8-9B | ×0.89 ~ ×1.00 |
137+
| **27B** | 27B | Qwen3 32B, Qwen 2.5 32B, Gemma 2 27B | 27-32B | ×1.00 ~ ×1.19 |
138+
139+
#### MoE → MoE
140+
141+
| Qwen3.5 Target | Active | Approved Source Models | Source Active | Factor |
142+
|----------------|--------|-----------------------|---------------|--------|
143+
| **35B-A3B** | 3B | Qwen3 30B-A3B, GPT-OSS-20B (3.6B active) | 3-3.6B | ×1.00 ~ ×1.20 |
144+
| **122B-A10B** | 10B | GPT-OSS-120B (5.1B active), Mixtral 8x7B (12.9B active) | 5.1-12.9B | ×0.51 ~ ×1.29 |
145+
| **397B-A17B** | 17B | Qwen3 235B-A22B (22B active), DeepSeek R1 671B (37B active) | 17-37B | ×1.29 ~ ×2.18 |
146+
147+
## Validation
148+
149+
CI will automatically check:
150+
- YAML frontmatter format
151+
- Required fields are present
152+
- Values are within reasonable ranges
153+
154+
Maintainers will manually review evidence for authenticity.
155+
156+
## FAQ
157+
158+
**Q: My device can't run Qwen3.5-27B, what do I do?**
159+
A: No problem — submit whatever models your device can run. Not being able to run a model is itself valuable information.
160+
161+
**Q: Can I submit data from different frameworks on the same device?**
162+
A: Yes, add multiple entries in `benchmarks` with different `framework` values.
163+
164+
**Q: I can only see one "tokens/s" number, not separate prefill/decode.**
165+
A: That's fine — just fill in `decode_tps`. The `prefill_tps` field is optional. If you want both numbers, try `ollama run --verbose` or `llama-bench`.
166+
167+
**Q: Prices fluctuate a lot, what should I put?**
168+
A: Use the price you paid, or the current mainstream channel price. Note it in the body text.
169+
170+
**Q: I'm not sure about the claimed TOPS figure.**
171+
A: `tops_int8` is optional. If you fill it in, use `tops_note` to explain the methodology (e.g., "GPU only", "sparse", "GPU+DLA").

0 commit comments

Comments
 (0)