|
| 1 | +# Contributing Guide |
| 2 | + |
| 3 | +Thank you for contributing! Every real-world benchmark helps the community make better purchasing decisions. |
| 4 | + |
| 5 | +## How to Submit |
| 6 | + |
| 7 | +### 1. Fork & Clone |
| 8 | + |
| 9 | +```bash |
| 10 | +git clone https://github.com/sipeed/llmdev.guide.git |
| 11 | +cd llmdev.guide |
| 12 | +``` |
| 13 | + |
| 14 | +### 2. Create a Device File |
| 15 | + |
| 16 | +```bash |
| 17 | +cp devices/_template.md devices/your-device-name.md |
| 18 | +``` |
| 19 | + |
| 20 | +Naming convention: `vendor-model.md`, lowercase with hyphens. Examples: |
| 21 | +- `nvidia-jetson-orin-nano-8gb.md` |
| 22 | +- `apple-mac-mini-m4-pro-48gb.md` |
| 23 | +- `rockchip-rk3588-16gb.md` |
| 24 | + |
| 25 | +### 3. Fill in the Data |
| 26 | + |
| 27 | +Follow the YAML frontmatter format in the template. |
| 28 | + |
| 29 | +**Required fields:** |
| 30 | +- `id`: Unique identifier (same as filename without `.md`) |
| 31 | +- `name`: Full product name |
| 32 | +- `vendor`: Manufacturer |
| 33 | +- `device_type`: Dev Board / PCIe Card / USB Accelerator / Mini PC / Server / Module |
| 34 | +- `memory_capacity_gb`: Memory capacity in GB |
| 35 | +- `memory_bandwidth_gbs`: Memory bandwidth in GB/s |
| 36 | +- `price_usd`: Reference price in USD |
| 37 | +- `power_watts`: Power consumption under load (W) |
| 38 | +- `benchmarks`: At least one Qwen3.5 model benchmark |
| 39 | +- `submitted_by`: Your GitHub username |
| 40 | +- `date`: Submission date |
| 41 | + |
| 42 | +**Per-benchmark required fields:** |
| 43 | +- `model`: Model name (Qwen3.5-9B / Qwen3.5-27B etc.) |
| 44 | +- `quant`: Quantization (int4 / fp4 / int8 / fp8 / bf16 / f32) |
| 45 | +- `framework`: Inference framework (Ollama / llama.cpp / LM Studio / vendor SDK etc.) |
| 46 | +- `decode_tps`: Output generation speed in tokens/s |
| 47 | + |
| 48 | +**Per-benchmark optional fields:** |
| 49 | +- `prefill_tps`: Prefill speed in tokens/s (if your tool reports it) |
| 50 | +- `context_length`: Context length used during testing |
| 51 | +- `image_encode_ms`: Image encoding time in ms (for vision models) |
| 52 | + |
| 53 | +### 4. How to Benchmark |
| 54 | + |
| 55 | +Choose the method that works best for you: |
| 56 | + |
| 57 | +#### Easy: Chat & Screenshot |
| 58 | + |
| 59 | +Just run the model in Ollama or LM Studio and note the tokens/s displayed: |
| 60 | + |
| 61 | +```bash |
| 62 | +ollama run qwen3.5:9b-q4_K_M |
| 63 | +``` |
| 64 | + |
| 65 | +Ask a question that generates a long response. Most tools display the generation speed (tokens/s) at the bottom of the output or in the UI. Screenshot this for your evidence. |
| 66 | + |
| 67 | +#### Standard: Ollama Verbose |
| 68 | + |
| 69 | +```bash |
| 70 | +ollama run qwen3.5:9b-q4_K_M --verbose |
| 71 | +``` |
| 72 | + |
| 73 | +This shows both **prompt eval rate** (prefill) and **eval rate** (decode) after each response. Copy these numbers directly. |
| 74 | + |
| 75 | +#### Advanced: llama-bench |
| 76 | + |
| 77 | +```bash |
| 78 | +# Qwen3.5-9B INT4 |
| 79 | +llama-bench -m qwen3.5-9b-q4_k_m.gguf -p 512 -n 128 |
| 80 | + |
| 81 | +# Qwen3.5-27B INT4 (if your device has enough memory) |
| 82 | +llama-bench -m qwen3.5-27b-q4_k_m.gguf -p 512 -n 128 |
| 83 | +``` |
| 84 | + |
| 85 | +This gives precise prefill (pp) and decode (tg) speeds with multiple runs averaged. |
| 86 | + |
| 87 | +#### Tips |
| 88 | + |
| 89 | +- **Run the test a few times** and use a representative result (not the first cold run) |
| 90 | +- **Ensure stable thermals**: let the device warm up, avoid thermal throttling |
| 91 | +- **Test early in the conversation** (short context) for the most comparable results |
| 92 | +- If you have a power meter, measure the actual system power draw under load |
| 93 | + |
| 94 | +#### Power Measurement |
| 95 | + |
| 96 | +A USB power meter or wall plug meter is ideal. If not available, use software readings (e.g., `tegrastats` on Jetson, `powermetrics` on Mac) and note the source. |
| 97 | + |
| 98 | +### 5. Provide Evidence |
| 99 | + |
| 100 | +In the markdown body, please include: |
| 101 | + |
| 102 | +- **Test environment**: OS, framework version, model source |
| 103 | +- **Screenshot or log output**: Proving the benchmark numbers are real |
| 104 | +- **Device photo**: At least one photo of the actual device |
| 105 | + |
| 106 | +Images can be uploaded via GitHub Issues and referenced by URL. |
| 107 | + |
| 108 | +### 6. Submit PR |
| 109 | + |
| 110 | +```bash |
| 111 | +git add devices/your-device-name.md |
| 112 | +git commit -m "Add benchmark: Device Name" |
| 113 | +git push origin main |
| 114 | +``` |
| 115 | + |
| 116 | +Then create a Pull Request on GitHub. |
| 117 | + |
| 118 | +## Estimation from Other Models |
| 119 | + |
| 120 | +If Qwen3.5 benchmarks are not yet available for your device, you may estimate from other models of **similar architecture and similar size**: |
| 121 | + |
| 122 | +- **Dense → Dense only** (never cross Dense/MoE) |
| 123 | +- **MoE → MoE only** (never cross Dense/MoE) |
| 124 | +- **Use the closest size** — do not estimate across large size gaps |
| 125 | +- **Formula**: `estimated_tps = measured_tps × (source_active_params / target_active_params)` |
| 126 | +- Mark with `estimated: true` and `estimated_from: "description"` in the benchmark entry |
| 127 | + |
| 128 | +Estimated values are displayed with an asterisk (*) on the website. |
| 129 | + |
| 130 | +### Approved Estimation Sources |
| 131 | + |
| 132 | +#### Dense → Dense |
| 133 | + |
| 134 | +| Qwen3.5 Target | Active | Approved Source Models | Source Active | Factor | |
| 135 | +|----------------|--------|-----------------------|---------------|--------| |
| 136 | +| **9B** | 9B | Llama 3.1 8B, Qwen3 8B, Gemma 2 9B, DeepSeek-R1-Distill 8B | 8-9B | ×0.89 ~ ×1.00 | |
| 137 | +| **27B** | 27B | Qwen3 32B, Qwen 2.5 32B, Gemma 2 27B | 27-32B | ×1.00 ~ ×1.19 | |
| 138 | + |
| 139 | +#### MoE → MoE |
| 140 | + |
| 141 | +| Qwen3.5 Target | Active | Approved Source Models | Source Active | Factor | |
| 142 | +|----------------|--------|-----------------------|---------------|--------| |
| 143 | +| **35B-A3B** | 3B | Qwen3 30B-A3B, GPT-OSS-20B (3.6B active) | 3-3.6B | ×1.00 ~ ×1.20 | |
| 144 | +| **122B-A10B** | 10B | GPT-OSS-120B (5.1B active), Mixtral 8x7B (12.9B active) | 5.1-12.9B | ×0.51 ~ ×1.29 | |
| 145 | +| **397B-A17B** | 17B | Qwen3 235B-A22B (22B active), DeepSeek R1 671B (37B active) | 17-37B | ×1.29 ~ ×2.18 | |
| 146 | + |
| 147 | +## Validation |
| 148 | + |
| 149 | +CI will automatically check: |
| 150 | +- YAML frontmatter format |
| 151 | +- Required fields are present |
| 152 | +- Values are within reasonable ranges |
| 153 | + |
| 154 | +Maintainers will manually review evidence for authenticity. |
| 155 | + |
| 156 | +## FAQ |
| 157 | + |
| 158 | +**Q: My device can't run Qwen3.5-27B, what do I do?** |
| 159 | +A: No problem — submit whatever models your device can run. Not being able to run a model is itself valuable information. |
| 160 | + |
| 161 | +**Q: Can I submit data from different frameworks on the same device?** |
| 162 | +A: Yes, add multiple entries in `benchmarks` with different `framework` values. |
| 163 | + |
| 164 | +**Q: I can only see one "tokens/s" number, not separate prefill/decode.** |
| 165 | +A: That's fine — just fill in `decode_tps`. The `prefill_tps` field is optional. If you want both numbers, try `ollama run --verbose` or `llama-bench`. |
| 166 | + |
| 167 | +**Q: Prices fluctuate a lot, what should I put?** |
| 168 | +A: Use the price you paid, or the current mainstream channel price. Note it in the body text. |
| 169 | + |
| 170 | +**Q: I'm not sure about the claimed TOPS figure.** |
| 171 | +A: `tops_int8` is optional. If you fill it in, use `tops_note` to explain the methodology (e.g., "GPU only", "sparse", "GPU+DLA"). |
0 commit comments