-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Description
Segmentation fault (exit code 139) occurs when attempting to convert InkubaLM-0.4B model using mlc_llm convert_weight
.
Model Information
- Model:
lelapa/InkubaLM-0.4B
from HuggingFace - Architecture: Custom LlamaForCausalLM variant (
VulavulaLlamaForCausalLM
) - Size: 0.4B parameters, 8 layers
- Special characteristics:
- Has shared tensors between layers
- Uses custom
auto_map
configuration - Custom Python files:
vulavulaslm.py
Steps to Reproduce
- Download the model from HuggingFace:
lelapa/InkubaLM-0.4B
- Run:
python -m mlc_llm convert_weight ./InkubaLM-0.4B --quantization q4f16_1 -o output
Expected Behavior
Successful weight conversion or graceful error message if the architecture is unsupported.
Actual Behavior
Segmentation fault with exit code 139.
Environment
- OS: macOS 15.0 (Apple Silicon ARM64)
- MLC LLM version: 0.1.dev0
- Python version: 3.10.18 (conda-forge)
- Device tested: Both
cpu:0
andmetal:0
Attempted Solutions
- ✅ Tried multiple quantization formats:
q4f16_1
,q0f32
- ✅ Tried CPU device instead of Metal
- ✅ Converted to safetensors format first
- ✅ Removed custom
auto_map
from config - ❌ All attempts result in segfault
Additional Context
The model loads successfully with transformers.AutoModelForCausalLM.from_pretrained()
using trust_remote_code=True
, suggesting the issue is specific to MLC's conversion process.
Log Output
[2025-10-02 17:10:06] INFO llama_model.py:63: context_window_size not found in config.json. Falling back to max_position_embeddings (2048) [2025-10-02 17:10:06] INFO llama_model.py:89: prefill_chunk_size defaults to 2048 !!!!!!! Segfault encountered !!!!!!! zsh: segmentation fault python -m mlc_llm convert_weight InkubaLM-0.4B --quantization
Impact
This prevents users from converting LelapaAI's InkubaLM-0.4B African language model to MLC format.