Skip to content

Conversation

atebites-hub
Copy link

No description provided.

- Upgraded TVM submodule to FFI bump commit (f68651f035)
- Fixed script printer namespace mismatch (node->script)
- Added conditional script printer imports with dummy fallbacks
- Resolved CMake compatibility issues in tokenizers-cpp submodules
- Added mlc_llm console script entry point to pyproject.toml
- Established virtual environment isolation for clean builds
- TVM v0.22 now imports successfully without errors
- MLC-LLM CLI functional with TVM v0.22 backend
- Ready for Phase 2: DLPack Type System Migration

Technical fixes:
- C++: script_printer.cc namespace registration fix
- Python: Optional Scriptable import with comprehensive fallback
- Build: TVM Python package separate installation requirement
- Environment: Virtual environment isolation for reproducibility
- Updated tokenizers-cpp to commit 405aa4fa
- Updated TVM to commit 52a49c82

-hotfix for TVM cpp dependency.
Phase 1 completed basic TVM v0.22 integration, but model compilation
fails with segmentation fault during convert_weight operation.

Root Cause: DLPack type system incompatibility
- TVM v0.22 changed DLTensor → DLNDArray
- TVM v0.22 changed DLManagedTensor → DLManagedNDArray
- MLC-LLM still uses old DLPack types for tensor operations

Impact: Cannot compile Gemma-3-270M or any models
Solution: Phase 2 DLPack migration required immediately

Validation: The refactor.md complexity assessment was accurate -
Phase 1 alone insufficient for full functionality.
… not DLPack

Root cause identified:
- Segfault occurs during TIR static initialization for Gemma3 sliding window attention
- NOT DLPack type incompatibility as initially assumed
- Issue is in TIR code generation for sliding window attention mechanisms
- Confirmed: Even q0f16 (no quantization) still segfaults
- Hypothesis: Missing TIR bitwise operations using powers of 2 for sliding window masks

Phase 1: ✅ TVM basic integration successful
Phase 2: 🔴 TIR sliding window operations required (not DLPack migration)

User insight: 'bitwise stuff happens in quantization' - correct, but issue is
broader - TIR generation for sliding window attention patterns fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant