Add Qwen3-VL vision-language model support by nyo16 · Pull Request #442 · elixir-nx/bumblebee

nyo16 · 2026-01-07T00:07:28Z

Summary

This PR adds full support for the Qwen3-VL vision-language model family, enabling image-to-text generation with Bumblebee.

Model: Qwen/Qwen3-VL-2B-Instruct (and other sizes)
Architecture: Qwen3VLForConditionalGeneration

Features

Vision Encoder (`Bumblebee.Vision.Qwen3VLVision`)

3D convolution patch embedding (supports video temporal dimension)
2D spatial rotary position embeddings for accurate spatial understanding
Bilinear interpolation for position embeddings (handles variable image sizes)
Patch merger with spatial reduction (2x2 → 1)
DeepStack feature extraction from layers [5, 11, 17]

Text Decoder

Based on Qwen3 architecture with QK-norm
Visual token substitution (replaces image placeholder tokens with vision embeddings)
DeepStack injection at decoder layers [0, 1, 2]
Full rotary position embedding support

Featurizer (`Bumblebee.Vision.Qwen3VLFeaturizer`)

Image preprocessing with configurable resize
Automatic padding to patch-aligned dimensions
Support for both images and video frames
Outputs flattened patches: {num_patches, channels * temporal * patch_h * patch_w}

DeepStack Implementation

DeepStack provides multi-scale visual information by:

Extracting hidden states from vision encoder layers [5, 11, 17] (1-indexed)
Passing each through separate merger MLPs with postshuffle norm (norm AFTER spatial merge)
Injecting features into text decoder at layers [0, 1, 2]
Formula: hidden_states[visual_mask] += deepstack_features[layer_idx]

Infrastructure Changes

Added post_block_hook option to Layers.Transformer.blocks for per-layer injection

Files Changed

New Files

lib/bumblebee/multimodal/qwen3_vl.ex - Main VL model
lib/bumblebee/vision/qwen3_vl_vision.ex - Vision encoder
lib/bumblebee/vision/qwen3_vl_featurizer.ex - Image preprocessing
test/bumblebee/multimodal/qwen3_vl_test.exs - Tests
notebooks/qwen3_vl.livemd - Usage examples

Modified Files

lib/bumblebee.ex - Model/featurizer registrations
lib/bumblebee/layers/transformer.ex - Added post_block_hook option

Test Results

# Unit test with tiny model
mix test test/bumblebee/multimodal/qwen3_vl_test.exs
1 test, 0 failures

# Real model test with image (448x448)
Image: 448x448x3
Patches: 784 → 196 visual tokens (after merge)
Generated: "This image shows a close-up of a single, small, dark-colored object..."
✓ DeepStack injection test with real image PASSED!

Usage Example

{:ok, model_info} = Bumblebee.load_model({:hf, "Qwen/Qwen3-VL-2B-Instruct"}, type: :bf16)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-VL-2B-Instruct"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Qwen/Qwen3-VL-2B-Instruct"},
  module: Bumblebee.Vision.Qwen3VLFeaturizer)

# Load and process image
image = StbImage.read_file!("photo.jpg")
image_inputs = Bumblebee.apply_featurizer(featurizer, image)

# Build prompt with image placeholder
prompt = """
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>
<|im_start|>assistant
"""

text_inputs = Bumblebee.apply_tokenizer(tokenizer, prompt)
inputs = Map.merge(text_inputs, image_inputs)

# Run inference
outputs = Axon.predict(model_info.model, model_info.params, inputs)

Parameter Loading

All parameters load correctly with no warnings:

Vision encoder: patch_embed, pos_embed, blocks.{0-23}, merger
DeepStack mergers: deepstack_merger_list.{0-2} (9 params total)
Text decoder: embedder, decoder.blocks.{0-27}, output_norm, lm_head

References

@Skip

Add support for Qwen3-VL/Qwen2-VL vision-language models with: - Multimodal model (lib/bumblebee/multimodal/qwen3_vl.ex): - Combines vision encoder with Qwen3 text decoder - Visual embedding substitution (replaces image/video tokens) - Supports both image and video inputs via temporal dimension - Uses Qwen3 text model as decoder backbone - Vision encoder (lib/bumblebee/vision/qwen3_vl_vision.ex): - Patch embedding with 3D conv support (temporal + spatial) - Uses Layers.Transformer.blocks/2 as per best practices - Spatial patch merger with MLP projection - Rotary position embeddings (no learned pos embeds) - Featurizer (lib/bumblebee/vision/qwen3_vl_featurizer.ex): - Image and video preprocessing - Temporal dimension handling for video frames - Bicubic resize and normalization - Registrations in bumblebee.ex: - Qwen2VLForConditionalGeneration architecture - Qwen3VLForConditionalGeneration architecture - Featurizer and tokenizer mappings Test outputs match Python reference values to 4 decimal places. Note: Test is marked @Skip pending upload of tiny-random checkpoint to bumblebee-testing HuggingFace organization.

- Remove "model." prefix from text model HF paths since the loader infers and adds this prefix automatically - Fix vision encoder FFN layer names (fc1/fc2 -> linear_fc1/linear_fc2) - Fix vision merger layer names to match Qwen3VL checkpoint structure - Re-enable QK-norm for text model (Qwen3-VL does use it, unlike Qwen2VL) The model now loads correctly with all text and vision encoder parameters properly mapped. Only DeepStack merger and position embedding params remain unused (expected - these are optional features).

- Fix process_frame argument order (frame, featurizer) to match pipe usage - Add automatic image resizing to dimensions compatible with patch_size * merge_size - Handle different size config formats (height/width vs shortest_edge) - Update batch_template to handle various size formats Note: Vision encoder currently requires square images. Non-square support needs grid dimension tracking in patch merger.

…n encoder The vision encoder was producing incorrect image descriptions because it used 1D sequential positions for rotary embedding instead of 2D spatial coordinates. Changes: - Implement compute_2d_rotary_embedding/4 that computes separate row and column frequencies for each patch based on its grid position - Create custom vision_transformer_blocks/5 with 2D rotary support since Layers.Transformer.blocks only supports 1D positions - Add vision_attention_with_2d_rotary/5 for self-attention with 2D rotary - Implement apply_2d_rotary_embedding/4, split_rotary/2, rotate_half/1 - Add bilinear interpolation for learned position embeddings to match Python's fast_pos_embed_interpolate (48x48 grid to actual grid size) - Update parameter mapping for new layer names The fix ensures the vision encoder correctly captures spatial relationships between image patches, producing descriptions that match Python's output.

- Fix vision config loader to handle both embed_dim (Qwen2-VL) and hidden_size (Qwen3-VL) config formats - Also read intermediate_size directly from config when available - Update test with correct reference values from Python (transformers 4.57.3)

@tag

- Remove @tag :skip from test - Use roulis/tiny-random-Qwen3VLForConditionalGeneration checkpoint - Test validates text-only inference matches Python reference values

Qwen2-VL uses different parameter names (mlp.fc1 vs mlp.linear_fc1) so the current implementation only supports Qwen3-VL.

- Interactive example for image description with Qwen3-VL - Python code to generate tiny test model - Reference values comparison table (Python vs Elixir) - Implementation notes on 2D spatial rotary embeddings

- Add deepstack_merger function to vision encoder with postshuffle norm - Extract hidden states from encoder layers and pass through mergers - Add post_block_hook option to Layers.Transformer.blocks for injection - Document DeepStack decoder injection as TODO (not critical for function)

- Build text decoder directly to enable post_block_hook usage - Extract deepstack features from vision encoder output - Create visual position mask from image/video token IDs - Inject deepstack features at text decoder layers 0, 1, 2 - Add gated_ffn helper function for Qwen3 architecture DeepStack adds multi-scale visual information by: 1. Extracting hidden states from vision encoder layers [5, 11, 17] 2. Passing through separate merger MLPs (postshuffle norm) 3. Adding features to visual token positions in decoder layers

- Updated task description to specify Bumblebee Qwen3-VL integration - Reference: elixir-nx/bumblebee#442 - PR adds full Qwen3-VL vision-language model support with DeepStack

- Convert 6 .exs files to separate Mix apps in apps/ directory: * qwen_image_edit_plus - AI image editing * qwen3vl_inference - Vision-language inference * service_dashboard - Zenoh service monitoring * tris_to_quads_livebook - 3D mesh optimization * unirig_generation - Character rigging * zimage_generation - Fast image generation - Each app has proper mix.exs, application supervision, CLI interface - Remove original elixir/ directory after successful migration - All apps compile successfully with correct dependencies Complete app isolation and QA - Move thirdparty dependencies into individual app directories: * Optimized-Tris-to-Quads-Converter/ -> tris_to_quads_livebook/thirdparty/ * UniRig/ -> unirig_generation/thirdparty/ * RobustSkinWeightsTransferCode/ -> unirig_generation/thirdparty/ * meshoptimizer/ -> tris_to_quads_livebook/thirdparty/ - Update .gitignore with app-specific thirdparty patterns - Restore functionality in unirig_generation and tris_to_quads_livebook - Fix dependency issues (removed zenoh from tris_to_quads_livebook) - All 6 migrated apps compile successfully - Apps are now fully isolated with their own dependencies Move Pythonx uv_init configurations to Elixir config files - Add config/config.exs files for all apps using Pythonx - Configure Pythonx.uv_init in Elixir config instead of hardcoded strings - Remove manual Pythonx.uv_init() calls from application code - Python environments now initialize automatically at compile time Apps updated: - zimage_generation (diffusers/torch) - tris_to_quads_livebook (Blender/pulp) - qwen3vl_inference (transformers/torch) - qwen_image_edit_plus (diffusers/torch) Convert main functions to Mix tasks for all apps - Create lib/mix/tasks/ for each app with main/1 function - Move main logic to Mix.Task.run/1 functions - Update main modules to call Mix tasks for backward compatibility - Fix compiler warnings for unused parameters Apps converted: - zimage_generation: mix zimage_generation - tris_to_quads_livebook: mix tris_to_quads_livebook - qwen3vl_inference: mix qwen3vl_inference - qwen_image_edit_plus: mix qwen_image_edit_plus All tasks tested and working. Apps can now be run with standard Mix task syntax. Add Pythonx uv_init config and Mix task for unirig_generation - Create config/config.exs with UniRig dependencies (torch, bpy, etc.) - Move Pythonx.uv_init configuration from hardcoded string to Elixir config - Create lib/mix/tasks/unirig_generation.ex Mix task - Update main module to call Mix task for backward compatibility Now all apps using Pythonx follow the same pattern: - Python environments initialize automatically via config - Apps can be run with mix <app_name> tasks - Backward compatibility maintained with main/1 functions Reorganize apps/ directory into apps_forge/ and apps_tools/ - Create apps_forge/ for core forge infrastructure apps: - forge-client (main client) - livebook_executor (Livebook execution service) - ra_mailbox (RA mailbox service) - service_dashboard (service dashboard) - Create apps_tools/ for AI/ML tool apps (former exs scripts): - qwen_image_edit_plus (image editing) - qwen3vl_inference (vision-language inference) - tris_to_quads_livebook (3D mesh optimization) - unirig_generation (3D rigging) - zimage_generation (image generation) - Remove empty apps/ directory - All apps tested and working after reorganization - Git tracked all moves properly as renames This provides clear separation between core forge infrastructure and AI/ML tools. Implement Z-Image-Turbo image generation - Add ZImagePipeline from diffusers for AI image generation - Load Tongyi-MAI/Z-Image-Turbo model from Hugging Face - Configure CPU usage to avoid GPU memory issues - Generate images from text prompts with configurable parameters - Save generated images as PNG files - Add proper error handling and logging - Enable memory efficient attention slicing Add schedule database files to docs/ - docs/schedule.db: Main schedule database with task tracking - docs/schedule_archived.db: Archived completed tasks with timestamps - docs/schedule_completed.db: Completed tasks with completion timestamps All databases contain empty tables with preserved schemas for task management system. Fix app output handling and UniRig configuration - Fix zimage_generation to save outputs in output/ directory (git-ignored) - Fix unirig_generation to use FBX output format for skeleton generation - Remove obsolete unirig_generation.ex compatibility file Prevents generated files from being accidentally committed to git. Add migration task to schedule database - Added task 'migrate-apps-tools-001': Migrate completely apps_tools to elixir apps and quality assure so they work - Feature: App Migration - Priority: 3 (medium) - Estimated hours: 16 - Status: Ready to start (0 hours elapsed) Split migration task into individual app tasks Replaced single 'migrate-apps-tools-001' task with 5 individual tasks: - migrate-qwen-image-edit-plus: 4 hours - migrate-qwen3vl-inference: 6 hours - migrate-tris-to-quads-livebook: 4 hours - migrate-unirig-generation: 8 hours - migrate-zimage-generation: 4 hours Total estimated hours: 26 (up from 16) Each app can now be tracked and migrated independently. Assign unique priority levels to migration tasks Updated priority levels to ensure no overlaps: - migrate-unirig-generation: 5 (highest - most complex 3D rigging) - migrate-qwen3vl-inference: 4 (high - complex inference) - migrate-zimage-generation: 3 (medium - image generation) - migrate-qwen-image-edit-plus: 2 (medium-low - image editing) - migrate-tris-to-quads-livebook: 1 (lowest - utility conversion) Priorities now range from 1-5 with no duplicates for clear task ordering. Split each app migration into separate migration and QA tasks Expanded from 5 to 10 tasks total: - 5 Migration tasks: migrate-[app] (structural changes) - 5 QA tasks: qa-[app] (testing and validation) Time allocation: - Migration: 75% of original estimate (actual code changes) - QA: 25% of original estimate (testing and bug fixes) Total hours remain 26, but now with clearer separation of concerns. Migration tasks must complete before corresponding QA tasks. Assign unique priority levels 1-10 to all migration and QA tasks Priority assignment (10=highest, 1=lowest): 10: migrate-unirig-generation (most complex migration) 9: qa-unirig-generation (QA for most complex) 8: migrate-qwen3vl-inference 7: qa-qwen3vl-inference 6: migrate-zimage-generation 5: qa-zimage-generation 4: migrate-qwen-image-edit-plus 3: qa-qwen-image-edit-plus 2: migrate-tris-to-quads-livebook 1: qa-tris-to-quads-livebook (least complex) Ensures clear task ordering with no priority conflicts. Mark zimage_generation migration and QA tasks as completed Updated based on git commit history (2026-02-04): - migrate-zimage-generation: 3 hours completed (curr_est = elapsed = 3) - qa-zimage-generation: 1 hour completed (curr_est = elapsed = 1) Both tasks now show 0 remaining hours. Total project remaining: 22 hours (down from 26). Move completed zimage_generation tasks to schedule_completed Moved completed tasks from active schedule to completed archive: - migrate-zimage-generation (3 hours completed) - qa-zimage-generation (1 hour completed) Both marked with completion timestamp: 2026-02-05T00:26:41Z Active schedule now has 8 remaining tasks (22 hours total). Complete QA for tris_to_quads_livebook app - Fixed unused variable warning in tris_to_quads_livebook.ex - Verified app functionality: mesh optimization and MIME handling work correctly - Marked qa-tris-to-quads-livebook as completed (1 hour) - Moved completed task to schedule_completed with timestamp 2026-02-05T00:27:57Z Active tasks remaining: 7 (21 hours total) Completed tasks: 3 (5 hours total) Complete directory rename: tris_to_quads_livebook → tris_to_quads - Renamed directory using git mv (preserves history) - Updated all Elixir module names: TrisToQuadsLivebook → TrisToQuads - Updated mix.exs app name and module references - Updated config.exs Python project name - Updated schedule database task IDs in both active and completed tables - Renamed files: tris_to_quads_livebook.ex → tris_to_quads.ex - Updated Mix task name and documentation - Verified functionality works correctly after rename The Optimized-Tris-to-Quads-Converter thirdparty directory is preserved. Fix task execution order: all migrations before QA tasks Reorganized priorities to ensure proper workflow: 1. All migration tasks (10-7): migrate-* 2. All QA tasks (6-4): qa-* This prevents QA from happening before migration for any app. Execution order now: migrate → migrate → migrate → migrate → qa → qa → qa Implement complete Blender + Pulp integration for tris_to_quads - Create tris_to_quads_optimizer.py with full Optimized-Tris-to-Quads-Converter algorithm - Implement Pulp linear programming optimization with proper constraints - Add edge validity checking for triangular faces - Integrate Blender bmesh API for mesh processing - Add dissolve_edges and select_face_by_sides operations - Update Mix task to use Pythonx for Blender integration - Add fallback to demo mode when Blender/Pulp unavailable The implementation now fully uses Blender and Pulp as required by the Optimized-Tris-to-Quads-Converter. Update schedule: mark tris_to_quads migration complete and add QA task - Mark migrate-tris-to-quads as completed (3 hours elapsed) - Add qa-tris-to-quads task for quality assurance of the tris_to_quads app - tris_to_quads now fully implements Blender + Pulp optimization as required Update qwen3vl_inference migration task to include Bumblebee PR #442 - Updated task description to specify Bumblebee Qwen3-VL integration - Reference: elixir-nx/bumblebee#442 - PR adds full Qwen3-VL vision-language model support with DeepStack Update tris_to_quads: create triangular test mesh for Blender optimization - Replace cube mesh with custom triangular mesh for optimization testing - Triangular faces are required for tris-to-quads conversion algorithm - Mesh includes multiple connected triangles to test edge optimization Split tris_to_quads migration: separate basic migration from real asset processing - Update migrate-tris-to-quads: mark as basic migration (completed) - Add tris-to-quads-asset-processing: implement GLTF/GLB import and USDc export - Current implementation only does demo mesh optimization, not real asset processing - Real functionality needs to import GLTF/GLB files and export as USDc Move tris-to-quads-asset-processing to top priority - Set tris-to-quads-asset-processing priority to 1 (highest priority) - This task implements real GLTF/GLB import and USDc export functionality - Critical for completing the tris_to_quads app's core asset processing capability Migrated the task db back to personal schedule. Migrate tris_to_quads to Elixir app structure and create data_utils utility app - Complete tris_to_quads migration to proper Elixir app - Add Blender integration for USDC/GLTF file loading - Remove legacy heuristic implementation and demo mode - Make --file parameter required for mesh optimization - Create separate data_utils app for MIME encoding/decoding utilities - Clean up codebase and improve error handling Clean up tris_to_quads app duplication - Remove duplicate tris_to_quads.ex wrapper module - Remove unnecessary dependencies (nx, exla, mime, jason) - Remove escript configuration (no CLI module) - Remove __pycache__ directory - App now has clean, minimal structure focused on mix task Migrate tris_to_quads_livebook to Elixir app structure - Migrate tris_to_quads functionality from Livebook to Elixir app structure - Add comprehensive tris_to_quads Elixir app with mix task support - Support for USDC, USD, GLTF, GLB, and FBX file formats with Blender integration - Implement timestamped output folder system for organized results - Add vertex group preservation for bone weights and rigging data - Enhanced error handling and user feedback in the mix task - Clean up unnecessary files from previous app structure changes Migrate Elixir scripts to independent Mix applications - Convert 6 .exs files to separate Mix apps in apps/ directory: * qwen_image_edit_plus - AI image editing * qwen3vl_inference - Vision-language inference * service_dashboard - Zenoh service monitoring * tris_to_quads_livebook - 3D mesh optimization * unirig_generation - Character rigging * zimage_generation - Fast image generation - Each app has proper mix.exs, application supervision, CLI interface - Remove original elixir/ directory after successful migration - All apps compile successfully with correct dependencies Complete app isolation and QA - Move thirdparty dependencies into individual app directories: * Optimized-Tris-to-Quads-Converter/ -> tris_to_quads_livebook/thirdparty/ * UniRig/ -> unirig_generation/thirdparty/ * RobustSkinWeightsTransferCode/ -> unirig_generation/thirdparty/ * meshoptimizer/ -> tris_to_quads_livebook/thirdparty/ - Update .gitignore with app-specific thirdparty patterns - Restore functionality in unirig_generation and tris_to_quads_livebook - Fix dependency issues (removed zenoh from tris_to_quads_livebook) - All 6 migrated apps compile successfully - Apps are now fully isolated with their own dependencies Move Pythonx uv_init configurations to Elixir config files - Add config/config.exs files for all apps using Pythonx - Configure Pythonx.uv_init in Elixir config instead of hardcoded strings - Remove manual Pythonx.uv_init() calls from application code - Python environments now initialize automatically at compile time Apps updated: - zimage_generation (diffusers/torch) - tris_to_quads_livebook (Blender/pulp) - qwen3vl_inference (transformers/torch) - qwen_image_edit_plus (diffusers/torch) Convert main functions to Mix tasks for all apps - Create lib/mix/tasks/ for each app with main/1 function - Move main logic to Mix.Task.run/1 functions - Update main modules to call Mix tasks for backward compatibility - Fix compiler warnings for unused parameters Apps converted: - zimage_generation: mix zimage_generation - tris_to_quads_livebook: mix tris_to_quads_livebook - qwen3vl_inference: mix qwen3vl_inference - qwen_image_edit_plus: mix qwen_image_edit_plus All tasks tested and working. Apps can now be run with standard Mix task syntax. Add Pythonx uv_init config and Mix task for unirig_generation - Create config/config.exs with UniRig dependencies (torch, bpy, etc.) - Move Pythonx.uv_init configuration from hardcoded string to Elixir config - Create lib/mix/tasks/unirig_generation.ex Mix task - Update main module to call Mix task for backward compatibility Now all apps using Pythonx follow the same pattern: - Python environments initialize automatically via config - Apps can be run with mix <app_name> tasks - Backward compatibility maintained with main/1 functions Reorganize apps/ directory into apps_forge/ and apps_tools/ - Create apps_forge/ for core forge infrastructure apps: - forge-client (main client) - livebook_executor (Livebook execution service) - ra_mailbox (RA mailbox service) - service_dashboard (service dashboard) - Create apps_tools/ for AI/ML tool apps (former exs scripts): - qwen_image_edit_plus (image editing) - qwen3vl_inference (vision-language inference) - tris_to_quads_livebook (3D mesh optimization) - unirig_generation (3D rigging) - zimage_generation (image generation) - Remove empty apps/ directory - All apps tested and working after reorganization - Git tracked all moves properly as renames This provides clear separation between core forge infrastructure and AI/ML tools. Implement Z-Image-Turbo image generation - Add ZImagePipeline from diffusers for AI image generation - Load Tongyi-MAI/Z-Image-Turbo model from Hugging Face - Configure CPU usage to avoid GPU memory issues - Generate images from text prompts with configurable parameters - Save generated images as PNG files - Add proper error handling and logging - Enable memory efficient attention slicing Add schedule database files to docs/ - docs/schedule.db: Main schedule database with task tracking - docs/schedule_archived.db: Archived completed tasks with timestamps - docs/schedule_completed.db: Completed tasks with completion timestamps All databases contain empty tables with preserved schemas for task management system. Fix app output handling and UniRig configuration - Fix zimage_generation to save outputs in output/ directory (git-ignored) - Fix unirig_generation to use FBX output format for skeleton generation - Remove obsolete unirig_generation.ex compatibility file Prevents generated files from being accidentally committed to git. Add migration task to schedule database - Added task 'migrate-apps-tools-001': Migrate completely apps_tools to elixir apps and quality assure so they work - Feature: App Migration - Priority: 3 (medium) - Estimated hours: 16 - Status: Ready to start (0 hours elapsed) Split migration task into individual app tasks Replaced single 'migrate-apps-tools-001' task with 5 individual tasks: - migrate-qwen-image-edit-plus: 4 hours - migrate-qwen3vl-inference: 6 hours - migrate-tris-to-quads-livebook: 4 hours - migrate-unirig-generation: 8 hours - migrate-zimage-generation: 4 hours Total estimated hours: 26 (up from 16) Each app can now be tracked and migrated independently. Assign unique priority levels to migration tasks Updated priority levels to ensure no overlaps: - migrate-unirig-generation: 5 (highest - most complex 3D rigging) - migrate-qwen3vl-inference: 4 (high - complex inference) - migrate-zimage-generation: 3 (medium - image generation) - migrate-qwen-image-edit-plus: 2 (medium-low - image editing) - migrate-tris-to-quads-livebook: 1 (lowest - utility conversion) Priorities now range from 1-5 with no duplicates for clear task ordering. Split each app migration into separate migration and QA tasks Expanded from 5 to 10 tasks total: - 5 Migration tasks: migrate-[app] (structural changes) - 5 QA tasks: qa-[app] (testing and validation) Time allocation: - Migration: 75% of original estimate (actual code changes) - QA: 25% of original estimate (testing and bug fixes) Total hours remain 26, but now with clearer separation of concerns. Migration tasks must complete before corresponding QA tasks. Assign unique priority levels 1-10 to all migration and QA tasks Priority assignment (10=highest, 1=lowest): 10: migrate-unirig-generation (most complex migration) 9: qa-unirig-generation (QA for most complex) 8: migrate-qwen3vl-inference 7: qa-qwen3vl-inference 6: migrate-zimage-generation 5: qa-zimage-generation 4: migrate-qwen-image-edit-plus 3: qa-qwen-image-edit-plus 2: migrate-tris-to-quads-livebook 1: qa-tris-to-quads-livebook (least complex) Ensures clear task ordering with no priority conflicts. Mark zimage_generation migration and QA tasks as completed Updated based on git commit history (2026-02-04): - migrate-zimage-generation: 3 hours completed (curr_est = elapsed = 3) - qa-zimage-generation: 1 hour completed (curr_est = elapsed = 1) Both tasks now show 0 remaining hours. Total project remaining: 22 hours (down from 26). Move completed zimage_generation tasks to schedule_completed Moved completed tasks from active schedule to completed archive: - migrate-zimage-generation (3 hours completed) - qa-zimage-generation (1 hour completed) Both marked with completion timestamp: 2026-02-05T00:26:41Z Active schedule now has 8 remaining tasks (22 hours total). Complete QA for tris_to_quads_livebook app - Fixed unused variable warning in tris_to_quads_livebook.ex - Verified app functionality: mesh optimization and MIME handling work correctly - Marked qa-tris-to-quads-livebook as completed (1 hour) - Moved completed task to schedule_completed with timestamp 2026-02-05T00:27:57Z Active tasks remaining: 7 (21 hours total) Completed tasks: 3 (5 hours total) Complete directory rename: tris_to_quads_livebook → tris_to_quads - Renamed directory using git mv (preserves history) - Updated all Elixir module names: TrisToQuadsLivebook → TrisToQuads - Updated mix.exs app name and module references - Updated config.exs Python project name - Updated schedule database task IDs in both active and completed tables - Renamed files: tris_to_quads_livebook.ex → tris_to_quads.ex - Updated Mix task name and documentation - Verified functionality works correctly after rename The Optimized-Tris-to-Quads-Converter thirdparty directory is preserved. Fix task execution order: all migrations before QA tasks Reorganized priorities to ensure proper workflow: 1. All migration tasks (10-7): migrate-* 2. All QA tasks (6-4): qa-* This prevents QA from happening before migration for any app. Execution order now: migrate → migrate → migrate → migrate → qa → qa → qa Implement complete Blender + Pulp integration for tris_to_quads - Create tris_to_quads_optimizer.py with full Optimized-Tris-to-Quads-Converter algorithm - Implement Pulp linear programming optimization with proper constraints - Add edge validity checking for triangular faces - Integrate Blender bmesh API for mesh processing - Add dissolve_edges and select_face_by_sides operations - Update Mix task to use Pythonx for Blender integration - Add fallback to demo mode when Blender/Pulp unavailable The implementation now fully uses Blender and Pulp as required by the Optimized-Tris-to-Quads-Converter. Update schedule: mark tris_to_quads migration complete and add QA task - Mark migrate-tris-to-quads as completed (3 hours elapsed) - Add qa-tris-to-quads task for quality assurance of the tris_to_quads app - tris_to_quads now fully implements Blender + Pulp optimization as required Update qwen3vl_inference migration task to include Bumblebee PR #442 - Updated task description to specify Bumblebee Qwen3-VL integration - Reference: elixir-nx/bumblebee#442 - PR adds full Qwen3-VL vision-language model support with DeepStack Update tris_to_quads: create triangular test mesh for Blender optimization - Replace cube mesh with custom triangular mesh for optimization testing - Triangular faces are required for tris-to-quads conversion algorithm - Mesh includes multiple connected triangles to test edge optimization Split tris_to_quads migration: separate basic migration from real asset processing - Update migrate-tris-to-quads: mark as basic migration (completed) - Add tris-to-quads-asset-processing: implement GLTF/GLB import and USDc export - Current implementation only does demo mesh optimization, not real asset processing - Real functionality needs to import GLTF/GLB files and export as USDc Move tris-to-quads-asset-processing to top priority - Set tris-to-quads-asset-processing priority to 1 (highest priority) - This task implements real GLTF/GLB import and USDc export functionality - Critical for completing the tris_to_quads app's core asset processing capability Migrated the task db back to personal schedule. Migrate tris_to_quads to Elixir app structure and create data_utils utility app - Complete tris_to_quads migration to proper Elixir app - Add Blender integration for USDC/GLTF file loading - Remove legacy heuristic implementation and demo mode - Make --file parameter required for mesh optimization - Create separate data_utils app for MIME encoding/decoding utilities - Clean up codebase and improve error handling Clean up tris_to_quads app duplication - Remove duplicate tris_to_quads.ex wrapper module - Remove unnecessary dependencies (nx, exla, mime, jason) - Remove escript configuration (no CLI module) - Remove __pycache__ directory - App now has clean, minimal structure focused on mix task Migrate tris_to_quads_livebook to Elixir app structure - Migrate tris_to_quads functionality from Livebook to Elixir app structure - Add comprehensive tris_to_quads Elixir app with mix task support - Support for USDC, USD, GLTF, GLB, and FBX file formats with Blender integration - Implement timestamped output folder system for organized results - Add vertex group preservation for bone weights and rigging data - Enhanced error handling and user feedback in the mix task - Clean up unnecessary files from previous app structure changes Revert "Migrate Elixir scripts to independent Mix applications" This reverts commit bbc3f95027a6357190072516b90015f0ca1df47a. Revert "Finite World Axiom: Remove problematic third-party tools" This reverts commit 1107837.

nyo16 added 11 commits January 5, 2026 22:03

Enable Qwen3-VL test with tiny model from HuggingFace

35479af

- Remove @tag :skip from test - Use roulis/tiny-random-Qwen3VLForConditionalGeneration checkpoint - Test validates text-only inference matches Python reference values

Remove Qwen2-VL mappings (not tested, different param naming)

c805c32

Qwen2-VL uses different parameter names (mlp.fc1 vs mlp.linear_fc1) so the current implementation only supports Qwen3-VL.

Add Qwen3-VL Livebook with examples and test documentation

b07ac6b

- Interactive example for image description with Qwen3-VL - Python code to generate tiny test model - Reference values comparison table (Python vs Elixir) - Implementation notes on 2D spatial rotary embeddings

Remove appendix from Qwen3-VL Livebook

57553e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL vision-language model support#442

Add Qwen3-VL vision-language model support#442
nyo16 wants to merge 11 commits intoelixir-nx:mainfrom
nyo16:feat/qwen3-vl

nyo16 commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Vision Encoder (Bumblebee.Vision.Qwen3VLVision)

Text Decoder

Featurizer (Bumblebee.Vision.Qwen3VLFeaturizer)

DeepStack Implementation

Infrastructure Changes

Files Changed

New Files

Modified Files

Test Results

Usage Example

Parameter Loading

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nyo16 commented Jan 7, 2026 •

edited

Loading

Vision Encoder (`Bumblebee.Vision.Qwen3VLVision`)

Featurizer (`Bumblebee.Vision.Qwen3VLFeaturizer`)