Skip to content

Add Qwen3-VL vision-language model support#442

Open
nyo16 wants to merge 11 commits intoelixir-nx:mainfrom
nyo16:feat/qwen3-vl
Open

Add Qwen3-VL vision-language model support#442
nyo16 wants to merge 11 commits intoelixir-nx:mainfrom
nyo16:feat/qwen3-vl

Conversation

@nyo16
Copy link
Contributor

@nyo16 nyo16 commented Jan 7, 2026

Summary

This PR adds full support for the Qwen3-VL vision-language model family, enabling image-to-text generation with Bumblebee.

Model: Qwen/Qwen3-VL-2B-Instruct (and other sizes)
Architecture: Qwen3VLForConditionalGeneration

Features

Vision Encoder (Bumblebee.Vision.Qwen3VLVision)

  • 3D convolution patch embedding (supports video temporal dimension)
  • 2D spatial rotary position embeddings for accurate spatial understanding
  • Bilinear interpolation for position embeddings (handles variable image sizes)
  • Patch merger with spatial reduction (2x2 → 1)
  • DeepStack feature extraction from layers [5, 11, 17]

Text Decoder

  • Based on Qwen3 architecture with QK-norm
  • Visual token substitution (replaces image placeholder tokens with vision embeddings)
  • DeepStack injection at decoder layers [0, 1, 2]
  • Full rotary position embedding support

Featurizer (Bumblebee.Vision.Qwen3VLFeaturizer)

  • Image preprocessing with configurable resize
  • Automatic padding to patch-aligned dimensions
  • Support for both images and video frames
  • Outputs flattened patches: {num_patches, channels * temporal * patch_h * patch_w}

DeepStack Implementation

DeepStack provides multi-scale visual information by:

  1. Extracting hidden states from vision encoder layers [5, 11, 17] (1-indexed)
  2. Passing each through separate merger MLPs with postshuffle norm (norm AFTER spatial merge)
  3. Injecting features into text decoder at layers [0, 1, 2]
  4. Formula: hidden_states[visual_mask] += deepstack_features[layer_idx]

Infrastructure Changes

  • Added post_block_hook option to Layers.Transformer.blocks for per-layer injection

Files Changed

New Files

  • lib/bumblebee/multimodal/qwen3_vl.ex - Main VL model
  • lib/bumblebee/vision/qwen3_vl_vision.ex - Vision encoder
  • lib/bumblebee/vision/qwen3_vl_featurizer.ex - Image preprocessing
  • test/bumblebee/multimodal/qwen3_vl_test.exs - Tests
  • notebooks/qwen3_vl.livemd - Usage examples

Modified Files

  • lib/bumblebee.ex - Model/featurizer registrations
  • lib/bumblebee/layers/transformer.ex - Added post_block_hook option

Test Results

# Unit test with tiny model
mix test test/bumblebee/multimodal/qwen3_vl_test.exs
1 test, 0 failures

# Real model test with image (448x448)
Image: 448x448x3
Patches: 784 → 196 visual tokens (after merge)
Generated: "This image shows a close-up of a single, small, dark-colored object..."
✓ DeepStack injection test with real image PASSED!

Usage Example

{:ok, model_info} = Bumblebee.load_model({:hf, "Qwen/Qwen3-VL-2B-Instruct"}, type: :bf16)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-VL-2B-Instruct"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Qwen/Qwen3-VL-2B-Instruct"},
  module: Bumblebee.Vision.Qwen3VLFeaturizer)

# Load and process image
image = StbImage.read_file!("photo.jpg")
image_inputs = Bumblebee.apply_featurizer(featurizer, image)

# Build prompt with image placeholder
prompt = """
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>
<|im_start|>assistant
"""

text_inputs = Bumblebee.apply_tokenizer(tokenizer, prompt)
inputs = Map.merge(text_inputs, image_inputs)

# Run inference
outputs = Axon.predict(model_info.model, model_info.params, inputs)

Parameter Loading

All parameters load correctly with no warnings:

  • Vision encoder: patch_embed, pos_embed, blocks.{0-23}, merger
  • DeepStack mergers: deepstack_merger_list.{0-2} (9 params total)
  • Text decoder: embedder, decoder.blocks.{0-27}, output_norm, lm_head

References

nyo16 added 11 commits January 5, 2026 22:03
Add support for Qwen3-VL/Qwen2-VL vision-language models with:

- Multimodal model (lib/bumblebee/multimodal/qwen3_vl.ex):
  - Combines vision encoder with Qwen3 text decoder
  - Visual embedding substitution (replaces image/video tokens)
  - Supports both image and video inputs via temporal dimension
  - Uses Qwen3 text model as decoder backbone

- Vision encoder (lib/bumblebee/vision/qwen3_vl_vision.ex):
  - Patch embedding with 3D conv support (temporal + spatial)
  - Uses Layers.Transformer.blocks/2 as per best practices
  - Spatial patch merger with MLP projection
  - Rotary position embeddings (no learned pos embeds)

- Featurizer (lib/bumblebee/vision/qwen3_vl_featurizer.ex):
  - Image and video preprocessing
  - Temporal dimension handling for video frames
  - Bicubic resize and normalization

- Registrations in bumblebee.ex:
  - Qwen2VLForConditionalGeneration architecture
  - Qwen3VLForConditionalGeneration architecture
  - Featurizer and tokenizer mappings

Test outputs match Python reference values to 4 decimal places.

Note: Test is marked @Skip pending upload of tiny-random checkpoint
to bumblebee-testing HuggingFace organization.
- Remove "model." prefix from text model HF paths since the loader
  infers and adds this prefix automatically
- Fix vision encoder FFN layer names (fc1/fc2 -> linear_fc1/linear_fc2)
- Fix vision merger layer names to match Qwen3VL checkpoint structure
- Re-enable QK-norm for text model (Qwen3-VL does use it, unlike Qwen2VL)

The model now loads correctly with all text and vision encoder parameters
properly mapped. Only DeepStack merger and position embedding params remain
unused (expected - these are optional features).
- Fix process_frame argument order (frame, featurizer) to match pipe usage
- Add automatic image resizing to dimensions compatible with patch_size * merge_size
- Handle different size config formats (height/width vs shortest_edge)
- Update batch_template to handle various size formats

Note: Vision encoder currently requires square images. Non-square support
needs grid dimension tracking in patch merger.
…n encoder

The vision encoder was producing incorrect image descriptions because it
used 1D sequential positions for rotary embedding instead of 2D spatial
coordinates.

Changes:
- Implement compute_2d_rotary_embedding/4 that computes separate row and
  column frequencies for each patch based on its grid position
- Create custom vision_transformer_blocks/5 with 2D rotary support since
  Layers.Transformer.blocks only supports 1D positions
- Add vision_attention_with_2d_rotary/5 for self-attention with 2D rotary
- Implement apply_2d_rotary_embedding/4, split_rotary/2, rotate_half/1
- Add bilinear interpolation for learned position embeddings to match
  Python's fast_pos_embed_interpolate (48x48 grid to actual grid size)
- Update parameter mapping for new layer names

The fix ensures the vision encoder correctly captures spatial relationships
between image patches, producing descriptions that match Python's output.
- Fix vision config loader to handle both embed_dim (Qwen2-VL) and
  hidden_size (Qwen3-VL) config formats
- Also read intermediate_size directly from config when available
- Update test with correct reference values from Python (transformers 4.57.3)
- Remove @tag :skip from test
- Use roulis/tiny-random-Qwen3VLForConditionalGeneration checkpoint
- Test validates text-only inference matches Python reference values
Qwen2-VL uses different parameter names (mlp.fc1 vs mlp.linear_fc1)
so the current implementation only supports Qwen3-VL.
- Interactive example for image description with Qwen3-VL
- Python code to generate tiny test model
- Reference values comparison table (Python vs Elixir)
- Implementation notes on 2D spatial rotary embeddings
- Add deepstack_merger function to vision encoder with postshuffle norm
- Extract hidden states from encoder layers and pass through mergers
- Add post_block_hook option to Layers.Transformer.blocks for injection
- Document DeepStack decoder injection as TODO (not critical for function)
- Build text decoder directly to enable post_block_hook usage
- Extract deepstack features from vision encoder output
- Create visual position mask from image/video token IDs
- Inject deepstack features at text decoder layers 0, 1, 2
- Add gated_ffn helper function for Qwen3 architecture

DeepStack adds multi-scale visual information by:
1. Extracting hidden states from vision encoder layers [5, 11, 17]
2. Passing through separate merger MLPs (postshuffle norm)
3. Adding features to visual token positions in decoder layers
fire added a commit to V-Sekai-fire/elixir-forge that referenced this pull request Feb 5, 2026
- Updated task description to specify Bumblebee Qwen3-VL integration
- Reference: elixir-nx/bumblebee#442
- PR adds full Qwen3-VL vision-language model support with DeepStack
fire added a commit to V-Sekai-fire/elixir-forge that referenced this pull request Feb 5, 2026
- Convert 6 .exs files to separate Mix apps in apps/ directory:
  * qwen_image_edit_plus - AI image editing
  * qwen3vl_inference - Vision-language inference
  * service_dashboard - Zenoh service monitoring
  * tris_to_quads_livebook - 3D mesh optimization
  * unirig_generation - Character rigging
  * zimage_generation - Fast image generation

- Each app has proper mix.exs, application supervision, CLI interface
- Remove original elixir/ directory after successful migration
- All apps compile successfully with correct dependencies

Complete app isolation and QA

- Move thirdparty dependencies into individual app directories:
  * Optimized-Tris-to-Quads-Converter/ -> tris_to_quads_livebook/thirdparty/
  * UniRig/ -> unirig_generation/thirdparty/
  * RobustSkinWeightsTransferCode/ -> unirig_generation/thirdparty/
  * meshoptimizer/ -> tris_to_quads_livebook/thirdparty/

- Update .gitignore with app-specific thirdparty patterns
- Restore functionality in unirig_generation and tris_to_quads_livebook
- Fix dependency issues (removed zenoh from tris_to_quads_livebook)
- All 6 migrated apps compile successfully
- Apps are now fully isolated with their own dependencies

Move Pythonx uv_init configurations to Elixir config files

- Add config/config.exs files for all apps using Pythonx
- Configure Pythonx.uv_init in Elixir config instead of hardcoded strings
- Remove manual Pythonx.uv_init() calls from application code
- Python environments now initialize automatically at compile time

Apps updated:
- zimage_generation (diffusers/torch)
- tris_to_quads_livebook (Blender/pulp)
- qwen3vl_inference (transformers/torch)
- qwen_image_edit_plus (diffusers/torch)

Convert main functions to Mix tasks for all apps

- Create lib/mix/tasks/ for each app with main/1 function
- Move main logic to Mix.Task.run/1 functions
- Update main modules to call Mix tasks for backward compatibility
- Fix compiler warnings for unused parameters

Apps converted:
- zimage_generation: mix zimage_generation
- tris_to_quads_livebook: mix tris_to_quads_livebook
- qwen3vl_inference: mix qwen3vl_inference
- qwen_image_edit_plus: mix qwen_image_edit_plus

All tasks tested and working. Apps can now be run with standard Mix task syntax.

Add Pythonx uv_init config and Mix task for unirig_generation

- Create config/config.exs with UniRig dependencies (torch, bpy, etc.)
- Move Pythonx.uv_init configuration from hardcoded string to Elixir config
- Create lib/mix/tasks/unirig_generation.ex Mix task
- Update main module to call Mix task for backward compatibility

Now all apps using Pythonx follow the same pattern:
- Python environments initialize automatically via config
- Apps can be run with mix <app_name> tasks
- Backward compatibility maintained with main/1 functions

Reorganize apps/ directory into apps_forge/ and apps_tools/

- Create apps_forge/ for core forge infrastructure apps:
  - forge-client (main client)
  - livebook_executor (Livebook execution service)
  - ra_mailbox (RA mailbox service)
  - service_dashboard (service dashboard)

- Create apps_tools/ for AI/ML tool apps (former exs scripts):
  - qwen_image_edit_plus (image editing)
  - qwen3vl_inference (vision-language inference)
  - tris_to_quads_livebook (3D mesh optimization)
  - unirig_generation (3D rigging)
  - zimage_generation (image generation)

- Remove empty apps/ directory
- All apps tested and working after reorganization
- Git tracked all moves properly as renames

This provides clear separation between core forge infrastructure and AI/ML tools.

Implement Z-Image-Turbo image generation

- Add ZImagePipeline from diffusers for AI image generation
- Load Tongyi-MAI/Z-Image-Turbo model from Hugging Face
- Configure CPU usage to avoid GPU memory issues
- Generate images from text prompts with configurable parameters
- Save generated images as PNG files
- Add proper error handling and logging
- Enable memory efficient attention slicing

Add schedule database files to docs/

- docs/schedule.db: Main schedule database with task tracking
- docs/schedule_archived.db: Archived completed tasks with timestamps
- docs/schedule_completed.db: Completed tasks with completion timestamps

All databases contain empty tables with preserved schemas for task management system.

Fix app output handling and UniRig configuration

- Fix zimage_generation to save outputs in output/ directory (git-ignored)
- Fix unirig_generation to use FBX output format for skeleton generation
- Remove obsolete unirig_generation.ex compatibility file

Prevents generated files from being accidentally committed to git.

Add migration task to schedule database

- Added task 'migrate-apps-tools-001': Migrate completely apps_tools to elixir apps and quality assure so they work
- Feature: App Migration
- Priority: 3 (medium)
- Estimated hours: 16
- Status: Ready to start (0 hours elapsed)

Split migration task into individual app tasks

Replaced single 'migrate-apps-tools-001' task with 5 individual tasks:
- migrate-qwen-image-edit-plus: 4 hours
- migrate-qwen3vl-inference: 6 hours
- migrate-tris-to-quads-livebook: 4 hours
- migrate-unirig-generation: 8 hours
- migrate-zimage-generation: 4 hours

Total estimated hours: 26 (up from 16)
Each app can now be tracked and migrated independently.

Assign unique priority levels to migration tasks

Updated priority levels to ensure no overlaps:
- migrate-unirig-generation: 5 (highest - most complex 3D rigging)
- migrate-qwen3vl-inference: 4 (high - complex inference)
- migrate-zimage-generation: 3 (medium - image generation)
- migrate-qwen-image-edit-plus: 2 (medium-low - image editing)
- migrate-tris-to-quads-livebook: 1 (lowest - utility conversion)

Priorities now range from 1-5 with no duplicates for clear task ordering.

Split each app migration into separate migration and QA tasks

Expanded from 5 to 10 tasks total:
- 5 Migration tasks: migrate-[app] (structural changes)
- 5 QA tasks: qa-[app] (testing and validation)

Time allocation:
- Migration: 75% of original estimate (actual code changes)
- QA: 25% of original estimate (testing and bug fixes)

Total hours remain 26, but now with clearer separation of concerns.
Migration tasks must complete before corresponding QA tasks.

Assign unique priority levels 1-10 to all migration and QA tasks

Priority assignment (10=highest, 1=lowest):
10: migrate-unirig-generation (most complex migration)
9: qa-unirig-generation (QA for most complex)
8: migrate-qwen3vl-inference
7: qa-qwen3vl-inference
6: migrate-zimage-generation
5: qa-zimage-generation
4: migrate-qwen-image-edit-plus
3: qa-qwen-image-edit-plus
2: migrate-tris-to-quads-livebook
1: qa-tris-to-quads-livebook (least complex)

Ensures clear task ordering with no priority conflicts.

Mark zimage_generation migration and QA tasks as completed

Updated based on git commit history (2026-02-04):
- migrate-zimage-generation: 3 hours completed (curr_est = elapsed = 3)
- qa-zimage-generation: 1 hour completed (curr_est = elapsed = 1)

Both tasks now show 0 remaining hours.
Total project remaining: 22 hours (down from 26).

Move completed zimage_generation tasks to schedule_completed

Moved completed tasks from active schedule to completed archive:
- migrate-zimage-generation (3 hours completed)
- qa-zimage-generation (1 hour completed)

Both marked with completion timestamp: 2026-02-05T00:26:41Z
Active schedule now has 8 remaining tasks (22 hours total).

Complete QA for tris_to_quads_livebook app

- Fixed unused variable warning in tris_to_quads_livebook.ex
- Verified app functionality: mesh optimization and MIME handling work correctly
- Marked qa-tris-to-quads-livebook as completed (1 hour)
- Moved completed task to schedule_completed with timestamp 2026-02-05T00:27:57Z

Active tasks remaining: 7 (21 hours total)
Completed tasks: 3 (5 hours total)

Complete directory rename: tris_to_quads_livebook → tris_to_quads

- Renamed directory using git mv (preserves history)
- Updated all Elixir module names: TrisToQuadsLivebook → TrisToQuads
- Updated mix.exs app name and module references
- Updated config.exs Python project name
- Updated schedule database task IDs in both active and completed tables
- Renamed files: tris_to_quads_livebook.ex → tris_to_quads.ex
- Updated Mix task name and documentation
- Verified functionality works correctly after rename

The Optimized-Tris-to-Quads-Converter thirdparty directory is preserved.

Fix task execution order: all migrations before QA tasks

Reorganized priorities to ensure proper workflow:
1. All migration tasks (10-7): migrate-*
2. All QA tasks (6-4): qa-*

This prevents QA from happening before migration for any app.
Execution order now: migrate → migrate → migrate → migrate → qa → qa → qa

Implement complete Blender + Pulp integration for tris_to_quads

- Create tris_to_quads_optimizer.py with full Optimized-Tris-to-Quads-Converter algorithm
- Implement Pulp linear programming optimization with proper constraints
- Add edge validity checking for triangular faces
- Integrate Blender bmesh API for mesh processing
- Add dissolve_edges and select_face_by_sides operations
- Update Mix task to use Pythonx for Blender integration
- Add fallback to demo mode when Blender/Pulp unavailable

The implementation now fully uses Blender and Pulp as required by the Optimized-Tris-to-Quads-Converter.

Update schedule: mark tris_to_quads migration complete and add QA task

- Mark migrate-tris-to-quads as completed (3 hours elapsed)
- Add qa-tris-to-quads task for quality assurance of the tris_to_quads app
- tris_to_quads now fully implements Blender + Pulp optimization as required

Update qwen3vl_inference migration task to include Bumblebee PR #442

- Updated task description to specify Bumblebee Qwen3-VL integration
- Reference: elixir-nx/bumblebee#442
- PR adds full Qwen3-VL vision-language model support with DeepStack

Update tris_to_quads: create triangular test mesh for Blender optimization

- Replace cube mesh with custom triangular mesh for optimization testing
- Triangular faces are required for tris-to-quads conversion algorithm
- Mesh includes multiple connected triangles to test edge optimization

Split tris_to_quads migration: separate basic migration from real asset processing

- Update migrate-tris-to-quads: mark as basic migration (completed)
- Add tris-to-quads-asset-processing: implement GLTF/GLB import and USDc export
- Current implementation only does demo mesh optimization, not real asset processing
- Real functionality needs to import GLTF/GLB files and export as USDc

Move tris-to-quads-asset-processing to top priority

- Set tris-to-quads-asset-processing priority to 1 (highest priority)
- This task implements real GLTF/GLB import and USDc export functionality
- Critical for completing the tris_to_quads app's core asset processing capability

Migrated the task db back to personal schedule.

Migrate tris_to_quads to Elixir app structure and create data_utils utility app

- Complete tris_to_quads migration to proper Elixir app
- Add Blender integration for USDC/GLTF file loading
- Remove legacy heuristic implementation and demo mode
- Make --file parameter required for mesh optimization
- Create separate data_utils app for MIME encoding/decoding utilities
- Clean up codebase and improve error handling

Clean up tris_to_quads app duplication

- Remove duplicate tris_to_quads.ex wrapper module
- Remove unnecessary dependencies (nx, exla, mime, jason)
- Remove escript configuration (no CLI module)
- Remove __pycache__ directory
- App now has clean, minimal structure focused on mix task

Migrate tris_to_quads_livebook to Elixir app structure

- Migrate tris_to_quads functionality from Livebook to Elixir app structure
- Add comprehensive tris_to_quads Elixir app with mix task support
- Support for USDC, USD, GLTF, GLB, and FBX file formats with Blender integration
- Implement timestamped output folder system for organized results
- Add vertex group preservation for bone weights and rigging data
- Enhanced error handling and user feedback in the mix task
- Clean up unnecessary files from previous app structure changes

Migrate Elixir scripts to independent Mix applications

- Convert 6 .exs files to separate Mix apps in apps/ directory:
  * qwen_image_edit_plus - AI image editing
  * qwen3vl_inference - Vision-language inference
  * service_dashboard - Zenoh service monitoring
  * tris_to_quads_livebook - 3D mesh optimization
  * unirig_generation - Character rigging
  * zimage_generation - Fast image generation

- Each app has proper mix.exs, application supervision, CLI interface
- Remove original elixir/ directory after successful migration
- All apps compile successfully with correct dependencies

Complete app isolation and QA

- Move thirdparty dependencies into individual app directories:
  * Optimized-Tris-to-Quads-Converter/ -> tris_to_quads_livebook/thirdparty/
  * UniRig/ -> unirig_generation/thirdparty/
  * RobustSkinWeightsTransferCode/ -> unirig_generation/thirdparty/
  * meshoptimizer/ -> tris_to_quads_livebook/thirdparty/

- Update .gitignore with app-specific thirdparty patterns
- Restore functionality in unirig_generation and tris_to_quads_livebook
- Fix dependency issues (removed zenoh from tris_to_quads_livebook)
- All 6 migrated apps compile successfully
- Apps are now fully isolated with their own dependencies

Move Pythonx uv_init configurations to Elixir config files

- Add config/config.exs files for all apps using Pythonx
- Configure Pythonx.uv_init in Elixir config instead of hardcoded strings
- Remove manual Pythonx.uv_init() calls from application code
- Python environments now initialize automatically at compile time

Apps updated:
- zimage_generation (diffusers/torch)
- tris_to_quads_livebook (Blender/pulp)
- qwen3vl_inference (transformers/torch)
- qwen_image_edit_plus (diffusers/torch)

Convert main functions to Mix tasks for all apps

- Create lib/mix/tasks/ for each app with main/1 function
- Move main logic to Mix.Task.run/1 functions
- Update main modules to call Mix tasks for backward compatibility
- Fix compiler warnings for unused parameters

Apps converted:
- zimage_generation: mix zimage_generation
- tris_to_quads_livebook: mix tris_to_quads_livebook
- qwen3vl_inference: mix qwen3vl_inference
- qwen_image_edit_plus: mix qwen_image_edit_plus

All tasks tested and working. Apps can now be run with standard Mix task syntax.

Add Pythonx uv_init config and Mix task for unirig_generation

- Create config/config.exs with UniRig dependencies (torch, bpy, etc.)
- Move Pythonx.uv_init configuration from hardcoded string to Elixir config
- Create lib/mix/tasks/unirig_generation.ex Mix task
- Update main module to call Mix task for backward compatibility

Now all apps using Pythonx follow the same pattern:
- Python environments initialize automatically via config
- Apps can be run with mix <app_name> tasks
- Backward compatibility maintained with main/1 functions

Reorganize apps/ directory into apps_forge/ and apps_tools/

- Create apps_forge/ for core forge infrastructure apps:
  - forge-client (main client)
  - livebook_executor (Livebook execution service)
  - ra_mailbox (RA mailbox service)
  - service_dashboard (service dashboard)

- Create apps_tools/ for AI/ML tool apps (former exs scripts):
  - qwen_image_edit_plus (image editing)
  - qwen3vl_inference (vision-language inference)
  - tris_to_quads_livebook (3D mesh optimization)
  - unirig_generation (3D rigging)
  - zimage_generation (image generation)

- Remove empty apps/ directory
- All apps tested and working after reorganization
- Git tracked all moves properly as renames

This provides clear separation between core forge infrastructure and AI/ML tools.

Implement Z-Image-Turbo image generation

- Add ZImagePipeline from diffusers for AI image generation
- Load Tongyi-MAI/Z-Image-Turbo model from Hugging Face
- Configure CPU usage to avoid GPU memory issues
- Generate images from text prompts with configurable parameters
- Save generated images as PNG files
- Add proper error handling and logging
- Enable memory efficient attention slicing

Add schedule database files to docs/

- docs/schedule.db: Main schedule database with task tracking
- docs/schedule_archived.db: Archived completed tasks with timestamps
- docs/schedule_completed.db: Completed tasks with completion timestamps

All databases contain empty tables with preserved schemas for task management system.

Fix app output handling and UniRig configuration

- Fix zimage_generation to save outputs in output/ directory (git-ignored)
- Fix unirig_generation to use FBX output format for skeleton generation
- Remove obsolete unirig_generation.ex compatibility file

Prevents generated files from being accidentally committed to git.

Add migration task to schedule database

- Added task 'migrate-apps-tools-001': Migrate completely apps_tools to elixir apps and quality assure so they work
- Feature: App Migration
- Priority: 3 (medium)
- Estimated hours: 16
- Status: Ready to start (0 hours elapsed)

Split migration task into individual app tasks

Replaced single 'migrate-apps-tools-001' task with 5 individual tasks:
- migrate-qwen-image-edit-plus: 4 hours
- migrate-qwen3vl-inference: 6 hours
- migrate-tris-to-quads-livebook: 4 hours
- migrate-unirig-generation: 8 hours
- migrate-zimage-generation: 4 hours

Total estimated hours: 26 (up from 16)
Each app can now be tracked and migrated independently.

Assign unique priority levels to migration tasks

Updated priority levels to ensure no overlaps:
- migrate-unirig-generation: 5 (highest - most complex 3D rigging)
- migrate-qwen3vl-inference: 4 (high - complex inference)
- migrate-zimage-generation: 3 (medium - image generation)
- migrate-qwen-image-edit-plus: 2 (medium-low - image editing)
- migrate-tris-to-quads-livebook: 1 (lowest - utility conversion)

Priorities now range from 1-5 with no duplicates for clear task ordering.

Split each app migration into separate migration and QA tasks

Expanded from 5 to 10 tasks total:
- 5 Migration tasks: migrate-[app] (structural changes)
- 5 QA tasks: qa-[app] (testing and validation)

Time allocation:
- Migration: 75% of original estimate (actual code changes)
- QA: 25% of original estimate (testing and bug fixes)

Total hours remain 26, but now with clearer separation of concerns.
Migration tasks must complete before corresponding QA tasks.

Assign unique priority levels 1-10 to all migration and QA tasks

Priority assignment (10=highest, 1=lowest):
10: migrate-unirig-generation (most complex migration)
9: qa-unirig-generation (QA for most complex)
8: migrate-qwen3vl-inference
7: qa-qwen3vl-inference
6: migrate-zimage-generation
5: qa-zimage-generation
4: migrate-qwen-image-edit-plus
3: qa-qwen-image-edit-plus
2: migrate-tris-to-quads-livebook
1: qa-tris-to-quads-livebook (least complex)

Ensures clear task ordering with no priority conflicts.

Mark zimage_generation migration and QA tasks as completed

Updated based on git commit history (2026-02-04):
- migrate-zimage-generation: 3 hours completed (curr_est = elapsed = 3)
- qa-zimage-generation: 1 hour completed (curr_est = elapsed = 1)

Both tasks now show 0 remaining hours.
Total project remaining: 22 hours (down from 26).

Move completed zimage_generation tasks to schedule_completed

Moved completed tasks from active schedule to completed archive:
- migrate-zimage-generation (3 hours completed)
- qa-zimage-generation (1 hour completed)

Both marked with completion timestamp: 2026-02-05T00:26:41Z
Active schedule now has 8 remaining tasks (22 hours total).

Complete QA for tris_to_quads_livebook app

- Fixed unused variable warning in tris_to_quads_livebook.ex
- Verified app functionality: mesh optimization and MIME handling work correctly
- Marked qa-tris-to-quads-livebook as completed (1 hour)
- Moved completed task to schedule_completed with timestamp 2026-02-05T00:27:57Z

Active tasks remaining: 7 (21 hours total)
Completed tasks: 3 (5 hours total)

Complete directory rename: tris_to_quads_livebook → tris_to_quads

- Renamed directory using git mv (preserves history)
- Updated all Elixir module names: TrisToQuadsLivebook → TrisToQuads
- Updated mix.exs app name and module references
- Updated config.exs Python project name
- Updated schedule database task IDs in both active and completed tables
- Renamed files: tris_to_quads_livebook.ex → tris_to_quads.ex
- Updated Mix task name and documentation
- Verified functionality works correctly after rename

The Optimized-Tris-to-Quads-Converter thirdparty directory is preserved.

Fix task execution order: all migrations before QA tasks

Reorganized priorities to ensure proper workflow:
1. All migration tasks (10-7): migrate-*
2. All QA tasks (6-4): qa-*

This prevents QA from happening before migration for any app.
Execution order now: migrate → migrate → migrate → migrate → qa → qa → qa

Implement complete Blender + Pulp integration for tris_to_quads

- Create tris_to_quads_optimizer.py with full Optimized-Tris-to-Quads-Converter algorithm
- Implement Pulp linear programming optimization with proper constraints
- Add edge validity checking for triangular faces
- Integrate Blender bmesh API for mesh processing
- Add dissolve_edges and select_face_by_sides operations
- Update Mix task to use Pythonx for Blender integration
- Add fallback to demo mode when Blender/Pulp unavailable

The implementation now fully uses Blender and Pulp as required by the Optimized-Tris-to-Quads-Converter.

Update schedule: mark tris_to_quads migration complete and add QA task

- Mark migrate-tris-to-quads as completed (3 hours elapsed)
- Add qa-tris-to-quads task for quality assurance of the tris_to_quads app
- tris_to_quads now fully implements Blender + Pulp optimization as required

Update qwen3vl_inference migration task to include Bumblebee PR #442

- Updated task description to specify Bumblebee Qwen3-VL integration
- Reference: elixir-nx/bumblebee#442
- PR adds full Qwen3-VL vision-language model support with DeepStack

Update tris_to_quads: create triangular test mesh for Blender optimization

- Replace cube mesh with custom triangular mesh for optimization testing
- Triangular faces are required for tris-to-quads conversion algorithm
- Mesh includes multiple connected triangles to test edge optimization

Split tris_to_quads migration: separate basic migration from real asset processing

- Update migrate-tris-to-quads: mark as basic migration (completed)
- Add tris-to-quads-asset-processing: implement GLTF/GLB import and USDc export
- Current implementation only does demo mesh optimization, not real asset processing
- Real functionality needs to import GLTF/GLB files and export as USDc

Move tris-to-quads-asset-processing to top priority

- Set tris-to-quads-asset-processing priority to 1 (highest priority)
- This task implements real GLTF/GLB import and USDc export functionality
- Critical for completing the tris_to_quads app's core asset processing capability

Migrated the task db back to personal schedule.

Migrate tris_to_quads to Elixir app structure and create data_utils utility app

- Complete tris_to_quads migration to proper Elixir app
- Add Blender integration for USDC/GLTF file loading
- Remove legacy heuristic implementation and demo mode
- Make --file parameter required for mesh optimization
- Create separate data_utils app for MIME encoding/decoding utilities
- Clean up codebase and improve error handling

Clean up tris_to_quads app duplication

- Remove duplicate tris_to_quads.ex wrapper module
- Remove unnecessary dependencies (nx, exla, mime, jason)
- Remove escript configuration (no CLI module)
- Remove __pycache__ directory
- App now has clean, minimal structure focused on mix task

Migrate tris_to_quads_livebook to Elixir app structure

- Migrate tris_to_quads functionality from Livebook to Elixir app structure
- Add comprehensive tris_to_quads Elixir app with mix task support
- Support for USDC, USD, GLTF, GLB, and FBX file formats with Blender integration
- Implement timestamped output folder system for organized results
- Add vertex group preservation for bone weights and rigging data
- Enhanced error handling and user feedback in the mix task
- Clean up unnecessary files from previous app structure changes

Revert "Migrate Elixir scripts to independent Mix applications"

This reverts commit bbc3f95027a6357190072516b90015f0ca1df47a.

Revert "Finite World Axiom: Remove problematic third-party tools"

This reverts commit 1107837.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant