Would you consider adding the ability to load the model to GPU-1/2/3 for multi-gpu users to help us keep all the things loaded in vram?