Hello, I'm really interested in your project. But it takes much vram to run. Could you please add inference code in 4bits using bitsandbytes or smth