Hi @tuidan , Thanks for open-sourcing this work. Really appreciate it. I am wondering if I can compress the chat-based models using this approach. Or do I need to do any additional steps? Looking forward, thanks.