The progress bar does not reflect for a long time

Thanks for your excellent work! But i met some questions when i try to use your framework.

I try to run `offloading.py` and `offloading_TP.py` on RTX4090 * 4 machine. As shown in the figure below, the progress bar has not been updated for a long time, but the graphics card usage is close to 100%.
 
![1724296520350](https://github.com/user-attachments/assets/533ddf81-a00c-4dc1-929c-44d8d4769e00)
![image](https://github.com/user-attachments/assets/874bfe69-0663-48c5-ae98-d2600746ef8c)

## The command i used：
`CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=48 torchrun --nproc_per_node=2 test/offloading_TP.py --budget 12288 --prefill 130048 --dataset gs --target llama-7B-128K --on_chip 9 --gamma 16 --target /TriForce/models/Yarn-Llama-2-7b-128k`

## Is there something wrong？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The progress bar does not reflect for a long time #9

The command i used：

Is there something wrong？

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The progress bar does not reflect for a long time #9

Description

The command i used：

Is there something wrong？

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions