Hi,
I'm seeing the error above when running python -u run_text_generation.py --model_arch llama --model_name huggyllama/llama-13b --recent_ratio 0.1 --heavy_ratio 0.1 \. Please let me know if you have a fix, as I would really like to use this technique.
Notes on reproducibility:
I did make some small modifications to the code (set num_hidden_layers=4) in the model config for faster debugging. I am using transformers 4.44.0.dev0 (this is what gets installed when using the provided install instructions). I'm using a conda environment with python 3.8.