Thank you greatly for your remarkable efforts and significant contributions to the open-source community.
I noticed that the TinyLlama_v1.1 model supports a maximum context length of only 2k. How does TinyLlama_v1.1 propose tokens for the target model when the requested prefill length exceeds this 2k limit?