In the examples you guys didn't mention how to specify parameters like batch size, max input length etc.
My first question is how to change the max input length, I tried the llama2 example for a RAG usage case. llama2 should be able to handle 4096 input tokens but it's limited to 1024 for some reason.
Similarly though I don't feel batching is a good idea on cpu, I still want to try batched inference with this package. is there a document for how to configure those things?