vllm:request_max_num_generation_tokens - This is the minimum of max-model-len - prompt length and max_tokens if defined.