-Below, you'll find sample commands to get started. Alternatively, you can replace the CLI command with docker run ([instructions here](https://docs.vllm.ai/en/latest/deployment/docker.html)) or use [our Pythonic interface](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference), the `LLM` class, for local batch inference. We also recommend checking out the [demo from the Meta team](https://github.com/meta-llama/llama-cookbook/blob/main/getting-started/build_with_llama_4.ipynb) showcasing the 1M long context capability with vLLM.
0 commit comments