Conversation
| # With all this in place, we are ready to define our high-performance, low-latency | ||
| # LFM 2 inference server. | ||
|
|
||
| app = modal.App("examples-lfm-snapshot") |
There was a problem hiding this comment.
🚩 App name uses examples- prefix instead of example-
The app is named "examples-lfm-snapshot" at lfm_snapshot.py:268, but every other app in the llm-serving/ directory uses the "example-" prefix (singular): "example-vllm-inference", "example-vllm-low-latency", "example-sglang-snapshot", etc. The CLAUDE.md guidelines also specify example- prefix with kebab-case. The __main__ block at lfm_snapshot.py:507 correctly references the same "examples-lfm-snapshot" string, so this won't cause a runtime mismatch, but it breaks the naming convention.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| MINUTES = 60 | ||
|
|
||
| MODEL_NAME = os.environ.get("MODEL_NAME", "LiquidAI/LFM2-8B-A1B") |
There was a problem hiding this comment.
🚩 Model revision is not pinned, unlike other examples
The internal CLAUDE.md guidelines explicitly state: "Always pin model revisions to avoid surprises when upstream repos update". The vllm_low_latency.py example pins MODEL_REVISION and passes --revision to the vLLM CLI (vllm_low_latency.py:69-71, vllm_low_latency.py:276-277). This example does not pin a revision and does not pass --revision in the vLLM command at lines 296-317. If LiquidAI pushes a breaking update to their HuggingFace repo, this example could break silently.
Was this helpful? React with 👍 or 👎 to provide feedback.
| "--max-cudagraph-capture-size", | ||
| f"{MAX_INPUTS}", |
There was a problem hiding this comment.
🚩 --max-cudagraph-capture-size CLI flag may not exist in vLLM v0.15.1
The vLLM serve command at lines 314-315 uses --max-cudagraph-capture-size, but this flag name doesn't appear in any other vLLM CLI invocation in the repo — only as a config dictionary key in gpt_oss_inference.py:136. Other vLLM examples pass cudagraph-related settings differently (e.g., through --compilation-config or not at all). Since the base image is vllm/vllm-openai:v0.15.1 (a future version I can't verify), I can't confirm whether this CLI flag is valid. If it isn't recognized, vLLM will fail to start.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
lgtm |
Uh oh!
There was an error while loading. Please reload this page.