Skip to content

Commit b576904

Browse files
authored
release lfm2 24b a2b (#1510)
* update model and language * add licensing note
1 parent 99b1560 commit b576904

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

06_gpu_and_ml/llm-serving/lfm_snapshot.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,22 @@
33
# cmd: ["python", "06_gpu_and_ml/llm-serving/lfm_snapshot.py"]
44
# ---
55

6-
# # Low Latency, Serverless LFM 2 with vLLM and Modal
6+
# # Low Latency, Serverless LFM2 with vLLM and Modal
77

8-
# In this example, we show how to serve Liquid AI's [LFM 2 models](https://www.liquid.ai/liquid-foundation-models)
8+
# In this example, we show how to serve Liquid AI's [LFM2 models](https://www.liquid.ai/liquid-foundation-models)
99
# with [vLLM](https://docs.vllm.ai) with low latency and fast cold starts on Modal.
1010

11-
# The LFM 2 models are not vanilla Transformers -- they have a hybrid architecture,
11+
# The LFM2 models are not vanilla Transformers -- they have a hybrid architecture,
1212
# discovered via an architecture search that optimized for quality, latency, and memory footprint.
1313
# Check out their [technical report](https://arxiv.org/abs/2511.23404v1)
1414
# for more details.
1515

16+
# Here, we run the [24B-A2B variant](https://huggingface.co/LiquidAI/LFM2-24B-A2B) of LFM2,
17+
# described [here](https://www.liquid.ai/blog/lfm2-24b-a2b). This variant is designed
18+
# for efficient inference and includes instruction tuning.
19+
# It is released under the weights-available [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2-24B-A2B/blob/main/LICENSE),
20+
# which restricts commercial use for entities with over $10M in revenue.
21+
1622
# This example demonstrates techniques to run inference at high efficiency,
1723
# including advanced features of both vLLM and Modal.
1824
# For a simpler introduction to LLM serving, see
@@ -22,7 +28,7 @@
2228
# which uses a new, low-latency routing service on Modal designed for latency-sensitive inference workloads.
2329
# This gives us more control over routing, but with increased power comes increased responsibility.
2430

25-
# We also include instructions for cutting cold start times by an order of magnitude using Modal's
31+
# We also include instructions for cutting cold start times using Modal's
2632
# [CPU + GPU memory snapshots](https://modal.com/docs/guide/memory-snapshot).
2733

2834
# Fast cold starts are particularly useful for LLM inference applications
@@ -50,7 +56,7 @@
5056

5157
MINUTES = 60
5258

53-
MODEL_NAME = os.environ.get("MODEL_NAME", "LiquidAI/LFM2-8B-A1B")
59+
MODEL_NAME = os.environ.get("MODEL_NAME", "LiquidAI/LFM2-24B-A2B")
5460
print(f"Running deployment script for model: {MODEL_NAME}")
5561

5662
vllm_image = (

0 commit comments

Comments
 (0)