Releases · nearai/cvm-compose-files · GitHub

06 Mar 23:06

Evrard-Nil

v0.0.52 Latest

Latest

Full Changelog: v0.0.51...v0.0.52

Assets 2

06 Mar 18:23

Evrard-Nil

v0.0.50

Changes

Update nearaidev/vllm-proxy-rs image digest across GLM-5, small-models, and Qwen3.5-122B configs (#3)
Remove unrecognized vLLM args (--max-cudagraph-capture-size, --stream-interval) from gpt-oss-120b (#4)

Assets 2

06 Mar 11:34

Evrard-Nil

v0.0.49

Changes

Increase nginx proxy_read_timeout from 300s to 3600s across all CVM configs — fixes timeout errors for long-context inference requests (100K+ tokens) where prefill exceeds 5 minutes (#1)
Update gpt-oss vLLM image to newer version (#2)

Assets 2

05 Mar 16:10

Evrard-Nil

v0.0.48

Update gpt-oss vllm image digest

Assets 2

05 Mar 09:04

Evrard-Nil

v0.0.47

Update vllm-proxy-rs to d46dd03 (attestation cache, GPU evidence serialization, retry on failure)

Assets 2

04 Mar 13:51

Evrard-Nil

v0.0.46

fix: update all vllm/vllm-openai images to new digest to address CUDA/NCCL crashes
fix: reduce gpt-oss-120b GPU memory utilization to 0.90

Assets 2

04 Mar 13:24

Evrard-Nil

v0.0.45

fix: reduce gpt-oss-120b GPU memory utilization from 0.95 to 0.90 to address CUDA OOM crashes

Assets 2

04 Mar 12:57

Evrard-Nil

v0.0.44

Changes

feat: Add model-proxy registrar sidecar to all model configs (DeepSeek-V3.1, GLM-5, Qwen3.5-122B, small-models) for automatic endpoint/model registration with the proxy fleet
fix: Remove prefill_token_shift and num_draft_tokens from Qwen3-30B speculative config — these params were removed in vLLM v0.16.0

Assets 2

02 Mar 11:36

Evrard-Nil

v0.0.43

Changes

Remove LMCache entirely — lmcache image, env vars, and --kv-transfer-config flags removed to fix crashes
Upgrade all vLLM images to v0.16.0 (sha256:4801151759655c57606c844662e5213403c032a62d149c7ce61d615759a821ef)
GPT-OSS-120B: --max-num-seqs 128→64, --max-num-batched-tokens 8K→16K
Qwen3-30B-A3B: --max-num-batched-tokens 16K→24K
Qwen3-VL-30B-A3B: add --gpu-memory-utilization 0.95, --max-model-len 32768, --max-num-seqs 64, --max-num-batched-tokens 16K (was completely unconfigured)

Assets 2

02 Mar 08:01

Evrard-Nil

Add cloud-api usage reporting & JSON logs

Add CLOUD_API_URL=https://cloud-api.near.ai to all Rust proxy services (small-models, Qwen3.5-122B)
Fix MODEL_NAME in GLM-5.yaml: zai-org/GLM-5 → zai-org/GLM-5-FP8 (was causing 404 on usage reporting)
Add LOG_FORMAT=json to all proxy services for structured logging

Assets 2