Releases: nearai/cvm-compose-files
Releases · nearai/cvm-compose-files
v0.0.52
Full Changelog: v0.0.51...v0.0.52
v0.0.50
v0.0.49
v0.0.48
- Update gpt-oss vllm image digest
v0.0.47
- Update vllm-proxy-rs to d46dd03 (attestation cache, GPU evidence serialization, retry on failure)
v0.0.46
- fix: update all vllm/vllm-openai images to new digest to address CUDA/NCCL crashes
- fix: reduce gpt-oss-120b GPU memory utilization to 0.90
v0.0.45
- fix: reduce gpt-oss-120b GPU memory utilization from 0.95 to 0.90 to address CUDA OOM crashes
v0.0.44
Changes
- feat: Add model-proxy registrar sidecar to all model configs (DeepSeek-V3.1, GLM-5, Qwen3.5-122B, small-models) for automatic endpoint/model registration with the proxy fleet
- fix: Remove
prefill_token_shiftandnum_draft_tokensfrom Qwen3-30B speculative config — these params were removed in vLLM v0.16.0
v0.0.43
Changes
- Remove LMCache entirely — lmcache image, env vars, and
--kv-transfer-configflags removed to fix crashes - Upgrade all vLLM images to v0.16.0 (
sha256:4801151759655c57606c844662e5213403c032a62d149c7ce61d615759a821ef) - GPT-OSS-120B:
--max-num-seqs128→64,--max-num-batched-tokens8K→16K - Qwen3-30B-A3B:
--max-num-batched-tokens16K→24K - Qwen3-VL-30B-A3B: add
--gpu-memory-utilization 0.95,--max-model-len 32768,--max-num-seqs 64,--max-num-batched-tokens 16K(was completely unconfigured)
Add cloud-api usage reporting & JSON logs
- Add
CLOUD_API_URL=https://cloud-api.near.aito all Rust proxy services (small-models, Qwen3.5-122B) - Fix
MODEL_NAMEin GLM-5.yaml:zai-org/GLM-5→zai-org/GLM-5-FP8(was causing 404 on usage reporting) - Add
LOG_FORMAT=jsonto all proxy services for structured logging