Skip to content

--cache_dir not propagated to LLM continuous batching pipeline (regression vs 2025.4 promise) #4230

@korund

Description

@korund

Component: LLM continuous batching, src/llm/language_model/continuous_batching/servable_initializer.cpp

OVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)

Platform: Windows 11, Intel Arc 140T iGPU (OPTIMIZATION_CAPABILITIES includes EXPORT_IMPORT)

Summary

The 2025.4 release notes state:

--cache_dir now enables compilation caching for both classic models and generative pipelines.

In 2026.1 this no longer holds for generative pipelines. When the server is started with --cache_dir <path>, the log line Model cache is enabled: <path> is printed, but the CACHE_DIR property is empty in every OpenVINO Core device-plugin dump that follows, and no .cl_cache / .blob files are written for HttpLLMCalculator nodes. The model is fully recompiled on every restart.

This reproduces both with and without speculative decoding, so the gap is not specific to drafts — it affects any LLMCalculatorOptions node.

Reproduction A — minimal single-model LLM

graph.pbtxt:

node: {
  calculator: "HttpLLMCalculator"
  ...
  node_options: {
    [type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
      max_num_seqs: 256,
      device: "GPU",
      models_path: "./",
      enable_prefix_caching: true,
      cache_size: 0,
    }
  }
  ...
}

Start:

ovms.exe --config_path config.json --rest_port 8001 --cache_dir C:/some/empty/dir --log_level DEBUG

Send one request, stop. The cache directory remains empty. Log shows Model cache is enabled: <path> but OpenVINO Core plugin: GPU; plugin configuration: { ... CACHE_DIR: , ... }.

Reproduction B — same plus speculative decoding

Add to the same node:

draft_models_path: "OpenVINO-Qwen3-0.6B-int8-ov",
draft_device: "CPU",

Same outcome: log says cache enabled, plugin CACHE_DIR: is empty, directory stays empty.

Diagnosis

For classic models, --cache_dir flows through src/modelinstance.cpp::setCacheOptionsieCore.set_property(ov::cache_dir(...)). The LLM continuous-batching path bypasses ModelInstance and builds a ContinuousBatchingPipeline directly in src/llm/language_model/continuous_batching/servable_initializer.cpp. Grepping that file (and its neighbours under src/llm/) finds zero references to cache_dir, CACHE_DIR, ov::cache_dir, or ServerSettings.cache_dir. The global flag value is never read on this code path.

The same pattern was reported for the underlying GenAI pipeline in issue openvinotoolkit/openvino.genai#1992. OVMS LLM nodes route through the same GenAI ContinuousBatchingPipeline, so the surface area overlaps.

Workaround

Setting CACHE_DIR inside plugin_config of the LLM node propagates correctly:

    node_options: {
      [type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
        device: "GPU"
        models_path: "./"
        plugin_config: '{"CACHE_DIR": "C:/some/empty/dir"}'
        ...
      }
    }

After this, .cl_cache files appear in the directory and subsequent starts compile materially faster.

Two notes for anyone hitting this:

  • Forward slashes are required in the JSON value on Windows. With backslashes, pbtxt unescapes them once and the JSON parser then fails: [servable_initializer.cpp:203] Error during llm node plugin_config option parsing to JSON: ....
  • For speculative decoding, the same plugin_config likely needs to be set for the draft model as well (not yet verified empirically).

Proposed fix

In the LLM servable_initializer.cpp path, when the global --cache_dir is set and plugin_config["CACHE_DIR"] is not already provided, inject it into the pluginConfig map before constructing the ContinuousBatchingPipeline. This would restore the 2025.4 contract for generative pipelines while keeping explicit plugin_config user overrides authoritative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions