--cache_dir not propagated to LLM continuous batching pipeline (regression vs 2025.4 promise)

**Component:** LLM continuous batching, `src/llm/language_model/continuous_batching/servable_initializer.cpp`

**OVMS version:** 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)

**Platform:** Windows 11, Intel Arc 140T iGPU (`OPTIMIZATION_CAPABILITIES` includes `EXPORT_IMPORT`)

**Summary**

The 2025.4 release notes state:

> `--cache_dir` now enables compilation caching for both classic models and generative pipelines.

In 2026.1 this no longer holds for generative pipelines. When the server is started with `--cache_dir <path>`, the log line `Model cache is enabled: <path>` is printed, but the `CACHE_DIR` property is empty in every OpenVINO Core device-plugin dump that follows, and no `.cl_cache` / `.blob` files are written for `HttpLLMCalculator` nodes. The model is fully recompiled on every restart.

This reproduces both with and without speculative decoding, so the gap is not specific to drafts — it affects any `LLMCalculatorOptions` node.

**Reproduction A — minimal single-model LLM**

`graph.pbtxt`:

    node: {
      calculator: "HttpLLMCalculator"
      ...
      node_options: {
        [type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
          max_num_seqs: 256,
          device: "GPU",
          models_path: "./",
          enable_prefix_caching: true,
          cache_size: 0,
        }
      }
      ...
    }

Start:

    ovms.exe --config_path config.json --rest_port 8001 --cache_dir C:/some/empty/dir --log_level DEBUG

Send one request, stop. The cache directory remains empty. Log shows `Model cache is enabled: <path>` but `OpenVINO Core plugin: GPU; plugin configuration: { ... CACHE_DIR: , ... }`.

**Reproduction B — same plus speculative decoding**

Add to the same node:

    draft_models_path: "OpenVINO-Qwen3-0.6B-int8-ov",
    draft_device: "CPU",

Same outcome: log says cache enabled, plugin `CACHE_DIR:` is empty, directory stays empty.

**Diagnosis**

For classic models, `--cache_dir` flows through `src/modelinstance.cpp::setCacheOptions` → `ieCore.set_property(ov::cache_dir(...))`. The LLM continuous-batching path bypasses `ModelInstance` and builds a `ContinuousBatchingPipeline` directly in `src/llm/language_model/continuous_batching/servable_initializer.cpp`. Grepping that file (and its neighbours under `src/llm/`) finds zero references to `cache_dir`, `CACHE_DIR`, `ov::cache_dir`, or `ServerSettings.cache_dir`. The global flag value is never read on this code path.

The same pattern was reported for the underlying GenAI pipeline in issue openvinotoolkit/openvino.genai#1992. OVMS LLM nodes route through the same GenAI `ContinuousBatchingPipeline`, so the surface area overlaps.

**Workaround**

Setting `CACHE_DIR` inside `plugin_config` of the LLM node propagates correctly:

```
    node_options: {
      [type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
        device: "GPU"
        models_path: "./"
        plugin_config: '{"CACHE_DIR": "C:/some/empty/dir"}'
        ...
      }
    }
```

After this, `.cl_cache` files appear in the directory and subsequent starts compile materially faster.

Two notes for anyone hitting this:
- Forward slashes are required in the JSON value on Windows. With backslashes, pbtxt unescapes them once and the JSON parser then fails: `[servable_initializer.cpp:203] Error during llm node plugin_config option parsing to JSON: ...`.
- For speculative decoding, the same `plugin_config` likely needs to be set for the draft model as well (not yet verified empirically).

**Proposed fix**

In the LLM `servable_initializer.cpp` path, when the global `--cache_dir` is set and `plugin_config["CACHE_DIR"]` is not already provided, inject it into the `pluginConfig` map before constructing the `ContinuousBatchingPipeline`. This would restore the 2025.4 contract for generative pipelines while keeping explicit `plugin_config` user overrides authoritative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--cache_dir not propagated to LLM continuous batching pipeline (regression vs 2025.4 promise) #4230

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

--cache_dir not propagated to LLM continuous batching pipeline (regression vs 2025.4 promise) #4230

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions