Component: LLM continuous batching, src/llm/language_model/continuous_batching/servable_initializer.cpp
OVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)
Platform: Windows 11, Intel Arc 140T iGPU (OPTIMIZATION_CAPABILITIES includes EXPORT_IMPORT)
Summary
The 2025.4 release notes state:
--cache_dir now enables compilation caching for both classic models and generative pipelines.
In 2026.1 this no longer holds for generative pipelines. When the server is started with --cache_dir <path>, the log line Model cache is enabled: <path> is printed, but the CACHE_DIR property is empty in every OpenVINO Core device-plugin dump that follows, and no .cl_cache / .blob files are written for HttpLLMCalculator nodes. The model is fully recompiled on every restart.
This reproduces both with and without speculative decoding, so the gap is not specific to drafts — it affects any LLMCalculatorOptions node.
Reproduction A — minimal single-model LLM
graph.pbtxt:
node: {
calculator: "HttpLLMCalculator"
...
node_options: {
[type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
max_num_seqs: 256,
device: "GPU",
models_path: "./",
enable_prefix_caching: true,
cache_size: 0,
}
}
...
}
Start:
ovms.exe --config_path config.json --rest_port 8001 --cache_dir C:/some/empty/dir --log_level DEBUG
Send one request, stop. The cache directory remains empty. Log shows Model cache is enabled: <path> but OpenVINO Core plugin: GPU; plugin configuration: { ... CACHE_DIR: , ... }.
Reproduction B — same plus speculative decoding
Add to the same node:
draft_models_path: "OpenVINO-Qwen3-0.6B-int8-ov",
draft_device: "CPU",
Same outcome: log says cache enabled, plugin CACHE_DIR: is empty, directory stays empty.
Diagnosis
For classic models, --cache_dir flows through src/modelinstance.cpp::setCacheOptions → ieCore.set_property(ov::cache_dir(...)). The LLM continuous-batching path bypasses ModelInstance and builds a ContinuousBatchingPipeline directly in src/llm/language_model/continuous_batching/servable_initializer.cpp. Grepping that file (and its neighbours under src/llm/) finds zero references to cache_dir, CACHE_DIR, ov::cache_dir, or ServerSettings.cache_dir. The global flag value is never read on this code path.
The same pattern was reported for the underlying GenAI pipeline in issue openvinotoolkit/openvino.genai#1992. OVMS LLM nodes route through the same GenAI ContinuousBatchingPipeline, so the surface area overlaps.
Workaround
Setting CACHE_DIR inside plugin_config of the LLM node propagates correctly:
node_options: {
[type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
device: "GPU"
models_path: "./"
plugin_config: '{"CACHE_DIR": "C:/some/empty/dir"}'
...
}
}
After this, .cl_cache files appear in the directory and subsequent starts compile materially faster.
Two notes for anyone hitting this:
- Forward slashes are required in the JSON value on Windows. With backslashes, pbtxt unescapes them once and the JSON parser then fails:
[servable_initializer.cpp:203] Error during llm node plugin_config option parsing to JSON: ....
- For speculative decoding, the same
plugin_config likely needs to be set for the draft model as well (not yet verified empirically).
Proposed fix
In the LLM servable_initializer.cpp path, when the global --cache_dir is set and plugin_config["CACHE_DIR"] is not already provided, inject it into the pluginConfig map before constructing the ContinuousBatchingPipeline. This would restore the 2025.4 contract for generative pipelines while keeping explicit plugin_config user overrides authoritative.
Component: LLM continuous batching,
src/llm/language_model/continuous_batching/servable_initializer.cppOVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)
Platform: Windows 11, Intel Arc 140T iGPU (
OPTIMIZATION_CAPABILITIESincludesEXPORT_IMPORT)Summary
The 2025.4 release notes state:
In 2026.1 this no longer holds for generative pipelines. When the server is started with
--cache_dir <path>, the log lineModel cache is enabled: <path>is printed, but theCACHE_DIRproperty is empty in every OpenVINO Core device-plugin dump that follows, and no.cl_cache/.blobfiles are written forHttpLLMCalculatornodes. The model is fully recompiled on every restart.This reproduces both with and without speculative decoding, so the gap is not specific to drafts — it affects any
LLMCalculatorOptionsnode.Reproduction A — minimal single-model LLM
graph.pbtxt:Start:
Send one request, stop. The cache directory remains empty. Log shows
Model cache is enabled: <path>butOpenVINO Core plugin: GPU; plugin configuration: { ... CACHE_DIR: , ... }.Reproduction B — same plus speculative decoding
Add to the same node:
Same outcome: log says cache enabled, plugin
CACHE_DIR:is empty, directory stays empty.Diagnosis
For classic models,
--cache_dirflows throughsrc/modelinstance.cpp::setCacheOptions→ieCore.set_property(ov::cache_dir(...)). The LLM continuous-batching path bypassesModelInstanceand builds aContinuousBatchingPipelinedirectly insrc/llm/language_model/continuous_batching/servable_initializer.cpp. Grepping that file (and its neighbours undersrc/llm/) finds zero references tocache_dir,CACHE_DIR,ov::cache_dir, orServerSettings.cache_dir. The global flag value is never read on this code path.The same pattern was reported for the underlying GenAI pipeline in issue openvinotoolkit/openvino.genai#1992. OVMS LLM nodes route through the same GenAI
ContinuousBatchingPipeline, so the surface area overlaps.Workaround
Setting
CACHE_DIRinsideplugin_configof the LLM node propagates correctly:After this,
.cl_cachefiles appear in the directory and subsequent starts compile materially faster.Two notes for anyone hitting this:
[servable_initializer.cpp:203] Error during llm node plugin_config option parsing to JSON: ....plugin_configlikely needs to be set for the draft model as well (not yet verified empirically).Proposed fix
In the LLM
servable_initializer.cpppath, when the global--cache_diris set andplugin_config["CACHE_DIR"]is not already provided, inject it into thepluginConfigmap before constructing theContinuousBatchingPipeline. This would restore the 2025.4 contract for generative pipelines while keeping explicitplugin_configuser overrides authoritative.