[GGUF] Serialize Generated OV Model for Faster LLMPipeline Init (#2218)

Xiake Sun · andrei-kochin · Copilot · web-flow · commit 035b591707ce · 2025-06-13T06:53:11.000Z
**Details:** This PR aim to cache generated OV model from GGUF model in disk for faster subsequent LLMPipeline initialization w/ OpenVINO model cache. - Serialize generated OV model from GGUF model w/ GGUF Reader with properties `ov::genai::enable_save_ov_model` (default value is `false`) - User can check if OV model exists in same folder of GGUF model, load OV model directly instead of creating GGUF model w/ GGUF Reader. - If GGUF model updated, user need to take the responsibility for cache invalidation and re-generate OV model with GGUF Reader. - Use `OPENVINO_LOG_LEVEL` environment variable to control the verbose of GGUF related debug information, details please refer to [DEBUG_LOG.md](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/DEBUG_LOG.md) **Expected behavior:** - Set environment variable: `export OPENVINO_LOG_LEVEL=3` - First run w/ GGUF model: - `build/samples/cpp/text_generation/greedy_causal_lm gguf_models/qwen2.5-0.5b-instruct-q4_0.gguf "Who are you?" ` > [GGUF Reader]: Loading and unpacking model from: gguf_models/qwen2.5-0.5b-instruct-q4_0.gguf [GGUF Reader]: Loading and unpacking model done. Time: 196ms [GGUF Reader]: Start generating OpenVINO model... [GGUF Reader]: Save generated OpenVINO model to: gguf_models/openvino_model.xml done. Time: 466 ms [GGUF Reader]: Model generation done. Time: 757ms I am Qwen, a large language model created by Alibaba Cloud. I am a language model designed to assist users in generating human-like text, such as writing articles, stories, and even writing books. I am trained on a vast corpus of text data, including books, articles, and other written works. I am also trained on a large corpus of human language data, including written and spoken language. I am designed to provide information and insights to users, and to assist them in their tasks and - 2nd run w/ OV model: - `build/samples/cpp/text_generation/greedy_causal_lm gguf_models "Who are you?"` > I am Qwen, a large language model created by Alibaba Cloud. I am a language model designed to assist users in generating human-like text, such as writing articles, stories, and even writing books. I am trained on a vast corpus of text data, including books, articles, and other written works. I am also trained on a large corpus of human language data, including written and spoken language. I am designed to provide information and insights to users, and to assist them in their tasks and --------- Co-authored-by: Andrei Kochin <andrei.kochin@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
diff --git a/src/cpp/include/openvino/genai/llm_pipeline.hpp b/src/cpp/include/openvino/genai/llm_pipeline.hpp
@@ -328,5 +328,13 @@ static constexpr ov::Property<SchedulerConfig> scheduler_config{"scheduler_confi
 */
 static constexpr ov::Property<bool> prompt_lookup{"prompt_lookup"};
 
+/**
+* @brief enable enable_save_ov_model property serves to serialize ov model (xml/bin) generated from gguf model on disk for re-use.
+* Set `true` to activate this mode.
+* And create LLMPipeline instance with this config.
+*/
+static constexpr ov::Property<bool> enable_save_ov_model{"enable_save_ov_model"};
+
+
 }  // namespace genai
 }  // namespace ov
diff --git a/src/cpp/src/continuous_batching/pipeline.cpp b/src/cpp/src/continuous_batching/pipeline.cpp
@@ -52,12 +52,12 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline( const std::filesystem::p
                                                         const ov::AnyMap& tokenizer_properties,
                                                         const ov::AnyMap& vision_encoder_properties) {
     auto start_time = std::chrono::steady_clock::now();
-
     auto properties_without_draft_model = properties;
     auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
     auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
 
     auto model = utils::read_model(models_path, properties);
+    auto [properties_without_draft_model_without_gguf, enable_save_ov_model] = utils::extract_gguf_properties(properties_without_draft_model);
     auto tokenizer = ov::genai::Tokenizer(models_path, tokenizer_properties);
     auto generation_config = utils::from_config_json_if_exists(models_path);
 
@@ -69,16 +69,16 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline( const std::filesystem::p
     if (is_prompt_lookup_enabled) {
         OPENVINO_ASSERT(draft_model_desr.model == nullptr, "Speculative decoding and prompt lookup decoding are mutually exclusive");
         OPENVINO_ASSERT(embedder == nullptr, "Prompt lookup decoding is not supported for models with embeddings");
-        m_impl = std::make_shared<PromptLookupImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<PromptLookupImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     } else if (draft_model_desr.model != nullptr) {
         OPENVINO_ASSERT(embedder == nullptr, "Speculative decoding is not supported for models with embeddings");
-        auto main_model_descr = ov::genai::ModelDesc(model, tokenizer, device, properties_without_draft_model, scheduler_config, generation_config);
+        auto main_model_descr = ov::genai::ModelDesc(model, tokenizer, device, properties_without_draft_model_without_gguf, scheduler_config, generation_config);
         m_impl = std::make_shared<SpeculativeDecodingImpl>(main_model_descr, draft_model_desr);
     } else if (embedder) {
-        m_impl = std::make_shared<ContinuousBatchingImpl>(model, embedder, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<ContinuousBatchingImpl>(model, embedder, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     }
     else {
-        m_impl = std::make_shared<ContinuousBatchingImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<ContinuousBatchingImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     }
 
     m_impl->m_load_time_ms = get_load_time(start_time);
@@ -91,31 +91,32 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
     const std::string& device,
     const ov::AnyMap& properties) {
     auto start_time = std::chrono::steady_clock::now();
-
     auto properties_without_draft_model = properties;
     auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
     auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
 
     auto model = utils::read_model(models_path, properties_without_draft_model);
+    auto [properties_without_draft_model_without_gguf, enable_save_ov_model] = utils::extract_gguf_properties(properties_without_draft_model);
+
     auto generation_config = utils::from_config_json_if_exists(models_path);
 
     std::shared_ptr<InputsEmbedder> embedder;
     if (std::filesystem::exists(models_path / "openvino_text_embeddings_model.xml")) {
-        embedder = std::make_shared<InputsEmbedder>(models_path, device, properties_without_draft_model);
+        embedder = std::make_shared<InputsEmbedder>(models_path, device, properties_without_draft_model_without_gguf);
     }
 
     if (is_prompt_lookup_enabled) {
         OPENVINO_ASSERT(draft_model_desr.model == nullptr, "Speculative decoding and prompt lookup decoding are mutually exclusive");
         OPENVINO_ASSERT(embedder == nullptr, "Prompt lookup decoding is not supported for models with embeddings");
-        m_impl = std::make_shared<PromptLookupImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<PromptLookupImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     } else if (draft_model_desr.model != nullptr) {
         OPENVINO_ASSERT(embedder == nullptr, "Speculative decoding is not supported for models with embeddings");
-        auto main_model_descr = ov::genai::ModelDesc(model, tokenizer, device, properties_without_draft_model, scheduler_config, generation_config);
+        auto main_model_descr = ov::genai::ModelDesc(model, tokenizer, device, properties_without_draft_model_without_gguf, scheduler_config, generation_config);
         m_impl = std::make_shared<SpeculativeDecodingImpl>(main_model_descr, draft_model_desr);
     } else if (embedder) {
-        m_impl = std::make_shared<ContinuousBatchingImpl>(model, embedder, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<ContinuousBatchingImpl>(model, embedder, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     } else {
-        m_impl = std::make_shared<ContinuousBatchingImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model, generation_config);
+        m_impl = std::make_shared<ContinuousBatchingImpl>(model, tokenizer, scheduler_config, device, properties_without_draft_model_without_gguf, generation_config);
     }
 
     m_impl->m_load_time_ms = get_load_time(start_time);
diff --git a/src/cpp/src/gguf_utils/gguf_modeling.cpp b/src/cpp/src/gguf_utils/gguf_modeling.cpp
@@ -14,7 +14,7 @@
 
 #include "gguf_utils/building_blocks.hpp"
 #include "gguf_utils/gguf_modeling.hpp"
-
+#include "utils.hpp"
 
 using namespace ov;
 using namespace ov::op::v13;
@@ -152,24 +152,38 @@ std::shared_ptr<ov::Model> create_language_model(
 
 } // namespace
 
-std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path) {
+std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path, const bool enable_save_ov_model) {
     auto start_time = std::chrono::high_resolution_clock::now();
-    std::cout << "Loading and unpacking model from: " << model_path << std::endl;
+    std::stringstream ss;
+    ss << "Loading and unpacking model from: " << model_path;
+    ov::genai::utils::print_gguf_debug_info(ss.str());
     auto [config, consts, qtypes] = load_gguf(model_path);
     auto load_finish_time = std::chrono::high_resolution_clock::now();
-    std::cout << "Loading and unpacking model done. Time: " << std::chrono::duration_cast<std::chrono::milliseconds>(load_finish_time - start_time).count() << "ms" << std::endl;
-    std::cout << "Start generating OV model..." << std::endl;
-    
-    std::shared_ptr<ov::Model> model;
 
+    ss.str("");
+    ss << "Loading and unpacking model done. Time: " << std::chrono::duration_cast<std::chrono::milliseconds>(load_finish_time - start_time).count() << "ms";
+    ov::genai::utils::print_gguf_debug_info(ss.str());
+
+    std::shared_ptr<ov::Model> model;
     const std::string model_arch = std::get<std::string>(config.at("architecture"));
+    ss.str("");
+    ss << "Start generating OpenVINO model...";
+    ov::genai::utils::print_gguf_debug_info(ss.str());
     if (!model_arch.compare("llama") || !model_arch.compare("qwen2") || !model_arch.compare("qwen3")) {
         model = create_language_model(config, consts, qtypes);
+        if (enable_save_ov_model){
+            std::filesystem::path gguf_model_path(model_path);
+            std::filesystem::path save_path = gguf_model_path.parent_path() / "openvino_model.xml";
+            ov::genai::utils::save_openvino_model(model, save_path.string(), true);
+        }
     } else {
         OPENVINO_THROW("Unsupported model architecture '", model_arch, "'");
     }
     auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(
         std::chrono::high_resolution_clock::now() - load_finish_time).count();
-    std::cout << "Model generation done. Time: " << duration << "ms" << std::endl;
+    ss.str("");
+    ss << "Model generation done. Time: " << duration << "ms";
+    ov::genai::utils::print_gguf_debug_info(ss.str());
+
     return model;
 }
diff --git a/src/cpp/src/gguf_utils/gguf_modeling.hpp b/src/cpp/src/gguf_utils/gguf_modeling.hpp
@@ -7,4 +7,4 @@
 
 #include "openvino/openvino.hpp"
 
-std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path);
+std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path, const bool enable_save_ov_model);
diff --git a/src/cpp/src/llm/pipeline_continuous_batching_adapter.hpp b/src/cpp/src/llm/pipeline_continuous_batching_adapter.hpp
@@ -58,7 +58,7 @@ class ContinuousBatchingAdapter final : public LLMPipelineImplBase {
         const SchedulerConfig& scheduler_config,
         const std::string& device,
         const ov::AnyMap& plugin_config
-    ): LLMPipelineImplBase{Tokenizer(models_path), GenerationConfig()} {
+    ): LLMPipelineImplBase{Tokenizer(models_path, plugin_config), GenerationConfig()} {
         auto mutable_plugin_config = plugin_config;
         mutable_plugin_config["sampler_num_threads"] = 1;
         m_impl = std::make_unique<ContinuousBatchingPipeline>(models_path, m_tokenizer, scheduler_config, device, mutable_plugin_config);
diff --git a/src/cpp/src/llm/pipeline_stateful.cpp b/src/cpp/src/llm/pipeline_stateful.cpp
@@ -65,7 +65,8 @@ StatefulLLMPipeline::StatefulLLMPipeline(
     if (!m_use_full_chat_history)
         m_kv_cache_state.seq_length_axis = kv_pos.seq_len;
 
-    auto filtered_properties = extract_adapters_from_properties(properties, &m_generation_config.adapters);
+    auto [filtered_properties_without_gguf, enable_save_ov_model] = utils::extract_gguf_properties(properties);
+    auto filtered_properties = extract_adapters_from_properties(filtered_properties_without_gguf, &m_generation_config.adapters);
     if (m_generation_config.adapters) {
         m_generation_config.adapters->set_tensor_name_prefix("base_model.model.");
         m_adapter_controller = AdapterController(model, *m_generation_config.adapters, device);   // TODO: Make the prefix name configurable
@@ -93,7 +94,7 @@ StatefulLLMPipeline::StatefulLLMPipeline(
     const std::filesystem::path& models_path,
     const std::string& device,
     const ov::AnyMap& plugin_config)
-    : StatefulLLMPipeline{models_path, Tokenizer(models_path), device, plugin_config} {}
+    : StatefulLLMPipeline{models_path, Tokenizer(models_path, plugin_config), device, plugin_config} {}
 
 DecodedResults StatefulLLMPipeline::generate(
     StringInputs inputs,
diff --git a/src/cpp/src/tokenizer/tokenizer.cpp b/src/cpp/src/tokenizer/tokenizer.cpp
@@ -247,7 +247,9 @@ class Tokenizer::TokenizerImpl {
 
         std::shared_ptr<ov::Model> ov_tokenizer = nullptr;
         std::shared_ptr<ov::Model> ov_detokenizer = nullptr;
-
+        auto [filtered_properties, enable_save_ov_model] = utils::extract_gguf_properties(properties);
+        // Pass no addtional properties to tokenizer/detokenizer models since it was not used by default
+        filtered_properties = {};
         if (is_gguf_model(models_path)) {
             std::map<std::string, GGUFMetaData> tokenizer_config{};
             const char* ov_tokenizer_path = getenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME);
@@ -272,15 +274,33 @@ class Tokenizer::TokenizerImpl {
                 m_chat_template = patch_gguf_chat_template(m_chat_template);
             }
 
-            setup_tokenizer(std::make_pair(ov_tokenizer, ov_detokenizer), properties);
+            if (enable_save_ov_model){
+                std::filesystem::path gguf_model_path(models_path);
+                std::filesystem::path save_ov_tokenizer_path = gguf_model_path.parent_path() / "openvino_tokenizer.xml";
+                std::filesystem::path save_ov_detokenizer_path = gguf_model_path.parent_path() / "openvino_detokenizer.xml";
+                ov_tokenizer->set_rt_info(m_pad_token_id, "pad_token_id");
+                ov_tokenizer->set_rt_info(m_bos_token_id, "bos_token_id");
+                ov_tokenizer->set_rt_info(m_eos_token_id, "eos_token_id");
+                ov_tokenizer->set_rt_info(m_chat_template, "chat_template");
+
+                ov_detokenizer->set_rt_info(m_pad_token_id, "pad_token_id");
+                ov_detokenizer->set_rt_info(m_bos_token_id, "bos_token_id");
+                ov_detokenizer->set_rt_info(m_eos_token_id, "eos_token_id");
+                ov_detokenizer->set_rt_info(m_chat_template, "chat_template");
+
+                ov::genai::utils::save_openvino_model(ov_tokenizer, save_ov_tokenizer_path.string(), false);
+                ov::genai::utils::save_openvino_model(ov_detokenizer, save_ov_detokenizer_path.string(), false);
+            }
+
+            setup_tokenizer(std::make_pair(ov_tokenizer, ov_detokenizer), filtered_properties);
             return;
         }
         if (std::filesystem::exists(models_path / "openvino_tokenizer.xml")) {
-            ov_tokenizer = core.read_model(models_path / "openvino_tokenizer.xml", {}, properties);
+            ov_tokenizer = core.read_model(models_path / "openvino_tokenizer.xml", {}, filtered_properties);
         }
 
         if (std::filesystem::exists(models_path / "openvino_detokenizer.xml")) {
-            ov_detokenizer = core.read_model(models_path / "openvino_detokenizer.xml", {}, properties);
+            ov_detokenizer = core.read_model(models_path / "openvino_detokenizer.xml", {}, filtered_properties);
         }
 
         read_config(models_path);
@@ -290,7 +310,7 @@ class Tokenizer::TokenizerImpl {
         parse_if_exists(models_path / "tokenizer_config.json", m_chat_template);
         parse_if_exists(models_path / "processor_config.json", m_chat_template);
         parse_if_exists(models_path / "chat_template.json", m_chat_template);
-        setup_tokenizer(std::make_pair(ov_tokenizer, ov_detokenizer), properties);
+        setup_tokenizer(std::make_pair(ov_tokenizer, ov_detokenizer), filtered_properties);
     }
 
     void setup_tokenizer(const std::pair<std::shared_ptr<ov::Model>, std::shared_ptr<ov::Model>>& models, const ov::AnyMap& properties) {
diff --git a/src/cpp/src/utils.cpp b/src/cpp/src/utils.cpp
@@ -301,17 +301,45 @@ ov::Core singleton_core() {
 
 
 namespace {
-
 bool is_gguf_model(const std::filesystem::path& file_path) {
     return file_path.extension() == ".gguf";
 }
 
 } // namespace
 
-std::shared_ptr<ov::Model> read_model(const std::filesystem::path& model_dir,  const ov::AnyMap& config) {
+std::pair<ov::AnyMap, bool> extract_gguf_properties(const ov::AnyMap& external_properties) {
+    bool enable_save_ov_model = false;
+    ov::AnyMap properties = external_properties;
+
+    auto it = properties.find(ov::genai::enable_save_ov_model.name());
+    if (it != properties.end()) {
+        enable_save_ov_model = it->second.as<bool>();
+        properties.erase(it);
+    }
+
+    return {properties, enable_save_ov_model};
+}
+
+void save_openvino_model(const std::shared_ptr<ov::Model>& model, const std::string& save_path, bool compress_to_fp16) {
+    try {
+        auto serialize_start_time = std::chrono::high_resolution_clock::now();
+        ov::save_model(model, save_path, compress_to_fp16);
+        auto serialize_finish_time = std::chrono::high_resolution_clock::now();
+        auto serialize_duration = std::chrono::duration_cast<std::chrono::milliseconds>(serialize_finish_time - serialize_start_time).count();
+        std::stringstream ss;
+        ss << "Save generated OpenVINO model to: " << save_path << " done. Time: " << serialize_duration << " ms";
+        ov::genai::utils::print_gguf_debug_info(ss.str());
+    }
+    catch (const ov::Exception& e) {
+        OPENVINO_THROW("Exception during model serialization ", e.what(), ", user can disable it by setting 'ov::genai::enable_save_ov_model' property to false");
+    }
+}
+
+std::shared_ptr<ov::Model> read_model(const std::filesystem::path& model_dir,  const ov::AnyMap& properties) {
+    auto [filtered_properties, enable_save_ov_model] = extract_gguf_properties(properties);
     if (is_gguf_model(model_dir)) {
 #ifdef ENABLE_GGUF
-        return create_from_gguf(model_dir.string());
+        return create_from_gguf(model_dir.string(), enable_save_ov_model);
 #else
         OPENVINO_ASSERT("GGUF support is switched off. Please, recompile with 'cmake -DENABLE_GGUF=ON'");
 #endif
@@ -326,7 +354,7 @@ std::shared_ptr<ov::Model> read_model(const std::filesystem::path& model_dir,  c
             OPENVINO_THROW("Could not find a model in the directory '", model_dir, "'");
         }
 
-        return singleton_core().read_model(model_path, {}, config);
+        return singleton_core().read_model(model_path, {}, filtered_properties);
     }
 }
 
@@ -467,6 +495,13 @@ void print_compiled_model_properties(ov::CompiledModel& compiled_Model, const ch
     }
 }
 
+void print_gguf_debug_info(const std::string &debug_info) {
+    if (!env_setup_for_print_debug_info()) {
+        return;
+    }
+    std::cout << "[GGUF Reader]: " << debug_info << std::endl;
+}
+
 std::pair<ov::CompiledModel, KVDesc>
 compile_decoder_for_npu(const std::shared_ptr<ov::Model>& model,
                         const ov::AnyMap& config,
diff --git a/src/cpp/src/utils.hpp b/src/cpp/src/utils.hpp
@@ -102,6 +102,8 @@ void apply_gather_before_matmul_transformation(std::shared_ptr<ov::Model> model)
 
 ov::Core singleton_core();
 
+std::pair<ov::AnyMap, bool> extract_gguf_properties(const ov::AnyMap& external_properties);
+
 std::shared_ptr<ov::Model> read_model(const std::filesystem::path& model_dir,  const ov::AnyMap& config);
 
 void release_core_plugin(const std::string& device);
@@ -147,6 +149,8 @@ bool env_setup_for_print_debug_info();
 
 void print_compiled_model_properties(ov::CompiledModel& compiled_Model, const char* model_title);
 
+void print_gguf_debug_info(const std::string& debug_info);
+
 struct KVDesc {
     uint32_t max_prompt_len;
     uint32_t min_response_len;
@@ -246,6 +250,8 @@ bool explicitly_requires_paged_attention(const ov::AnyMap& properties);
 
 std::pair<ov::AnyMap, std::string> extract_attention_backend(const ov::AnyMap& external_properties);
 
+void save_openvino_model(const std::shared_ptr<ov::Model>& model, const std::string& save_path, bool compress_to_fp16);
+
 }  // namespace utils
 }  // namespace genai
 }  // namespace ov
diff --git a/src/docs/DEBUG_LOG.md b/src/docs/DEBUG_LOG.md
@@ -42,7 +42,7 @@ the properties of the compiled model are printed as follows:
     CPU: Intel(R) Xeon(R) Platinum 8468
 ```
 
-When Speculative Decoding ot Prompt Lookup pipeline is executed, performance metrics will be also printed.
+When Speculative Decoding or Prompt Lookup pipeline is executed, performance metrics will be also printed.
 
 For example:
 
@@ -64,4 +64,16 @@ Generated tokens: 100
 Accepted token rate, %: 51
 ===============================
 Request_id: 0 ||| 40 0 40 20 0 0 40 40 0 20 20 20 0 40 0 0 20 80 0 80 20 0 0 0 40 80 0 40 60 40 80 0 0 0 0 40 20 20 0 40 20 40 0 20 0 0 0
-```
+```
+
+
+When GGUF model passed to LLMPipeline, the details debug info will be also printed.
+
+For example:
+```
+[GGUF Reader]: Loading and unpacking model from: gguf_models/qwen2.5-0.5b-instruct-q4_0.gguf
+[GGUF Reader]: Loading and unpacking model done. Time: 196ms
+[GGUF Reader]: Start generating OpenVINO model...
+[GGUF Reader]: Save generated OpenVINO model to: gguf_models/openvino_model.xml done. Time: 466 ms
+[GGUF Reader]: Model generation done. Time: 757ms
+```
diff --git a/tests/python_tests/test_llm_pipeline.py b/tests/python_tests/test_llm_pipeline.py
diff --git a/tests/python_tests/utils/hugging_face.py b/tests/python_tests/utils/hugging_face.py
diff --git a/tests/python_tests/utils/ov_genai_pipelines.py b/tests/python_tests/utils/ov_genai_pipelines.py

Original file line number	Diff line number	Diff line change
`@@ -7,4 +7,4 @@`
`7`	`7`
`8`	`8`	`#include "openvino/openvino.hpp"`
`9`	`9`
`10`		`-std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path);`
	`10`	`+std::shared_ptr<ov::Model> create_from_gguf(const std::string& model_path, const bool enable_save_ov_model);`