Skip to content

Commit e344aef

Browse files
committed
refactor: change clean_vector_db default value to False
1 parent 20da166 commit e344aef

File tree

4 files changed

+42
-47
lines changed

4 files changed

+42
-47
lines changed

demos/kfp/docling/asr-conversion/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ The pipeline enables rich RAG applications that can answer questions about spoke
106106
- `embed_model_id`: Embedding model to use (default: `ibm-granite/granite-embedding-125m-english`)
107107
- `max_tokens`: Maximum tokens per chunk (default: 512)
108108
- `use_gpu`: Whether to use GPU for processing (default: true)
109-
- `clean_vector_db`: if True, the vector database will be cleared during running the pipeline
109+
- `clean_vector_db`: The vector database will be cleared during running the pipeline (default: false)
110110

111111

112112
### Creating the Pipeline for running on GPU node

demos/kfp/docling/asr-conversion/docling_asr_convert_pipeline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@ def docling_convert_pipeline(
489489
embed_model_id: str = "ibm-granite/granite-embedding-125m-english",
490490
max_tokens: int = 512,
491491
use_gpu: bool = True, # use only if you have additional gpu worker
492-
clean_vector_db: bool = True, # if True, the vector database will be cleared during running the pipeline
492+
clean_vector_db: bool = False, # if True, the vector database will be cleared during running the pipeline
493493
) -> None:
494494
"""
495495
Converts audio recordings to text using Docling ASR and generates embeddings

demos/kfp/docling/asr-conversion/docling_asr_convert_pipeline_compiled.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# Inputs:
55
# audio_filenames: str [Default: 'RAG_use_cases.wav, RAG_customers.wav, RAG_benefits.m4a, RAG_vs_Regular_LLM_Output.m4a']
66
# base_url: str [Default: 'https://raw.githubusercontent.com/opendatahub-io/rag/main/demos/testing-data/audio-speech']
7-
# clean_vector_db: bool [Default: True]
7+
# clean_vector_db: bool [Default: False]
88
# embed_model_id: str [Default: 'ibm-granite/granite-embedding-125m-english']
99
# max_tokens: int [Default: 512.0]
1010
# num_workers: int [Default: 1.0]
@@ -2070,7 +2070,7 @@ root:
20702070
isOptional: true
20712071
parameterType: STRING
20722072
clean_vector_db:
2073-
defaultValue: true
2073+
defaultValue: false
20742074
description: boolean to enable/disable clearing the vector database before
20752075
running the pipeline
20762076
isOptional: true

0 commit comments

Comments
 (0)