Skip to content

Commit e0de0a1

Browse files
Fix ingestion bugs (#266)
- Random sample workers were using `16CPU` and `1Gi` which is problematic for most datasets. We need at least a couple of GB of memory to run this. For CPU we don't need 16 cores. - `access_credentials_name` was not passed correctly when `embeddings_generation_driver_mode == Mode.BATCH`
1 parent 3eb4996 commit e0de0a1

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

apis/python/src/tiledb/vector_search/ingestion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2016,7 +2016,7 @@ def create_ingestion_dag(
20162016
config=config,
20172017
verbose=verbose,
20182018
name="read-random-sample-" + str(idx),
2019-
resources={"cpu": str(threads), "memory": "1Gi"},
2019+
resources={"cpu": "2", "memory": "6Gi"},
20202020
image_name=DEFAULT_IMG_NAME,
20212021
)
20222022
)

apis/python/src/tiledb/vector_search/object_api/embeddings_ingestion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ def submit_local(d, func, *args, **kwargs):
443443
):
444444
submit = d.submit
445445
driver_access_credentials_name_kwargs = {}
446-
if embeddings_generation_mode == Mode.BATCH:
446+
if embeddings_generation_driver_mode == Mode.BATCH:
447447
driver_access_credentials_name_kwargs[
448448
"access_credentials_name"
449449
] = worker_access_credentials_name

0 commit comments

Comments
 (0)