Skip to content

Commit 7002c56

Browse files
authored
[Fix] To change the model zip file name from hugging face org id to a custom prefix when upload_prefix provided. (#413)
* [Feature] Add a workflow parameter that model uploader can specific a customize prefix. Signed-off-by: conggguan <[email protected]> * [Fix] To change the model zip file name from hugging face org id to a custom prefix when upload_prefix provided. Signed-off-by: conggguan <[email protected]> * [Fix] Revert the redundant history. Signed-off-by: conggguan <[email protected]> * [Add] add a changelog item. Signed-off-by: conggguan <[email protected]> --------- Signed-off-by: conggguan <[email protected]>
1 parent f41c2ef commit 7002c56

File tree

9 files changed

+46
-41
lines changed

9 files changed

+46
-41
lines changed

.ci/run-repository.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
7272
echo -e "\033[34;1mINFO:\033[0m TRACING_FORMAT: ${TRACING_FORMAT}\033[0m"
7373
echo -e "\033[34;1mINFO:\033[0m EMBEDDING_DIMENSION: ${EMBEDDING_DIMENSION:-N/A}\033[0m"
7474
echo -e "\033[34;1mINFO:\033[0m POOLING_MODE: ${POOLING_MODE:-N/A}\033[0m"
75+
echo -e "\033[34;1mINFO:\033[0m UPLOAD_PREFIX: ${UPLOAD_PREFIX:-N/A}\033[0m"
7576
echo -e "\033[34;1mINFO:\033[0m MODEL_DESCRIPTION: ${MODEL_DESCRIPTION:-N/A}\033[0m"
7677

7778
if [[ "$TASK_TYPE" == "SentenceTransformerTrace" ]]; then
@@ -95,7 +96,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
9596
--env "TEST_TYPE=server" \
9697
--name opensearch-py-ml-trace-runner \
9798
opensearch-project/opensearch-py-ml \
98-
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}
99+
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -up ${UPLOAD_PREFIX} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}
99100

100101
# To upload a model, we need the model artifact, description, license files into local path
101102
# trace_output should include description and license file.

.github/workflows/model_uploader.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,8 @@ jobs:
206206
echo "MODEL_VERSION=${{ github.event.inputs.model_version }}" >> $GITHUB_ENV
207207
echo "TRACING_FORMAT=${{ github.event.inputs.tracing_format }}" >> $GITHUB_ENV
208208
echo "EMBEDDING_DIMENSION=${{ github.event.inputs.embedding_dimension }}" >> $GITHUB_ENV
209-
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
209+
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
210+
echo "UPLOAD_PREFIX=${{ github.event.inputs.upload_prefix }}" >> $GITHUB_ENV
210211
echo "MODEL_DESCRIPTION=${{ github.event.inputs.model_description }}" >> $GITHUB_ENV
211212
- name: Autotracing ${{ matrix.cluster }} secured=${{ matrix.secured }} version=${{matrix.entry.opensearch_version}}
212213
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} ${{github.event.inputs.model_type}}Trace"

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
4646
- updating listing file with three v2 sparse model - by @dhrubo-os ([#412](https://github.com/opensearch-project/opensearch-py-ml/pull/412))
4747

4848
### Fixed
49+
- Fix the wrong final zip file name in model_uploader workflow, now will name it by the upload_prefix alse.([#413](https://github.com/opensearch-project/opensearch-py-ml/pull/413/files))
4950
- Fix the wrong input parameter for model_uploader's base_download_path in jekins trigger.([#402](https://github.com/opensearch-project/opensearch-py-ml/pull/402))
5051
- Enable make_model_config_json to add model description to model config file by @thanawan-atc in ([#203](https://github.com/opensearch-project/opensearch-py-ml/pull/203))
5152
- Correct demo_ml_commons_integration.ipynb by @thanawan-atc in ([#208](https://github.com/opensearch-project/opensearch-py-ml/pull/208))

opensearch_py_ml/ml_models/sparse_encoding_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@ def save_as_pt(
8181
add_apache_license: bool = True,
8282
) -> str:
8383
"""
84-
Download sentence transformer model directly from huggingface, convert model to torch script format,
85-
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
84+
Download sparse encoding model directly from huggingface, convert model to torch script format,
85+
zip the model file and its tokenizer.json file to prepare to upload to the OpenSearch cluster
8686
8787
:param sentences:
8888
Required, for example sentences = ['today is sunny']

utils/model_uploader/autotracing_utils.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,7 @@ def prepare_files_for_uploading(
235235
model_format: str,
236236
src_model_path: str,
237237
src_model_config_path: str,
238+
upload_prefix: str = None,
238239
) -> tuple[str, str]:
239240
"""
240241
Prepare files for uploading by storing them in UPLOAD_FOLDER_PATH
@@ -253,7 +254,11 @@ def prepare_files_for_uploading(
253254
(path to model config json file) in the UPLOAD_FOLDER_PATH
254255
:rtype: Tuple[str, str]
255256
"""
256-
model_type, model_name = model_id.split("/")
257+
model_type, model_name = (
258+
model_id.split("/")
259+
if upload_prefix is None
260+
else (upload_prefix, model_id.split("/")[-1])
261+
)
257262
model_format = model_format.lower()
258263
folder_to_delete = (
259264
TORCHSCRIPT_FOLDER_PATH if model_format == "torch_script" else ONNX_FOLDER_PATH

utils/model_uploader/model_autotracing.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ def main(
281281
embedding_dimension: Optional[int] = None,
282282
pooling_mode: Optional[str] = None,
283283
model_description: Optional[str] = None,
284+
upload_prefix: Optional[str] = None,
284285
) -> None:
285286
"""
286287
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
@@ -363,6 +364,7 @@ def main(
363364
TORCH_SCRIPT_FORMAT,
364365
torchscript_model_path,
365366
torchscript_model_config_path,
367+
upload_prefix,
366368
)
367369

368370
config_path_for_checking_description = torchscript_dst_model_config_path
@@ -425,6 +427,14 @@ def main(
425427
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
426428
help="Model format for auto-tracing",
427429
)
430+
parser.add_argument(
431+
"-up",
432+
"--upload_prefix",
433+
type=str,
434+
nargs="?",
435+
default=None,
436+
help="Model customize path prefix for upload",
437+
)
428438
parser.add_argument(
429439
"-ed",
430440
"--embedding_dimension",
@@ -462,4 +472,5 @@ def main(
462472
args.embedding_dimension,
463473
args.pooling_mode,
464474
args.model_description,
475+
args.upload_prefix,
465476
)

utils/model_uploader/sparse_model_autotracing.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ def main(
186186
model_version: str,
187187
tracing_format: str,
188188
model_description: Optional[str] = None,
189+
upload_prefix: Optional[str] = None,
189190
) -> None:
190191
"""
191192
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
@@ -235,7 +236,10 @@ def main(
235236
torchscript_model_path,
236237
torchscript_model_config_path,
237238
) = trace_sparse_encoding_model(
238-
model_id, model_version, TORCH_SCRIPT_FORMAT, model_description=None
239+
model_id,
240+
model_version,
241+
TORCH_SCRIPT_FORMAT,
242+
model_description=model_description,
239243
)
240244

241245
torchscript_encoding_datas = register_and_deploy_sparse_encoding_model(
@@ -262,6 +266,7 @@ def main(
262266
TORCH_SCRIPT_FORMAT,
263267
torchscript_model_path,
264268
torchscript_model_config_path,
269+
upload_prefix,
265270
)
266271

267272
config_path_for_checking_description = torchscript_dst_model_config_path
@@ -273,7 +278,7 @@ def main(
273278
onnx_model_path,
274279
onnx_model_config_path,
275280
) = trace_sparse_encoding_model(
276-
model_id, model_version, ONNX_FORMAT, model_description=None
281+
model_id, model_version, ONNX_FORMAT, model_description=model_description
277282
)
278283

279284
onnx_embedding_datas = register_and_deploy_sparse_encoding_model(
@@ -325,6 +330,14 @@ def main(
325330
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
326331
help="Model format for auto-tracing",
327332
)
333+
parser.add_argument(
334+
"-up",
335+
"--upload_prefix",
336+
type=str,
337+
nargs="?",
338+
default=None,
339+
help="Model customize path prefix for upload",
340+
)
328341
parser.add_argument(
329342
"-md",
330343
"--model_description",
@@ -336,4 +349,10 @@ def main(
336349
)
337350
args = parser.parse_args()
338351

339-
main(args.model_id, args.model_version, args.tracing_format, args.model_description)
352+
main(
353+
args.model_id,
354+
args.model_version,
355+
args.tracing_format,
356+
args.model_description,
357+
args.upload_prefix,
358+
)

utils/model_uploader/upload_history/MODEL_UPLOAD_HISTORY.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,3 @@ The following table shows sentence transformer model upload history.
2121
|2023-09-13 18:03:32|@dhrubo-os|`sentence-transformers/distiluse-base-multilingual-cased-v1`|1.0.1|TORCH_SCRIPT|N/A|N/A|6178024517|
2222
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|ONNX|N/A|N/A|6568285400|
2323
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|TORCH_SCRIPT|N/A|N/A|6568285400|
24-
|2024-08-07 18:01:26|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10293890748|
25-
|2024-08-07 18:23:41|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini`|1.0.0|TORCH_SCRIPT|N/A|N/A|10294048787|
26-
|2024-08-08 09:40:44|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10295327692|

utils/model_uploader/upload_history/supported_models.json

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -48,35 +48,5 @@
4848
"Embedding Dimension": "N/A",
4949
"Pooling Mode": "N/A",
5050
"Workflow Run ID": "6568285400"
51-
},
52-
{
53-
"Model Uploader": "@dhrubo-os",
54-
"Upload Time": "2024-08-07 18:01:26",
55-
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill",
56-
"Model Version": "1.0.0",
57-
"Model Format": "TORCH_SCRIPT",
58-
"Embedding Dimension": "N/A",
59-
"Pooling Mode": "N/A",
60-
"Workflow Run ID": "10293890748"
61-
},
62-
{
63-
"Model Uploader": "@dhrubo-os",
64-
"Upload Time": "2024-08-07 18:23:41",
65-
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini",
66-
"Model Version": "1.0.0",
67-
"Model Format": "TORCH_SCRIPT",
68-
"Embedding Dimension": "N/A",
69-
"Pooling Mode": "N/A",
70-
"Workflow Run ID": "10294048787"
71-
},
72-
{
73-
"Model Uploader": "@dhrubo-os",
74-
"Upload Time": "2024-08-08 09:40:44",
75-
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
76-
"Model Version": "1.0.0",
77-
"Model Format": "TORCH_SCRIPT",
78-
"Embedding Dimension": "N/A",
79-
"Pooling Mode": "N/A",
80-
"Workflow Run ID": "10295327692"
8151
}
8252
]

0 commit comments

Comments
 (0)