Skip to content

[Bug]open bge向量化模型向量化文档失败 #4296

@sometimecry

Description

@sometimecry

Contact Information

[email protected]

MaxKB Version

v2

Problem Description

会导致向量化失败
向量段落: 0198acc0-b3ad-7610-8b59-94ea606658ce 错误:Error code: 413 - {'message': 'Input validation error: inputs must have less than 512 tokens. Given: 2818', 'code': 413, 'type': 'Validation'} Traceback (most recent call last):

File "/opt/maxkb-app/apps/common/event/listener_manage.py", line 150, in embedding_by_paragraph

VectorStore.get_embedding_vector().batch_save(data_list, embedding_model, is_the_task_interrupted)

File "/opt/maxkb-app/apps/knowledge/vector/base_vector.py", line 102, in batch_save

self._batch_save(child_array, embedding, is_the_task_interrupted)

File "/opt/maxkb-app/apps/knowledge/vector/pg_vector.py", line 66, in _batch_save

embeddings = embedding.embed_documents(texts)

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/maxkb-app/apps/models_provider/impl/openai_model_provider/model/embedding.py", line 38, in embed_documents

res = self.client.create(input=texts, model=self.model_name, encoding_format="float")

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/py3/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create

return self._post(

       ^^^^^^^^^^^

File "/opt/py3/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post

return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/py3/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request

raise self._make_status_error_from_response(err.response) from None

openai.APIStatusError: Error code: 413 - {'message': 'Input validation error: inputs must have less than 512 tokens. Given: 2818', 'code': 413, 'type': 'Validation'}

Steps to Reproduce

使用本地的open ai bge1.5向量化模型

The expected correct result

No response

Related log output

Additional Information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions