-
Notifications
You must be signed in to change notification settings - Fork 2k
Labels
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
AbstractModel#vectorization (link) method return a double(8 bytes) array vector
but elasticsearch sink(link) vector fields to a float(4 bytes) array, it cause vector dimensions double and contains a half zeros

SeaTunnel Version
dev/2.3.12-SNAPSHOT
SeaTunnel Config
env {
parallelism = 1
job.mode = "BATCH"
}
source {
S3File {
path = "/seatunnel/test_csv_data.csv"
bucket = "s3a://ltchen"
fs.s3a.endpoint="tos-s3-cn-beijing.volces.com"
fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
file_format_type = "csv"
access_key="xxx"
secret_key="xxx",
csv_use_header_line = true,
field_delimiter = ","
schema={
fields {
code = int
data = string
success = boolean
}
}
}
}
transform {
Embedding {
model_provider = "DOUBAO"
model = "doubao-embedding-text-240715"
api_key = "xxx"
secret_key = "xxx"
vectorization_fields {
data_vector = data
},
custom_config={
custom_response_parse = "$.data[*].embedding"
custom_request_headers = {
"Content-Type"= "application/json"
"Authorization"= "Bearer ${api_key}"
}
custom_request_body ={
model = "${model}"
input = ["${input}"]
}
}
}
}
sink {
Elasticsearch {
hosts = ["http://127.0.0.1:9200"]
index = "seatunnel-ltchen-embedding"
schema_save_mode="RECREATE_SCHEMA"
data_save_mode="APPEND_DATA"
vectorization_fields = ["data_vector"]
}
}
Running Command
bin/seatunnel.sh -c jobs/s32es_embedding.conf -m local
Error Exception
no exception
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct