Skip to content

[Bug] [connector-elasticsearch] vector field type convert issue cause dimensions double #9611

@loupipalien

Description

@loupipalien

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

AbstractModel#vectorization (link) method return a double(8 bytes) array vector
but elasticsearch sink(link) vector fields to a float(4 bytes) array, it cause vector dimensions double and contains a half zeros

Image

SeaTunnel Version

dev/2.3.12-SNAPSHOT

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  S3File {
    path = "/seatunnel/test_csv_data.csv"
    bucket = "s3a://ltchen"
    fs.s3a.endpoint="tos-s3-cn-beijing.volces.com"
    fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    file_format_type = "csv"
    access_key="xxx"
    secret_key="xxx",
    csv_use_header_line = true,
    field_delimiter = ","
    schema={
        fields {
            code = int
            data = string
            success = boolean
        }
    }
  }
}

transform {
  Embedding {
    model_provider = "DOUBAO"
    model = "doubao-embedding-text-240715"
    api_key = "xxx"
    secret_key = "xxx"
    vectorization_fields {
        data_vector = data
    },
    custom_config={
      custom_response_parse = "$.data[*].embedding"
      custom_request_headers = {
          "Content-Type"= "application/json"
          "Authorization"= "Bearer ${api_key}"
      }
      custom_request_body ={
          model = "${model}"
          input = ["${input}"]
      }
    }
  }
}

sink {
    Elasticsearch {
        hosts = ["http://127.0.0.1:9200"]
        index = "seatunnel-ltchen-embedding"
        schema_save_mode="RECREATE_SCHEMA"
        data_save_mode="APPEND_DATA"
        vectorization_fields = ["data_vector"]
    }
}

Running Command

bin/seatunnel.sh -c jobs/s32es_embedding.conf -m local

Error Exception

no exception

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions