Skip to content

Bulk index to same shards if routing is null #178

@adrian-arapiles

Description

@adrian-arapiles

Hi,

I found a weird behavior on bulk request. When you have and index with for example 3 shards, all documents go to same shard.
If you put to index with 6 shards, all documents go to 2 shards.

When I put a custom routing on bulk request, documents are mixed on all shards. I think is an issue/bug with routing on bulk requests but I don't know what it could be.
I tried to reproduce without use client from kibana console but I can't reproduce the same behavior, so I think is client issue.

The code is:

List<BulkOperation> bulkOperations = new ArrayList<>();
for (MultimediaDocument document : documents) {
    BulkOperation operation = BulkOperation.of(builder -> builder.index(builder1 -> builder1.index("multimedia-phash")
            .id(document.getDocumentId())
            .document(document)));
    bulkOperations.add(operation);
}
try {
    BulkResponse response = client.bulk(builder -> builder
            .operations(bulkOperations)
            .timeout(Time.of(builderTime -> builderTime.time("5m")))
    );
} catch (IOException e) {
    e.printStackTrace();
}

And the elasticsearch _cat/shards/multimedia-phash output:

index           shard prirep      state        docs    store        node
multimedia-phash 5        p        STARTED      0   225b          node3
multimedia-phash 3        p        STARTED      0   225b          node1
multimedia-phash 1        p        STARTED 180037 85.5mb          node2
multimedia-phash 4        p        STARTED      0   225b          node2
multimedia-phash 2        p        STARTED      0   225b          node3
multimedia-phash 0        p        STARTED 178963   85mb          node1

The code with workaround is:

Random r = new Random();
List<BulkOperation> bulkOperations = new ArrayList<>();
for (MultimediaDocument document : documents) {
    BulkOperation operation = BulkOperation.of(builder -> builder.index(builder1 -> builder1.index("multimedia-phash")
            .id(document.getDocumentId())
            .document(document)));
    bulkOperations.add(operation);
}
try {
    BulkResponse response = client.bulk(builder -> builder
            .operations(bulkOperations)
            .routing(String.valueOf(r.nextInt(1000)))
            .timeout(Time.of(builderTime -> builderTime.time("5m")))
    );
} catch (IOException e) {
    e.printStackTrace();
}

And the elasticsearch _cat/shards/multimedia-phash output:

index            shard prirep state    docs  store     node
multimedia-phash 5     p      STARTED 23000   11mb  node3
multimedia-phash 3     p      STARTED 33000 12.6mb  node1
multimedia-phash 1     p      STARTED 28000 10.6mb  node2
multimedia-phash 4     p      STARTED 30000 11.7mb  node2
multimedia-phash 2     p      STARTED 28000 10.6mb  node3
multimedia-phash 0     p      STARTED 22000  8.4mb  node3

Versions:
Elasticsearch: 8.0.0
co.elastic.clients.elasticsearch-java: 8.0.0

If you need any more info, please ask me.

Thanks in advance,
Adrian.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions