Improve serialization of translog operations #134243

Tim-Brooks · 2025-09-05T19:00:01Z

Currently translog operations are serialized in a way that is relatively
inefficient. This commit improves performance by re-ordering the
operations to allow most of the header to be serialized without bounds
checks. Additionally, instead of incrementally calculating the checksum
we do a single pass at the end. Finally, we no longer copy the source
twice. Instead serialize it directly into the translog writer.

…vements

… recycler_byte_improvements

…vements

… recycler_byte_improvements

henningandersen

Looks good, left more comments.

server/src/main/java/org/elasticsearch/index/translog/TranslogHeaderWriter.java

henningandersen · 2025-10-03T14:57:00Z

server/src/main/java/org/elasticsearch/index/translog/TranslogHeaderWriter.java

+
+            int variableLengthSize = (int) (buffer.position() - variableLengthStart);
+            int sizeOfOperation = FIXED_INDEX_HEADER_SIZE - Integer.BYTES + variableLengthSize + sourceLength + Integer.BYTES;
+            ByteUtils.writeIntBE(sizeOfOperation, bytes, off);


Should we assert that it is identical to a slow header, perhaps rarely?? You probably have tests for it though.

I do have tests in TranslogHeaderWriterTests which runs 30 checks each run randomizing to one page and cross-page.

henningandersen · 2025-10-03T15:03:48Z

server/src/main/java/org/elasticsearch/index/translog/TranslogHeaderWriter.java

+            buffer.writeString(indexOperation.id());
+
+            int sourceLength = indexOperation.source().length();
+            buffer.writeVInt(indexOperation.source().length());


I need to check but am guessing we may not need this. I am ok to keep it though, but think a comment on why we have it is appropriate.

Hmm. I'm not sure what you mean. We serialize it here to calculate the checksum. I actually don't love using a vInt here. It's more historical from how we serialize BytesReferences in stream outputs.

Fixed width values would be nicer but we already have variable length values for id and routing so I did not feel like it was worth it to change this.

henningandersen · 2025-10-03T15:44:37Z

server/src/main/java/org/elasticsearch/index/translog/Translog.java

+                out.writeLong(seqNo);
+                out.writeLong(primaryTerm);
+            } else {
+                out.writeLong(version);


Can this and writeSlowIndexHeader share code?

I combined these.

server/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java

server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

…ments

This reverts commit eccc376.

henningandersen

LGTM.

Can we run a relevant nightly test using the new buildkite ability to run nightly tests on PRs? Both to verify performance and ensure it does not expose anything.

server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

…ments

Tim-Brooks · 2025-10-07T20:58:58Z

Buildkite benchmark this with tsdb-metricsgen-270m please

Tim-Brooks · 2025-10-08T01:05:29Z

Buildkite benchmark this with http_logs-1n-1g please

Tim-Brooks · 2025-10-08T02:05:53Z

When I ran the http logs it deleted the tsdb metrics gen comment. I also ran that:

⏳ Build in-progress

Buildkite Build
Commit: 8c29c56
Baseline: f474779 (env ID 94ec114f-ce84-429a-80b5-cead24c90144)
Contender: 8c29c56 (env ID 785a0436-2ac1-405b-bc2f-f93b3cadc66e)
This build attempts two tsdb-metricsgen-270m benchmarks to evaluate performance impact of this PR. To estimate benchmark completion time inspect previous nightly runs here.

Result: https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/r/s/lwXvc

Tim-Brooks · 2025-10-08T02:20:10Z

💚 Build Succeeded

Buildkite Build
Commit: 8c29c56
Baseline: f474779 (env ID 137b5f68-e411-4537-a594-bb9eaf01058f)
Contender: 8c29c56 (env ID 120fa40c-6781-4cac-a2de-d8510e4cb70b)
Benchmark results

This build ran two http_logs-1n-1g benchmarks to evaluate performance impact of this PR.

History

💚 Build #28 succeeded 8c29c56

Tim-Brooks · 2025-10-08T02:20:33Z

Buildkite benchmark this with nyc_taxis-1n-8g please

Tim-Brooks · 2025-10-08T03:36:23Z

NYC Taxis

💚 Build Succeeded

Buildkite Build
Commit: 8c29c56
Baseline: f474779 (env ID 061aafaa-7ec3-4a98-9c24-7843529f9c2f)
Contender: 8c29c56 (env ID 1a5cfada-ed4e-47a9-a3b2-500cbcd917ae)
Benchmark results

This build ran two nyc_taxis-1n-8g benchmarks to evaluate performance impact of this PR.

History

💚 Build #29 succeeded 8c29c56
💚 Build #28 succeeded 8c29c56

Tim-Brooks · 2025-10-08T03:54:29Z

TSDB Metricsgen p50
Vanilla - 77006 -> 78734 (2.2%)
Tweaked - 87086 doc/s -> 89415 doc/s (2.7%)

NYC Taxi
Less than 1% improvement but the CPUs are around 80% so it is not really saturated. In my own benchmark environment these as around 1-2% improvement with enough clients to saturate.

Http Logs are nearly identical between contender and main. CPUs pegged.

I'll run the standard logs overnight and as long as that looks fine I'll commit tomorrow morning for a full nightly run.

Tim-Brooks · 2025-10-08T03:55:29Z

Buildkite benchmark this with elastic-logs please

elasticmachine · 2025-10-08T03:57:32Z

💚 Build Succeeded

Buildkite Build
Commit: 8c29c56
Baseline: f474779 (env ID 198e8326-d35b-461d-abaf-6738bb5a4ed6)
Contender: 8c29c56 (env ID 28c2d5d9-f4e1-47a3-ad36-f960022c83a4)
Benchmark results

This build ran two elastic-logs benchmarks to evaluate performance impact of this PR.

History

💚 Build #30 succeeded 8c29c56
💚 Build #29 succeeded 8c29c56
💚 Build #28 succeeded 8c29c56

Tim-Brooks · 2025-10-08T14:36:27Z

Logs
33438 dos/ -> 34392 doc/s (2.85%)

Tim-Brooks added 3 commits August 26, 2025 10:32

Changes

a4dfd29

Merge remote-tracking branch 'upstream/main' into recycler_byte_impro…

f9d9127

…vements

Change

fdb7d31

Tim-Brooks added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v9.2.0 labels Sep 5, 2025

Tim-Brooks and others added 24 commits September 5, 2025 14:22

more

84b5a55

test

416e01b

[CI] Auto commit changes from spotless

7097b35

WIP

5e5bb52

WIP

57ee8b5

Merge remote-tracking branch 'upstream/main' into recycler_byte_impro…

d81a57f

…vements

Merge remote-tracking branch 'upstream/main' into recycler_byte_impro…

fbbb6d7

…vements

Change

4bdf98f

Merge remote-tracking branch 'origin/recycler_byte_improvements' into…

dbf4837

… recycler_byte_improvements

[CI] Update transport version definitions

a5d8b78

Change

c2d85a7

Merge remote-tracking branch 'upstream/main' into recycler_byte_impro…

dcdb373

…vements

Merge remote-tracking branch 'origin/recycler_byte_improvements' into…

d71d480

… recycler_byte_improvements

[CI] Auto commit changes from spotless

d737572

Change

550bc1f

Merge remote-tracking branch 'origin/recycler_byte_improvements' into…

524e7ef

… recycler_byte_improvements

[CI] Update transport version definitions

83dc34f

Change

9a33a1f

Merge remote-tracking branch 'origin/recycler_byte_improvements' into…

86b01bc

… recycler_byte_improvements

more

655b3a0

more

f507583

Change

a05a360

Change

fa733bf

Change

1a7384e

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Tim-Brooks requested a review from henningandersen October 3, 2025 11:22

henningandersen reviewed Oct 3, 2025

View reviewed changes

Tim-Brooks added 3 commits October 3, 2025 11:19

Merge remote-tracking branch 'origin/main' into recycler_byte_improve…

a2721af

…ments

Change

5aedd0c

Change

b6d6445

Tim-Brooks requested a review from henningandersen October 3, 2025 18:58

Tim-Brooks and others added 9 commits October 6, 2025 13:54

Merge remote-tracking branch 'origin/main' into recycler_byte_improve…

9fb7d09

…ments

Change

676925a

Fix

eccc376

crc32c

ae0cee4

Revert "Fix"

e7fb5f2

This reverts commit eccc376.

[CI] Auto commit changes from spotless

3d68f75

checksum

c666120

Change

9722a0f

More reorder

30f8b51

henningandersen approved these changes Oct 7, 2025

View reviewed changes

server/src/test/java/org/elasticsearch/index/translog/TranslogTests.java Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into recycler_byte_improve…

8c29c56

…ments

Tim-Brooks merged commit 5ea2d96 into elastic:main Oct 8, 2025
34 checks passed

Improve serialization of translog operations #134243

Improve serialization of translog operations #134243

Uh oh!

Conversation

Tim-Brooks commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

henningandersen Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tim-Brooks commented Oct 7, 2025

Uh oh!

Tim-Brooks commented Oct 8, 2025

Uh oh!

Tim-Brooks commented Oct 8, 2025

Uh oh!

Tim-Brooks commented Oct 8, 2025

💚 Build Succeeded

History

Uh oh!

Tim-Brooks commented Oct 8, 2025

Uh oh!

Tim-Brooks commented Oct 8, 2025

💚 Build Succeeded

History

Uh oh!

Tim-Brooks commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tim-Brooks commented Oct 8, 2025

Uh oh!

elasticmachine commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

History

Uh oh!

Tim-Brooks commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tim-Brooks commented Sep 5, 2025 •

edited

Loading

Tim-Brooks commented Oct 8, 2025 •

edited

Loading

elasticmachine commented Oct 8, 2025 •

edited

Loading