Adding asynchronous fetching for DirectIO directory #134803

benwtrent · 2025-09-16T12:43:48Z

One significant cost of DirectIO is simply waiting for bytes to be read in a path dedicate for compute.

This change adds "prefetch" capabilities to DirectIO by allowing to prefetch particular file positions. For simplicity, I have it always prefetch a DirectIO page. Initially I did a bunch of work to allow prefetching multiple pages (e.g. more than 8192 bytes), but this greatly complicated the implementation. I think this can be added as a follow up.

Here are some benchmarks for vectors. Note, the recall difference indicates I am doing something wrong right now. I am thinking I have a couple off-by-one errors and I am still investigating.

Opening as a draft until I can figure out this weird bug (and of course, remove all my extraneous changes used for testing this thing)...This is labeled as 9.2, but I would be very surprise if it actually lands there.

This PR:

index_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity
------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------
cohere-wikipedia-docs-768d.vec         ivf                 1.00         5.57              0.00           0.00  179.53    0.92  87397.37                1.00

Baseline DirectIO:

index_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity
------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------
cohere-wikipedia-docs-768d.vec         ivf                 1.00         8.12              0.00           0.00  123.15    0.94  87397.37                1.00

Baseline MMAP (when many floating points can still just reside in memory):

index_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited  filter_selectivity
------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  --------  ------------------
cohere-wikipedia-docs-768d.vec         ivf                 1.00         3.58              0.00           0.00  279.33    0.94  87397.37                1.00

…n regular direct io right now

elasticsearchmachine · 2025-09-16T12:44:29Z

Hi @benwtrent, I've created a changelog YAML for you.

benwtrent · 2025-09-18T21:20:18Z

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

+
+import static java.nio.ByteOrder.LITTLE_ENDIAN;
+
+public class AsyncDirectIOIndexInput extends IndexInput {


This is the main change.

benwtrent · 2025-09-18T21:27:06Z

Hey reviewers, I am marking this "ready to review", but obviously, I think pieces of it need to be split out.

Once we merge Lucene, I am gonna work on addressing reranking and bulk rescoring. We can likely replace pieces of this with the next lucene version.
We need to determine if we want to make this async DirectIO thing more general. Right now, its only prefetching things you ask for directly. Maybe it should prefetch more eagerly? Prefetch more than a single buffer size?

Basically, the main focus of the review should be the AsyncDirectIOIndexInput. The rest is structure I needed to actually put it through its paces and will be moved out when I can into a separate PR.

elasticsearchmachine · 2025-09-18T21:27:37Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-09-18T21:27:38Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

…rch into exp/async-direct-io

thecoop · 2025-09-26T07:58:02Z

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

+    public AsyncDirectIOIndexInput clone() {
+        try {
+            var clone = new AsyncDirectIOIndexInput("clone:" + this, this, offset, length);
+            // TODO figure out how to make this async


Is a seek expected to take a long time?

@thecoop yep, I have noticed that this has a measurable impact every single time we construct this thing because its a fully synchronous seek.

Gosh. Async seek would be something like putting the seek on a virtual thread, and blocking all calls to the cloned instance until the seek is complete. Bit of a faff really.

Async seek would be something like putting the seek on a virtual thread, and blocking all calls to the cloned instance until the seek is complete.

Not really, async seek would just place one in the queue and kick it off. Then if the bytes are ever read (which, for vectors, doesn't really happen as we don't read from slice(pos=0)), it will join the async call and let it read.

The downside is that it will take a async buffer slot.

What it does now is block the calling thread until the seek and read is complete, which gets expensive as we very commonly request a new FloatVectorValues just to see vector count or pass around the object for other validations (never actually reading vector values at all).

Could we then make the seek lazy, only doing it if the values is actually going to be used?

Could we then make the seek lazy, only doing it if the values is actually going to be used?

@thecoop I have tried, and keep running into very strange edge cases. I will create an issue to track the TODO so it can be tackled later on.

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

…rch into exp/async-direct-io

thecoop · 2025-10-06T08:27:45Z

qa/vector/build.gradle

  if (buildParams.getRuntimeJavaVersion().map { it.majorVersion.toInteger() }.get() >= 21) {
    jvmArgs '--add-modules=jdk.incubator.vector', '--enable-native-access=ALL-UNNAMED'
  }
+  if (System.getenv("DO_ASYNC_PROFILING") != null) {


Is this needed in the final version?

I found it useful to have a header to just apply my own async profiler, can we keep it?

Actually, looking at: #136021

that makes this work nicer, so I will remove this :).

thecoop · 2025-10-06T08:29:48Z

server/src/main/java/org/elasticsearch/index/store/FsDirectoryFactory.java

+                    @Override
+                    public IndexInput openInput(String name, IOContext context) throws IOException {
+                        ensureOpen();
+                        if (useDirectIO(name, context, OptionalLong.of(fileLength(name)))) {


useDirectIO is overridden to always return true in this class

I realize that, but I wanted to make sure we handle correct logic, unrelated to underlying API constants. (maybe we have a more "custom" directIO impl that only does direct IO based on file context,etc.)

thecoop · 2025-10-06T08:32:13Z

server/src/test/java/org/elasticsearch/index/store/AsyncDirectIODirectoryTests.java

+            public IndexInput openInput(String name, IOContext context) throws IOException {
+                int blockSize = getBlockSize(path);
+                ensureOpen();
+                if (useDirectIO(name, context, OptionalLong.of(fileLength(name)))) {


again, this will always be true here.

(could we just use the implementation in FsDirectoryFactory somehow?)

thecoop · 2025-10-06T08:33:04Z

server/src/test/java/org/elasticsearch/index/store/AsyncDirectIODirectoryTests.java

+
+            // Reading immediately after seeking past EOF should throw EOFException
+            expectThrows(EOFException.class, () -> i.readByte());
+            i.close();


nit: Put IndexInput i in a try block?

thecoop · 2025-10-06T08:34:14Z

server/src/test/java/org/elasticsearch/index/store/AsyncDirectIODirectoryTests.java

+    }
+
+    // Ping-pong seeks should be really fast, since the position should be within buffer.
+    // The test should complete within sub-second times, not minutes.


This doesn't check the time. Is it worth adding a stopwatch check for a long time, say 1 minute?

@thecoop honestly, I am not sure...that would likely make it flaky, I am inheriting these tests from Lucene.

thecoop · 2025-10-06T08:36:13Z

server/src/test/java/org/elasticsearch/index/store/AsyncDirectIOIndexInputTests.java

+        int offset = 84;
+        float[] vectorActual = new float[768];
+        int[] toSeek = new int[] { 1, 2, 3, 5, 6, 9, 11, 14, 15, 16, 18, 23, 24, 25, 26, 29, 30, 31 };
+        int byteSize = 768 * 4;


Suggested change

int byteSize = 768 * 4;

int byteSize = vectorActual.length * Float.BYTES;

thecoop · 2025-10-06T08:37:23Z

server/src/test/java/org/elasticsearch/index/store/AsyncDirectIOIndexInputTests.java

+import java.util.ArrayList;
+import java.util.List;
+
+public class AsyncDirectIOIndexInputTests extends ESTestCase {


There's quite a few magic numbers in this class. Could they be consolidated and some comments added?

thecoop · 2025-10-06T08:46:30Z

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

+            buffer.flip();
+            buffer.position(delta);
+        } catch (IOException ioe) {
+            throw new IOException(ioe.getMessage() + ": " + this, ioe);


Do we need to rethrow as an IOException? Note that this hides any thrown subclasses of IOException

I am not sure, this is copied from Lucene. I would hope once Lucene gets async directIO, we can remove this class and rely on Lucene's.

thecoop · 2025-10-06T08:55:30Z

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

+         */
+        void prefetch(long pos, long length) {
+            // first determine how many slots we need given the length
+            int numSlots = (int) Math.min((length + prefetchBytesSize - 1) / prefetchBytesSize, Integer.MAX_VALUE - 1);


Do we really want a max number of slots as Integer.MAX_VALUE - 1? Wouldn't that cause significant numbers having that number?

thecoop · 2025-10-06T09:00:14Z

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java

+        void prefetch(long pos, long length) {
+            // first determine how many slots we need given the length
+            int numSlots = (int) Math.min((length + prefetchBytesSize - 1) / prefetchBytesSize, Integer.MAX_VALUE - 1);
+            while (numSlots > 0 && (this.posToSlot.size() + this.pendingPrefetches.size()) < maxTotalPrefetches) {


So this doesn't do any prefetching if we've got max in-progress then. That's probably the right thing to do, but it may be worth making a note that in high-pressure situations (IO taking a long time, exceptions thrown, large blocking prefetches, whatever), prefetching wont be doing anything. Is it worth a debug log in this case to help us diagnose IO problems around this in the future?

@thecoop let me add a logger! yes

thecoop

Happy with the overall structure of this. I've commented on a few details that could be worked on here, or separately later on

benwtrent · 2025-10-06T18:30:36Z

server/src/main/java/org/elasticsearch/index/store/FsDirectoryFactory.java

        }; // can we set on both - node and index level, some nodes might be running on NFS so they might need simple rather than native
    }, Property.IndexScope, Property.NodeScope);

+    public static final Setting<Integer> ASYNC_PREFETCH_LIMIT = Setting.intSetting(


@thecoop I made this change as I figured we want a way to shut it off or increase the limit. 64 is pretty conservative with only 8k buffers, but it seems better to be safer than not.

I am also not sure we actually want to document this setting. Ideally, its never touched.

benwtrent added 4 commits September 12, 2025 18:11

Absolutely garbage async io POC, it reads correctly but is slower tha…

f111798

…n regular direct io right now

iter

aa69974

iter

e36d5a4

iter

9f43918

benwtrent added >enhancement :Search Relevance/Vectors Vector search v9.2.0 :Search Foundations/Search Catch all for Search Foundations labels Sep 16, 2025

Update docs/changelog/134803.yaml

8ccc268

benwtrent added 3 commits September 18, 2025 16:38

repeatably failing test

2ce6dd2

Merge remote-tracking branch 'upstream/main' into exp/async-direct-io

74793df

fixing bug

9ed9b4c

benwtrent commented Sep 18, 2025

View reviewed changes

iter

c19b9b1

benwtrent marked this pull request as ready for review September 18, 2025 21:27

elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Sep 18, 2025

benwtrent added >non-issue and removed >enhancement labels Sep 18, 2025

benwtrent and others added 4 commits September 18, 2025 17:32

Delete docs/changelog/134803.yaml

ca6db4a

iter

6a5086e

Merge branch 'exp/async-direct-io' of github.com:benwtrent/elasticsea…

87909c0

…rch into exp/async-direct-io

iter

d1feca4

benwtrent changed the title ~~[DRAFT] Adding asynchronous fetching for DirectIO directory~~ Adding asynchronous fetching for DirectIO directory Sep 19, 2025

benwtrent added 2 commits September 19, 2025 16:16

Merge remote-tracking branch 'upstream/main' into exp/async-direct-io

431fa36

iter

bfd8656

thecoop reviewed Sep 26, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java Show resolved Hide resolved

thecoop reviewed Sep 26, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/store/AsyncDirectIOIndexInput.java Outdated Show resolved Hide resolved

benwtrent added 2 commits September 26, 2025 09:15

iter

0ebdf43

Merge branch 'exp/async-direct-io' of github.com:benwtrent/elasticsea…

6633f68

…rch into exp/async-direct-io

benwtrent requested a review from thecoop September 26, 2025 13:16

benwtrent and others added 3 commits September 26, 2025 10:30

Merge branch 'main' into exp/async-direct-io

dd8abc9

Merge remote-tracking branch 'upstream/main' into exp/async-direct-io

1ae5dd3

Merge branch 'exp/async-direct-io' of github.com:benwtrent/elasticsea…

441aa24

…rch into exp/async-direct-io

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Merge branch 'main' into exp/async-direct-io

7b6a822

thecoop reviewed Oct 6, 2025

View reviewed changes

thecoop approved these changes Oct 6, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into exp/async-direct-io

b8afc2b

benwtrent mentioned this pull request Oct 6, 2025

Make slice & clone do async seeks for DirectIO inputs #136046

Open

Adding a setting and addressing PR comments

cdfb99d

benwtrent requested a review from a team as a code owner October 6, 2025 18:25

benwtrent commented Oct 6, 2025

View reviewed changes

benwtrent and others added 2 commits October 6, 2025 15:52

fixing setting

7636387

Merge branch 'main' into exp/async-direct-io

cb5bd7b


		import static java.nio.ByteOrder.LITTLE_ENDIAN;

		public class AsyncDirectIOIndexInput extends IndexInput {

	int byteSize = 768 * 4;
	int byteSize = vectorActual.length * Float.BYTES;

Adding asynchronous fetching for DirectIO directory #134803

Are you sure you want to change the base?

Adding asynchronous fetching for DirectIO directory #134803

Conversation

benwtrent commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Sep 18, 2025

Uh oh!

elasticsearchmachine commented Sep 18, 2025

Uh oh!

elasticsearchmachine commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benwtrent commented Sep 16, 2025 •

edited

Loading