[GPU] Copy to host in case of small matrices to release resources early #136464

ldematte · 2025-10-13T06:20:38Z

This PR makes a small change to improve parallelism during graph build which we noticed with NVIDIA from profiler traces.
In case the resulting graph is "small enough" (where "small enough" is ATM set to 128 MB) we copy the graph entirely to host memory, release the cuvs resources and proceed, instead of downloading data in pages from the device and write each page to disk, which is more efficient but will hold the resources till we finished writing to disk -- on a busy system this can require time.

elasticsearchmachine · 2025-10-13T06:21:03Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

ChrisHegarty

Makes sense. LGTM.

ldematte · 2025-10-13T08:13:04Z

Update without test-gpu to verify this works well with "regular" CI

…ly (elastic#136464) This PR makes a small change to improve parallelism during graph build which we noticed with NVIDIA from profiler traces. In case the resulting graph is "small enough" (where "small enough" is ATM set to 128 MB) we copy the graph entirely to host memory, release the cuvs resources and proceed, instead of downloading data in pages from the device and write each page to disk, which is more efficient but will hold the resources till we finished writing to disk -- on a busy system this can require time.

ldematte added 2 commits October 13, 2025 08:11

Copy to host in case of small matrices to release resources early

014fa6d

Small change

c8c0fb7

ldematte requested review from ChrisHegarty and mayya-sharipova October 13, 2025 06:20

ldematte added >non-issue :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.2.1 v9.3.0 labels Oct 13, 2025

ldematte mentioned this pull request Oct 13, 2025

[GPU] Optimize merge memory usage #136411

Open

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 13, 2025

ChrisHegarty approved these changes Oct 13, 2025

View reviewed changes

ldematte removed the test-gpu Run tests using a GPU label Oct 13, 2025

ldematte added 2 commits October 13, 2025 10:13

Merge branch 'main' into gpu/optimize-flush-memory-release

a04b2e5

Merge branch 'main' into gpu/optimize-flush-memory-release

d05323b

ldematte enabled auto-merge (squash) October 13, 2025 10:43

Merge branch 'main' into gpu/optimize-flush-memory-release

241f009

ldematte merged commit ca5ca98 into elastic:main Oct 13, 2025
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Copy to host in case of small matrices to release resources early #136464

[GPU] Copy to host in case of small matrices to release resources early #136464

Uh oh!

ldematte commented Oct 13, 2025

Uh oh!

elasticsearchmachine commented Oct 13, 2025

Uh oh!

ChrisHegarty left a comment

Uh oh!

ldematte commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[GPU] Copy to host in case of small matrices to release resources early #136464

[GPU] Copy to host in case of small matrices to release resources early #136464

Uh oh!

Conversation

ldematte commented Oct 13, 2025

Uh oh!

elasticsearchmachine commented Oct 13, 2025

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

ldematte commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants