Commit c7a482a
authored
Remove vectors from
## Summary
This PR introduces a new **hybrid mode for the `_source` field** that stores the original source **without dense vector fields**. The goal is to reduce storage overhead and improve performance, especially as vector sizes grow. The setting also affects whether vectors are returned in **search and get APIs**, which matters even for synthetic source, since reconstructing vectors from doc values can be expensive.
## Background
Today, Elasticsearch supports two modes for `_source`:
* **Stored**: Original JSON is persisted as-is.
* **Synthetic**: `_source` is reconstructed from doc values at read time.
However, dense vector fields have become problematic:
* They **don’t compress well**, unlike text.
* They are **already stored in doc values**, so storing them again in `_source` is wasteful.
* Their `_source` representation is often **overly precise** (double precision), which isn’t needed for search/indexing.
While switching to full synthetic is an option, retrieving the full original `_source` (minus vectors) is often faster and more practical than pulling individual fields from individual storage when the number of metadata fields is high.
## What This PR Adds
We’re introducing a **hybrid source mode**:
* Keeps the original `_source`, **minus any `dense_vector` fields**.
* Built on top of the **synthetic source infrastructure**, reusing parts of it.
* Controlled via a **single index-level setting**.
### Key Behavior
* When enabled, `dense_vector` fields are **excluded from `_source` at index time**.
* The setting **also controls whether vectors are returned in search and get APIs**:
* This matters even for **synthetic source**, as **rebuilding vectors is costly**.
* You can override behavior at query time using the `exclude_vectors` option.
* The setting is:
* **Disabled by default**
* **Protected by a feature flag**
* Intended to be **enabled by default for new indices** in a follow-up
## Motivation
This hybrid option is designed for use cases where users:
* Want faster reads than full synthetic offers.
* Don’t want the storage cost of large vectors in `_source`.
* Are okay with **some loss of precision** when vectors are rehydrated.
By making this setting default for newly created indices in a follow up, we can help users avoid surprises from the hidden cost of storing and returning high-dimensional vectors.
## Benchmark Results
Benchmarking this PR against `main` using the `openai` rally track shows substantial improvements at the cost of a loss of precision when retrieving the original vectors:
| Metric | Main (Baseline) | This PR (Contender) | Change | % Change |
| :----------------------------------------- | :-------------- | :------------------ | :-------- | :---------- |
| **Indexing throughput (mean)** | 1690.77 docs/s | 2704.57 docs/s | +1013.79 | **+59.96%** |
| **Indexing time** | 120.25 min | 74.32 min | –45.93 | **–38.20%** |
| **Merge time** | 132.56 min | 69.28 min | –63.28 | **–47.74%** |
| **Merge throttle time** | 100.99 min | 36.30 min | –64.69 | **–64.06%** |
| **Flush time** | 2.71 min | 1.48 min | –1.23 | **–45.29%** |
| **Refresh count** | 60 | 42 | –18 | **–30.00%** |
| **Dataset / Store size** | 52.29 GB | 19.30 GB | –32.99 GB | **–63.09%** |
| **Young Gen GC time** | 30.64 s | 22.17 s | –8.47 | **–27.65%** |
| **Search throughput (k=10, multi-client)** | 613 ops/s | 677 ops/s | +64 ops/s | **+10.42%** |
| **Search latency (p99, k=10)** | 29.5 ms | 26.5 ms | –3.0 ms | **–10.43%** |
## Miscellaneous
Reindexing is not covered in this PR. Since it's one of the main use cases for returning vectors, the plan is for reindex to **force the inclusion of** vectors by default. This will be addressed in a follow-up, as this PR is already quite large._source transparently (#130382)1 parent ec5254b commit c7a482a
File tree
36 files changed
+1677
-226
lines changed- docs/changelog
- qa
- ccs-common-rest
- src/yamlRestTest/java/org/elasticsearch/test/rest/yaml
- smoke-test-multinode/src/yamlRestTest/java/org/elasticsearch/smoketest
- rest-api-spec/src
- main/resources/rest-api-spec/api
- yamlRestTest
- java/org/elasticsearch/test/rest
- resources/rest-api-spec/test/search.vectors
- server/src
- main/java/org/elasticsearch
- common/settings
- index
- engine
- get
- mapper
- vectors
- rest/action/search
- search
- fetch
- subphase
- vectors
- test/java/org/elasticsearch/index
- mapper/vectors
- shard
- test
- framework/src/main/java/org/elasticsearch/index/engine
- test-clusters/src/main/java/org/elasticsearch/test/cluster
- x-pack
- plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper
- qa/core-rest-tests-with-security/src/yamlRestTest/java/org/elasticsearch/xpack/security
36 files changed
+1677
-226
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | | - | |
| 94 | + | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
71 | 75 | | |
72 | 76 | | |
73 | 77 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
158 | 162 | | |
159 | 163 | | |
160 | 164 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
0 commit comments