|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Optimizing inference processors for cost efficiency and performance" |
| 4 | +authors: |
| 5 | + - will-hwang |
| 6 | + - heemin-kim |
| 7 | + - kolchfa |
| 8 | +date: 2025-05-29 |
| 9 | +has_science_table: true |
| 10 | +categories: |
| 11 | + - technical-posts |
| 12 | +meta_keywords: inference processors, vector embeddings, OpenSearch text embedding, text image embedding, sparse encoding, caching mechanism, ingest pipeline, OpenSearch optimization |
| 13 | +meta_description: Learn about a new OpenSearch optimization for inference processors that reduces redundant calls, lowering costs and improving performance in vector embedding generation. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +Inference processors, such as `text_embedding`, `text_image_embedding`, and `sparse_encoding`, enable the generation of vector embeddings during document ingestion or updates. Today, these processors invoke model inference every time a document is ingested or updated, even if the embedding source fields remain unchanged. This can lead to unnecessary compute usage and increased costs. |
| 18 | + |
| 19 | +This blog post introduces a new inference processor optimization that reduces redundant inference calls, lowering costs and improving overall performance. |
| 20 | + |
| 21 | +## How the optimization works |
| 22 | + |
| 23 | +The optimization adds a caching mechanism that compares the embedding source fields in the updated document against the existing document. If the embedding fields have not changed, the processor directly copies the existing embeddings into the updated document instead of triggering new inference. If the fields differ, the processor proceeds with inference as usual. The following diagram illustrates this workflow. |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +This approach minimizes redundant inference calls, significantly improving efficiency without impacting the accuracy or freshness of embeddings. |
| 28 | + |
| 29 | +## How to enable the optimization |
| 30 | + |
| 31 | +To enable this optimization, set the `skip_existing` parameter to `true` in your ingest pipeline processor definition. This option is available for [`text_embedding`](#text-embedding-processor), [`text_image_embedding`](#textimage-embedding-processor), and [`sparse_encoding`](#sparse-encoding-processor) processors. By default, `skip_existing` is set to `false`. |
| 32 | + |
| 33 | +### Text embedding processor |
| 34 | + |
| 35 | +The [`text_embedding` processor](https://docs.opensearch.org/docs/latest/ingest-pipelines/processors/text-embedding/) generates vector embeddings for text fields, typically used in semantic search. |
| 36 | + |
| 37 | +* **Optimization behavior**: If `skip_existing` is `true`, the processor checks whether the text fields mapped in `field_map` have changed. If they haven't, inference is skipped and the existing vector is reused. |
| 38 | + |
| 39 | +**Example pipeline**: |
| 40 | + |
| 41 | +```json |
| 42 | +PUT /_ingest/pipeline/optimized-ingest-pipeline |
| 43 | +{ |
| 44 | + "description": "Optimized ingest pipeline", |
| 45 | + "processors": [ |
| 46 | + { |
| 47 | + "text_embedding": { |
| 48 | + "model_id": "<model_id>", |
| 49 | + "field_map": { |
| 50 | + "text": "<vector_field>" |
| 51 | + }, |
| 52 | + "skip_existing": true |
| 53 | + } |
| 54 | + } |
| 55 | + ] |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +### Text/image embedding processor |
| 60 | + |
| 61 | +The [`text_image_embedding` processor](https://docs.opensearch.org/docs/latest/ingest-pipelines/processors/text-image-embedding/) generates combined embeddings from text and image fields for multimodal search use cases. |
| 62 | + |
| 63 | +* **Optimization behavior**: Because embeddings are generated for combined text and image fields, inference is skipped only if **both** the text and image fields mapped in `field_map` are unchanged. |
| 64 | + |
| 65 | +**Example pipeline**: |
| 66 | + |
| 67 | +```json |
| 68 | +PUT /_ingest/pipeline/optimized-ingest-pipeline |
| 69 | +{ |
| 70 | + "description": "Optimized ingest pipeline", |
| 71 | + "processors": [ |
| 72 | + { |
| 73 | + "text_image_embedding": { |
| 74 | + "model_id": "<model_id>", |
| 75 | + "embedding": "<vector_field>", |
| 76 | + "field_map": { |
| 77 | + "text": "<input_text_field>", |
| 78 | + "image": "<input_image_field>" |
| 79 | + }, |
| 80 | + "skip_existing": true |
| 81 | + } |
| 82 | + } |
| 83 | + ] |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +### Sparse encoding processor |
| 88 | + |
| 89 | +The [`sparse_encoding` processor](https://docs.opensearch.org/docs/latest/ingest-pipelines/processors/sparse-encoding/) generates sparse vectors from text fields used in neural sparse retrieval. |
| 90 | + |
| 91 | +* **Optimization behavior**: If the text fields in `field_map` are unchanged, the processor skips inference and reuses the existing sparse encoding. |
| 92 | + |
| 93 | +**Example pipeline**: |
| 94 | + |
| 95 | +```json |
| 96 | +PUT /_ingest/pipeline/optimized-ingest-pipeline |
| 97 | +{ |
| 98 | + "description": "Optimized ingest pipeline", |
| 99 | + "processors": [ |
| 100 | + { |
| 101 | + "sparse_encoding": { |
| 102 | + "model_id": "<model_id>", |
| 103 | + "prune_type": "max_ratio", |
| 104 | + "prune_ratio": "0.1", |
| 105 | + "field_map": { |
| 106 | + "text": "<vector_field>" |
| 107 | + }, |
| 108 | + "skip_existing": true |
| 109 | + } |
| 110 | + } |
| 111 | + ] |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +## Performance results |
| 116 | + |
| 117 | +In addition to reducing compute costs, skipping redundant inference significantly lowers latency. The following benchmarks compare processor performance with and without the `skip_existing` optimization. |
| 118 | + |
| 119 | +### Test environment |
| 120 | + |
| 121 | +We used the following cluster setup to run benchmarking tests. |
| 122 | + |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +### Text embedding processor |
| 127 | + |
| 128 | +* **Model**: `huggingface/sentence-transformers/msmarco-distilbert-base-tas-b` |
| 129 | +* **Dataset**: [Trec-Covid](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/trec-covid.zip) |
| 130 | + |
| 131 | +**Sample requests** |
| 132 | + |
| 133 | +Single document: |
| 134 | + |
| 135 | +```json |
| 136 | +PUT /test_index/_doc/1 |
| 137 | +{ |
| 138 | + "text": "Hello World" |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +Bulk update: |
| 143 | + |
| 144 | +```json |
| 145 | +POST _bulk |
| 146 | +{ "index": { "_index": "test_index" } } |
| 147 | +{ "text": "hello world" } |
| 148 | +{ "index": { "_index": "test_index" } } |
| 149 | +{ "text": "Hi World" } |
| 150 | +``` |
| 151 | + |
| 152 | +The following table presents the benchmarking test results for the `text_embedding` processor. |
| 153 | + |
| 154 | +| Operation type | Doc size | Batch size | Baseline (`skip_existing`=false) | Updated (`skip_existing`=true) | Δ vs. baseline | Unchanged (`skip_existing`=true) | Δ vs. baseline | |
| 155 | +| -------------- | -------- | ---------- | ------------------------------- | ----------------------------- | -------------- | ------------------------------- | -------------- | |
| 156 | +| Single update | 3,000 | 1 | 1,400,710 ms | 1,401,216 ms | +0.04% | 292,020 ms | -79.15% | |
| 157 | +| Batch update | 171,332 | 200 | 2,247,191 ms | 2,192,883 ms | -2.42% | 352,767 ms | -84.30% | |
| 158 | + |
| 159 | +### Text/image embedding processor |
| 160 | + |
| 161 | +* **Model**: `amazon.titan-embed-image-v1` |
| 162 | +* **Dataset**: [Flickr Image](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset) |
| 163 | + |
| 164 | +**Sample requests** |
| 165 | + |
| 166 | +Single document: |
| 167 | + |
| 168 | +```json |
| 169 | +PUT /test_index/_doc/1 |
| 170 | +{ |
| 171 | + "text": "Orange table", |
| 172 | + "image": "bGlkaHQtd29rfx43..." |
| 173 | +} |
| 174 | +``` |
| 175 | + |
| 176 | +Bulk update: |
| 177 | + |
| 178 | +```json |
| 179 | +POST _bulk |
| 180 | +{ "index": { "_index": "test_index" } } |
| 181 | +{ "text": "Orange table", "image": "bGlkaHQtd29rfx43..." } |
| 182 | +{ "index": { "_index": "test_index" } } |
| 183 | +{ "text": "Red chair", "image": "aFlkaHQtd29rfx43..." } |
| 184 | +``` |
| 185 | + |
| 186 | +The following table presents the benchmarking test results for the `text_image_embedding` processor. |
| 187 | + |
| 188 | +| Operation type | Doc size | Batch size | Baseline | Updated | Δ vs. baseline | Unchanged | Δ vs. baseline | |
| 189 | +| -------------- | -------- | ---------- | ------------ | ------------ | -------------- | ------------ | -------------- | |
| 190 | +| Single update | 3,000 | 1 | 1,060,339 ms | 1,060,785 ms | +0.04% | 465,771 ms | -56.07% | |
| 191 | +| Batch update | 31,783 | 200 | 1,809,299 ms | 1,662,389 ms | -8.12% | 1,571,012 ms | -13.17% | |
| 192 | + |
| 193 | + |
| 194 | +### Sparse encoding processor |
| 195 | + |
| 196 | +* **Model**: `huggingface/sentence-transformers/msmarco-distilbert-base-tas-b` |
| 197 | +* **Dataset**: [Trec-Covid](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/trec-covid.zip) |
| 198 | +* **Prune method**: `max_ratio`, **ratio**: `0.1` |
| 199 | + |
| 200 | +**Sample requests** |
| 201 | + |
| 202 | +Single document: |
| 203 | + |
| 204 | +```json |
| 205 | +PUT /test_index/_doc/1 |
| 206 | +{ |
| 207 | + "text": "Hello World" |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +Bulk update: |
| 212 | + |
| 213 | +```json |
| 214 | +POST _bulk |
| 215 | +{ "index": { "_index": "test_index" } } |
| 216 | +{ "text": "hello world" } |
| 217 | +{ "index": { "_index": "test_index" } } |
| 218 | +{ "text": "Hi World" } |
| 219 | +``` |
| 220 | + |
| 221 | +The following table presents the benchmarking test results for the `sparse_encoding` processor. |
| 222 | + |
| 223 | +| Operation type | Doc size | Batch size | Baseline | Updated | Δ vs. baseline | Unchanged | Δ vs. baseline | |
| 224 | +| -------------- | -------- | ---------- | ------------ | ------------ | -------------- | ---------- | -------------- | |
| 225 | +| Single update | 3,000 | 1 | 1,942,907 ms | 1,965,918 ms | +1.18% | 306,766 ms | -84.21% | |
| 226 | +| Batch update | 171,332 | 200 | 3,077,040 ms | 3,101,697 ms | +0.80% | 475,197 ms | -84.56% | |
| 227 | + |
| 228 | +## Conclusion |
| 229 | + |
| 230 | +As demonstrated by the cost and performance results, the `skip_existing` optimization significantly reduces redundant inference operations, which translates to lower costs and improved system performance. By reusing existing embeddings when input fields remain unchanged, ingest pipelines can process updates faster and more efficiently. This strategy improves system performance, enhances scalability, and delivers more cost-effective embedding retrieval at scale. |
| 231 | + |
| 232 | +## What's next |
| 233 | + |
| 234 | +If you use the Bulk API with ingest pipelines, it's important to understand how different operations behave. |
| 235 | + |
| 236 | +The Bulk API supports two operations---`index` and `update`: |
| 237 | + |
| 238 | +* The `index` operation replaces the entire document and **does** trigger ingest pipelines. |
| 239 | +* The `update` operation modifies only the specified fields but **does not** currently trigger ingest pipelines. |
| 240 | + |
| 241 | +If you'd like to see ingest pipeline support added to the `update` operation in Bulk API requests, consider supporting [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/17494) by adding a +1. |
| 242 | + |
0 commit comments