-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Handle empty input inference #123763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle empty input inference #123763
Changes from 12 commits
9d6a32e
16f0b5a
96605bb
6403aa0
6e0d484
aeaf117
bb99b3b
6509870
3c4c3ed
f085df3
f7d9359
285226a
43406db
78c5e12
33a533a
cd15c9e
2fb0092
1a275db
7486fe8
6123d1a
d31d281
78a390c
09a298a
72886bf
1179f84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 123763 | ||
| summary: Handle empty input inference | ||
| area: Relevance | ||
| type: enhancement | ||
| issues: [] | ||
Mikep86 marked this conversation as resolved.
Show resolved
Hide resolved
Mikep86 marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1005,3 +1005,190 @@ setup: | |
| - match: { hits.hits.0._source.dense_field: "another inference test" } | ||
| - match: { hits.hits.0._source.non_inference_field: "non inference test" } | ||
| - exists: hits.hits.0._source._inference_fields | ||
|
|
||
| --- | ||
| "Empty semantic_text field skips embedding generation": | ||
| - requires: | ||
| cluster_features: "semantic_text.handle_empty_input" | ||
| reason: skips generating embeddings when semantic_text field is contains empty or whitespace only input | ||
|
||
|
|
||
| - do: | ||
| index: | ||
| index: test-index | ||
| id: doc_1 | ||
| body: | ||
| sparse_field: "" | ||
| refresh: true | ||
|
|
||
| - do: | ||
| search: | ||
| index: test-index | ||
| body: | ||
| fields: [ _inference_fields ] | ||
| query: | ||
| match_all: { } | ||
|
|
||
| - match: { hits.total.value: 1 } | ||
| - match: { hits.hits.0._source.sparse_field: "" } | ||
| - not_exists: hits.hits.0._source._inference_fields | ||
|
|
||
| --- | ||
| "Whitespace-Only semantic_text field skips embedding generation": | ||
| - requires: | ||
| cluster_features: "semantic_text.handle_empty_input" | ||
| reason: skips generating embeddings when semantic_text field is contains empty or whitespace only input | ||
|
|
||
| - do: | ||
| index: | ||
| index: test-index | ||
| id: doc_1 | ||
| body: | ||
| sparse_field: " " | ||
| refresh: true | ||
|
|
||
| - do: | ||
| search: | ||
| index: test-index | ||
| body: | ||
| fields: [ _inference_fields ] | ||
| query: | ||
| match_all: { } | ||
|
|
||
| - match: { hits.total.value: 1 } | ||
| - match: { hits.hits.0._source.sparse_field: " " } | ||
| - not_exists: hits.hits.0._source._inference_fields | ||
|
|
||
| --- | ||
| "Reindexing with empty or whitespace semantic_text skips embedding generation": | ||
| - requires: | ||
| cluster_features: "semantic_text.handle_empty_input" | ||
| reason: skips generating embeddings when semantic_text field is contains empty or whitespace only input | ||
|
|
||
| - do: | ||
| index: | ||
| index: test-index | ||
| id: doc_1 | ||
| body: | ||
| sparse_field: " " | ||
| refresh: true | ||
|
|
||
| - do: | ||
| indices.create: | ||
| index: destination-index | ||
| body: | ||
| settings: | ||
| index: | ||
| mapping: | ||
| semantic_text: | ||
| use_legacy_format: false | ||
| mappings: | ||
| properties: | ||
| sparse_field: | ||
| type: semantic_text | ||
| inference_id: sparse-inference-id | ||
|
|
||
| - do: | ||
| reindex: | ||
| wait_for_completion: true | ||
| body: | ||
| source: | ||
| index: test-index | ||
| dest: | ||
| index: destination-index | ||
| refresh: true | ||
|
|
||
| - do: | ||
| get: | ||
| index: destination-index | ||
| id: doc_1 | ||
|
|
||
| - match: { _source.sparse_field: " " } | ||
|
|
||
| - do: | ||
| search: | ||
| index: destination-index | ||
| body: | ||
| fields: [ _inference_fields ] | ||
| query: | ||
| match_all: { } | ||
|
|
||
| - not_exists: hits.hits.0._source._inference_fields | ||
|
|
||
| --- | ||
| "Empty Multi-Field skips embedding generation": | ||
| - requires: | ||
| cluster_features: "semantic_text.handle_empty_input" | ||
| reason: skips generating embeddings when semantic_text field is contains empty or whitespace only input | ||
|
|
||
| - do: | ||
| indices.create: | ||
| index: test-multi-index | ||
| body: | ||
| settings: | ||
| index: | ||
| mapping: | ||
| semantic_text: | ||
| use_legacy_format: false | ||
| mappings: | ||
| properties: | ||
| field: | ||
| type: semantic_text | ||
| inference_id: sparse-inference-id | ||
| fields: | ||
| sparse: | ||
| type: semantic_text | ||
| inference_id: sparse-inference-id | ||
|
|
||
| - do: | ||
| bulk: | ||
| index: test-multi-index | ||
| refresh: true | ||
| body: | | ||
| {"index":{"_id": "1"}} | ||
| {"field": ["you know, for testing", "now with chunks"]} | ||
| {"index":{"_id": "2"}} | ||
| {"field": ["", " "]} | ||
|
|
||
| - do: | ||
| search: | ||
| index: test-multi-index | ||
| body: | ||
| fields: [ _inference_fields ] | ||
| query: | ||
| match_all: { } | ||
|
|
||
| - exists: hits.hits.0._source._inference_fields | ||
| - not_exists: hits.hits.1._source._inference_fields | ||
|
|
||
| --- | ||
| "Multi chunks skips empty input embedding generation": | ||
| - requires: | ||
| cluster_features: "semantic_text.handle_empty_input" | ||
| reason: skips generating embeddings when semantic_text field is contains empty or whitespace only input | ||
|
|
||
| - do: | ||
| index: | ||
| index: test-index | ||
| id: doc_1 | ||
| body: | ||
| sparse_field: ["some test data", " ", "now with chunks"] | ||
| refresh: true | ||
|
|
||
| - do: | ||
| search: | ||
| index: test-index | ||
| body: | ||
| fields: [ _inference_fields ] | ||
| query: | ||
| match_all: { } | ||
|
|
||
| - match: { hits.total.value: 1 } | ||
|
|
||
| - length: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks: 1 } | ||
| - length: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field: 2 } | ||
| - exists: hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.0.embeddings | ||
| - match: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.0.start_offset: 0 } | ||
| - match: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.0.end_offset: 14 } | ||
| - exists: hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.1.embeddings | ||
| - match: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.1.start_offset: 20 } | ||
| - match: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks.sparse_field.1.end_offset: 35 } | ||
Uh oh!
There was an error while loading. Please reload this page.