Skip to content
36 changes: 36 additions & 0 deletions docs/reference/elasticsearch/mapping-reference/keyword.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,42 @@ Will become:
}
```

If `null_value` is configured, `null` values are replaced with the `null_value` in synthetic source:

$$$synthetic-source-keyword-example-null-value$$$

```console
PUT idx
{
"settings": {
"index": {
"mapping": {
"source": {
"mode": "synthetic"
}
}
}
},
"mappings": {
"properties": {
"kwd": { "type": "keyword", "null_value": "NA" }
}
}
}
PUT idx/_doc/1
{
"kwd": ["foo", null, "bar"]
}
```

Will become:

```console-result
{
"kwd": ["bar", "foo", "NA"]
}
```


## Constant keyword field type [constant-keyword-field-type]

Expand Down
64 changes: 13 additions & 51 deletions docs/reference/elasticsearch/mapping-reference/text.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,11 +104,20 @@ Synthetic `_source` is Generally Available only for TSDB indices (indices that h
::::


`text` fields support [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) if they have a [`keyword`](/reference/elasticsearch/mapping-reference/keyword.md#keyword-synthetic-source) sub-field that supports synthetic `_source` or if the `text` field sets `store` to `true`. Either way, it may not have [`copy_to`](/reference/elasticsearch/mapping-reference/copy-to.md).
`text` fields may use a [`keyword`](/reference/elasticsearch/mapping-reference/keyword.md#keyword-synthetic-source) sub-field to support [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) without storing values of the text field itself.

::::{note}
Synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field in this case.

These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values but `keyword` fields support replacing `null`s with a value specified in the `null_value` parameter. This replacement will be represented in synthetic source.
::::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've reframed the entire section to be about keyword, so I'd take this out of a note

Suggested change
::::{note}
Synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field in this case.
These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values but `keyword` fields support replacing `null`s with a value specified in the `null_value` parameter. This replacement will be represented in synthetic source.
::::
In this case, the synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field.
These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more details. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values, but `keyword` fields support replacing `null` values with a value specified in the `null_value` parameter. This replacement is represented in synthetic source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it would be good to keep an example here because it's the actual synthetic part. not critical, but the store example is less important I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to repeat the same example as keyword has. Do you think it's helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's intuitive for people to look for the "core" example on the keyword page ... they'd likely just scroll down to the synthetics section and copy whatever is there/assume whatever example is there represents the happy path. you could consider using a snippet to avoid maintaining two examples.

I'm also ok with this merging as is if you're not super worried about it 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we don't actually have an example with a multi-field i'll go ahead and add it. Thanks for the feedback.


If using a sub-`keyword` field, then the values are sorted in the same way as a `keyword` field’s values are sorted. By default, that means sorted with duplicates removed. So:

$$$synthetic-source-text-example-default$$$
If the `text` field sets `store` to `true` then the sub-field is not used and modifications mentioned above do not apply.

$$$synthetic-source-text-example-stored$$$

```console
PUT idx
Expand All @@ -126,6 +135,7 @@ PUT idx
"properties": {
"text": {
"type": "text",
"store": true,
"fields": {
"raw": {
"type": "keyword"
Expand All @@ -147,54 +157,6 @@ PUT idx/_doc/1

Will become:

```console-result
{
"text": [
"jumped over the lazy dog",
"the quick brown fox"
]
}
```

::::{note}
Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
::::


If the `text` field sets `store` to true then order and duplicates are preserved.

$$$synthetic-source-text-example-stored$$$

```console
PUT idx
{
"settings": {
"index": {
"mapping": {
"source": {
"mode": "synthetic"
}
}
}
},
"mappings": {
"properties": {
"text": { "type": "text", "store": true }
}
}
}
PUT idx/_doc/1
{
"text": [
"the quick brown fox",
"the quick brown fox",
"jumped over the lazy dog"
]
}
```

Will become:

```console-result
{
"text": [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1087,6 +1087,13 @@ public Builder builder(BlockFactory factory, int expectedCount) {
* using whatever
*/
private BlockSourceReader.LeafIteratorLookup blockReaderDisiLookup(BlockLoaderContext blContext) {
if (isSyntheticSource && syntheticSourceDelegate != null) {
// Since we are using synthetic source and a delegate, we can't use this field
// to determine if the delegate has values in the document (f.e. handling of `null` is different
// between text and keyword).
return BlockSourceReader.lookupMatchingAll();
}

if (isIndexed()) {
if (getTextSearchInfo().hasNorms()) {
return BlockSourceReader.lookupFromNorms(name());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,8 @@ public static Object expectedValue(Map<String, Object> fieldMapping, Object valu
if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");

// Due to how TextFieldMapper#blockReaderDisiLookup works this is complicated.
// If we are using lookupMatchingAll() then we'll see all docs, generate synthetic source using syntheticSourceDelegate,
// parse it and see null_value inside.
// But if we are using lookupFromNorms() we will skip the document (since the text field itself does not exist).
// Same goes for lookupFromFieldNames().
boolean textFieldIndexed = (boolean) fieldMapping.getOrDefault("index", true);

if (value == null) {
if (textFieldIndexed == false && nullValue != null && nullValue.length() <= (int) ignoreAbove) {
if (nullValue != null && nullValue.length() <= (int) ignoreAbove) {
return new BytesRef(nullValue);
}

Expand All @@ -82,12 +75,6 @@ public static Object expectedValue(Map<String, Object> fieldMapping, Object valu
}

var values = (List<String>) value;

// See note above about TextFieldMapper#blockReaderDisiLookup.
if (textFieldIndexed && values.stream().allMatch(Objects::isNull)) {
return null;
}

var indexed = values.stream()
.map(s -> s == null ? nullValue : s)
.filter(Objects::nonNull)
Expand Down
Loading