Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs/reference/mapping/types/keyword.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,43 @@ Will become:
----
// TEST[s/^/{"_source":/ s/\n$/}/]

If `null_value` is configured, `null` values are replaced with the `null_value` in synthetic source.
For example:
[source,console,id=synthetic-source-keyword-example-null-values]
----
PUT idx
{
"settings": {
"index": {
"mapping": {
"source": {
"mode": "synthetic"
}
}
}
},
"mappings": {
"properties": {
"kwd": { "type": "keyword", "null_value": "NA" }
}
}
}
PUT idx/_doc/1
{
"kwd": ["foo", null, "bar"]
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]

Will become:

[source,console-result]
----
{
"kwd": ["NA", "bar", "foo"]
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]

include::constant-keyword.asciidoc[]

Expand Down
49 changes: 31 additions & 18 deletions docs/reference/mapping/types/text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -168,15 +168,23 @@ be changed or removed in a future release. Elastic will work to fix
any issues, but features in technical preview are not subject to the support SLA
of official GA features.

`text` fields support <<synthetic-source,synthetic `_source`>> if they have
a <<keyword-synthetic-source, `keyword`>> sub-field that supports synthetic
`_source` or if the `text` field sets `store` to `true`. Either way, it may
not have <<copy-to,`copy_to`>>.

If using a sub-`keyword` field, then the values are sorted in the same way as
a `keyword` field's values are sorted. By default, that means sorted with
duplicates removed. So:
[source,console,id=synthetic-source-text-example-default]
`text` fields can use a <<keyword-synthetic-source, `keyword`>> sub-field to support
<<synthetic-source,synthetic `_source`>> without storing values of the text field itself.

In this case, the synthetic source of the `text` field will have the same <<synthetic-source-modifications,modifications>> as a `keyword` field.

These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on <<query-dsl-match-query-phrase,phrase>>
and <<span-queries,span>> queries. See the discussion about
<<position-increment-gap,`position_increment_gap`>> for more detail. You
can avoid this by making sure the `slop` parameter on the phrase queries
is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values,
but `keyword` fields support replacing `null` values with a value specified in the `null_value` parameter.
This replacement is represented in synthetic source.

For example:
[source,console,id=synthetic-source-text-example-multi-field]
----
PUT idx
{
Expand All @@ -194,8 +202,9 @@ PUT idx
"text": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
"kwd": {
"type": "keyword",
"null_value": "NA"
}
}
}
Expand All @@ -205,6 +214,7 @@ PUT idx
PUT idx/_doc/1
{
"text": [
null,
"the quick brown fox",
"the quick brown fox",
"jumped over the lazy dog"
Expand All @@ -218,19 +228,14 @@ Will become:
----
{
"text": [
"NA",
"jumped over the lazy dog",
"the quick brown fox"
]
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]

NOTE: Reordering text fields can have an effect on <<query-dsl-match-query-phrase,phrase>>
and <<span-queries,span>> queries. See the discussion about
<<position-increment-gap,`position_increment_gap`>> for more detail. You
can avoid this by making sure the `slop` parameter on the phrase queries
is lower than the `position_increment_gap`. This is the default.

If the `text` field sets `store` to true then order and duplicates
are preserved.
[source,console,id=synthetic-source-text-example-stored]
Expand All @@ -248,7 +253,15 @@ PUT idx
},
"mappings": {
"properties": {
"text": { "type": "text", "store": true }
"text": {
"type": "text",
"store": true,
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1075,6 +1075,13 @@ public Builder builder(BlockFactory factory, int expectedCount) {
* using whatever
*/
private BlockSourceReader.LeafIteratorLookup blockReaderDisiLookup(BlockLoaderContext blContext) {
if (isSyntheticSource && syntheticSourceDelegate != null) {
// Since we are using synthetic source and a delegate, we can't use this field
// to determine if the delegate has values in the document (f.e. handling of `null` is different
// between text and keyword).
return BlockSourceReader.lookupMatchingAll();
}

if (isIndexed()) {
if (getTextSearchInfo().hasNorms()) {
return BlockSourceReader.lookupFromNorms(name());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,8 @@ public static Object expectedValue(Map<String, Object> fieldMapping, Object valu
if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");

// Due to how TextFieldMapper#blockReaderDisiLookup works this is complicated.
// If we are using lookupMatchingAll() then we'll see all docs, generate synthetic source using syntheticSourceDelegate,
// parse it and see null_value inside.
// But if we are using lookupFromNorms() we will skip the document (since the text field itself does not exist).
// Same goes for lookupFromFieldNames().
boolean textFieldIndexed = (boolean) fieldMapping.getOrDefault("index", true);

if (value == null) {
if (textFieldIndexed == false && nullValue != null && nullValue.length() <= (int) ignoreAbove) {
if (nullValue != null && nullValue.length() <= (int) ignoreAbove) {
return new BytesRef(nullValue);
}

Expand All @@ -82,12 +75,6 @@ public static Object expectedValue(Map<String, Object> fieldMapping, Object valu
}

var values = (List<String>) value;

// See note above about TextFieldMapper#blockReaderDisiLookup.
if (textFieldIndexed && values.stream().allMatch(Objects::isNull)) {
return null;
}

var indexed = values.stream()
.map(s -> s == null ? nullValue : s)
.filter(Objects::nonNull)
Expand Down