Skip to content

Commit 0ad1a4a

Browse files
Default LogsDB value for ignore_dynamic_beyond_limit (#115921)
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded: 1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail. 2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit. To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure. This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
1 parent af02795 commit 0ad1a4a

File tree

5 files changed

+298
-2
lines changed

5 files changed

+298
-2
lines changed

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
---
12
object with unmapped fields:
23
- requires:
34
cluster_features: ["mapper.ignored_source.dont_expand_dots"]

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/logsdb/10_settings.yml

Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -600,3 +600,284 @@ end time not allowed in logs mode:
600600
- match: { error.root_cause.0.type: "illegal_argument_exception" }
601601
- match: { error.type: "illegal_argument_exception" }
602602
- match: { error.reason: "[index.time_series.end_time] requires [index.mode=time_series]" }
603+
604+
---
605+
ignore dynamic beyond limit logsdb default value:
606+
- requires:
607+
cluster_features: [ "mapper.logsdb_default_ignore_dynamic_beyond_limit" ]
608+
reason: requires logsdb default value for `index.mapping.total_fields.ignore_dynamic_beyond_limit`
609+
610+
- do:
611+
indices.create:
612+
index: test-ignore-dynamic-default
613+
body:
614+
settings:
615+
index:
616+
mode: logsdb
617+
618+
- do:
619+
indices.get_settings:
620+
index: test-ignore-dynamic-default
621+
include_defaults: true
622+
623+
- match: { test-ignore-dynamic-default.settings.index.mode: "logsdb" }
624+
- match: { test-ignore-dynamic-default.defaults.index.mapping.total_fields.limit: "1000" }
625+
- match: { test-ignore-dynamic-default.defaults.index.mapping.total_fields.ignore_dynamic_beyond_limit: "true" }
626+
627+
---
628+
ignore dynamic beyond limit logsdb override value:
629+
- requires:
630+
cluster_features: [ "mapper.logsdb_default_ignore_dynamic_beyond_limit" ]
631+
reason: requires logsdb default value for `index.mapping.total_fields.ignore_dynamic_beyond_limit`
632+
633+
- do:
634+
indices.create:
635+
index: test-ignore-dynamic-override
636+
body:
637+
settings:
638+
index:
639+
mode: logsdb
640+
mapping:
641+
total_fields:
642+
ignore_dynamic_beyond_limit: false
643+
644+
- do:
645+
indices.get_settings:
646+
index: test-ignore-dynamic-override
647+
648+
- match: { test-ignore-dynamic-override.settings.index.mode: "logsdb" }
649+
- match: { test-ignore-dynamic-override.settings.index.mapping.total_fields.ignore_dynamic_beyond_limit: "false" }
650+
651+
---
652+
logsdb with default ignore dynamic beyond limit and default sorting:
653+
- requires:
654+
cluster_features: ["mapper.logsdb_default_ignore_dynamic_beyond_limit"]
655+
reason: requires default value for ignore_dynamic_beyond_limit
656+
657+
- do:
658+
indices.create:
659+
index: test-logsdb-default-sort
660+
body:
661+
settings:
662+
index:
663+
mode: logsdb
664+
mapping:
665+
# NOTE: When the index mode is set to `logsdb`, the `host.name` field is automatically injected if
666+
# sort settings are not overridden.
667+
# With `subobjects` set to `true` (default), this creates a `host` object field and a nested `name`
668+
# keyword field (`host.name`).
669+
#
670+
# As a result, there are always at least 4 statically mapped fields (`@timestamp`, `host`, `host.name`
671+
# and `name`). We cannot use a field limit lower than 4 because these fields are always present.
672+
#
673+
# Indeed, if `index.mapping.total_fields.ignore_dynamic_beyond_limit` is `true`, any dynamically
674+
# mapped fields beyond the limit `index.mapping.total_fields.limit` are ignored, but the statically
675+
# mapped fields are always counted.
676+
total_fields:
677+
limit: 4
678+
mappings:
679+
properties:
680+
"@timestamp":
681+
type: date
682+
name:
683+
type: keyword
684+
685+
- do:
686+
indices.get_settings:
687+
index: test-logsdb-default-sort
688+
689+
- match: { test-logsdb-default-sort.settings.index.mode: "logsdb" }
690+
691+
- do:
692+
bulk:
693+
index: test-logsdb-default-sort
694+
refresh: true
695+
body:
696+
- '{ "index": { } }'
697+
- '{ "@timestamp": "2024-08-13T12:30:00Z", "name": "foo", "host.name": "92f4a67c", "value": 10, "message": "the quick brown fox", "region": "us-west", "pid": 153462 }'
698+
- '{ "index": { } }'
699+
- '{ "@timestamp": "2024-08-13T12:01:00Z", "name": "bar", "host.name": "24eea278", "value": 20, "message": "jumps over the lazy dog", "region": "us-central", "pid": 674972 }'
700+
- match: { errors: false }
701+
702+
- do:
703+
search:
704+
index: test-logsdb-default-sort
705+
body:
706+
query:
707+
match_all: {}
708+
709+
- match: { hits.total.value: 2 }
710+
- match: { hits.hits.0._source.name: "bar" }
711+
- match: { hits.hits.0._source.value: 20 }
712+
- match: { hits.hits.0._source.message: "jumps over the lazy dog" }
713+
- match: { hits.hits.0._ignored: [ "message", "pid", "region", "value" ] }
714+
- match: { hits.hits.1._source.name: "foo" }
715+
- match: { hits.hits.1._source.value: 10 }
716+
- match: { hits.hits.1._source.message: "the quick brown fox" }
717+
- match: { hits.hits.1._ignored: [ "message", "pid", "region", "value" ] }
718+
719+
---
720+
logsdb with default ignore dynamic beyond limit and non-default sorting:
721+
- requires:
722+
cluster_features: ["mapper.logsdb_default_ignore_dynamic_beyond_limit"]
723+
reason: requires default value for ignore_dynamic_beyond_limit
724+
725+
- do:
726+
indices.create:
727+
index: test-logsdb-non-default-sort
728+
body:
729+
settings:
730+
index:
731+
sort.field: [ "name" ]
732+
sort.order: [ "desc" ]
733+
mode: logsdb
734+
mapping:
735+
# NOTE: Here sort settings are overridden and we do not have any additional statically mapped field other
736+
# than `name` and `timestamp`. As a result, there are only 2 statically mapped fields.
737+
total_fields:
738+
limit: 2
739+
mappings:
740+
properties:
741+
"@timestamp":
742+
type: date
743+
name:
744+
type: keyword
745+
746+
- do:
747+
indices.get_settings:
748+
index: test-logsdb-non-default-sort
749+
750+
- match: { test-logsdb-non-default-sort.settings.index.mode: "logsdb" }
751+
752+
- do:
753+
bulk:
754+
index: test-logsdb-non-default-sort
755+
refresh: true
756+
body:
757+
- '{ "index": { } }'
758+
- '{ "@timestamp": "2024-08-13T12:30:00Z", "name": "foo", "host.name": "92f4a67c", "value": 10, "message": "the quick brown fox", "region": "us-west", "pid": 153462 }'
759+
- '{ "index": { } }'
760+
- '{ "@timestamp": "2024-08-13T12:01:00Z", "name": "bar", "host.name": "24eea278", "value": 20, "message": "jumps over the lazy dog", "region": "us-central", "pid": 674972 }'
761+
- match: { errors: false }
762+
763+
- do:
764+
search:
765+
index: test-logsdb-non-default-sort
766+
body:
767+
query:
768+
match_all: {}
769+
770+
- match: { hits.total.value: 2 }
771+
- match: { hits.hits.0._source.name: "foo" }
772+
- match: { hits.hits.0._source.value: 10 }
773+
- match: { hits.hits.0._source.message: "the quick brown fox" }
774+
- match: { hits.hits.0._ignored: [ "host", "message", "pid", "region", "value" ] }
775+
- match: { hits.hits.1._source.name: "bar" }
776+
- match: { hits.hits.1._source.value: 20 }
777+
- match: { hits.hits.1._source.message: "jumps over the lazy dog" }
778+
- match: { hits.hits.1._ignored: [ "host", "message", "pid", "region", "value" ] }
779+
780+
---
781+
logsdb with default ignore dynamic beyond limit and too low limit:
782+
- requires:
783+
cluster_features: ["mapper.logsdb_default_ignore_dynamic_beyond_limit"]
784+
reason: requires default value for ignore_dynamic_beyond_limit
785+
786+
- do:
787+
catch: bad_request
788+
indices.create:
789+
index: test-logsdb-low-limit
790+
body:
791+
settings:
792+
index:
793+
mode: logsdb
794+
mapping:
795+
# NOTE: When the index mode is set to `logsdb`, the `host.name` field is automatically injected if
796+
# sort settings are not overridden.
797+
# With `subobjects` set to `true` (default), this creates a `host` object field and a nested `name`
798+
# keyword field (`host.name`).
799+
#
800+
# As a result, there are always at least 4 statically mapped fields (`@timestamp`, `host`, `host.name`
801+
# and `name`). We cannot use a field limit lower than 4 because these fields are always present.
802+
#
803+
# Indeed, if `index.mapping.total_fields.ignore_dynamic_beyond_limit` is `true`, any dynamically
804+
# mapped fields beyond the limit `index.mapping.total_fields.limit` are ignored, but the statically
805+
# mapped fields are always counted.
806+
total_fields:
807+
limit: 3
808+
mappings:
809+
properties:
810+
"@timestamp":
811+
type: date
812+
name:
813+
type: keyword
814+
- match: { error.type: "illegal_argument_exception" }
815+
- match: { error.reason: "Limit of total fields [3] has been exceeded" }
816+
817+
---
818+
logsdb with default ignore dynamic beyond limit and subobjects false:
819+
- requires:
820+
cluster_features: ["mapper.logsdb_default_ignore_dynamic_beyond_limit"]
821+
reason: requires default value for ignore_dynamic_beyond_limit
822+
823+
- do:
824+
indices.create:
825+
index: test-logsdb-subobjects-false
826+
body:
827+
settings:
828+
index:
829+
mode: logsdb
830+
mapping:
831+
# NOTE: When the index mode is set to `logsdb`, the `host.name` field is automatically injected if
832+
# sort settings are not overridden.
833+
# With `subobjects` set to `false` anyway, a single `host.name` keyword field is automatically mapped.
834+
#
835+
# As a result, there are just 3 statically mapped fields (`@timestamp`, `host.name` and `name`).
836+
# We cannot use a field limit lower than 3 because these fields are always present.
837+
#
838+
# Indeed, if `index.mapping.total_fields.ignore_dynamic_beyond_limit` is `true`, any dynamically
839+
# mapped fields beyond the limit `index.mapping.total_fields.limit` are ignored, but the statically
840+
# mapped fields are always counted.
841+
total_fields:
842+
limit: 3
843+
mappings:
844+
subobjects: false
845+
properties:
846+
"@timestamp":
847+
type: date
848+
name:
849+
type: keyword
850+
851+
- do:
852+
indices.get_settings:
853+
index: test-logsdb-subobjects-false
854+
855+
- match: { test-logsdb-subobjects-false.settings.index.mode: "logsdb" }
856+
857+
- do:
858+
bulk:
859+
index: test-logsdb-subobjects-false
860+
refresh: true
861+
body:
862+
- '{ "index": { } }'
863+
- '{ "@timestamp": "2024-08-13T12:30:00Z", "name": "foo", "host.name": "92f4a67c", "value": 10, "message": "the quick brown fox", "region": "us-west", "pid": 153462 }'
864+
- '{ "index": { } }'
865+
- '{ "@timestamp": "2024-08-13T12:01:00Z", "name": "bar", "host.name": "24eea278", "value": 20, "message": "jumps over the lazy dog", "region": "us-central", "pid": 674972 }'
866+
- match: { errors: false }
867+
868+
- do:
869+
search:
870+
index: test-logsdb-subobjects-false
871+
body:
872+
query:
873+
match_all: {}
874+
875+
- match: { hits.total.value: 2 }
876+
- match: { hits.hits.0._source.name: "bar" }
877+
- match: { hits.hits.0._source.value: 20 }
878+
- match: { hits.hits.0._source.message: "jumps over the lazy dog" }
879+
- match: { hits.hits.0._ignored: [ "message", "pid", "region", "value" ] }
880+
- match: { hits.hits.1._source.name: "foo" }
881+
- match: { hits.hits.1._source.value: 10 }
882+
- match: { hits.hits.1._source.message: "the quick brown fox" }
883+
- match: { hits.hits.1._ignored: [ "message", "pid", "region", "value" ] }

server/src/main/java/org/elasticsearch/index/IndexVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ private static IndexVersion def(int id, Version luceneVersion) {
119119
public static final IndexVersion UPGRADE_TO_LUCENE_9_12 = def(8_516_00_0, Version.LUCENE_9_12_0);
120120
public static final IndexVersion ENABLE_IGNORE_ABOVE_LOGSDB = def(8_517_00_0, Version.LUCENE_9_12_0);
121121
public static final IndexVersion ADD_ROLE_MAPPING_CLEANUP_MIGRATION = def(8_518_00_0, Version.LUCENE_9_12_0);
122+
public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT_BACKPORT = def(8_519_00_0, Version.LUCENE_9_12_0);
122123
/*
123124
* STOP! READ THIS FIRST! No, really,
124125
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _

server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,8 @@ public Set<NodeFeature> getTestFeatures() {
5757
RangeFieldMapper.DATE_RANGE_INDEXING_FIX,
5858
IgnoredSourceFieldMapper.DONT_EXPAND_DOTS_IN_IGNORED_SOURCE,
5959
SourceFieldMapper.REMOVE_SYNTHETIC_SOURCE_ONLY_VALIDATION,
60-
IgnoredSourceFieldMapper.ALWAYS_STORE_OBJECT_ARRAYS_IN_NESTED_OBJECTS
60+
IgnoredSourceFieldMapper.ALWAYS_STORE_OBJECT_ARRAYS_IN_NESTED_OBJECTS,
61+
MapperService.LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT
6162
);
6263
}
6364
}

server/src/main/java/org/elasticsearch/index/mapper/MapperService.java

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@
2323
import org.elasticsearch.common.xcontent.XContentHelper;
2424
import org.elasticsearch.core.Nullable;
2525
import org.elasticsearch.core.UpdateForV9;
26+
import org.elasticsearch.features.NodeFeature;
2627
import org.elasticsearch.index.AbstractIndexComponent;
28+
import org.elasticsearch.index.IndexMode;
2729
import org.elasticsearch.index.IndexSettings;
2830
import org.elasticsearch.index.IndexVersion;
31+
import org.elasticsearch.index.IndexVersions;
2932
import org.elasticsearch.index.analysis.AnalysisRegistry;
3033
import org.elasticsearch.index.analysis.IndexAnalyzers;
3134
import org.elasticsearch.index.analysis.NamedAnalyzer;
@@ -122,9 +125,18 @@ public boolean isAutoUpdate() {
122125
Property.IndexScope,
123126
Property.ServerlessPublic
124127
);
128+
129+
public static final NodeFeature LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = new NodeFeature(
130+
"mapper.logsdb_default_ignore_dynamic_beyond_limit"
131+
);
125132
public static final Setting<Boolean> INDEX_MAPPING_IGNORE_DYNAMIC_BEYOND_LIMIT_SETTING = Setting.boolSetting(
126133
"index.mapping.total_fields.ignore_dynamic_beyond_limit",
127-
false,
134+
settings -> {
135+
boolean isLogsDBIndexMode = IndexSettings.MODE.get(settings) == IndexMode.LOGSDB;
136+
boolean isNewIndexVersion = IndexMetadata.SETTING_INDEX_VERSION_CREATED.get(settings)
137+
.onOrAfter(IndexVersions.LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT_BACKPORT);
138+
return String.valueOf(isLogsDBIndexMode && isNewIndexVersion);
139+
},
128140
Property.Dynamic,
129141
Property.IndexScope
130142
);

0 commit comments

Comments
 (0)