Skip to content

Conversation

parkertimmins
Copy link
Contributor

@parkertimmins parkertimmins commented Aug 4, 2025

Add tests to the logsdb randomized tests which use multi-fields for each of the text, keyword, match_only_text, and wildcard types. For each type allow each other type of multi-field.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Aug 4, 2025
@parkertimmins
Copy link
Contributor Author

The updates in the PR should have caught the bug fixed in #131314 and #131383. To verify that this is the case, used the patch below which reverts the relevant code to the pre-fix state.

Running:
./gradlew :x-pack:plugin:logsdb:javaRestTest --tests "org.elasticsearch.xpack.logsdb.qa.BulkChallengeRestIT.testRandomQueries" -Dtests.iters=100 with the old code from the patch usually results in a few failures. Some with the expected messages, like Cannot invoke "java.util.List.iterator()" because "values" is null

Index: server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java b/server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
--- a/server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java	(revision 560d937fb913742ebfa548f6fbad0f6cae93d0a7)
+++ b/server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java	(date 1754276716082)
@@ -547,7 +547,6 @@
         private final IndexMode indexMode;
         private final IndexSortConfig indexSortConfig;
         private final boolean hasDocValuesSkipper;
-        private final String originalName;
 
         public KeywordFieldType(
             String name,
@@ -576,7 +575,6 @@
             this.indexMode = builder.indexMode;
             this.indexSortConfig = builder.indexSortConfig;
             this.hasDocValuesSkipper = DocValuesSkipIndexType.NONE.equals(fieldType.docValuesSkipIndexType()) == false;
-            this.originalName = isSyntheticSource ? name + "._original" : null;
         }
 
         public KeywordFieldType(String name, boolean isIndexed, boolean hasDocValues, Map<String, String> meta) {
@@ -591,7 +589,6 @@
             this.indexMode = IndexMode.STANDARD;
             this.indexSortConfig = null;
             this.hasDocValuesSkipper = false;
-            this.originalName = null;
         }
 
         public KeywordFieldType(String name) {
@@ -617,7 +614,6 @@
             this.indexMode = IndexMode.STANDARD;
             this.indexSortConfig = null;
             this.hasDocValuesSkipper = DocValuesSkipIndexType.NONE.equals(fieldType.docValuesSkipIndexType()) == false;
-            this.originalName = null;
         }
 
         public KeywordFieldType(String name, NamedAnalyzer analyzer) {
@@ -632,7 +628,6 @@
             this.indexMode = IndexMode.STANDARD;
             this.indexSortConfig = null;
             this.hasDocValuesSkipper = false;
-            this.originalName = null;
         }
 
         @Override
@@ -1096,15 +1091,6 @@
         ) {
             return new AutomatonQueryWithDescription(new Term(name()), automatonSupplier.get(), description);
         }
-
-        /**
-         * The name used to store "original" that have been ignored
-         * by {@link KeywordFieldType#ignoreAbove()} so that they can be rebuilt
-         * for synthetic source.
-         */
-        public String originalName() {
-            return originalName;
-        }
     }
 
     private final boolean indexed;
@@ -1158,7 +1144,7 @@
         this.forceDocValuesSkipper = builder.forceDocValuesSkipper;
         this.offsetsFieldName = offsetsFieldName;
         this.indexSourceKeepMode = indexSourceKeepMode;
-        this.originalName = mappedFieldType.originalName();
+        this.originalName = isSyntheticSource ? fullPath() + "._original" : null;
     }
 
     @Override
@@ -1218,7 +1204,7 @@
                 // Save a copy of the field so synthetic source can load it
                 var utfBytes = value.bytes();
                 var bytesRef = new BytesRef(utfBytes.bytes(), utfBytes.offset(), utfBytes.length());
-                context.doc().add(new StoredField(originalName, bytesRef));
+                context.doc().add(new StoredField(originalName(), bytesRef));
             }
             return false;
         }
@@ -1334,6 +1320,15 @@
         return normalizerName != null;
     }
 
+    /**
+     * The name used to store "original" that have been ignored
+     * by {@link KeywordFieldType#ignoreAbove()} so that they can be rebuilt
+     * for synthetic source.
+     */
+    private String originalName() {
+        return originalName;
+    }
+
     @Override
     protected SyntheticSourceSupport syntheticSourceSupport() {
         if (hasNormalizer()) {
@@ -1382,7 +1377,7 @@
         }
 
         if (fieldType().ignoreAbove != Integer.MAX_VALUE) {
-            layers.add(new CompositeSyntheticFieldLoader.StoredFieldLayer(originalName) {
+            layers.add(new CompositeSyntheticFieldLoader.StoredFieldLayer(originalName()) {
                 @Override
                 protected void writeValue(Object value, XContentBuilder b) throws IOException {
                     BytesRef ref = (BytesRef) value;
Index: modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java b/modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
--- a/modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java	(revision 560d937fb913742ebfa548f6fbad0f6cae93d0a7)
+++ b/modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java	(date 1754276716081)
@@ -47,7 +47,6 @@
 import org.elasticsearch.index.mapper.BlockStoredFieldsReader;
 import org.elasticsearch.index.mapper.DocumentParserContext;
 import org.elasticsearch.index.mapper.FieldMapper;
-import org.elasticsearch.index.mapper.KeywordFieldMapper;
 import org.elasticsearch.index.mapper.MappedFieldType;
 import org.elasticsearch.index.mapper.MapperBuilderContext;
 import org.elasticsearch.index.mapper.SourceValueFetcher;
@@ -252,17 +251,6 @@
             if (searchExecutionContext.isSourceSynthetic() && withinMultiField) {
                 String parentField = searchExecutionContext.parentPath(name());
                 var parent = searchExecutionContext.lookup().fieldType(parentField);
-
-                if (parent instanceof KeywordFieldMapper.KeywordFieldType keywordParent
-                    && keywordParent.ignoreAbove() != Integer.MAX_VALUE) {
-                    if (parent.isStored()) {
-                        return storedFieldFetcher(parentField, keywordParent.originalName());
-                    } else if (parent.hasDocValues()) {
-                        var ifd = searchExecutionContext.getForField(parent, MappedFieldType.FielddataOperation.SEARCH);
-                        return combineFieldFetchers(docValuesFieldFetcher(ifd), storedFieldFetcher(keywordParent.originalName()));
-                    }
-                }
-
                 if (parent.isStored()) {
                     return storedFieldFetcher(parentField);
                 } else if (parent.hasDocValues()) {
@@ -274,19 +262,8 @@
             } else if (searchExecutionContext.isSourceSynthetic() && hasCompatibleMultiFields) {
                 var mapper = (MatchOnlyTextFieldMapper) searchExecutionContext.getMappingLookup().getMapper(name());
                 var kwd = TextFieldMapper.SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(mapper);
-
                 if (kwd != null) {
                     var fieldType = kwd.fieldType();
-
-                    if (fieldType.ignoreAbove() != Integer.MAX_VALUE) {
-                        if (fieldType.isStored()) {
-                            return storedFieldFetcher(fieldType.name(), fieldType.originalName());
-                        } else if (fieldType.hasDocValues()) {
-                            var ifd = searchExecutionContext.getForField(fieldType, MappedFieldType.FielddataOperation.SEARCH);
-                            return combineFieldFetchers(docValuesFieldFetcher(ifd), storedFieldFetcher(fieldType.originalName()));
-                        }
-                    }
-
                     if (fieldType.isStored()) {
                         return storedFieldFetcher(fieldType.name());
                     } else if (fieldType.hasDocValues()) {
@@ -335,52 +312,13 @@
             };
         }
 
-        private static IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOException>> storedFieldFetcher(String... names) {
-            var loader = StoredFieldLoader.create(false, Set.of(names));
+        private static IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOException>> storedFieldFetcher(String name) {
+            var loader = StoredFieldLoader.create(false, Set.of(name));
             return context -> {
                 var leafLoader = loader.getLoader(context, null);
                 return docId -> {
                     leafLoader.advanceTo(docId);
-                    var storedFields = leafLoader.storedFields();
-                    if (names.length == 1) {
-                        return storedFields.get(names[0]);
-                    }
-
-                    List<Object> values = new ArrayList<>();
-                    for (var name : names) {
-                        var currValues = storedFields.get(name);
-                        if (currValues != null) {
-                            values.addAll(currValues);
-                        }
-                    }
-
-                    return values;
-                };
-            };
-        }
-
-        private static IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOException>> combineFieldFetchers(
-            IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOException>> primaryFetcher,
-            IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOException>> secondaryFetcher
-        ) {
-            return context -> {
-                var primaryGetter = primaryFetcher.apply(context);
-                var secondaryGetter = secondaryFetcher.apply(context);
-                return docId -> {
-                    List<Object> values = new ArrayList<>();
-                    var primary = primaryGetter.apply(docId);
-                    if (primary != null) {
-                        values.addAll(primary);
-                    }
-
-                    var secondary = secondaryGetter.apply(docId);
-                    if (secondary != null) {
-                        values.addAll(secondary);
-                    }
-
-                    assert primary != null || secondary != null;
-
-                    return values;
+                    return leafLoader.storedFields().get(name);
                 };
             };
         }

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 4, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work - LGTM 👍

I think the >test label is more appropriate here?

@parkertimmins parkertimmins added >test Issues or PRs that are addressing/adding tests and removed >non-issue labels Aug 4, 2025
Ideally, wildcard would be tested as a subfield. But data generation
code is used in server, and wildcard type is in xpack. With some better
wiring this could be fixed, but that will have to wait for the future.
@parkertimmins parkertimmins requested a review from lkts August 12, 2025 19:44
boolean fullyDynamicMapping,
List<PredefinedField> predefinedFields
List<PredefinedField> predefinedFields,
boolean includePluginTypes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkts I added this because BlockLoaderTestCase tests were failing because I was adding wildcard and match_only_text multi-fields. Since these types are defined in plugins they cannot (I think) be added to BlockLoader tests in test. But I'm not sure this belongs up in the main specification. Since the data generation code provides lots of knobs for overriding behavior, I'm guessing there's a better way to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. We already use geo_shape and shape in DataGenerationHelper that are defined in plugins. Can you replicate that?

This is the configuration in DataGenerationHelper and you need to add a Gradle dependency on the plugin.

.withDataSourceHandlers(List.of(new GeoShapeDataSourceHandler(), new ShapeDataSourceHandler()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep match_only_text in DefaultMappingParametersHandler. counted_keyword and wildcard types are from xpack already and don't pose a problem. I instead tried just moving the creation of multi-fields to it's own handler which allows the choice of multi-fields used to the code that adds the handler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you can implement it in the "main" module it's fine. geo_shape is special because it relies on test helpers defined inside geo module.

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion to handle includePluginTypesInMultiFields in a cleaner way.

boolean fullyDynamicMapping,
List<PredefinedField> predefinedFields
List<PredefinedField> predefinedFields,
boolean includePluginTypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. We already use geo_shape and shape in DataGenerationHelper that are defined in plugins. Can you replicate that?

This is the configuration in DataGenerationHelper and you need to add a Gradle dependency on the plugin.

.withDataSourceHandlers(List.of(new GeoShapeDataSourceHandler(), new ShapeDataSourceHandler()))

};
}

private Map<String, Object> stringSubField(DataSourceRequest.LeafMappingParametersGenerator request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to do some adjustments here to handle plugins. Instead of directly calling matchOnlyTextMapping(..) you need to get it from DataSource. Take a look at TextFieldWithParentBlockLoaderTests.

@parkertimmins parkertimmins requested a review from lkts August 14, 2025 14:09

// Need to delegate creation of the same type of field to other handlers. So skip request
// if it's for the placeholder name used when creating the child and parent fields.
if (request.fieldName().equals(PLACEHOLDER_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easy to implement a reserved prefix, you need to update generateFieldName() in DefaultObjectGenerationHandler and that's it i believe. Should we do that?

private static Map<String, Object> getChildMappingForType(FieldType type, DataSourceRequest.LeafMappingParametersGenerator request) {
Map<String, Object> mapping = getMappingForType(type, request);
if (type == FieldType.KEYWORD) {
mapping.remove("copy_to");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't remember why this is here but i don't see a reason why this should be done only for keywords.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I guess it should probably be done for other types too 🤔

@parkertimmins parkertimmins merged commit 8ca0947 into elastic:main Aug 14, 2025
33 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 15, 2025
* upstream/main: (32 commits)
  Speed up loading keyword fields with index sorts (elastic#132950)
  Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964
  Simplify EsqlSession (elastic#132848)
  Implement WriteLoadConstraintDecider#canAllocate (elastic#132041)
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965
  Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941)
  Breakdown undesired allocations by shard routing role (elastic#132235)
  Implement v_magnitude function (elastic#132765)
  Introduce execution location marker for better handling of remote/local compatibility (elastic#132205)
  Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957
  Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946)
  Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952)
  Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956
  Fix failing UT by adding a required capability (elastic#132947)
  Precompute the BitsetCacheKey hashCode (elastic#132875)
  Adding simulate ingest effective mapping (elastic#132833)
  Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948
  Rename skipping logic to remove hard link to skip_unavailable (elastic#132861)
  Store ignored source in unique stored fields per entry (elastic#132142)
  Add random tests with match_only_text multi-field (elastic#132380)
  ...
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Aug 15, 2025
Add tests to the logsdb randomized tests which use multi-fields for each of the text, keyword, match_only_text, and wildcard types. For each type allow each other type of multi-field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:StorageEngine/Mapping The storage related side of mappings Team:StorageEngine >test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants