-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add random tests with match_only_text multi-field #132380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add random tests with match_only_text multi-field #132380
Conversation
The updates in the PR should have caught the bug fixed in #131314 and #131383. To verify that this is the case, used the patch below which reverts the relevant code to the pre-fix state. Running:
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work - LGTM 👍
I think the >test
label is more appropriate here?
Ideally, wildcard would be tested as a subfield. But data generation code is used in server, and wildcard type is in xpack. With some better wiring this could be fixed, but that will have to wait for the future.
boolean fullyDynamicMapping, | ||
List<PredefinedField> predefinedFields | ||
List<PredefinedField> predefinedFields, | ||
boolean includePluginTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lkts I added this because BlockLoaderTestCase
tests were failing because I was adding wildcard
and match_only_text
multi-fields. Since these types are defined in plugins they cannot (I think) be added to BlockLoader tests in test
. But I'm not sure this belongs up in the main specification. Since the data generation code provides lots of knobs for overriding behavior, I'm guessing there's a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping. We already use geo_shape
and shape
in DataGenerationHelper
that are defined in plugins. Can you replicate that?
This is the configuration in DataGenerationHelper
and you need to add a Gradle dependency on the plugin.
.withDataSourceHandlers(List.of(new GeoShapeDataSourceHandler(), new ShapeDataSourceHandler()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep match_only_text
in DefaultMappingParametersHandler. counted_keyword
and wildcard
types are from xpack already and don't pose a problem. I instead tried just moving the creation of multi-fields to it's own handler which allows the choice of multi-fields used to the code that adds the handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you can implement it in the "main" module it's fine. geo_shape
is special because it relies on test helpers defined inside geo module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion to handle includePluginTypesInMultiFields
in a cleaner way.
boolean fullyDynamicMapping, | ||
List<PredefinedField> predefinedFields | ||
List<PredefinedField> predefinedFields, | ||
boolean includePluginTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping. We already use geo_shape
and shape
in DataGenerationHelper
that are defined in plugins. Can you replicate that?
This is the configuration in DataGenerationHelper
and you need to add a Gradle dependency on the plugin.
.withDataSourceHandlers(List.of(new GeoShapeDataSourceHandler(), new ShapeDataSourceHandler()))
}; | ||
} | ||
|
||
private Map<String, Object> stringSubField(DataSourceRequest.LeafMappingParametersGenerator request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would have to do some adjustments here to handle plugins. Instead of directly calling matchOnlyTextMapping(..)
you need to get it from DataSource
. Take a look at TextFieldWithParentBlockLoaderTests
.
|
||
// Need to delegate creation of the same type of field to other handlers. So skip request | ||
// if it's for the placeholder name used when creating the child and parent fields. | ||
if (request.fieldName().equals(PLACEHOLDER_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's easy to implement a reserved prefix, you need to update generateFieldName()
in DefaultObjectGenerationHandler
and that's it i believe. Should we do that?
private static Map<String, Object> getChildMappingForType(FieldType type, DataSourceRequest.LeafMappingParametersGenerator request) { | ||
Map<String, Object> mapping = getMappingForType(type, request); | ||
if (type == FieldType.KEYWORD) { | ||
mapping.remove("copy_to"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't remember why this is here but i don't see a reason why this should be done only for keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I guess it should probably be done for other types too 🤔
* upstream/main: (32 commits) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) Add random tests with match_only_text multi-field (elastic#132380) ...
Add tests to the logsdb randomized tests which use multi-fields for each of the text, keyword, match_only_text, and wildcard types. For each type allow each other type of multi-field.
Add tests to the logsdb randomized tests which use multi-fields for each of the
text
,keyword
,match_only_text
, andwildcard
types. For each type allow each other type of multi-field.