Add sub-field support to flattened field type#144451
Add sub-field support to flattened field type#144451parkertimmins merged 44 commits intoelastic:mainfrom
Conversation
Allow specific keys within a flattened field to be mapped as typed sub-fields (keyword, ip, etc.) via a new "properties" mapping attribute. Mapped keys are indexed exclusively through their sub-field mapper and excluded from the flattened field's root/keyed representation. Made-with: Cursor
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Vale Linting ResultsSummary: 1 suggestion found 💡 Suggestions (1)
The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
Tighten field count assertions to exact values, add test for mapped properties matched via nested object notation, simplify serialization roundtrip test, and fix minor doc formatting. Made-with: Cursor
Made-with: Cursor # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldMapper.java # server/src/main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldParser.java # server/src/test/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldParserTests.java
Add search tests covering term queries, sorting, aggregations, doc value field loading, and numeric range queries on mapped sub-fields within flattened fields. Made-with: Cursor
Extend FlattenedDocValuesSyntheticFieldLoader to compose mapped property loaders alongside the flattened field's own keyed doc values, so mapped properties are included in synthetic source. Made-with: Cursor
Register mapper.flattened.mapped_properties cluster feature and add REST tests covering queries, sorting, aggregations, synthetic source, and mapping serialization for mapped properties. Made-with: Cursor
Reset docValues to NO_VALUES in the binary doc values branch of FlattenedDocValuesSyntheticFieldLoader when no binary DVs exist for a segment, preventing stale state from a prior segment. Forward null tokens to mapped property mappers in FlattenedFieldParser so sub-field null_value handling works independently of the parent flattened field's null_value. Made-with: Cursor
Reject copy_to and multi_fields on flattened mapped properties since multi-fields are silently un-queryable and copy_to is inconsistent with the parent flattened field's restrictions. Add tests for multi-value arrays, null value forwarding, exists queries, property preservation on merge, additional types, empty strings, ignore_above/depth_limit interaction, cross-index queries, synthetic source ordering, and all disallowed types. Made-with: Cursor
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/340_flattened.yml
Show resolved
Hide resolved
The RootFlattenedDocValuesBlockLoader was not passing mapped property loaders through to FlattenedDocValuesSyntheticFieldLoader, causing ES|QL block loading to omit all mapped property values from the resulting JSON. Also fixed writeToBlock to use hasValue() so it does not emit null when only mapped properties have data. Made-with: Cursor
|
Hi @parkertimmins, I've created a changelog YAML for you. |
Replace LinkedHashMap with TreeMap for propertyBuilders and mappedProperties, and use unmodifiableSortedMap to preserve sort order through Map.copyOf. This removes the need to wrap in new TreeMap<>() at each usage site. Made-with: Cursor
Made-with: Cursor
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
The test relied on unsorted search returning results in index sort order, which fails when CCS searches a replica that is still being peer-recovered from the primary. Made-with: Cursor
Flattened type cannot be used as a multi-field since the feature branch changed its TypeParser from FieldMapper.TypeParser to Mapper.TypeParser. Use object properties instead. Made-with: Cursor
Made-with: Cursor
| # [foo] is flattened in index5 | ||
| # [bar] is keyword in index5 | ||
| # [bar].[baz] is flattened in index5 | ||
| # [bar.baz] is flattened in index5 |
There was a problem hiding this comment.
Not super necessary but I think I'd be nice to tell immediately, just by looking at comments like these, if foo.bar was itself the field name or bar is a field under foo ([foo].[bar]). Seems like it's still the latter case in index5.
There was a problem hiding this comment.
@mouhc1ne Thanks for the review. Unfortunately, we ended up needing the keep the NOOP behavior that currently exists, because changing it would be a breaking change. Martijn pointed out that, because the flattened mapping is serialized, not being backwards compatible here would cause failures during a shard upgrade.
|
Just fyi that you might need to pull in #144741 in case you run into CI errors in |
The feature branch changed FlattenedFieldMapper.PARSER from a FieldMapper.TypeParser to a Mapper.TypeParser to handle the new properties key. This broke the instanceof check in TypeParsers.parseMultiField, rejecting flattened as a multi-field. Fix by subclassing FieldMapper.TypeParser instead: extract properties from the node before delegating to super.parse(). This preserves backwards compatibility while supporting the new properties parsing. Made-with: Cursor
| }); | ||
| } | ||
|
|
||
| public void testBlockLoaderWithMappedPropertiesOnly() throws IOException { |
There was a problem hiding this comment.
From what I can tell, these two blockloader tests are testing the root blockloader. We should probably add a test for the KeyedFlattenedDocValuesBlockLoader too.
There was a problem hiding this comment.
Good point, I've added a test here: https://github.com/elastic/elasticsearch/pull/144451/changes#diff-7113916e741652f4a9e73b8cf5573b3af612664e2c54e22cbc697b49468a8886R1760
This brings up an important question. Should the mapped values be included in the KeyedFlattenedDocValuesBlockLoader? Currently they are not. They have to be obtained through a separate, and correctly typed, blockloader. I can also see the argument that they should included, though this would involve casting everything to strings. For this reason, I'm inclined to say they should be left out.
@jordan-powers How will this choice effect query time behavior ES|QL. @martijnvg Any thoughts?
There was a problem hiding this comment.
Good question. Not really related, but synthetic source does include the values of the mapped sub fields. The root block loader's purpose is to include all the content of a flattened field. So I think the root block loader should also include the content of mapped sub fields.
Maybe we can quickly implement this by falling back to the source based block loader if there are mapped sub fields (in RootFlattenedFieldType#blockLoader(...))? Which should do the right thing, given that synthetic source already has this behaviour.
Then in a follow up we can improve RootFlattenedDocValuesBlockLoader?
There was a problem hiding this comment.
I misunderstood the question :) Never mind my previous comment.
Should the mapped values be included in the KeyedFlattenedDocValuesBlockLoader?
No, these fields have their own block loader via their mapped sub field.
There was a problem hiding this comment.
How will this choice affect query time behavior in ES|QL?
So long as the sub-field mappers can be resolved by FieldTypeLookup#get, then the fields will be loaded with the correct blockloaders.
This reverts commit d2313f1.
Switch PARSER from a manual Mapper.TypeParser to FieldMapper.TypeParser via createTypeParserWithLegacySupport, allowing flattened fields as multi-fields for bwc. The "properties" field is now a Parameter<Map<String, Builder>> with parsing handled by the standard parameter framework. Made-with: Cursor
Use explicit sort: _doc to ensure deterministic result order after force merge, avoiding reliance on undefined tiebreaker behavior with equal _score values. Made-with: Cursor
The flattened field type indexes all leaf values as untyped keywords, preventing type-aware operations (range queries, date math, numeric aggregations) on individual keys. Users needing typed behavior must switch the entire object to object type/
A new optional properties parameter lets users declare specific paths with real leaf field types while leaving the rest on the default untyped flattened path.
At index time, values matching a mapped property are delegated to that sub-field's mapper and excluded from the root/keyed flattened fields. Unmapped keys continue to behave as normal flattened keywords. Allowed sub-field types: keyword, constant_keyword, wildcard, text, long, integer, short, byte, double, float, half_float, scaled_float, unsigned_long, date, date_nanos, boolean, ip. Supported operations: typed search, sort (including index sort), aggregations, ESQL block loading, and synthetic _source. Restrictions: copy_to and fields (multi-fields) are disallowed on mapped properties. Only leaf types from the allow-list are permitted.
Made-with: Cursor