-
Couldn't load subscription status.
- Fork 25.6k
Add template_id to patterned-text type #131401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add template_id to patterned-text type #131401
Conversation
| final IndexMode indexMode, | ||
| final String fullFieldName | ||
| ) { | ||
| if (requireDocValueSkippers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to have both useDocValuesSkipper and requireDocValuesSkipper fields. But I don't think it's a good idea to use the name-based way to deciding whether or not to use docValueSkippers. It seems like we should just enforce that they are always used for KeywordFields that are created for the purpose of being a templateId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads more like enableDocValuesSkipper and useDocValuesSkipper. Can we clean up the logic to apply this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline about changing useDocValuesSkipper to enableDocValuesSkipper and requireDocValuesSkipper to forceDocValuesSkipper. It's still a bit weird to have two such parameters, but one is needed to generally enable the use of doc values skippers, and another it needed for types (such as templateId) which are certain they want to use skippers, and don't require and name-based logic to decide (like host.name). With future changes to skippers, we can probably clean this up some.
| Integer.MAX_VALUE, | ||
| indexCreatedVersion, | ||
| IndexMode.LOGSDB, | ||
| null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be wiring a IndexSortConfig through here? I'm still a bit confused about how we want to control sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No index sorting will be explicitly configured for now, see:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find it weird that it's propagated through this class.. Looking at the uses, I see shouldUseDocValuesSkipper that's just inappropriate - this logic belongs to the LogsdbIndexModeSettingsProvider that can inject a setting to enable skiplists when appropriate.
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
Outdated
Show resolved
Hide resolved
| mappingParserContext.getIndexSettings().getIndexVersionCreated(), | ||
| mappingParserContext.getIndexSettings().getMode(), | ||
| mappingParserContext.getIndexSettings().getIndexSortConfig(), | ||
| USE_DOC_VALUES_SKIPPER.get(mappingParserContext.getSettings()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this, or use regular indexing when this is not set? Maybe something to discuss with @martijnvg when he's back.
.../src/main/java/org/elasticsearch/xpack/logsdb/patternedtext/PatternedTextValueProcessor.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/xpack/logsdb/patternedtext/PatternedTextValueProcessorTests.java
Outdated
Show resolved
Hide resolved
| search: | ||
| index: test | ||
| body: | ||
| docvalue_fields: [ "foo.template_id" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test that it can be used in index sort config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, some minor comments.
|
Fixes #128937 |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
For patterned-text mapper
foo, add a sub-field calledfoo.template_id. This is the an 8-byte hash of the template doc_value column. Unlike the template,template_idis accessible and can be used for querying, aggregations, etc. Thetemplate_idis stored as a KeywordField and can use any features of the KeyworkFieldType.template_idhas doc_values, and is not stored or indexed. It uses doc value skippers, which should be quite fast given that the index will be sorted on template.