-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Custom analyzer to reject unknown parameters #89112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@javanna @romseygeek @jtibshirani Hi, PTAL if free~ |
|
Pinging @elastic/es-search (Team:Search) |
|
Hi, can someone take a look about it |
|
@javanna Hi, PTAL |
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, the approach looks good to me. I am going to run tests and edit the title which is going to go in the changelog entry.
| for (String key : analyzerSettings.keySet()) { | ||
| switch (key) { | ||
| case "tokenizer", "char_filter", "filter", "type", "position_increment_gap" -> {} | ||
| default -> throw new IllegalArgumentException("Custom Analyzer not support [" + key + "] now"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rephrase the error to "Custom analyzer [" + name + "] does not support [" + key + "]" ?
| tokenizer: standard | ||
| filter: [lowercase] | ||
| search: | ||
| rest_total_hits_as_int: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this change necessary?
|
|
||
| - match: { status: 400 } | ||
| - match: { error.type: illegal_argument_exception } | ||
| - match: { error.reason: "Custom Analyzer not support [foo] now" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ thanks for adding this test!
| ) { | ||
| for (String key : analyzerSettings.keySet()) { | ||
| switch (key) { | ||
| case "tokenizer", "char_filter", "filter", "type", "position_increment_gap" -> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another way of doing this, without deduplicating the expected analyzer keys, would be to duplicate the settings into a mutable map, which we'd remove from every time we read a certain setting. At the end, if any item is left in the map it means that some unsupported param has been provided.
| try { | ||
| createComponents("my_analyzer", analyzerSettings, testAnalysis.tokenizer, testAnalysis.charFilter, testAnalysis.tokenFilter); | ||
| fail("expected failure"); | ||
| } catch (IllegalArgumentException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use expectThrows instead .
| @BeforeClass | ||
| public static void setup() throws IOException { | ||
| testAnalysis = createTestAnalysis(new Index("test", "_na_"), settings); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this as before class? It looks like it's only used in the new method that you introduced?
| assertEquals(Arrays.asList("hello", "world"), wordList); | ||
| } | ||
|
|
||
| public void testCustomAnalyzerWithNotSupportKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testCustomAnalyzerWithUnsupportedKey ?
|
@elasticsearchmachine test this please |
custom analyzer
docs/changelog/89112.yaml
Outdated
| area: Search/Analysis | ||
| type: enhancement | ||
| issues: | ||
| - 85710 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changelog generation is automatic in most situations, no need to create the entry yourself. And it gets updated as the labels and title of the PR get updated. Could you remove the changelog from your PR, then it should get automatically created with the expected area and all other fields. The current build errors are around the wrong area.
| final Map<String, CharFilterFactory> charFilters, | ||
| final Map<String, TokenFilterFactory> tokenFilters | ||
| ) { | ||
| for (String key : analyzerSettings.keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one additional suggestion: this is technically a breaking change, as existing indices will not be loaded if they have have an unknown param in the definition of a custom analyzer. We could mitigate this by making the new behaviour depend on the index created version, so that we'd throw error only for newly created indices. We'd need to have the index created version propagated from AnalysisRegistry though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I am late. But i find the method of createComponents is stateless. How can we propagate the index created version to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I got it.
|
@javanna Hi, i resolved all comments. PTAL~ |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
@javanna I am not sure we should actually make this change. All analyzer builders accept unknown parameters without throwing, and it has been that way since the start. Seems like an expensive and broad ranging breaking change that might not be worth the squeeze. |
|
I agree that it's a breaking change. I agree it is rather late to fix it. It does feel like something we'll want to fix though at some point, in that incorrect configuration is being silently accepted, as opposed to how Elasticsearch works in many other places. Ideally we'd promptly fail instead and provide feedback to users. |
|
@javanna the real fix is to implement something like: And provide a deprecation warning on unknown parameters and deprecate the old ctor for custom analyzers that do not provide a "known settings" value. I am closing this PR as it is very far away from a good path forward. New work can be reopened through the plan laid out in the linked issue. |
|
Sounds good to me @benwtrent thanks |
Main Change
Nowtime, when we create custom analyzer with some not required params("foo":"nar"), it will succeed. This PR will make check with these parms to ensure these were our need.
Close: #85710