Skip to content

Conversation

@likzn
Copy link
Contributor

@likzn likzn commented Aug 4, 2022

Main Change

Nowtime, when we create custom analyzer with some not required params("foo":"nar"), it will succeed. This PR will make check with these parms to ensure these were our need.

Close: #85710

PUT customanlyzerindex
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "foo": "bar"
                }
            }
        }
    }
}

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v8.5.0 labels Aug 4, 2022
@likzn
Copy link
Contributor Author

likzn commented Aug 4, 2022

@javanna @romseygeek @jtibshirani Hi, PTAL if free~

@nik9000 nik9000 added :Search Relevance/Analysis How text is split into tokens team-discuss and removed needs:triage Requires assignment of a team area label labels Aug 4, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 4, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@likzn
Copy link
Contributor Author

likzn commented Aug 12, 2022

Hi, can someone take a look about it

@likzn
Copy link
Contributor Author

likzn commented Aug 20, 2022

@javanna Hi, PTAL

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments, the approach looks good to me. I am going to run tests and edit the title which is going to go in the changelog entry.

for (String key : analyzerSettings.keySet()) {
switch (key) {
case "tokenizer", "char_filter", "filter", "type", "position_increment_gap" -> {}
default -> throw new IllegalArgumentException("Custom Analyzer not support [" + key + "] now");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rephrase the error to "Custom analyzer [" + name + "] does not support [" + key + "]" ?

tokenizer: standard
filter: [lowercase]
search:
rest_total_hits_as_int: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this change necessary?


- match: { status: 400 }
- match: { error.type: illegal_argument_exception }
- match: { error.reason: "Custom Analyzer not support [foo] now" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ thanks for adding this test!

) {
for (String key : analyzerSettings.keySet()) {
switch (key) {
case "tokenizer", "char_filter", "filter", "type", "position_increment_gap" -> {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way of doing this, without deduplicating the expected analyzer keys, would be to duplicate the settings into a mutable map, which we'd remove from every time we read a certain setting. At the end, if any item is left in the map it means that some unsupported param has been provided.

try {
createComponents("my_analyzer", analyzerSettings, testAnalysis.tokenizer, testAnalysis.charFilter, testAnalysis.tokenFilter);
fail("expected failure");
} catch (IllegalArgumentException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use expectThrows instead .

@BeforeClass
public static void setup() throws IOException {
testAnalysis = createTestAnalysis(new Index("test", "_na_"), settings);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this as before class? It looks like it's only used in the new method that you introduced?

assertEquals(Arrays.asList("hello", "world"), wordList);
}

public void testCustomAnalyzerWithNotSupportKey() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCustomAnalyzerWithUnsupportedKey ?

@javanna javanna self-assigned this Sep 9, 2022
@javanna javanna added the v8.5.0 label Sep 9, 2022
@javanna
Copy link
Member

javanna commented Sep 9, 2022

@elasticsearchmachine test this please

@javanna javanna changed the title Make param check when create custom analyzer Custom analyzer to reject unknown parameters Sep 9, 2022
area: Search/Analysis
type: enhancement
issues:
- 85710
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changelog generation is automatic in most situations, no need to create the entry yourself. And it gets updated as the labels and title of the PR get updated. Could you remove the changelog from your PR, then it should get automatically created with the expected area and all other fields. The current build errors are around the wrong area.

final Map<String, CharFilterFactory> charFilters,
final Map<String, TokenFilterFactory> tokenFilters
) {
for (String key : analyzerSettings.keySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one additional suggestion: this is technically a breaking change, as existing indices will not be loaded if they have have an unknown param in the definition of a custom analyzer. We could mitigate this by making the new behaviour depend on the index created version, so that we'd throw error only for newly created indices. We'd need to have the index created version propagated from AnalysisRegistry though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I am late. But i find the method of createComponents is stateless. How can we propagate the index created version to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I got it.

@likzn
Copy link
Contributor Author

likzn commented Sep 18, 2022

@javanna Hi, i resolved all comments. PTAL~

@likzn likzn requested a review from javanna September 20, 2022 02:19
@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@quux00 quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@mattc58 mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@benwtrent
Copy link
Member

@javanna I am not sure we should actually make this change.

All analyzer builders accept unknown parameters without throwing, and it has been that way since the start. Seems like an expensive and broad ranging breaking change that might not be worth the squeeze.

@javanna javanna removed their assignment Aug 12, 2025
@javanna
Copy link
Member

javanna commented Aug 12, 2025

I agree that it's a breaking change. I agree it is rather late to fix it. It does feel like something we'll want to fix though at some point, in that incorrect configuration is being silently accepted, as opposed to how Elasticsearch works in many other places. Ideally we'd promptly fail instead and provide feedback to users.

@benwtrent
Copy link
Member

@javanna the real fix is to implement something like:

#41299 (comment)

And provide a deprecation warning on unknown parameters and deprecate the old ctor for custom analyzers that do not provide a "known settings" value.

I am closing this PR as it is very far away from a good path forward.

New work can be reopened through the plan laid out in the linked issue.

@benwtrent benwtrent closed this Aug 12, 2025
@javanna
Copy link
Member

javanna commented Aug 13, 2025

Sounds good to me @benwtrent thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch team-discuss v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create analyzer accepting any random param and returning that in response as well