Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/89112.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 89112
summary: Make param check when create custom analyzer
area: Search/Analysis
type: enhancement
issues:
- 85710
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changelog generation is automatic in most situations, no need to create the entry yourself. And it gets updated as the labels and title of the PR get updated. Could you remove the changelog from your PR, then it should get automatically created with the expected area and all other fields. The current build errors are around the wrong area.

Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,22 @@
- match: { detail.tokenizer.tokens.0.token: ABc }
- match: { detail.tokenfilters.0.name: lowercase }
- match: { detail.tokenfilters.0.tokens.0.token: abc }

---
"Custom analyzer with not supported param in request":
- do:
catch: bad_request
indices.create:
index: test
body:
settings:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: standard
foo: bar

- match: { status: 400 }
- match: { error.type: illegal_argument_exception }
- match: { error.reason: "Custom Analyzer not support [foo] now" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ thanks for adding this test!

Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
tokenizer: standard
filter: [lowercase]
search:
rest_total_hits_as_int: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this change necessary?

tokenizer: standard
filter: [lowercase, keyword_repeat, porter_stem, unique_stem]
filter:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ static AnalyzerComponents createComponents(
final Map<String, CharFilterFactory> charFilters,
final Map<String, TokenFilterFactory> tokenFilters
) {
for (String key : analyzerSettings.keySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one additional suggestion: this is technically a breaking change, as existing indices will not be loaded if they have have an unknown param in the definition of a custom analyzer. We could mitigate this by making the new behaviour depend on the index created version, so that we'd throw error only for newly created indices. We'd need to have the index created version propagated from AnalysisRegistry though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I am late. But i find the method of createComponents is stateless. How can we propagate the index created version to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I got it.

switch (key) {
case "tokenizer", "char_filter", "filter", "type", "position_increment_gap" -> {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way of doing this, without deduplicating the expected analyzer keys, would be to duplicate the settings into a mutable map, which we'd remove from every time we read a certain setting. At the end, if any item is left in the map it means that some unsupported param has been provided.

default -> throw new IllegalArgumentException("Custom Analyzer not support [" + key + "] now");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rephrase the error to "Custom analyzer [" + name + "] does not support [" + key + "]" ?

}
}
String tokenizerName = analyzerSettings.get("tokenizer");
if (tokenizerName == null) {
throw new IllegalArgumentException("Custom Analyzer [" + name + "] must be configured with a tokenizer");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,14 @@
package org.elasticsearch.index.analysis;

import org.apache.lucene.analysis.CharArraySet;
import org.elasticsearch.Version;
import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.env.TestEnvironment;
import org.elasticsearch.index.Index;
import org.elasticsearch.test.ESTestCase;
import org.junit.BeforeClass;

import java.io.BufferedWriter;
import java.io.FileNotFoundException;
Expand All @@ -27,9 +31,18 @@
import java.util.Arrays;
import java.util.List;

import static org.elasticsearch.index.analysis.AnalyzerComponents.createComponents;
import static org.hamcrest.Matchers.is;

public class AnalysisTests extends ESTestCase {
private static TestAnalysis testAnalysis;
private static Settings settings = Settings.builder().put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT).build();

@BeforeClass
public static void setup() throws IOException {
testAnalysis = createTestAnalysis(new Index("test", "_na_"), settings);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this as before class? It looks like it's only used in the new method that you introduced?


public void testParseStemExclusion() {
/* Comma separated list */
Settings settings = Settings.builder().put("stem_exclusion", "foo,bar").build();
Expand Down Expand Up @@ -103,4 +116,15 @@ public void testParseWordList() throws IOException {
List<String> wordList = Analysis.getWordList(env, nodeSettings, "foo.bar");
assertEquals(Arrays.asList("hello", "world"), wordList);
}

public void testCustomAnalyzerWithNotSupportKey() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCustomAnalyzerWithUnsupportedKey ?

Settings analyzerSettings = Settings.builder().put("tokenizer", "standard").put("foo", "bar").put("type", "custom").build();

try {
createComponents("my_analyzer", analyzerSettings, testAnalysis.tokenizer, testAnalysis.charFilter, testAnalysis.tokenFilter);
fail("expected failure");
} catch (IllegalArgumentException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use expectThrows instead .

assertEquals(e.getMessage(), "Custom Analyzer not support [foo] now");
}
}
}