-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Adds example plugin for custom ingest processor #112282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds example plugin for custom ingest processor #112282
Conversation
Adds an example for creating a plugin with a simple custom ingest processor. The example processor repeats the value of an expected filed in a document, or ignores it if the expected field does not exist. Closes elastic#111539
Documentation preview: |
*/ | ||
public class ExampleRepeatProcessor extends AbstractProcessor { | ||
public static final String TYPE = "repeat"; | ||
public static final String FILED_TO_REPEAT = "toRepeat"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/FILED/FIELD/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'd do s/toRepeat/to_repeat/g
, too, we don't use a lot of camelCaseLikeThis in the Elasticsearch APIs, we mostly use snake_case_like_this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense for the snake case. By s/toRepeat/to_repeat/g
do you mean using regex for the field name, is there a method available for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it as a shorthand for "please run this regex with your mind and do a global search and replace on this pull request".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific s/foo/bar/g
syntax is from how I typically use sed
:
$ echo "Jim was here and used toRepeat, Ted also used toRepeat." | sed -e 's/toRepeat/to_repeat/g'
Jim was here and used to_repeat, Ted also used to_repeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks for the explanation.
@Override | ||
public Map<String, Processor.Factory> getProcessors(Processor.Parameters parameters) { | ||
return Map.ofEntries( | ||
entry(ExampleRepeatProcessor.TYPE, new ExampleRepeatProcessor.Factory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you'd be happier with just a Map.of
here. It's slightly simpler.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @samxbr, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick idea on how we could demonstrate a little more about creating processors.
*/ | ||
public class ExampleRepeatProcessor extends AbstractProcessor { | ||
public static final String TYPE = "repeat"; | ||
public static final String FILED_TO_REPEAT = "to_repeat"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FILED_TO_REPEAT
still needs changed to FIELD_TO_REPEAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed this
String description, | ||
Map<String, Object> config | ||
) { | ||
return new ExampleRepeatProcessor(tag, description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth demonstrating how a user can pass properties into the processor here. For instance, most ES processors would have you specify which field name on a document to repeat. Maybe making a change so that the processor definition accepts a configuration parameter called field
in the definition that will later be used to identify which document field name to repeat instead of having a single property name that is expected by the processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about demonstrating passing in field name, and supporting Mustache template for the field name as well. But then I thought maybe it's better to keep this to minimum required for a processor plugin to avoid confusion.
I am thinking we keep this example simple, and add a reference link to an existing processor (like Set) that demonstrates passing in the field name and other features like Mustache.
What do you think? I am open to either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to involve using mustache in the example plugin, but using some values from the config object to show how they might be organized would be helpful.
@elasticmachine test this please |
@elasticmachine update branch |
"Does not process document without field": | ||
# create ingest pipeline with custom processor | ||
- do: | ||
ingest.put_pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend putting this in a setup:
block so that you don't have to repeat yourself. Also you should probably remove it in a teardown:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
230_change_target_index.yml
has an example of those two blocks in action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left one last comment, should be good to merge once all comments are resolved
String description, | ||
Map<String, Object> config | ||
) { | ||
String field = ConfigurationUtils.readStringProperty(TYPE, tag, config, "field"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move "field"
up to the top of the class into a public static final String
just to avoid magic strings in the code.
/** | ||
* {@link ExampleProcessorClientYamlTestSuiteIT} executes the plugin's REST API integration tests. | ||
* <p> | ||
* The tests can be executed using the command: ./gradlew :example-plugins:processor:yamlRestTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment actually true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, this was carelessly copied from other examples. It's probably worth fixing this for other examples as well, I will double check them and fix if necessary in another PR. Want to keep this PR self-contained.
@elasticmachine update branch |
Adds an example for creating a plugin with a simple custom ingest processor. The example processor repeats the value of an expected filed in a document, or ignores it if the expected field does not exist.
Closes #111539
Testing
Added YAML REST tests