Skip to content

Conversation

samxbr
Copy link
Contributor

@samxbr samxbr commented Aug 28, 2024

Adds an example for creating a plugin with a simple custom ingest processor. The example processor repeats the value of an expected filed in a document, or ignores it if the expected field does not exist.

Closes #111539

Testing

Added YAML REST tests

Adds an example for creating a plugin with a simple custom ingest
processor. The example processor repeats the value of an expected filed
in a document, or ignores it if the expected field does not exist.

Closes elastic#111539
@samxbr samxbr requested a review from a team as a code owner August 28, 2024 14:01
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added v8.16.0 needs:triage Requires assignment of a team area label labels Aug 28, 2024
@samxbr samxbr added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Aug 28, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Data Management Meta label for data/management team labels Aug 28, 2024
@samxbr samxbr added Team:Data Management Meta label for data/management team >enhancement and removed needs:triage Requires assignment of a team area label v8.16.0 labels Aug 28, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Data Management Meta label for data/management team labels Aug 28, 2024
@samxbr samxbr requested review from jbaiera and a team August 28, 2024 14:05
@samxbr samxbr added v8.16.0 Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Aug 28, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Data Management Meta label for data/management team labels Aug 28, 2024
@samxbr samxbr marked this pull request as draft August 28, 2024 14:10
@samxbr samxbr added the Team:Data Management Meta label for data/management team label Aug 28, 2024
@samxbr samxbr removed the request for review from a team August 28, 2024 14:12
@samxbr samxbr removed the needs:triage Requires assignment of a team area label label Aug 28, 2024
*/
public class ExampleRepeatProcessor extends AbstractProcessor {
public static final String TYPE = "repeat";
public static final String FILED_TO_REPEAT = "toRepeat";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/FILED/FIELD/g

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'd do s/toRepeat/to_repeat/g, too, we don't use a lot of camelCaseLikeThis in the Elasticsearch APIs, we mostly use snake_case_like_this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense for the snake case. By s/toRepeat/to_repeat/g do you mean using regex for the field name, is there a method available for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using it as a shorthand for "please run this regex with your mind and do a global search and replace on this pull request".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific s/foo/bar/g syntax is from how I typically use sed:

$ echo "Jim was here and used toRepeat, Ted also used toRepeat." | sed -e 's/toRepeat/to_repeat/g'
Jim was here and used to_repeat, Ted also used to_repeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thanks for the explanation.

@samxbr samxbr marked this pull request as ready for review August 28, 2024 16:23
@samxbr samxbr requested a review from a team August 28, 2024 16:23
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 28, 2024
@Override
public Map<String, Processor.Factory> getProcessors(Processor.Parameters parameters) {
return Map.ofEntries(
entry(ExampleRepeatProcessor.TYPE, new ExampleRepeatProcessor.Factory())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you'd be happier with just a Map.of here. It's slightly simpler.

@samxbr samxbr added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Aug 28, 2024
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Aug 28, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @samxbr, I've created a changelog YAML for you.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick idea on how we could demonstrate a little more about creating processors.

*/
public class ExampleRepeatProcessor extends AbstractProcessor {
public static final String TYPE = "repeat";
public static final String FILED_TO_REPEAT = "to_repeat";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FILED_TO_REPEAT still needs changed to FIELD_TO_REPEAT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this

String description,
Map<String, Object> config
) {
return new ExampleRepeatProcessor(tag, description);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth demonstrating how a user can pass properties into the processor here. For instance, most ES processors would have you specify which field name on a document to repeat. Maybe making a change so that the processor definition accepts a configuration parameter called field in the definition that will later be used to identify which document field name to repeat instead of having a single property name that is expected by the processor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about demonstrating passing in field name, and supporting Mustache template for the field name as well. But then I thought maybe it's better to keep this to minimum required for a processor plugin to avoid confusion.

I am thinking we keep this example simple, and add a reference link to an existing processor (like Set) that demonstrates passing in the field name and other features like Mustache.

What do you think? I am open to either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to involve using mustache in the example plugin, but using some values from the config object to show how they might be organized would be helpful.

@samxbr
Copy link
Contributor Author

samxbr commented Sep 4, 2024

@elasticmachine test this please

@samxbr
Copy link
Contributor Author

samxbr commented Sep 4, 2024

@elasticmachine update branch

@samxbr samxbr requested review from jbaiera and joegallo September 4, 2024 19:52
"Does not process document without field":
# create ingest pipeline with custom processor
- do:
ingest.put_pipeline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend putting this in a setup: block so that you don't have to repeat yourself. Also you should probably remove it in a teardown:.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

230_change_target_index.yml has an example of those two blocks in action.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left one last comment, should be good to merge once all comments are resolved

String description,
Map<String, Object> config
) {
String field = ConfigurationUtils.readStringProperty(TYPE, tag, config, "field");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move "field" up to the top of the class into a public static final String just to avoid magic strings in the code.

/**
* {@link ExampleProcessorClientYamlTestSuiteIT} executes the plugin's REST API integration tests.
* <p>
* The tests can be executed using the command: ./gradlew :example-plugins:processor:yamlRestTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment actually true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, this was carelessly copied from other examples. It's probably worth fixing this for other examples as well, I will double check them and fix if necessary in another PR. Want to keep this PR self-contained.

@samxbr
Copy link
Contributor Author

samxbr commented Sep 5, 2024

@elasticmachine update branch

@samxbr samxbr requested a review from joegallo September 6, 2024 14:26
@samxbr samxbr merged commit 7cd6de7 into elastic:main Sep 6, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an example plugin that includes a custom ingest processor
5 participants