Skip to content

Conversation

@jan-elastic
Copy link
Contributor

@jan-elastic jan-elastic commented Jul 11, 2025

The categorizate text agg has some configuration options. ES|QL categorize does not.

This PR adds them with syntax comparable to options of the ES|QL match function.

The exposed options are:

  • analyzer
  • similarity threshold
  • output format (regex (default) or space-seperated tokens)

Furthermore, the options functionality of match is refactored to make it reusable.

@jan-elastic jan-elastic added >feature :ml Machine learning Team:ML Meta label for the ML team v9.2.0 v8.20.0 labels Jul 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 0faee91 to b308397 Compare July 11, 2025 14:43
@jan-elastic jan-elastic marked this pull request as draft July 11, 2025 14:44
@jan-elastic jan-elastic force-pushed the esql-categorize-options branch 4 times, most recently from ddf2f1f to 5ed9dfa Compare July 14, 2025 14:31
@jan-elastic jan-elastic marked this pull request as ready for review July 14, 2025 18:01
@elasticsearchmachine
Copy link
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic requested review from alex-spies and removed request for alex-spies July 15, 2025 07:04
Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good!

}
}
return new CategorizeDef(
(String) optionsMap.get("analyzer"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we validate the analyzer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of. You'd need the AnalysisRegistry here.

During execution, that comes via the EsPhysicalOperationProviders from the SearchService. I don't see how to obtain something similar at this stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! As it's checked now at planning time, no problem then

@ghudgins ghudgins removed the v8.20.0 label Jul 15, 2025
@ghudgins
Copy link

@jan-elastic FYI there's no v8.20.0 as of now. I removed your label

@jan-elastic
Copy link
Contributor Author

@ghudgins does that mean we're not backporting new functionality anymore? Just bugfixes to 8.19.x?

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from a154ede to 5300744 Compare July 16, 2025 07:39
@jan-elastic jan-elastic requested a review from ivancea July 16, 2025 07:46
@jan-elastic
Copy link
Contributor Author

@ivancea Thanks for the thorough review. Fixed all you comments. PTAL

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 5300744 to 6248657 Compare July 16, 2025 14:34
@jan-elastic
Copy link
Contributor Author

@bpintea Thanks for your review. Fixed all your comments

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 6248657 to db90b33 Compare July 17, 2025 06:41
@jan-elastic jan-elastic enabled auto-merge (squash) July 17, 2025 06:43
@jan-elastic jan-elastic merged commit ec7f77b into elastic:main Jul 17, 2025
33 checks passed
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 17, 2025
* ES|QL categorize options

* refactor options

* fix serialization

* polish

* add verfications

* better test coverage + polish code

* better test coverage + polish code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants