-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL categorize with multiple groupings #118173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
|
Hi @jan-elastic, I've created a changelog YAML for you. |
099e143 to
7409695
Compare
7409695 to
35e9811
Compare
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
|
Hi @jan-elastic, I've created a changelog YAML for you. |
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/Page.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
ivancea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👀
Some nits and new tests in comments; the impl looks quite solid to me
...in/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/BlockHash.java
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
| if (id == 0) { | ||
| builder.appendNull(); | ||
| } else { | ||
| builder.appendBytesRef(regexes.getBytesRef(id + idsOffset, scratch)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now: We're repeating, potentially, a lot of bytesref values here. I wonder if there is or it would make sense to have a BytesRefBlock that instead of all the bytesrefs, stores every value just once, and then a reference per position:
AAAAAA
BBBBBBB
AAAAAA
AAAAAA
->
// 1: AAAAAA
// 2: BBBBBBB
1
2
1
1
@nik9000 Something to consider for later? Maybe it's too specific for this. And anyway, the next EVAL or whatever will duplicate the value again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a nice thing to have, but definitely out of scope for this PR.
However, the next EVAL should not duplicate the value again.
If you have:
// 1: AAAAAA
// 2: BBBBBBB
1
2
1
1
then an efficient EVAL x=SUBSTRING(x, 1, 2) should give
// 1: AA
// 2: BB
1
2
1
1
without ever duplicating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that SUBSTRING to not duplicate, we would need to add that "hashtable" strategy in the BytesRefBlockBuilder. It looks goo (?), but I wonder if using that by default could perform negatively in some scenarios. Something to try eventually probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like worth trying in the future. Are you making a note (issue) of this, so that the idea doesn't get lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'll comment it with Nik, just in case it was considered and discarded already, and then I'll document it in an issue somewhere
ivancea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya, this generally looks very good, thank you @jan-elastic !
I only have one observation about potential untracked memory that I think we should ponder at least a bit to ensure we're safe; see below.
...in/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/BlockHash.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait with merging until we had a chance to look at this subtle point.
...va/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHashTests.java
Show resolved
Hide resolved
|
@alex-spies Thanks for the review! Processed all your comments, except the OOM one, which needs more discussion. PTAL |
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jan-elastic !
Let's take care of the memory accounting in a follow-up.
Maybe we want to add a couple more block hash test cases, but otherwise this LGTM and should be unblocked :)
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Outdated
Show resolved
Hide resolved
...va/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHashTests.java
Show resolved
Hide resolved
…egorize-multiple-groupings
...in/java/org/elasticsearch/compute/aggregation/blockhash/CategorizePackedValuesBlockHash.java
Show resolved
Hide resolved
…egorize-multiple-groupings
…egorize-multiple-groupings
💔 Backport failed
You can use sqren/backport to manually backport by running |
* ES|QL categorize with multiple groupings. * Fix VerifierTests * Close stuff when constructing CategorizePackedValuesBlockHash fails * CategorizePackedValuesBlockHashTests * Improve categorize javadocs * Update docs/changelog/118173.yaml * Create CategorizePackedValuesBlockHash's deletegate page differently * Double check in BlockHash builder for single categorize * Reuse blocks array * More CSV tests * Remove assumeTrue categorize_v5 * Rename test * Two more verifier tests * more CSV tests * Add JavaDocs/comments * spotless * Refactor/unify recategorize * Better memory accounting * fix csv test * randomize CategorizePackedValuesBlockHashTests * Add TODO
* ES|QL categorize with multiple groupings. * Fix VerifierTests * Close stuff when constructing CategorizePackedValuesBlockHash fails * CategorizePackedValuesBlockHashTests * Improve categorize javadocs * Update docs/changelog/118173.yaml * Create CategorizePackedValuesBlockHash's deletegate page differently * Double check in BlockHash builder for single categorize * Reuse blocks array * More CSV tests * Remove assumeTrue categorize_v5 * Rename test * Two more verifier tests * more CSV tests * Add JavaDocs/comments * spotless * Refactor/unify recategorize * Better memory accounting * fix csv test * randomize CategorizePackedValuesBlockHashTests * Add TODO Co-authored-by: Elastic Machine <[email protected]>
…8590) * ES|QL categorize with multiple groupings. * Fix VerifierTests * Close stuff when constructing CategorizePackedValuesBlockHash fails * CategorizePackedValuesBlockHashTests * Improve categorize javadocs * Update docs/changelog/118173.yaml * Create CategorizePackedValuesBlockHash's deletegate page differently * Double check in BlockHash builder for single categorize * Reuse blocks array * More CSV tests * Remove assumeTrue categorize_v5 * Rename test * Two more verifier tests * more CSV tests * Add JavaDocs/comments * spotless * Refactor/unify recategorize * Better memory accounting * fix csv test * randomize CategorizePackedValuesBlockHashTests * Add TODO Co-authored-by: Elastic Machine <[email protected]>
…8590) * ES|QL categorize with multiple groupings. * Fix VerifierTests * Close stuff when constructing CategorizePackedValuesBlockHash fails * CategorizePackedValuesBlockHashTests * Improve categorize javadocs * Update docs/changelog/118173.yaml * Create CategorizePackedValuesBlockHash's deletegate page differently * Double check in BlockHash builder for single categorize * Reuse blocks array * More CSV tests * Remove assumeTrue categorize_v5 * Rename test * Two more verifier tests * more CSV tests * Add JavaDocs/comments * spotless * Refactor/unify recategorize * Better memory accounting * fix csv test * randomize CategorizePackedValuesBlockHashTests * Add TODO Co-authored-by: Elastic Machine <[email protected]>
No description provided.