-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL: Add PRESENT ES|QL function #133986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Add PRESENT ES|QL function #133986
Conversation
Add a new ES|QL function that checks for the presence of a field in the output result. Presence means that the input expression yields any non-null value. Part of elastic#131069
Add unit tests and documentation for the PRESENT function. Part of elastic#131069
…nto feature/esql-present-function
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Clean-up of the PRESENT function. Part of elastic#131069
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
...e/src/main/java/org/elasticsearch/compute/aggregation/PresentGroupingAggregatorFunction.java
Outdated
Show resolved
Hide resolved
- Change intermediate state for using boolean - Add unit tests for PresentAggregatorFunctionTests and PresentGroupingAggregatorFunctionTests Part of elastic#131069
- Add VerifierTests Part of elastic#131069
- Add union_types csv tests Part of elastic#131069
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
- Fix unit tests Part of elastic#131069
- Comment out TestLogging on CsvTests - Add missing DataTypes to the function Part of elastic#131069
- Improve documentation Part of elastic#131069
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments around the seen block and the state. The PR looks great. Thanks @leontyevdv
Note that we can rewrite PRESENT as COUNT, but I'm okay with implementing the aggregator for Present as in this PR. The Present aggregator should be lighter than COUNT, but COUNT can be pushed down to Lucene. Present can be pushed down too, though I don't think it's used enough to consider it.
|
||
private static final List<IntermediateStateDesc> INTERMEDIATE_STATE_DESC = List.of( | ||
new IntermediateStateDesc("present", ElementType.BOOLEAN), | ||
new IntermediateStateDesc("seen", ElementType.BOOLEAN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need seen
which tracks groups without values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it, thanks!
new IntermediateStateDesc("seen", ElementType.BOOLEAN) | ||
); | ||
|
||
private final BooleanArrayState state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove seen
, we can use BitArray to track whether a group has a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! BitArray is now being used in this class.
public AddInput prepareProcessRawInputPage(SeenGroupIds seenGroupIds, Page page) { | ||
Block valuesBlock = page.getBlock(blockIndex()); | ||
|
||
if (valuesBlock.mayHaveNulls()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this tracking once we remove seen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
continue; | ||
} | ||
int groupId = groups.getInt(groupPosition); | ||
state.set(groupId, state.getOrDefault(groupId) || values.getValueCount(position) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't to check getValueCount
after checking isNull
(line 88).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I cleaned this up in all places within the class.
...e/src/main/java/org/elasticsearch/compute/aggregation/PresentGroupingAggregatorFunction.java
Outdated
Show resolved
Hide resolved
int groupEnd = groupStart + groups.getValueCount(groupPosition); | ||
for (int g = groupStart; g < groupEnd; g++) { | ||
int groupId = groups.getInt(g); | ||
state.set(groupId, state.getOrDefault(groupId) || present.getBoolean(groupPosition + positionOffset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid reading the value from state?
if(present.getBoolean(groupPosition + positionOffset)){
state.set(groupId, true);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thank you! Done!
return INTERMEDIATE_STATE_DESC; | ||
} | ||
|
||
private final BooleanState state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the seen
block and use a boolean to track present instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done. I use AtomicBoolean now to preserve the field count and have an ability to update it from the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using AtomicBoolean since this will only be accessed by a single thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Removed. Thank you!
- Optimize AggregatorFunctions Part of elastic#131069
- Fix Rest Tests Part of elastic#131069
Optimize AggregatorFunction Part of elastic#131069
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some nits, but this looks great. Thanks Dima!
assert page.getBlockCount() >= blockIndex() + intermediateStateDesc().size(); | ||
BooleanVector present = page.<BooleanBlock>getBlock(channels.get(0)).asVector(); | ||
for (int groupPosition = 0; groupPosition < groups.getPositionCount(); groupPosition++) { | ||
if (groups.isNull(groupPosition)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we read the value from the values block once each position?
if (groups.isNull(groupPosition) || present.getBoolean(groupPosition + positionOffset) == false) {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved! Thank you!
...e/src/main/java/org/elasticsearch/compute/aggregation/PresentGroupingAggregatorFunction.java
Outdated
Show resolved
Hide resolved
try (var valuesBuilder = driverContext.blockFactory().newBooleanBlockBuilder(selected.getPositionCount())) { | ||
for (int i = 0; i < selected.getPositionCount(); i++) { | ||
int group = selected.getInt(i); | ||
if (group < state.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We don't need to check the size; if it's out of range, BitArray will return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
||
@Override | ||
public void evaluateIntermediate(Block[] blocks, int offset, IntVector selected) { | ||
try (var valuesBuilder = driverContext.blockFactory().newBooleanBlockBuilder(selected.getPositionCount())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use newBooleanVectorFixedBuilder instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Optimize PresentGroupingAggregatorFunction Part of elastic#131069
Add PresentErrorTests Part of elastic#131069
Add docs Part of elastic#131069
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/AggregateWritables.java
Add docs Part of elastic#131069
* ES|QL: Add PRESENT ES|QL function Add a new ES|QL function that checks for the presence of a field in the output result. Presence means that the input expression yields any non-null value. Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT ES|QL function Add unit tests and documentation for the PRESENT function. Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Clean-up of the PRESENT function. Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Change intermediate state for using boolean - Add unit tests for PresentAggregatorFunctionTests and PresentGroupingAggregatorFunctionTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Add VerifierTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Add union_types csv tests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Fix unit tests Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT_OVER_TIME ES|QL function - Comment out TestLogging on CsvTests - Add missing DataTypes to the function Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT_OVER_TIME ES|QL function - Improve documentation Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Optimize AggregatorFunctions Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT ES|QL function - Fix Rest Tests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Optimize AggregatorFunction Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Optimize PresentGroupingAggregatorFunction Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add PresentErrorTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add docs Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add docs Part of elastic#131069 --------- Co-authored-by: elasticsearchmachine <[email protected]>
* ES|QL: Add PRESENT ES|QL function Add a new ES|QL function that checks for the presence of a field in the output result. Presence means that the input expression yields any non-null value. Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT ES|QL function Add unit tests and documentation for the PRESENT function. Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Clean-up of the PRESENT function. Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Change intermediate state for using boolean - Add unit tests for PresentAggregatorFunctionTests and PresentGroupingAggregatorFunctionTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Add VerifierTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Add union_types csv tests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Fix unit tests Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT_OVER_TIME ES|QL function - Comment out TestLogging on CsvTests - Add missing DataTypes to the function Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT_OVER_TIME ES|QL function - Improve documentation Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function - Optimize AggregatorFunctions Part of elastic#131069 * [CI] Auto commit changes from spotless * ES|QL: Add PRESENT ES|QL function - Fix Rest Tests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Optimize AggregatorFunction Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Optimize PresentGroupingAggregatorFunction Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add PresentErrorTests Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add docs Part of elastic#131069 * ES|QL: Add PRESENT ES|QL function Add docs Part of elastic#131069 --------- Co-authored-by: elasticsearchmachine <[email protected]>
Add a new ES|QL function that checks for the presence of a field in the output result. Presence means that the input expression yields any non-null value.
Part of #131069