Skip to content

Conversation

leontyevdv
Copy link
Contributor

@leontyevdv leontyevdv commented Sep 2, 2025

Add a new ES|QL function that checks for the presence of a field in the output result. Presence means that the input expression yields any non-null value.

Part of #131069

Add a new ES|QL function that checks for the presence of a field in the
output result. Presence means that the input expression yields any
non-null value.

Part of elastic#131069
@leontyevdv leontyevdv requested a review from dnhatn September 2, 2025 13:47
@leontyevdv leontyevdv self-assigned this Sep 2, 2025
elasticsearchmachine and others added 3 commits September 2, 2025 13:53
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2025

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Clean-up of the PRESENT function.

Part of elastic#131069
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
- Change intermediate state for using boolean
- Add unit tests for PresentAggregatorFunctionTests and
PresentGroupingAggregatorFunctionTests

Part of elastic#131069
- Add union_types csv tests

Part of elastic#131069
@leontyevdv leontyevdv marked this pull request as ready for review September 3, 2025 17:56
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 3, 2025
@leontyevdv leontyevdv added :Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/TSDB You know, for Metrics :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Sep 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@leontyevdv leontyevdv requested a review from dnhatn September 4, 2025 10:06
leontyevdv and others added 3 commits September 4, 2025 13:50
- Comment out TestLogging on CsvTests
- Add missing DataTypes to the function

Part of elastic#131069
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments around the seen block and the state. The PR looks great. Thanks @leontyevdv

Note that we can rewrite PRESENT as COUNT, but I'm okay with implementing the aggregator for Present as in this PR. The Present aggregator should be lighter than COUNT, but COUNT can be pushed down to Lucene. Present can be pushed down too, though I don't think it's used enough to consider it.


private static final List<IntermediateStateDesc> INTERMEDIATE_STATE_DESC = List.of(
new IntermediateStateDesc("present", ElementType.BOOLEAN),
new IntermediateStateDesc("seen", ElementType.BOOLEAN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need seen which tracks groups without values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it, thanks!

new IntermediateStateDesc("seen", ElementType.BOOLEAN)
);

private final BooleanArrayState state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove seen, we can use BitArray to track whether a group has a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! BitArray is now being used in this class.

public AddInput prepareProcessRawInputPage(SeenGroupIds seenGroupIds, Page page) {
Block valuesBlock = page.getBlock(blockIndex());

if (valuesBlock.mayHaveNulls()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this tracking once we remove seen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

continue;
}
int groupId = groups.getInt(groupPosition);
state.set(groupId, state.getOrDefault(groupId) || values.getValueCount(position) > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't to check getValueCount after checking isNull (line 88).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I cleaned this up in all places within the class.

int groupEnd = groupStart + groups.getValueCount(groupPosition);
for (int g = groupStart; g < groupEnd; g++) {
int groupId = groups.getInt(g);
state.set(groupId, state.getOrDefault(groupId) || present.getBoolean(groupPosition + positionOffset));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid reading the value from state?

if(present.getBoolean(groupPosition + positionOffset)){
	state.set(groupId, true);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you! Done!

return INTERMEDIATE_STATE_DESC;
}

private final BooleanState state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the seen block and use a boolean to track present instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done. I use AtomicBoolean now to preserve the field count and have an ability to update it from the methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using AtomicBoolean since this will only be accessed by a single thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Removed. Thank you!

@leontyevdv leontyevdv requested a review from dnhatn September 5, 2025 11:46
Optimize AggregatorFunction

Part of elastic#131069
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some nits, but this looks great. Thanks Dima!

assert page.getBlockCount() >= blockIndex() + intermediateStateDesc().size();
BooleanVector present = page.<BooleanBlock>getBlock(channels.get(0)).asVector();
for (int groupPosition = 0; groupPosition < groups.getPositionCount(); groupPosition++) {
if (groups.isNull(groupPosition)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we read the value from the values block once each position?

if (groups.isNull(groupPosition) || present.getBoolean(groupPosition + positionOffset) == false) {
      continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved! Thank you!

try (var valuesBuilder = driverContext.blockFactory().newBooleanBlockBuilder(selected.getPositionCount())) {
for (int i = 0; i < selected.getPositionCount(); i++) {
int group = selected.getInt(i);
if (group < state.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We don't need to check the size; if it's out of range, BitArray will return false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


@Override
public void evaluateIntermediate(Block[] blocks, int offset, IntVector selected) {
try (var valuesBuilder = driverContext.blockFactory().newBooleanBlockBuilder(selected.getPositionCount())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use newBooleanVectorFixedBuilder instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@leontyevdv leontyevdv added the test-release Trigger CI checks against release build label Sep 8, 2025
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/AggregateWritables.java
@leontyevdv leontyevdv merged commit cce52dd into elastic:main Sep 9, 2025
33 of 36 checks passed
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
* ES|QL: Add PRESENT ES|QL function

Add a new ES|QL function that checks for the presence of a field in the
output result. Presence means that the input expression yields any
non-null value.

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT ES|QL function

Add unit tests and documentation for the PRESENT function.

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Clean-up of the PRESENT function.

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Change intermediate state for using boolean
- Add unit tests for PresentAggregatorFunctionTests and
PresentGroupingAggregatorFunctionTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Add VerifierTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Add union_types csv tests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Fix unit tests

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT_OVER_TIME ES|QL function

- Comment out TestLogging on CsvTests
- Add missing DataTypes to the function

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT_OVER_TIME ES|QL function

- Improve documentation

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Optimize AggregatorFunctions

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT ES|QL function

- Fix Rest Tests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Optimize AggregatorFunction

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Optimize PresentGroupingAggregatorFunction

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add PresentErrorTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add docs

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add docs

Part of elastic#131069

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Sep 9, 2025
* ES|QL: Add PRESENT ES|QL function

Add a new ES|QL function that checks for the presence of a field in the
output result. Presence means that the input expression yields any
non-null value.

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT ES|QL function

Add unit tests and documentation for the PRESENT function.

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Clean-up of the PRESENT function.

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Change intermediate state for using boolean
- Add unit tests for PresentAggregatorFunctionTests and
PresentGroupingAggregatorFunctionTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Add VerifierTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Add union_types csv tests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Fix unit tests

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT_OVER_TIME ES|QL function

- Comment out TestLogging on CsvTests
- Add missing DataTypes to the function

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT_OVER_TIME ES|QL function

- Improve documentation

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

- Optimize AggregatorFunctions

Part of elastic#131069

* [CI] Auto commit changes from spotless

* ES|QL: Add PRESENT ES|QL function

- Fix Rest Tests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Optimize AggregatorFunction

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Optimize PresentGroupingAggregatorFunction

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add PresentErrorTests

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add docs

Part of elastic#131069

* ES|QL: Add PRESENT ES|QL function

Add docs

Part of elastic#131069

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine test-release Trigger CI checks against release build v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants