Skip to content

Commit 5b18e1a

Browse files
fix!: DH-20785: nan/null comparison in filter and query language (deephaven#7510)
This PR implements support for NaN/null comparison in the filter and query language, ensuring compliance with IEEE 754 standards for NaN handling in floating-point comparisons. **Key changes:** - Adds new `FilterIsNaN` filter type to explicitly check for NaN values - Updates inequality comparison operators to return `false` when comparing with NaN (IEEE 754 compliance) - Refactors `MatchFilter` to use a new `MatchOptions` configuration object instead of separate boolean flags --------- Co-authored-by: margaretkennedy <[email protected]>
1 parent e2755da commit 5b18e1a

File tree

90 files changed

+6268
-2357
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+6268
-2357
lines changed

docs/groovy/how-to-guides/filters.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,11 @@ resultIn = source.where("X in 2,4,6")
7777
resultNotIn = source.where("X not in 2,4,6")
7878
```
7979

80+
> [!NOTE]
81+
> Match filters created using the equality operators (`=`, `==` or `!=`) follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values.
82+
>
83+
> In contrast, match filters created with set inclusion syntax (`in`, `not in`) _will_ match `NaN` values. For example: `value in NaN, 10.0` will return `true` if `value` is `NaN` or `10.0`. Alternatively, you can use the `isNaN(value)` function to explicitly test for NaN values such as `isNaN(value) || value < 10.0`.
84+
8085
### Range filters
8186

8287
Range filters evaluate to true if the column value is within a specified range. This type of filter is typically applied to numeric columns but can be applied to any column that supports comparison operators.
@@ -89,6 +94,12 @@ resultRange = source.where("X >= 2 && X < 6")
8994
resultInRange = source.where("inRange(X, 2, 6)")
9095
```
9196

97+
> [!NOTE]
98+
> Null values are considered less than any non-null value for sorting and comparison purposes. Therefore, `<` and `<=` comparisons will always include `null`. To prevent this behavior, you can add an explicit null check; for example: `!isNull(value) && value < 10`.
99+
100+
> [!NOTE]
101+
> Comparison operators on floating-point values follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values. To include `NaN` values in your comparisons, use the `isNaN(value)` function to explicitly test for NaN values, such as `isNaN(value) || value < 10.0`.
102+
92103
Both `resultRange` and `resultInRange` can instead be implemented by [conjunctively](#conjunctive) combining two separate range filters:
93104

94105
```groovy order=source,resultRangeConjunctive

docs/groovy/how-to-guides/operators.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ There are many operators available in the Deephaven Query Language (DQL). They a
8787
| `<` | Less than | Compares two values to see if the left value is less than the right value. |
8888
| `<=` | Less than or equal | Compares two values to see if the left value is less than or equal to the right value. |
8989

90+
> [!NOTE]
91+
> Null values are considered less than any non-null value for sorting and comparison purposes. Therefore, `<` and `<=` comparisons will always include `null`. To prevent this behavior, you can add an explicit null check; for example: `!isNull(value) && value < 10`.
92+
93+
> [!NOTE]
94+
> Comparison operators on floating-point values follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values. To include `NaN` values in your comparisons, you can use the set inclusion operators ("in"/"not in"). For example: `value in NaN, 10.0` will return true if `value` is `NaN` or `10.0`. Alternatively, use the `isNaN(value)` function to explicitly test for NaN values, such as `isNaN(value) || value < 10.0`.
95+
9096
### Assignment operators
9197

9298
| Symbol | Name | Description |

docs/python/how-to-guides/filters.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,11 @@ result_in = source.where("X in 2,4,6")
8989
result_notin = source.where("X not in 2,4,6")
9090
```
9191

92+
> [!NOTE]
93+
> Match filters created using the equality operators (`=`, `==` or `!=`) follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values.
94+
>
95+
> In contrast, match filters created with set inclusion syntax (`in`, `not in`) _will_ match `NaN` values. For example: `value in NaN, 10.0` will return `true` if `value` is `NaN` or `10.0`. Alternatively, you can use the `isNaN(value)` function to explicitly test for NaN values such as `isNaN(value) || value < 10.0`.
96+
9297
### Range filters
9398

9499
Range filters evaluate to true if the column value is within a specified range. This type of filter is typically applied to numeric columns but can be applied to any column that supports comparison operators.
@@ -103,6 +108,12 @@ result_range = source.where("X >= 2 && X < 6")
103108
result_inrange = source.where("inRange(X, 2, 6)")
104109
```
105110

111+
> [!NOTE]
112+
> Null values are considered less than any non-null value for sorting and comparison purposes. Therefore, `<` and `<=` comparisons will always include `null`. To prevent this behavior, you can add an explicit null check; for example: `!isNull(value) && value < 10`.
113+
114+
> [!NOTE]
115+
> Comparison operators on floating-point values follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values. To include `NaN` values in your comparisons, use the `isNaN(value)` function to explicitly test for NaN values, such as `isNaN(value) || value < 10.0`.
116+
106117
Both `result_range` and `result_inrange` can instead be implemented by [conjunctively](#conjunctive) combining two separate range filters:
107118

108119
```python order=source,result_range_conjunctive

docs/python/how-to-guides/operators.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ There are many operators available in the Deephaven Query Language (DQL). They a
8787
| `<` | Less than | Compares two values to see if the left value is less than the right value. |
8888
| `<=` | Less than or equal | Compares two values to see if the left value is less than or equal to the right value. |
8989

90+
> [!NOTE]
91+
> Null values are considered less than any non-null value for sorting and comparison purposes. Therefore, `<` and `<=` comparisons will always include `null`. To prevent this behavior, you can add an explicit null check; for example: `!isNull(value) && value < 10`.
92+
93+
> [!NOTE]
94+
> Comparison operators on floating-point values follow standard IEEE 754 rules for handling `NaN` values. Any comparison involving `NaN` returns `false`, except for `!=`, which returns `true` for all values. To include `NaN` values in your comparisons, you can use the set inclusion operators ("in"/"not in"). For example: `value in NaN, 10.0` will return true if `value` is `NaN` or `10.0`. Alternatively, use the `isNaN(value)` function to explicitly test for NaN values, such as `isNaN(value) || value < 10.0`.
95+
9096
### Assignment operators
9197

9298
| Symbol | Name | Description |

engine/api/src/main/java/io/deephaven/engine/table/ColumnSource.java

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,19 +42,17 @@ default ChunkType getChunkType() {
4242
/**
4343
* Return a {@link RowSet row set} where the values in the column source match the given keys.
4444
*
45-
* @param invertMatch Whether to invert the match, i.e. return the rows where the values do not match the given keys
4645
* @param usePrev Whether to use the previous values for the ColumnSource
47-
* @param caseInsensitive Whether to perform a case insensitive match
48-
* @param mapper Restrict results to this row set
46+
* @param matchOptions How the match should be performed; whether should be inverted, case sensitive, etc.
47+
* @param selection Restrict results to this row set
4948
* @param keys The keys to match in the column
5049
*
5150
* @return The rows that match the given keys
5251
*/
5352
WritableRowSet match(
54-
boolean invertMatch,
5553
boolean usePrev,
56-
boolean caseInsensitive,
57-
@NotNull RowSet mapper,
54+
MatchOptions matchOptions,
55+
@NotNull RowSet selection,
5856
Object... keys);
5957

6058
/**
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
//
2+
// Copyright (c) 2016-2025 Deephaven Data Labs and Patent Pending
3+
//
4+
package io.deephaven.engine.table;
5+
6+
import io.deephaven.annotations.BuildableStyle;
7+
import org.immutables.value.Value.Immutable;
8+
import org.immutables.value.Value.Default;
9+
10+
/**
11+
* An object for controlling the behavior of a {@code match} operation. Used by {@code ColumnSource#match} and when
12+
* creating {@code MatchFilter}.
13+
*/
14+
@Immutable
15+
@BuildableStyle
16+
public abstract class MatchOptions {
17+
public static final MatchOptions REGULAR = builder().build();
18+
public static final MatchOptions INVERTED = builder().inverted(true).build();
19+
20+
/**
21+
* Whether the match should be inverted.
22+
*/
23+
@Default
24+
public boolean inverted() {
25+
return false;
26+
}
27+
28+
/**
29+
* In the case of string matching, whether the match should ignore case.
30+
*/
31+
@Default
32+
public boolean caseInsensitive() {
33+
return false;
34+
}
35+
36+
/**
37+
* In the case of floating point matching, whether two NaN values are equivalent.
38+
*/
39+
@Default
40+
public boolean nanMatch() {
41+
return false;
42+
}
43+
44+
/**
45+
* Return a clone of this {@link MatchOptions} with {@link #inverted()} set to the supplied value.
46+
*/
47+
public MatchOptions withInverted(boolean inverted) {
48+
return builder()
49+
.caseInsensitive(caseInsensitive())
50+
.nanMatch(nanMatch())
51+
.inverted(inverted).build();
52+
}
53+
54+
/**
55+
* Get a new {@link Builder} for constructing {@link MatchOptions} objects.
56+
*/
57+
public static Builder builder() {
58+
return ImmutableMatchOptions.builder();
59+
}
60+
61+
/**
62+
* A class for constructing {@link MatchOptions} instances
63+
*/
64+
public interface Builder {
65+
/**
66+
* Set {@link #inverted()} to the supplied value
67+
*/
68+
Builder inverted(boolean inverted);
69+
70+
/**
71+
* Set {@link #caseInsensitive()} to the supplied value
72+
*/
73+
Builder caseInsensitive(boolean caseInsensitive);
74+
75+
/**
76+
* Set {@link #nanMatch()} to the supplied value
77+
*/
78+
Builder nanMatch(boolean nanMatch);
79+
80+
/**
81+
* Construct a new immutable {@link MatchOptions} from this builder's state.
82+
*
83+
* @return a new, immutable {@link MatchOptions}
84+
*/
85+
MatchOptions build();
86+
}
87+
}

engine/benchmark/src/benchmark/java/io/deephaven/benchmark/engine/MatchFilterBenchmark.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55

66
import io.deephaven.engine.context.ExecutionContext;
77
import io.deephaven.engine.context.TestExecutionContext;
8+
import io.deephaven.engine.table.MatchOptions;
89
import io.deephaven.engine.table.Table;
9-
import io.deephaven.engine.table.impl.select.MatchFilter.MatchType;
1010
import io.deephaven.engine.testutil.ControlledUpdateGraph;
1111
import io.deephaven.time.DateTimeUtils;
1212
import io.deephaven.engine.table.impl.select.*;
@@ -116,7 +116,7 @@ public void setupEnv(BenchmarkParams params) {
116116
values.add(ii);
117117
}
118118
}
119-
matchFilter = new MatchFilter(MatchType.Regular, filterCol, values.toArray());
119+
matchFilter = new MatchFilter(MatchOptions.REGULAR, filterCol, values.toArray());
120120
}
121121

122122
@TearDown(Level.Trial)

engine/table/src/main/java/io/deephaven/engine/table/impl/AbstractColumnSource.java

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import io.deephaven.engine.context.ExecutionContext;
1111
import io.deephaven.engine.rowset.*;
1212
import io.deephaven.engine.table.ColumnSource;
13+
import io.deephaven.engine.table.MatchOptions;
1314
import io.deephaven.engine.table.impl.chunkfillers.ChunkFiller;
1415
import io.deephaven.engine.table.impl.chunkfilter.ChunkFilter;
1516
import io.deephaven.engine.table.impl.chunkfilter.ChunkMatchFilterFactory;
@@ -103,21 +104,12 @@ public ColumnSource<T> getPrevSource() {
103104

104105
@Override
105106
public WritableRowSet match(
106-
final boolean invertMatch,
107107
final boolean usePrev,
108-
final boolean caseInsensitive,
109-
@NotNull final RowSet rowsetToFilter,
108+
final MatchOptions matchOptions,
109+
@NotNull final RowSet selection,
110110
final Object... keys) {
111-
return doChunkFilter(invertMatch, usePrev, caseInsensitive, rowsetToFilter, keys);
112-
}
113-
114-
private WritableRowSet doChunkFilter(final boolean invertMatch,
115-
final boolean usePrev,
116-
final boolean caseInsensitive,
117-
@NotNull final RowSet rowsetToFilter,
118-
final Object[] keys) {
119-
return ChunkFilter.applyChunkFilter(rowsetToFilter, this, usePrev,
120-
ChunkMatchFilterFactory.getChunkFilter(type, caseInsensitive, invertMatch, keys));
111+
return ChunkFilter.applyChunkFilter(selection, this, usePrev,
112+
ChunkMatchFilterFactory.getChunkFilter(type, matchOptions, keys));
121113
}
122114

123115
@Override

engine/table/src/main/java/io/deephaven/engine/table/impl/TimeTable.java

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import io.deephaven.engine.rowset.WritableRowSet;
1818
import io.deephaven.engine.rowset.chunkattributes.RowKeys;
1919
import io.deephaven.engine.table.ColumnSource;
20+
import io.deephaven.engine.table.MatchOptions;
2021
import io.deephaven.engine.table.Table;
2122
import io.deephaven.engine.table.impl.perf.PerformanceEntry;
2223
import io.deephaven.engine.table.impl.sources.FillUnordered;
@@ -268,9 +269,8 @@ public long getLong(long rowKey) {
268269

269270
@Override
270271
public WritableRowSet match(
271-
final boolean invertMatch,
272272
final boolean usePrev,
273-
final boolean caseInsensitive,
273+
@NotNull final MatchOptions matchOptions,
274274
@NotNull final RowSet selection,
275275
final Object... keys) {
276276
if (startTime == null) {
@@ -293,7 +293,7 @@ public WritableRowSet match(
293293
matchingSet.addKey(minus(key, startTime) / period);
294294
}
295295

296-
if (invertMatch) {
296+
if (matchOptions.inverted()) {
297297
try (final WritableRowSet matching = matchingSet.build()) {
298298
return selection.minus(matching);
299299
}
@@ -410,9 +410,8 @@ public void fillPrevChunk(
410410

411411
@Override
412412
public WritableRowSet match(
413-
final boolean invertMatch,
414413
final boolean usePrev,
415-
final boolean caseInsensitive,
414+
@NotNull final MatchOptions matchOptions,
416415
@NotNull final RowSet selection,
417416
final Object... keys) {
418417
if (startTime == null) {
@@ -435,7 +434,7 @@ public WritableRowSet match(
435434
matchingSet.addKey((key - epochNanos(startTime)) / period);
436435
}
437436

438-
if (invertMatch) {
437+
if (matchOptions.inverted()) {
439438
try (final WritableRowSet matching = matchingSet.build()) {
440439
return selection.minus(matching);
441440
}

engine/table/src/main/java/io/deephaven/engine/table/impl/WouldMatchOperation.java

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -363,10 +363,9 @@ public void fillPrevChunk(@NotNull FillContext context,
363363

364364
@Override
365365
public WritableRowSet match(
366-
final boolean invertMatch,
367366
final boolean usePrev,
368-
final boolean caseInsensitive,
369-
@NotNull final RowSet mapper,
367+
@NotNull final MatchOptions matchOptions,
368+
@NotNull final RowSet selection,
370369
final Object... keys) {
371370
boolean hasFalse = false;
372371
boolean hasTrue = false;
@@ -382,21 +381,21 @@ public WritableRowSet match(
382381
}
383382

384383
if (hasTrue && hasFalse) {
385-
return invertMatch ? RowSetFactory.empty() : mapper.copy();
384+
return matchOptions.inverted() ? RowSetFactory.empty() : selection.copy();
386385
} else if (!hasTrue && !hasFalse) {
387-
return invertMatch ? mapper.copy() : RowSetFactory.empty();
386+
return matchOptions.inverted() ? selection.copy() : RowSetFactory.empty();
388387
}
389388

390389
try (final SafeCloseableList closer = new SafeCloseableList()) {
391390
final WritableRowSet intersection;
392391
if (usePrev) {
393-
intersection = mapper.intersect(closer.add(source.copyPrev()));
392+
intersection = selection.intersect(closer.add(source.copyPrev()));
394393
} else {
395-
intersection = mapper.intersect(source);
394+
intersection = selection.intersect(source);
396395
}
397-
if (invertMatch ^ hasFalse) {
396+
if (matchOptions.inverted() ^ hasFalse) {
398397
closer.add(intersection);
399-
return mapper.minus(intersection);
398+
return selection.minus(intersection);
400399
}
401400
return intersection;
402401
}

0 commit comments

Comments
 (0)