Skip to content

Commit fac6ad2

Browse files
authored
feat: DH-18351: Add CumCountWhere() and RollingCountWhere() features to UpdateBy (#6566)
## Groovy Examples ``` table = emptyTable(1000).update("key=randomInt(0,10)", "intCol=randomInt(0,1000)") // zero-key t_summary = table.updateBy([ CumCountWhere("running_gt_500", "intCol > 500"), RollingCountWhere(50, "windowed_gt_500", "intCol > 500"), ]) // bucketed t_summary = table.updateBy([ CumCountWhere("running_gt_500", "intCol > 500"), RollingCountWhere(50, "windowed_gt_500", "intCol > 500"), ], "key") ``` ## Python Examples ``` from deephaven import empty_table from deephaven.updateby import cum_count_where, rolling_count_where_tick table = empty_table(1000).update(["key=randomInt(0,10)", "intCol=randomInt(0,1000)"]) # zero-key t_summary = table.update_by([ cum_count_where(col="running_gt_500", filters="intCol > 500"), rolling_count_where_tick(rev_ticks=50, col="windowed_gt_500", filters="intCol > 500"), ]) # bucketed t_summary_bucketed = table.update_by([ cum_count_where(col="running_gt_500", filters="intCol > 500"), rolling_count_where_tick(rev_ticks=50, col="windowed_gt_500", filters="intCol > 500"), ], by="key") ``` ## Performance Notes TL:DR Performance compares very well. `RollingCountWhere()` has near identical performance to the comparison benchmarks (can be faster depending on the complexity of the filter. `CumCountWhere()` also compares well to `Ema()`but can't catch up to zero-key `CumSum()`, which is is remarkably fast. Comparing `CumCountWhere` to `CumSum` and `Ema`: ``` 120000000 avg of 2 ZeroKey CumSum 137.36250 Ema 449.5528125 CumCountWhereConstant 475.9980005 CumCountWhereMatch 649.9689995 CumCountWhereRange 654.322250 CumCountWhereMultiple 695.4477915 CumCountWhereMultipleOr 704.900583 Bucketed - 250 buckets CumSum 2979.1730005 Ema 3024.152458 CumCountWhereConstant 2569.7280835 CumCountWhereMatch 3031.6534795 CumCountWhereRange 3030.5433335 CumCountWhereMultiple 3052.597625 CumCountWhereMultipleOr 3059.911729 Bucketed - 640 buckets CumSum 3827.299833 Ema 3880.2538125 CumCountWhereConstant 3416.4387715 CumCountWhereMatch 3906.691333 CumCountWhereRange 3902.3064375 CumCountWhereMultiple 3967.1584795 CumCountWhereMultipleOr 3925.0775205 ``` Comparing `RollingCountWhere` to `RollingCount` and `RollingSum`: ``` 120000000 avg of 2 ZeroKey RollingCount 1511.7957295 RollingSum 1513.6013545 RollingCountWhereConstant 1403.2817915 RollingCountWhereMatch 1453.9323125 RollingCountWhereRange 1764.2137915 RollingCountWhereMultiple 1576.4896255 RollingCountWhereMultipleOr 1541.5631455 Bucketed - 250 buckets RollingCount 3468.7696665 RollingSum 3326.047792 RollingCountWhereConstant 2858.677771 RollingCountWhereMatch 3327.958604 RollingCountWhereRange 3347.961083 RollingCountWhereMultiple 3429.413562 RollingCountWhereMultipleOr 3364.244104 Bucketed - 640 buckets RollingCount 4310.4265835 RollingSum 4286.427479 RollingCountWhereConstant 3869.1892705 RollingCountWhereMatch 4333.8479375 RollingCountWhereRange 4269.3454375 RollingCountWhereMultiple 4290.0618545 RollingCountWhereMultipleOr 4346.8478535 ```
1 parent e7f731b commit fac6ad2

File tree

43 files changed

+7180
-2559
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+7180
-2559
lines changed

engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByOperator.java

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
* {@link UpdateByOperator#initializeRolling(Context, RowSet)} (Context)} for windowed operators</li>
3434
* <li>{@link UpdateByOperator.Context#accumulateCumulative(RowSequence, Chunk[], LongChunk, int)} for cumulative
3535
* operators or
36-
* {@link UpdateByOperator.Context#accumulateRolling(RowSequence, Chunk[], LongChunk, LongChunk, IntChunk, IntChunk, int)}
36+
* {@link UpdateByOperator.Context#accumulateRolling(RowSequence, Chunk[], LongChunk, LongChunk, IntChunk, IntChunk, int, int)}
3737
* for windowed operators</li>
3838
* <li>{@link #finishUpdate(UpdateByOperator.Context)}</li>
3939
* </ol>
@@ -99,18 +99,48 @@ protected void pop(int count) {
9999
throw new UnsupportedOperationException("pop() must be overriden by rolling operators");
100100
}
101101

102-
public abstract void accumulateCumulative(RowSequence inputKeys,
102+
/**
103+
* For cumulative operators only, this method will be called to pass the input chunk data to the operator and
104+
* produce the output data values.
105+
*
106+
* @param inputKeys the keys for the input data rows (also matches the output keys)
107+
* @param valueChunkArr the input data chunks needed by the operator for internal calculations
108+
* @param tsChunk the timestamp chunk for the input data (if applicable)
109+
* @param len the number of items in the input data chunks
110+
*/
111+
public abstract void accumulateCumulative(
112+
RowSequence inputKeys,
103113
Chunk<? extends Values>[] valueChunkArr,
104114
LongChunk<? extends Values> tsChunk,
105115
int len);
106116

107-
public abstract void accumulateRolling(RowSequence inputKeys,
117+
/**
118+
* For windowed operators only, this method will be called to pass the input chunk data to the operator and
119+
* produce the output data values. It is important to note that the size of the influencer (input) and affected
120+
* (output) chunks are not likely be the same. We pass these sizes explicitly to the operators for the sake of
121+
* the operators (such as {@link io.deephaven.engine.table.impl.updateby.countwhere.CountWhereOperator} with
122+
* zero input columns) where no input chunks are provided but we must still process the exact number of input
123+
* rows.
124+
*
125+
* @param inputKeys the keys for the input data rows (also matches the output keys)
126+
* @param influencerValueChunkArr the input data chunks needed by the operator for internal calculations, these
127+
* values will be pushed and popped into the current window
128+
* @param affectedPosChunk the row positions of the affected rows
129+
* @param influencerPosChunk the row positions of the influencer rows
130+
* @param pushChunk a chunk containing the push instructions for each output row to be calculated
131+
* @param popChunk a chunk containing the pop instructions for each output row to be calculated
132+
* @param affectedCount how many affected (output) rows are being computed
133+
* @param influencerCount how many influencer (input) rows are needed for the computation
134+
*/
135+
public abstract void accumulateRolling(
136+
RowSequence inputKeys,
108137
Chunk<? extends Values>[] influencerValueChunkArr,
109138
LongChunk<OrderedRowKeys> affectedPosChunk,
110139
LongChunk<OrderedRowKeys> influencerPosChunk,
111140
IntChunk<? extends Values> pushChunk,
112141
IntChunk<? extends Values> popChunk,
113-
int len);
142+
int affectedCount,
143+
int influencerCount);
114144

115145
/**
116146
* Write the current value for this row to the output chunk

engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByOperatorFactory.java

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,21 @@
1010
import io.deephaven.api.updateby.UpdateByControl;
1111
import io.deephaven.api.updateby.UpdateByOperation;
1212
import io.deephaven.api.updateby.spec.*;
13+
import io.deephaven.base.verify.Require;
14+
import io.deephaven.engine.rowset.RowSetFactory;
1315
import io.deephaven.engine.table.ColumnDefinition;
16+
import io.deephaven.engine.table.ColumnSource;
17+
import io.deephaven.engine.table.Table;
1418
import io.deephaven.engine.table.TableDefinition;
1519
import io.deephaven.engine.table.impl.MatchPair;
1620
import io.deephaven.engine.table.impl.QueryCompilerRequestProcessor;
21+
import io.deephaven.engine.table.impl.QueryTable;
1722
import io.deephaven.engine.table.impl.select.FormulaColumn;
1823
import io.deephaven.engine.table.impl.select.SelectColumn;
24+
import io.deephaven.engine.table.impl.select.WhereFilter;
25+
import io.deephaven.engine.table.impl.sources.NullValueColumnSource;
26+
import io.deephaven.engine.table.impl.sources.ReinterpretUtils;
27+
import io.deephaven.engine.table.impl.updateby.countwhere.CountWhereOperator;
1928
import io.deephaven.engine.table.impl.updateby.delta.*;
2029
import io.deephaven.engine.table.impl.updateby.em.*;
2130
import io.deephaven.engine.table.impl.updateby.emstd.*;
@@ -45,6 +54,7 @@
4554
import java.time.Instant;
4655
import java.util.*;
4756
import java.util.stream.Collectors;
57+
import java.util.stream.IntStream;
4858
import java.util.stream.Stream;
4959

5060
import static io.deephaven.util.BooleanUtils.NULL_BOOLEAN_AS_BYTE;
@@ -414,6 +424,12 @@ public Void visit(CumProdSpec cps) {
414424
return null;
415425
}
416426

427+
@Override
428+
public Void visit(CumCountWhereSpec spec) {
429+
ops.add(makeCountWhereOperator(tableDef, spec));
430+
return null;
431+
}
432+
417433
@Override
418434
public Void visit(@NotNull final DeltaSpec spec) {
419435
Arrays.stream(pairs)
@@ -537,6 +553,12 @@ public Void visit(@NotNull final RollingCountSpec spec) {
537553
return null;
538554
}
539555

556+
@Override
557+
public Void visit(@NotNull final RollingCountWhereSpec spec) {
558+
ops.add(makeCountWhereOperator(tableDef, spec));
559+
return null;
560+
}
561+
540562
@Override
541563
public Void visit(@NotNull final RollingFormulaSpec spec) {
542564
final boolean isTimeBased = spec.revWindowScale().isTimeBased();
@@ -1240,6 +1262,130 @@ private UpdateByOperator makeRollingCountOperator(@NotNull final MatchPair pair,
12401262
}
12411263
}
12421264

1265+
/**
1266+
* This is used for Cum/Rolling CountWhere operators
1267+
*/
1268+
private UpdateByOperator makeCountWhereOperator(
1269+
@NotNull final TableDefinition tableDef,
1270+
@NotNull final UpdateBySpec spec) {
1271+
1272+
Require.eqTrue(spec instanceof CumCountWhereSpec || spec instanceof RollingCountWhereSpec,
1273+
"spec instanceof CumCountWhereSpec || spec instanceof RollingCountWhereSpec");
1274+
1275+
final boolean isCumulative = spec instanceof CumCountWhereSpec;
1276+
1277+
final WhereFilter[] whereFilters = isCumulative
1278+
? WhereFilter.fromInternal(((CumCountWhereSpec) spec).filter())
1279+
: WhereFilter.fromInternal(((RollingCountWhereSpec) spec).filter());
1280+
1281+
final List<String> inputColumnNameList = new ArrayList<>();
1282+
final Map<String, Integer> inputColumnMap = new HashMap<>();
1283+
final List<int[]> filterInputColumnIndicesList = new ArrayList<>();
1284+
1285+
// Verify all the columns in the where filters are present in the dummy table and valid for use.
1286+
for (final WhereFilter whereFilter : whereFilters) {
1287+
whereFilter.init(tableDef);
1288+
if (whereFilter.isRefreshing()) {
1289+
throw new UnsupportedOperationException("CountWhere does not support refreshing filters");
1290+
}
1291+
1292+
// Compute which input sources this filter will use.
1293+
final List<String> filterColumnName = whereFilter.getColumns();
1294+
final int inputColumnCount = whereFilter.getColumns().size();
1295+
final int[] inputColumnIndices = new int[inputColumnCount];
1296+
for (int ii = 0; ii < inputColumnCount; ++ii) {
1297+
final String inputColumnName = filterColumnName.get(ii);
1298+
final int inputColumnIndex = inputColumnMap.computeIfAbsent(inputColumnName, k -> {
1299+
inputColumnNameList.add(inputColumnName);
1300+
return inputColumnNameList.size() - 1;
1301+
});
1302+
inputColumnIndices[ii] = inputColumnIndex;
1303+
}
1304+
filterInputColumnIndicesList.add(inputColumnIndices);
1305+
}
1306+
1307+
// Gather the input column type info and create a dummy table we can use to initialize filters.
1308+
final String[] inputColumnNames = inputColumnNameList.toArray(String[]::new);
1309+
final ColumnSource<?>[] originalColumnSources = new ColumnSource[inputColumnNames.length];
1310+
final ColumnSource<?>[] reinterpretedColumnSources = new ColumnSource[inputColumnNames.length];
1311+
1312+
final Map<String, ColumnSource<?>> columnSourceMap = new LinkedHashMap<>();
1313+
for (int i = 0; i < inputColumnNames.length; i++) {
1314+
final String col = inputColumnNames[i];
1315+
final ColumnDefinition<?> def = tableDef.getColumn(col);
1316+
// Create a representative column source of the correct type for the filter.
1317+
final ColumnSource<?> nullSource =
1318+
NullValueColumnSource.getInstance(def.getDataType(), def.getComponentType());
1319+
// Create a reinterpreted version of the column source.
1320+
final ColumnSource<?> maybeReinterpretedSource = ReinterpretUtils.maybeConvertToPrimitive(nullSource);
1321+
if (nullSource != maybeReinterpretedSource) {
1322+
originalColumnSources[i] = nullSource;
1323+
}
1324+
columnSourceMap.put(col, maybeReinterpretedSource);
1325+
reinterpretedColumnSources[i] = maybeReinterpretedSource;
1326+
}
1327+
final Table dummyTable = new QueryTable(RowSetFactory.empty().toTracking(), columnSourceMap);
1328+
1329+
final CountWhereOperator.CountFilter[] countFilters =
1330+
CountWhereOperator.CountFilter.createCountFilters(whereFilters, dummyTable,
1331+
filterInputColumnIndicesList);
1332+
1333+
// If any filter is ConditionFilter or ChunkFilter and uses a reinterpreted column, need to produce
1334+
// original-typed chunks.
1335+
final boolean originalChunksRequired = Arrays.asList(countFilters).stream()
1336+
.anyMatch(filter -> (filter.chunkFilter() != null || filter.conditionFilter() != null)
1337+
&& IntStream.of(filter.inputColumnIndices())
1338+
.anyMatch(i -> originalColumnSources[i] != null));
1339+
1340+
// If any filter is a standard WhereFilter or we need to produce original-typed chunks, need a chunk source
1341+
// table.
1342+
final boolean chunkSourceTableRequired = originalChunksRequired ||
1343+
Arrays.asList(countFilters).stream().anyMatch(filter -> filter.whereFilter() != null);
1344+
1345+
// Create a new column pair with the same name for the left and right columns
1346+
final String columnName = isCumulative
1347+
? ((CumCountWhereSpec) spec).column().name()
1348+
: ((RollingCountWhereSpec) spec).column().name();
1349+
final MatchPair pair = new MatchPair(columnName, columnName);
1350+
1351+
// Create and return the operator.
1352+
if (isCumulative) {
1353+
return new CountWhereOperator(
1354+
pair,
1355+
countFilters,
1356+
inputColumnNames,
1357+
originalColumnSources,
1358+
reinterpretedColumnSources,
1359+
chunkSourceTableRequired,
1360+
originalChunksRequired);
1361+
} else {
1362+
final RollingCountWhereSpec rs = (RollingCountWhereSpec) spec;
1363+
1364+
final String[] affectingColumns;
1365+
if (rs.revWindowScale().timestampCol() == null) {
1366+
affectingColumns = inputColumnNames;
1367+
} else {
1368+
affectingColumns = ArrayUtils.add(inputColumnNames, rs.revWindowScale().timestampCol());
1369+
}
1370+
1371+
final long prevWindowScaleUnits = rs.revWindowScale().getTimeScaleUnits();
1372+
final long fwdWindowScaleUnits = rs.fwdWindowScale().getTimeScaleUnits();
1373+
1374+
return new CountWhereOperator(
1375+
pair,
1376+
affectingColumns,
1377+
rs.revWindowScale().timestampCol(),
1378+
prevWindowScaleUnits,
1379+
fwdWindowScaleUnits,
1380+
countFilters,
1381+
inputColumnNames,
1382+
originalColumnSources,
1383+
reinterpretedColumnSources,
1384+
chunkSourceTableRequired,
1385+
originalChunksRequired);
1386+
}
1387+
}
1388+
12431389
private UpdateByOperator makeRollingStdOperator(@NotNull final MatchPair pair,
12441390
@NotNull final TableDefinition tableDef,
12451391
@NotNull final RollingStdSpec rs) {

engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByWindowRollingBase.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,8 @@ void processWindowBucketOperatorSet(final UpdateByWindowBucketContext context,
204204
influencePosChunk,
205205
ctx.pushChunks[affectedChunkOffset],
206206
ctx.popChunks[affectedChunkOffset],
207-
affectedChunkSize);
207+
affectedChunkSize,
208+
influencerCount);
208209
}
209210

210211
affectedChunkOffset++;

0 commit comments

Comments
 (0)