Skip to content

Commit 51a7494

Browse files
authored
MSQ: Allow spelling rowsInMemory as maxRowsInMemory. (#18832)
The maxRowsInMemory spelling is better aligned with how the equivalent property is spelled in tuningConfigs. Both are allowed; if both are provided, maxRowsInMemory takes priority.
1 parent 4c1a28b commit 51a7494

File tree

6 files changed

+49
-16
lines changed

6 files changed

+49
-16
lines changed

docs/multi-stage-query/reference.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,8 @@ The following table lists the context parameters for the MSQ task engine:
395395
| `finalizeAggregations` | SELECT, INSERT, REPLACE<br /><br />Determines the type of aggregation to return. If true, Druid finalizes the results of complex aggregations that directly appear in query results. If false, Druid returns the aggregation's intermediate type rather than finalized type. This parameter is useful during ingestion, where it enables storing sketches directly in Druid tables. For more information about aggregations, see [SQL aggregation functions](../querying/sql-aggregations.md). | `true` |
396396
| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type values are stored in Druid segments. When set to `array` (recommended for SQL compliance), Druid will store all ARRAY typed values in [ARRAY typed columns](../querying/arrays.md), and supports storing both VARCHAR and numeric typed arrays. When set to `mvd` (the default, for backwards compatibility), Druid only supports VARCHAR typed arrays, and will store them as [multi-value string columns](../querying/multi-value-dimensions.md). See [`arrayIngestMode`] in the [Arrays](../querying/arrays.md) page for more details. | `mvd` (for backwards compatibility, recommended to use `array` for SQL compliance)|
397397
| `sqlJoinAlgorithm` | SELECT, INSERT, REPLACE<br /><br />Algorithm to use for JOIN. Use `broadcast` (the default) for broadcast hash join or `sortMerge` for sort-merge join. Affects all JOIN operations in the query. This is a hint to the MSQ engine and the actual joins in the query may proceed in a different way than specified. See [Joins](#joins) for more details. | `broadcast` |
398-
| `rowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to store in memory at once before flushing to disk during the segment generation process. Ignored for non-INSERT queries. In most cases, use the default value. You may need to override the default if you run into one of the [known issues](./known-issues.md) around memory usage. | 100,000 |
398+
| `maxRowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to store in memory at once before flushing to disk during the segment generation process. Ignored for non-INSERT queries. In most cases, use the default value. You may need to override the default if you run into one of the [known issues](./known-issues.md) around memory usage. | 100,000 |
399+
| `rowsInMemory` | INSERT or REPLACE<br /><br />Alternate spelling of `maxRowsInMemory`. Ignored if `maxRowsInMemory` is set. | 100,000 |
399400
| `segmentSortOrder` | INSERT or REPLACE<br /><br />Normally, Druid sorts rows in individual segments using `__time` first, followed by the [CLUSTERED BY](#clustered-by) clause. When you set `segmentSortOrder`, Druid uses the order from this context parameter instead. Provide the column list as comma-separated values or as a JSON array in string form.<br />< br/>For example, consider an INSERT query that uses `CLUSTERED BY country` and has `segmentSortOrder` set to `__time,city,country`. Within each time chunk, Druid assigns rows to segments based on `country`, and then within each of those segments, Druid sorts those rows by `__time` first, then `city`, then `country`. | empty list |
400401
| `forceSegmentSortByTime` | INSERT or REPLACE<br /><br />When set to `true` (the default), Druid prepends `__time` to [CLUSTERED BY](#clustered-by) when determining the sort order for individual segments. Druid also requires that `segmentSortOrder`, if provided, starts with `__time`.<br /><br />When set to `false`, Druid uses the [CLUSTERED BY](#clustered-by) alone to determine the sort order for individual segments, and does not require that `segmentSortOrder` begin with `__time`. Setting this parameter to `false` is an experimental feature; see [Sorting](../ingestion/partitioning.md#sorting) for details. | `true` |
401402
| `maxParseExceptions`| SELECT, INSERT, REPLACE<br /><br />Maximum number of parse exceptions that are ignored while executing the query before it stops with `TooManyWarningsFault`. To ignore all the parse exceptions, set the value to -1. | 0 |

multi-stage-query/src/main/java/org/apache/druid/msq/indexing/MSQCompactionRunner.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ private static MSQTuningConfig buildMSQTuningConfig(CompactionTask compactionTas
339339

340340
// We don't consider maxRowsInMemory coming via CompactionTuningConfig since it always sets a default value if no
341341
// value specified by user.
342-
final int maxRowsInMemory = MultiStageQueryContext.getRowsInMemory(compactionTaskContext);
342+
final int maxRowsInMemory = MultiStageQueryContext.getMaxRowsInMemory(compactionTaskContext);
343343
final Integer maxNumSegments = MultiStageQueryContext.getMaxNumSegments(compactionTaskContext);
344344

345345
Integer rowsPerSegment = getRowsPerSegment(compactionTask);

multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -545,7 +545,7 @@ private static MSQTuningConfig makeMSQTuningConfig(final PlannerContext plannerC
545545
// This parameter is used internally for the number of worker tasks only, so we subtract 1
546546
final int maxNumWorkers = maxNumTasks - 1;
547547
final int rowsPerSegment = MultiStageQueryContext.getRowsPerSegment(sqlQueryContext);
548-
final int maxRowsInMemory = MultiStageQueryContext.getRowsInMemory(sqlQueryContext);
548+
final int maxRowsInMemory = MultiStageQueryContext.getMaxRowsInMemory(sqlQueryContext);
549549
final Integer maxNumSegments = MultiStageQueryContext.getMaxNumSegments(sqlQueryContext);
550550
final IndexSpec indexSpec = MultiStageQueryContext.getIndexSpec(sqlQueryContext, plannerContext.getJsonMapper());
551551
MSQTuningConfig tuningConfig = new MSQTuningConfig(maxNumWorkers, maxRowsInMemory, rowsPerSegment, maxNumSegments, indexSpec);

multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,10 +157,22 @@ public class MultiStageQueryContext
157157
public static final String CTX_REMOVE_NULL_BYTES = "removeNullBytes";
158158
public static final boolean DEFAULT_REMOVE_NULL_BYTES = false;
159159

160-
public static final String CTX_ROWS_IN_MEMORY = "rowsInMemory";
161-
// Lower than the default to minimize the impact of per-row overheads that are not accounted for by
162-
// OnheapIncrementalIndex. For example: overheads related to creating bitmaps during persist.
163-
public static final int DEFAULT_ROWS_IN_MEMORY = 100000;
160+
/**
161+
* Used by {@link #getMaxRowsInMemory(QueryContext)}.
162+
*/
163+
static final String CTX_MAX_ROWS_IN_MEMORY = "maxRowsInMemory";
164+
165+
/**
166+
* Used by {@link #getMaxRowsInMemory(QueryContext)}. Alternate spelling of {@link #CTX_MAX_ROWS_IN_MEMORY}.
167+
* Ignored if {@link #CTX_MAX_ROWS_IN_MEMORY} is set.
168+
*/
169+
static final String CTX_ROWS_IN_MEMORY = "rowsInMemory";
170+
171+
/**
172+
* Lower than the default to minimize the impact of per-row overheads that are not accounted for by
173+
* OnheapIncrementalIndex. For example: overheads related to creating bitmaps during persist.
174+
*/
175+
public static final int DEFAULT_MAX_ROWS_IN_MEMORY = 100000;
164176

165177
public static final String CTX_IS_REINDEX = "isReindex";
166178

@@ -413,9 +425,14 @@ public static MSQSelectDestination getSelectDestination(final QueryContext query
413425
return destination;
414426
}
415427

416-
public static int getRowsInMemory(final QueryContext queryContext)
428+
public static int getMaxRowsInMemory(final QueryContext queryContext)
417429
{
418-
return queryContext.getInt(CTX_ROWS_IN_MEMORY, DEFAULT_ROWS_IN_MEMORY);
430+
Integer ctxValue = queryContext.getInt(CTX_MAX_ROWS_IN_MEMORY);
431+
if (ctxValue == null) {
432+
ctxValue = queryContext.getInt(CTX_ROWS_IN_MEMORY);
433+
}
434+
435+
return ctxValue != null ? ctxValue : DEFAULT_MAX_ROWS_IN_MEMORY;
419436
}
420437

421438
public static Integer getMaxNumSegments(final QueryContext queryContext)

multi-stage-query/src/test/java/org/apache/druid/msq/indexing/MSQCompactionRunnerTest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -758,7 +758,7 @@ private static MSQTuningConfig getExpectedTuningConfig()
758758
{
759759
return new MSQTuningConfig(
760760
1,
761-
MultiStageQueryContext.DEFAULT_ROWS_IN_MEMORY,
761+
MultiStageQueryContext.DEFAULT_MAX_ROWS_IN_MEMORY,
762762
MAX_ROWS_PER_SEGMENT,
763763
null,
764764
createIndexSpec()

multi-stage-query/src/test/java/org/apache/druid/msq/util/MultiStageQueryContextTest.java

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_FINALIZE_AGGREGATIONS;
5050
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_MAX_FRAME_SIZE;
5151
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_MAX_NUM_TASKS;
52+
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_MAX_ROWS_IN_MEMORY;
5253
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_MAX_THREADS;
5354
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_MSQ_MODE;
5455
import static org.apache.druid.msq.util.MultiStageQueryContext.CTX_REMOVE_NULL_BYTES;
@@ -159,19 +160,33 @@ public void getRowsPerSegment_set_returnsCorrectValue()
159160
}
160161

161162
@Test
162-
public void getRowsInMemory_unset_returnsDefaultValue()
163+
public void getMaxRowsInMemory_unset_returnsDefaultValue()
163164
{
164165
Assert.assertEquals(
165-
MultiStageQueryContext.DEFAULT_ROWS_IN_MEMORY,
166-
MultiStageQueryContext.getRowsInMemory(QueryContext.empty())
166+
MultiStageQueryContext.DEFAULT_MAX_ROWS_IN_MEMORY,
167+
MultiStageQueryContext.getMaxRowsInMemory(QueryContext.empty())
167168
);
168169
}
169170

170171
@Test
171-
public void getRowsInMemory_set_returnsCorrectValue()
172+
public void getMaxRowsInMemory_set_returnsCorrectValue()
172173
{
173-
Map<String, Object> propertyMap = ImmutableMap.of(CTX_ROWS_IN_MEMORY, 10);
174-
Assert.assertEquals(10, MultiStageQueryContext.getRowsInMemory(QueryContext.of(propertyMap)));
174+
Map<String, Object> propertyMap = ImmutableMap.of(CTX_MAX_ROWS_IN_MEMORY, 10);
175+
Assert.assertEquals(10, MultiStageQueryContext.getMaxRowsInMemory(QueryContext.of(propertyMap)));
176+
}
177+
178+
@Test
179+
public void getMaxRowsInMemory_altSet_returnsCorrectValue()
180+
{
181+
Map<String, Object> propertyMap = ImmutableMap.of(CTX_ROWS_IN_MEMORY, 20);
182+
Assert.assertEquals(20, MultiStageQueryContext.getMaxRowsInMemory(QueryContext.of(propertyMap)));
183+
}
184+
185+
@Test
186+
public void getMaxRowsInMemory_bothSet_returnsCorrectValue()
187+
{
188+
Map<String, Object> propertyMap = ImmutableMap.of(CTX_ROWS_IN_MEMORY, 20, CTX_MAX_ROWS_IN_MEMORY, 10);
189+
Assert.assertEquals(10, MultiStageQueryContext.getMaxRowsInMemory(QueryContext.of(propertyMap)));
175190
}
176191

177192
@Test

0 commit comments

Comments
 (0)