Skip to content

Commit 026c649

Browse files
andygroveSteve Vaughan Jr
authored andcommitted
docs: Ignore prettier formatting for generated tables (apache#2790)
1 parent adba587 commit 026c649

File tree

3 files changed

+32
-0
lines changed

3 files changed

+32
-0
lines changed

docs/source/user-guide/latest/compatibility.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ The following cast operations are generally compatible with Spark except for the
8989
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
9090

9191
<!--BEGIN:COMPAT_CAST_TABLE-->
92+
<!-- prettier-ignore-start -->
9293
| From Type | To Type | Notes |
9394
|-|-|-|
9495
| boolean | byte | |
@@ -165,6 +166,7 @@ The following cast operations are generally compatible with Spark except for the
165166
| timestamp | long | |
166167
| timestamp | string | |
167168
| timestamp | date | |
169+
<!-- prettier-ignore-end -->
168170
<!--END:COMPAT_CAST_TABLE-->
169171

170172
### Incompatible Casts
@@ -174,6 +176,7 @@ The following cast operations are not compatible with Spark for all inputs and a
174176
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
175177

176178
<!--BEGIN:INCOMPAT_CAST_TABLE-->
179+
<!-- prettier-ignore-start -->
177180
| From Type | To Type | Notes |
178181
|-|-|-|
179182
| float | decimal | There can be rounding differences |
@@ -182,6 +185,7 @@ The following cast operations are not compatible with Spark for all inputs and a
182185
| string | double | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. |
183186
| string | decimal | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits |
184187
| string | timestamp | Not all valid formats are supported |
188+
<!-- prettier-ignore-end -->
185189
<!--END:INCOMPAT_CAST_TABLE-->
186190

187191
### Unsupported Casts

docs/source/user-guide/latest/configs.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,22 @@ Comet provides the following configuration settings.
2525

2626
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
2727
<!--BEGIN:CONFIG_TABLE[scan]-->
28+
<!-- prettier-ignore-start -->
2829
| Config | Description | Default Value |
2930
|--------|-------------|---------------|
3031
| `spark.comet.scan.allowIncompatible` | Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
3132
| `spark.comet.scan.enabled` | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | true |
3233
| `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature of CometScan. | false |
3334
| `spark.comet.scan.preFetch.threadNum` | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
3435
| `spark.hadoop.fs.comet.libhdfs.schemes` | Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas. Valid only when built with hdfs feature enabled. | |
36+
<!-- prettier-ignore-end -->
3537
<!--END:CONFIG_TABLE-->
3638

3739
## Parquet Reader Configuration Settings
3840

3941
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
4042
<!--BEGIN:CONFIG_TABLE[parquet]-->
43+
<!-- prettier-ignore-start -->
4144
| Config | Description | Default Value |
4245
|--------|-------------|---------------|
4346
| `spark.comet.parquet.enable.directBuffer` | Whether to use Java direct byte buffer when reading Parquet. | false |
@@ -47,12 +50,14 @@ Comet provides the following configuration settings.
4750
| `spark.comet.parquet.read.parallel.io.enabled` | Whether to enable Comet's parallel reader for Parquet files. The parallel reader reads ranges of consecutive data in a file in parallel. It is faster for large files and row groups but uses more resources. | true |
4851
| `spark.comet.parquet.read.parallel.io.thread-pool.size` | The maximum number of parallel threads the parallel reader will use in a single executor. For executors configured with a smaller number of cores, use a smaller number. | 16 |
4952
| `spark.comet.parquet.respectFilterPushdown` | Whether to respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config. This needs to be respected when running the Spark SQL test suite but the default setting results in poor performance in Comet when using the new native scans, disabled by default | false |
53+
<!-- prettier-ignore-end -->
5054
<!--END:CONFIG_TABLE-->
5155

5256
## Query Execution Settings
5357

5458
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
5559
<!--BEGIN:CONFIG_TABLE[exec]-->
60+
<!-- prettier-ignore-start -->
5661
| Config | Description | Default Value |
5762
|--------|-------------|---------------|
5863
| `spark.comet.caseConversion.enabled` | Java uses locale-specific rules when converting strings to upper or lower case and Rust does not, so we disable upper and lower by default. | false |
@@ -67,6 +72,7 @@ Comet provides the following configuration settings.
6772
| `spark.comet.metrics.updateInterval` | The interval in milliseconds to update metrics. If interval is negative, metrics will be updated upon task completion. | 3000 |
6873
| `spark.comet.nativeLoadRequired` | Whether to require Comet native library to load successfully when Comet is enabled. If not, Comet will silently fallback to Spark when it fails to load the native lib. Otherwise, an error will be thrown and the Spark job will be aborted. | false |
6974
| `spark.comet.regexp.allowIncompatible` | Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
75+
<!-- prettier-ignore-end -->
7076
<!--END:CONFIG_TABLE-->
7177

7278
## Viewing Explain Plan & Fallback Reasons
@@ -75,19 +81,22 @@ These settings can be used to determine which parts of the plan are accelerated
7581

7682
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
7783
<!--BEGIN:CONFIG_TABLE[exec_explain]-->
84+
<!-- prettier-ignore-start -->
7885
| Config | Description | Default Value |
7986
|--------|-------------|---------------|
8087
| `spark.comet.explain.format` | Choose extended explain output. The default format of 'verbose' will provide the full query plan annotated with fallback reasons as well as a summary of how much of the plan was accelerated by Comet. The format 'fallback' provides a list of fallback reasons instead. | verbose |
8188
| `spark.comet.explain.native.enabled` | When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false |
8289
| `spark.comet.explain.rules` | When this setting is enabled, Comet will log all plan transformations performed in physical optimizer rules. Default: false | false |
8390
| `spark.comet.explainFallback.enabled` | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
8491
| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. It can be overridden by the environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | false |
92+
<!-- prettier-ignore-end -->
8593
<!--END:CONFIG_TABLE-->
8694

8795
## Shuffle Configuration Settings
8896

8997
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
9098
<!--BEGIN:CONFIG_TABLE[shuffle]-->
99+
<!-- prettier-ignore-start -->
91100
| Config | Description | Default Value |
92101
|--------|-------------|---------------|
93102
| `spark.comet.columnar.shuffle.async.enabled` | Whether to enable asynchronous shuffle for Arrow-based shuffle. | false |
@@ -101,24 +110,28 @@ These settings can be used to determine which parts of the plan are accelerated
101110
| `spark.comet.native.shuffle.partitioning.range.enabled` | Whether to enable range partitioning for Comet native shuffle. | true |
102111
| `spark.comet.shuffle.preferDictionary.ratio` | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
103112
| `spark.comet.shuffle.sizeInBytesMultiplier` | Comet reports smaller sizes for shuffle due to using Arrow's columnar memory format and this can result in Spark choosing a different join strategy due to the estimated size of the exchange being smaller. Comet will multiple sizeInBytes by this amount to avoid regressions in join strategy. | 1.0 |
113+
<!-- prettier-ignore-end -->
104114
<!--END:CONFIG_TABLE-->
105115

106116
## Memory & Tuning Configuration Settings
107117

108118
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
109119
<!--BEGIN:CONFIG_TABLE[tuning]-->
120+
<!-- prettier-ignore-start -->
110121
| Config | Description | Default Value |
111122
|--------|-------------|---------------|
112123
| `spark.comet.batchSize` | The columnar batch size, i.e., the maximum number of rows that a batch can contain. | 8192 |
113124
| `spark.comet.exec.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in off-heap mode. Available pool types are `greedy_unified` and `fair_unified`. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | fair_unified |
114125
| `spark.comet.exec.memoryPool.fraction` | Fraction of off-heap memory pool that is available to Comet. Only applies to off-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
115126
| `spark.comet.tracing.enabled` | Enable fine-grained tracing of events and memory usage. For more information, refer to the [Comet Tracing Guide](https://datafusion.apache.org/comet/user-guide/tracing.html). | false |
127+
<!-- prettier-ignore-end -->
116128
<!--END:CONFIG_TABLE-->
117129

118130
## Development & Testing Settings
119131

120132
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
121133
<!--BEGIN:CONFIG_TABLE[testing]-->
134+
<!-- prettier-ignore-start -->
122135
| Config | Description | Default Value |
123136
|--------|-------------|---------------|
124137
| `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
@@ -131,12 +144,14 @@ These settings can be used to determine which parts of the plan are accelerated
131144
| `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. This is an experimental feature and has known issues with non-UTC timezones. | false |
132145
| `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true. | Range,InMemoryTableScan,RDDScan |
133146
| `spark.comet.testing.strict` | Experimental option to enable strict testing, which will fail tests that could be more comprehensive, such as checking for a specific fallback reason. It can be overridden by the environment variable `ENABLE_COMET_STRICT_TESTING`. | false |
147+
<!-- prettier-ignore-end -->
134148
<!--END:CONFIG_TABLE-->
135149

136150
## Enabling or Disabling Individual Operators
137151

138152
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
139153
<!--BEGIN:CONFIG_TABLE[enable_exec]-->
154+
<!-- prettier-ignore-start -->
140155
| Config | Description | Default Value |
141156
|--------|-------------|---------------|
142157
| `spark.comet.exec.aggregate.enabled` | Whether to enable aggregate by default. | true |
@@ -157,12 +172,14 @@ These settings can be used to determine which parts of the plan are accelerated
157172
| `spark.comet.exec.takeOrderedAndProject.enabled` | Whether to enable takeOrderedAndProject by default. | true |
158173
| `spark.comet.exec.union.enabled` | Whether to enable union by default. | true |
159174
| `spark.comet.exec.window.enabled` | Whether to enable window by default. | true |
175+
<!-- prettier-ignore-end -->
160176
<!--END:CONFIG_TABLE-->
161177

162178
## Enabling or Disabling Individual Scalar Expressions
163179

164180
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
165181
<!--BEGIN:CONFIG_TABLE[enable_expr]-->
182+
<!-- prettier-ignore-start -->
166183
| Config | Description | Default Value |
167184
|--------|-------------|---------------|
168185
| `spark.comet.expression.Abs.enabled` | Enable Comet acceleration for `Abs` | true |
@@ -308,12 +325,14 @@ These settings can be used to determine which parts of the plan are accelerated
308325
| `spark.comet.expression.WeekOfYear.enabled` | Enable Comet acceleration for `WeekOfYear` | true |
309326
| `spark.comet.expression.XxHash64.enabled` | Enable Comet acceleration for `XxHash64` | true |
310327
| `spark.comet.expression.Year.enabled` | Enable Comet acceleration for `Year` | true |
328+
<!-- prettier-ignore-end -->
311329
<!--END:CONFIG_TABLE-->
312330

313331
## Enabling or Disabling Individual Aggregate Expressions
314332

315333
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
316334
<!--BEGIN:CONFIG_TABLE[enable_agg_expr]-->
335+
<!-- prettier-ignore-start -->
317336
| Config | Description | Default Value |
318337
|--------|-------------|---------------|
319338
| `spark.comet.expression.Average.enabled` | Enable Comet acceleration for `Average` | true |
@@ -334,4 +353,5 @@ These settings can be used to determine which parts of the plan are accelerated
334353
| `spark.comet.expression.Sum.enabled` | Enable Comet acceleration for `Sum` | true |
335354
| `spark.comet.expression.VariancePop.enabled` | Enable Comet acceleration for `VariancePop` | true |
336355
| `spark.comet.expression.VarianceSamp.enabled` | Enable Comet acceleration for `VarianceSamp` | true |
356+
<!-- prettier-ignore-end -->
337357
<!--END:CONFIG_TABLE-->

spark/src/main/scala/org/apache/comet/GenerateDocs.scala

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ object GenerateDocs {
5252
w.write(s"${line.stripTrailing()}\n".getBytes)
5353
line match {
5454
case pattern(category) =>
55+
w.write("<!-- prettier-ignore-start -->\n".getBytes)
5556
w.write("| Config | Description | Default Value |\n".getBytes)
5657
w.write("|--------|-------------|---------------|\n".getBytes)
5758
category match {
@@ -61,12 +62,14 @@ object GenerateDocs {
6162
w.write(
6263
s"| `$config` | Enable Comet acceleration for `$expr` | true |\n".getBytes)
6364
}
65+
w.write("<!-- prettier-ignore-end -->\n".getBytes)
6466
case "enable_agg_expr" =>
6567
for (expr <- QueryPlanSerde.aggrSerdeMap.keys.map(_.getSimpleName).toList.sorted) {
6668
val config = s"spark.comet.expression.$expr.enabled"
6769
w.write(
6870
s"| `$config` | Enable Comet acceleration for `$expr` | true |\n".getBytes)
6971
}
72+
w.write("<!-- prettier-ignore-end -->\n".getBytes)
7073
case _ =>
7174
val urlPattern = """Comet\s+(Compatibility|Tuning|Tracing)\s+Guide\s+\(""".r
7275
val confs = publicConfigs.filter(_.category == category).toList.sortBy(_.key)
@@ -93,6 +96,7 @@ object GenerateDocs {
9396
}
9497
}
9598
}
99+
w.write("<!-- prettier-ignore-end -->\n".getBytes)
96100
}
97101
case _ =>
98102
}
@@ -106,6 +110,7 @@ object GenerateDocs {
106110
for (line <- lines) {
107111
w.write(s"${line.stripTrailing()}\n".getBytes)
108112
if (line.trim == "<!--BEGIN:COMPAT_CAST_TABLE-->") {
113+
w.write("<!-- prettier-ignore-start -->\n".getBytes)
109114
w.write("| From Type | To Type | Notes |\n".getBytes)
110115
w.write("|-|-|-|\n".getBytes)
111116
for (fromType <- CometCast.supportedTypes) {
@@ -123,7 +128,9 @@ object GenerateDocs {
123128
}
124129
}
125130
}
131+
w.write("<!-- prettier-ignore-end -->\n".getBytes)
126132
} else if (line.trim == "<!--BEGIN:INCOMPAT_CAST_TABLE-->") {
133+
w.write("<!-- prettier-ignore-start -->\n".getBytes)
127134
w.write("| From Type | To Type | Notes |\n".getBytes)
128135
w.write("|-|-|-|\n".getBytes)
129136
for (fromType <- CometCast.supportedTypes) {
@@ -140,6 +147,7 @@ object GenerateDocs {
140147
}
141148
}
142149
}
150+
w.write("<!-- prettier-ignore-end -->\n".getBytes)
143151
}
144152
}
145153
w.close()

0 commit comments

Comments
 (0)