docs: Ignore prettier formatting for generated tables (apache#2790)

andygrove · Steve Vaughan Jr · commit 026c6491c2d0 · 2025-11-17T20:36:18.000-05:00
diff --git a/docs/source/user-guide/latest/compatibility.md b/docs/source/user-guide/latest/compatibility.md
@@ -89,6 +89,7 @@ The following cast operations are generally compatible with Spark except for the
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 
 <!--BEGIN:COMPAT_CAST_TABLE-->
+<!-- prettier-ignore-start -->
 | From Type | To Type | Notes |
 |-|-|-|
 | boolean | byte |  |
@@ -165,6 +166,7 @@ The following cast operations are generally compatible with Spark except for the
 | timestamp | long |  |
 | timestamp | string |  |
 | timestamp | date |  |
+<!-- prettier-ignore-end -->
 <!--END:COMPAT_CAST_TABLE-->
 
 ### Incompatible Casts
@@ -174,6 +176,7 @@ The following cast operations are not compatible with Spark for all inputs and a
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 
 <!--BEGIN:INCOMPAT_CAST_TABLE-->
+<!-- prettier-ignore-start -->
 | From Type | To Type | Notes |
 |-|-|-|
 | float | decimal  | There can be rounding differences |
@@ -182,6 +185,7 @@ The following cast operations are not compatible with Spark for all inputs and a
 | string | double  | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. |
 | string | decimal  | Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits |
 | string | timestamp  | Not all valid formats are supported |
+<!-- prettier-ignore-end -->
 <!--END:INCOMPAT_CAST_TABLE-->
 
 ### Unsupported Casts
diff --git a/docs/source/user-guide/latest/configs.md b/docs/source/user-guide/latest/configs.md
@@ -25,19 +25,22 @@ Comet provides the following configuration settings.
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[scan]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.scan.allowIncompatible` | Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
 | `spark.comet.scan.enabled` | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. | true |
 | `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature of CometScan. | false |
 | `spark.comet.scan.preFetch.threadNum` | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
 | `spark.hadoop.fs.comet.libhdfs.schemes` | Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas. Valid only when built with hdfs feature enabled. | |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Parquet Reader Configuration Settings
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[parquet]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.parquet.enable.directBuffer` | Whether to use Java direct byte buffer when reading Parquet. | false |
@@ -47,12 +50,14 @@ Comet provides the following configuration settings.
 | `spark.comet.parquet.read.parallel.io.enabled` | Whether to enable Comet's parallel reader for Parquet files. The parallel reader reads ranges of consecutive data in a  file in parallel. It is faster for large files and row groups but uses more resources. | true |
 | `spark.comet.parquet.read.parallel.io.thread-pool.size` | The maximum number of parallel threads the parallel reader will use in a single executor. For executors configured with a smaller number of cores, use a smaller number. | 16 |
 | `spark.comet.parquet.respectFilterPushdown` | Whether to respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config. This needs to be respected when running the Spark SQL test suite but the default setting results in poor performance in Comet when using the new native scans, disabled by default | false |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Query Execution Settings
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[exec]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.caseConversion.enabled` | Java uses locale-specific rules when converting strings to upper or lower case and Rust does not, so we disable upper and lower by default. | false |
@@ -67,6 +72,7 @@ Comet provides the following configuration settings.
 | `spark.comet.metrics.updateInterval` | The interval in milliseconds to update metrics. If interval is negative, metrics will be updated upon task completion. | 3000 |
 | `spark.comet.nativeLoadRequired` | Whether to require Comet native library to load successfully when Comet is enabled. If not, Comet will silently fallback to Spark when it fails to load the native lib. Otherwise, an error will be thrown and the Spark job will be aborted. | false |
 | `spark.comet.regexp.allowIncompatible` | Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Viewing Explain Plan & Fallback Reasons
@@ -75,19 +81,22 @@ These settings can be used to determine which parts of the plan are accelerated
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[exec_explain]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.explain.format` | Choose extended explain output. The default format of 'verbose' will provide the full query plan annotated with fallback reasons as well as a summary of how much of the plan was accelerated by Comet. The format 'fallback' provides a list of fallback reasons instead. | verbose |
 | `spark.comet.explain.native.enabled` | When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false |
 | `spark.comet.explain.rules` | When this setting is enabled, Comet will log all plan transformations performed in physical optimizer rules. Default: false | false |
 | `spark.comet.explainFallback.enabled` | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
 | `spark.comet.logFallbackReasons.enabled` | When this setting is enabled, Comet will log warnings for all fallback reasons. It can be overridden by the environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | false |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Shuffle Configuration Settings
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[shuffle]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.columnar.shuffle.async.enabled` | Whether to enable asynchronous shuffle for Arrow-based shuffle. | false |
@@ -101,24 +110,28 @@ These settings can be used to determine which parts of the plan are accelerated
 | `spark.comet.native.shuffle.partitioning.range.enabled` | Whether to enable range partitioning for Comet native shuffle. | true |
 | `spark.comet.shuffle.preferDictionary.ratio` | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
 | `spark.comet.shuffle.sizeInBytesMultiplier` | Comet reports smaller sizes for shuffle due to using Arrow's columnar memory format and this can result in Spark choosing a different join strategy due to the estimated size of the exchange being smaller. Comet will multiple sizeInBytes by this amount to avoid regressions in join strategy. | 1.0 |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Memory & Tuning Configuration Settings
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[tuning]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.batchSize` | The columnar batch size, i.e., the maximum number of rows that a batch can contain. | 8192 |
 | `spark.comet.exec.memoryPool` | The type of memory pool to be used for Comet native execution when running Spark in off-heap mode. Available pool types are `greedy_unified` and `fair_unified`. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | fair_unified |
 | `spark.comet.exec.memoryPool.fraction` | Fraction of off-heap memory pool that is available to Comet. Only applies to off-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
 | `spark.comet.tracing.enabled` | Enable fine-grained tracing of events and memory usage. For more information, refer to the [Comet Tracing Guide](https://datafusion.apache.org/comet/user-guide/tracing.html). | false |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Development & Testing Settings
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[testing]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to be allocated per executor process for columnar shuffle when running in on-heap mode. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
@@ -131,12 +144,14 @@ These settings can be used to determine which parts of the plan are accelerated
 | `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow columnar conversion. When this is turned on, Comet will convert operators in `spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format before processing. This is an experimental feature and has known issues with non-UTC timezones. | false |
 | `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list of operators that will be converted to Arrow columnar format when `spark.comet.sparkToColumnar.enabled` is true. | Range,InMemoryTableScan,RDDScan |
 | `spark.comet.testing.strict` | Experimental option to enable strict testing, which will fail tests that could be more comprehensive, such as checking for a specific fallback reason. It can be overridden by the environment variable `ENABLE_COMET_STRICT_TESTING`. | false |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Enabling or Disabling Individual Operators
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[enable_exec]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.exec.aggregate.enabled` | Whether to enable aggregate by default. | true |
@@ -157,12 +172,14 @@ These settings can be used to determine which parts of the plan are accelerated
 | `spark.comet.exec.takeOrderedAndProject.enabled` | Whether to enable takeOrderedAndProject by default. | true |
 | `spark.comet.exec.union.enabled` | Whether to enable union by default. | true |
 | `spark.comet.exec.window.enabled` | Whether to enable window by default. | true |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Enabling or Disabling Individual Scalar Expressions
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[enable_expr]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.expression.Abs.enabled` | Enable Comet acceleration for `Abs` | true |
@@ -308,12 +325,14 @@ These settings can be used to determine which parts of the plan are accelerated
 | `spark.comet.expression.WeekOfYear.enabled` | Enable Comet acceleration for `WeekOfYear` | true |
 | `spark.comet.expression.XxHash64.enabled` | Enable Comet acceleration for `XxHash64` | true |
 | `spark.comet.expression.Year.enabled` | Enable Comet acceleration for `Year` | true |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
 
 ## Enabling or Disabling Individual Aggregate Expressions
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 <!--BEGIN:CONFIG_TABLE[enable_agg_expr]-->
+<!-- prettier-ignore-start -->
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.expression.Average.enabled` | Enable Comet acceleration for `Average` | true |
@@ -334,4 +353,5 @@ These settings can be used to determine which parts of the plan are accelerated
 | `spark.comet.expression.Sum.enabled` | Enable Comet acceleration for `Sum` | true |
 | `spark.comet.expression.VariancePop.enabled` | Enable Comet acceleration for `VariancePop` | true |
 | `spark.comet.expression.VarianceSamp.enabled` | Enable Comet acceleration for `VarianceSamp` | true |
+<!-- prettier-ignore-end -->
 <!--END:CONFIG_TABLE-->
diff --git a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
@@ -52,6 +52,7 @@ object GenerateDocs {
       w.write(s"${line.stripTrailing()}\n".getBytes)
       line match {
         case pattern(category) =>
+          w.write("<!-- prettier-ignore-start -->\n".getBytes)
           w.write("| Config | Description | Default Value |\n".getBytes)
           w.write("|--------|-------------|---------------|\n".getBytes)
           category match {
@@ -61,12 +62,14 @@ object GenerateDocs {
                 w.write(
                   s"| `$config` | Enable Comet acceleration for `$expr` | true |\n".getBytes)
               }
+              w.write("<!-- prettier-ignore-end -->\n".getBytes)
             case "enable_agg_expr" =>
               for (expr <- QueryPlanSerde.aggrSerdeMap.keys.map(_.getSimpleName).toList.sorted) {
                 val config = s"spark.comet.expression.$expr.enabled"
                 w.write(
                   s"| `$config` | Enable Comet acceleration for `$expr` | true |\n".getBytes)
               }
+              w.write("<!-- prettier-ignore-end -->\n".getBytes)
             case _ =>
               val urlPattern = """Comet\s+(Compatibility|Tuning|Tracing)\s+Guide\s+\(""".r
               val confs = publicConfigs.filter(_.category == category).toList.sortBy(_.key)
@@ -93,6 +96,7 @@ object GenerateDocs {
                   }
                 }
               }
+              w.write("<!-- prettier-ignore-end -->\n".getBytes)
           }
         case _ =>
       }
@@ -106,6 +110,7 @@ object GenerateDocs {
     for (line <- lines) {
       w.write(s"${line.stripTrailing()}\n".getBytes)
       if (line.trim == "<!--BEGIN:COMPAT_CAST_TABLE-->") {
+        w.write("<!-- prettier-ignore-start -->\n".getBytes)
         w.write("| From Type | To Type | Notes |\n".getBytes)
         w.write("|-|-|-|\n".getBytes)
         for (fromType <- CometCast.supportedTypes) {
@@ -123,7 +128,9 @@ object GenerateDocs {
             }
           }
         }
+        w.write("<!-- prettier-ignore-end -->\n".getBytes)
       } else if (line.trim == "<!--BEGIN:INCOMPAT_CAST_TABLE-->") {
+        w.write("<!-- prettier-ignore-start -->\n".getBytes)
         w.write("| From Type | To Type | Notes |\n".getBytes)
         w.write("|-|-|-|\n".getBytes)
         for (fromType <- CometCast.supportedTypes) {
@@ -140,6 +147,7 @@ object GenerateDocs {
             }
           }
         }
+        w.write("<!-- prettier-ignore-end -->\n".getBytes)
       }
     }
     w.close()

Original file line number	Diff line number	Diff line change
`@@ -52,6 +52,7 @@ object GenerateDocs {`
`52`	`52`	`w.write(s"${line.stripTrailing()}\n".getBytes)`
`53`	`53`	`line match {`
`54`	`54`	`case pattern(category) =>`
	`55`	`+ w.write("<!-- prettier-ignore-start -->\n".getBytes)`
`55`	`56`	`w.write("\| Config \| Description \| Default Value \|\n".getBytes)`
`56`	`57`	`w.write("\|--------\|-------------\|---------------\|\n".getBytes)`
`57`	`58`	`category match {`
`@@ -61,12 +62,14 @@ object GenerateDocs {`
`61`	`62`	`w.write(`
`62`	`63`	s"\| `$config` \| Enable Comet acceleration for `$expr` \| true \|\n".getBytes)
`63`	`64`	`}`
	`65`	`+ w.write("<!-- prettier-ignore-end -->\n".getBytes)`
`64`	`66`	`case "enable_agg_expr" =>`
`65`	`67`	`for (expr <- QueryPlanSerde.aggrSerdeMap.keys.map(_.getSimpleName).toList.sorted) {`
`66`	`68`	`val config = s"spark.comet.expression.$expr.enabled"`
`67`	`69`	`w.write(`
`68`	`70`	s"\| `$config` \| Enable Comet acceleration for `$expr` \| true \|\n".getBytes)
`69`	`71`	`}`
	`72`	`+ w.write("<!-- prettier-ignore-end -->\n".getBytes)`
`70`	`73`	`case _ =>`
`71`	`74`	`val urlPattern = """Comet\s+(Compatibility\|Tuning\|Tracing)\s+Guide\s+\(""".r`
`72`	`75`	`val confs = publicConfigs.filter(_.category == category).toList.sortBy(_.key)`
`@@ -93,6 +96,7 @@ object GenerateDocs {`
`93`	`96`	`}`
`94`	`97`	`}`
`95`	`98`	`}`
	`99`	`+ w.write("<!-- prettier-ignore-end -->\n".getBytes)`
`96`	`100`	`}`
`97`	`101`	`case _ =>`
`98`	`102`	`}`
`@@ -106,6 +110,7 @@ object GenerateDocs {`
`106`	`110`	`for (line <- lines) {`
`107`	`111`	`w.write(s"${line.stripTrailing()}\n".getBytes)`
`108`	`112`	`if (line.trim == "<!--BEGIN:COMPAT_CAST_TABLE-->") {`
	`113`	`+ w.write("<!-- prettier-ignore-start -->\n".getBytes)`
`109`	`114`	`w.write("\| From Type \| To Type \| Notes \|\n".getBytes)`
`110`	`115`	`w.write("\|-\|-\|-\|\n".getBytes)`
`111`	`116`	`for (fromType <- CometCast.supportedTypes) {`
`@@ -123,7 +128,9 @@ object GenerateDocs {`
`123`	`128`	`}`
`124`	`129`	`}`
`125`	`130`	`}`
	`131`	`+ w.write("<!-- prettier-ignore-end -->\n".getBytes)`
`126`	`132`	`} else if (line.trim == "<!--BEGIN:INCOMPAT_CAST_TABLE-->") {`
	`133`	`+ w.write("<!-- prettier-ignore-start -->\n".getBytes)`
`127`	`134`	`w.write("\| From Type \| To Type \| Notes \|\n".getBytes)`
`128`	`135`	`w.write("\|-\|-\|-\|\n".getBytes)`
`129`	`136`	`for (fromType <- CometCast.supportedTypes) {`
`@@ -140,6 +147,7 @@ object GenerateDocs {`
`140`	`147`	`}`
`141`	`148`	`}`
`142`	`149`	`}`
	`150`	`+ w.write("<!-- prettier-ignore-end -->\n".getBytes)`
`143`	`151`	`}`
`144`	`152`	`}`
`145`	`153`	`w.close()`