fix(help): fine-tune markdown generation of docopt usage text (#3600)

jqnatividad · claude · web-flow · commit 8f57e8715a12 · 2026-03-10T10:11:27.000-04:00
* fix(help): process list items in help parsing properly

Treat lines beginning with "- " or "* " as list items when parsing help text instead of as section breaks. Such lines are now appended as HTML bullets ("&lt;br&gt;• " + content with the marker removed). Also adjust the break condition so a hyphen followed by a space ("- ") is not interpreted as a new option/section. Changes applied to parse_arguments_section and parse_option_line.

* docs(describegpt): wordsmith usage text to get rid of `&lt;dir&gt;`

which was causing markdown formatting error

* refactor(help): use inline HTML for bullets so they render properly in GH markdown inside tables

* docs(help): render bullets in options properly

* docs(help): render new pragmastat options

* docs(help): update help markdown to remove `&lt;dir&gt;` causing markdown rendering issues

also capitalize MCP

* fix(help): separate overly-merged list items and trim trailing whitespace in bullets

Use indentation tracking to detect when a non-bullet line in a list is
post-list text rather than a continuation of the current bullet item.
Also trim trailing whitespace from list item content before closing &lt;/li&gt;.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;

* feat(help): Treat U+200E as line break in help generator

Recognize the Left-to-Right mark (U+200E) as an intentional line break when generating help markdown and render it as &lt;br&gt; in output. Update help text for the joinp command to use HTML &lt;br&gt; for line breaks and fix minor punctuation/formatting in the cached schema option. This ensures docopt-inserted U+200E markers (used to avoid parsing negative numbers as flags) are displayed properly in generated docs.

---------

Co-authored-by: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/docs/help/describegpt.md b/docs/help/describegpt.md
@@ -5,7 +5,7 @@
 **[Table of Contents](TableOfContents.md)** | **Source: [src/cmd/describegpt.rs](https://github.com/dathere/qsv/blob/master/src/cmd/describegpt.rs)** | [🌐](TableOfContents.md#legend "has web-aware options.")[🤖](TableOfContents.md#legend "command uses Natural Language Processing or Generative AI.")[🪄](TableOfContents.md#legend "\"automagical\" commands that uses stats and/or frequency tables to work \"smarter\" & \"faster\".")[🗃️](TableOfContents.md#legend "Limited Extended input support.")[📚](TableOfContents.md#legend "has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[⛩️](TableOfContents.md#legend "uses Mini Jinja template engine.") [![CKAN](../images/ckan.png)](TableOfContents.md#legend "has CKAN-aware integration options.")
 
 <a name="nav"></a>
-[Description](#description) | [Examples](#examples) | [Usage](#usage) | [Data Analysis/Inferencing Options](#data-analysis/inferencing-options) | [Dictionary Options](#dictionary-options) | [Tag Options](#tag-options) | [Stats/Frequency Options](#stats/frequency-options) | [Custom Prompt Options](#custom-prompt-options) | [LLM API Options](#llm-api-options) | [Caching Options](#caching-options) | [Mcp Sampling Options](#mcp-sampling-options) | [Common Options](#common-options)
+[Description](#description) | [Examples](#examples) | [Usage](#usage) | [Data Analysis/Inferencing Options](#data-analysis/inferencing-options) | [Dictionary Options](#dictionary-options) | [Tag Options](#tag-options) | [Stats/Frequency Options](#stats/frequency-options) | [Custom Prompt Options](#custom-prompt-options) | [LLM API Options](#llm-api-options) | [Caching Options](#caching-options) | [MCP Sampling Options](#mcp-sampling-options) | [Common Options](#common-options)
 
 <a name="description"></a>
 
@@ -276,15 +276,15 @@ qsv describegpt --help
 | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
 |--------|------|-------------|--------|
 | &nbsp;`--no-cache`&nbsp; | flag | Disable default disk cache. |  |
-| &nbsp;`--disk-cache-dir`&nbsp; | string | The directory <dir> to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange, etc.) | `~/.qsv/cache/describegpt` |
+| &nbsp;`--disk-cache-dir`&nbsp; | string | The directory to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange). | `~/.qsv/cache/describegpt` |
 | &nbsp;`--redis-cache`&nbsp; | flag | Use Redis instead of the default disk cache to cache LLM completions. It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool size of 20, with a TTL of 28 days, and cache hits NOT refreshing an existing cached value's TTL. This option automatically disables the disk cache. |  |
 | &nbsp;`--fresh`&nbsp; | flag | Send a fresh request to the LLM API, refreshing a cached response if it exists. When a --prompt SQL query fails, you can also use this option to request the LLM to generate a new SQL query. |  |
 | &nbsp;`--forget`&nbsp; | flag | Remove a cached response if it exists and then exit. |  |
 | &nbsp;`--flush-cache`&nbsp; | flag | Flush the current cache entries on startup. WARNING: This operation is irreversible. |  |
 
 <a name="mcp-sampling-options"></a>
 
-## Mcp Sampling Options [↩](#nav)
+## MCP Sampling Options [↩](#nav)
 
 | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
 |--------|------|-------------|--------|
diff --git a/docs/help/excel.md b/docs/help/excel.md
@@ -171,7 +171,7 @@ qsv excel --help
 | &nbsp;`--table`&nbsp; | string | An Excel table (case-insensitive) to extract to a CSV. Only valid for XLSX files. The --sheet option is ignored as a table could be in any sheet. Overrides --range option. |  |
 | &nbsp;`--range`&nbsp; | string | An Excel format range - like RangeName, C:T, C3:T25 or 'Sheet1!C3:T25' to extract to the CSV. If the specified range contains the required sheet, the --sheet option is ignored. If the range is not found, qsv will exit with an error. |  |
 | &nbsp;`--cell`&nbsp; | string | A single cell reference - like C3 or 'Sheet1!C3' to extract. This is a convenience option equivalent to --range C3:C3. If both --cell and --range are specified, --cell takes precedence. |  |
-| &nbsp;`--error-format`&nbsp; | string | The format to use when formatting error cells. There are 3 formats: * "code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!) * "formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0) * "both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1) | `code` |
+| &nbsp;`--error-format`&nbsp; | string | The format to use when formatting error cells. There are 3 formats:<ul><li>"code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!)</li><li>"formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0)</li><li>"both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1)</li></ul> | `code` |
 | &nbsp;`--flexible`&nbsp; | flag | Continue even if the number of columns is different from row to row. |  |
 | &nbsp;`--trim`&nbsp; | flag | Trim all fields so that leading & trailing whitespaces are removed. Also removes embedded linebreaks. |  |
 | &nbsp;`--date-format`&nbsp; | string | Optional date format to use when formatting dates. See <https://docs.rs/chrono/latest/chrono/format/strftime/index.html> for the full list of supported format specifiers. Note that if a date format is invalid, qsv will fall back and return the date as if no date-format was specified. |  |
diff --git a/docs/help/geoconvert.md b/docs/help/geoconvert.md
@@ -59,7 +59,7 @@ qsv geoconvert --help
 |----------|-------------|
 | &nbsp;`<input>`&nbsp; | The spatial file to convert. To use stdin instead, use a dash "-". Note: SHP input must be a path to a .shp file and cannot use stdin. |
 | &nbsp;`<input-format>`&nbsp; | Valid values are "geojson", "shp", and "csv" |
-| &nbsp;`<output-format>`&nbsp; | Valid values are: |
+| &nbsp;`<output-format>`&nbsp; | Valid values are:<ul><li>For GeoJSON input: "csv", "svg", and "geojsonl"</li><li>For SHP input: "csv", "geojson", and "geojsonl"</li><li>For CSV input: "geojson", "geojsonl", "csv", and "svg"</li></ul> |
 
 <a name="geoconvert-options"></a>
 
diff --git a/docs/help/joinp.md b/docs/help/joinp.md
@@ -74,7 +74,7 @@ qsv joinp --help
 |--------|------|-------------|--------|
 | &nbsp;`--try-parsedates`&nbsp; | flag | When set, will attempt to parse the columns as dates. If the parse fails, columns remain as strings. This is useful when the join keys are formatted as dates with differing date formats, as the date formats will be normalized. Note that this will be automatically enabled when using asof joins. |  |
 | &nbsp;`--infer-len`&nbsp; | string | The number of rows to scan when inferring the schema of the CSV. Set to 0 to do a full table scan (warning: very slow). Only used when --cache-schema is 0 or 1 and no cached schema exists or when --infer-len is 0. | `10000` |
-| &nbsp;`--cache-schema`&nbsp; | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0. ‎ -2: treat all columns as String. A Polars schema file is created & cached. ‎ -1: treat all columns as String. No Polars schema file is created. 0: do not cache Polars schema. Uses --infer-len to infer schema. 1: cache Polars schema with the following behavior: * If schema file exists and is newer than input: use cached schema * If schema file missing/outdated and stats cache exists: derive schema from stats and cache it * If no schema or stats cache: infer schema using --infer-len and cache the result Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json) NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. | `0` |
+| &nbsp;`--cache-schema`&nbsp; | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0.<br>-2: treat all columns as String. A Polars schema file is created & cached.<br>-1: treat all columns as String. No Polars schema file is created.<br>0: do not cache Polars schema. Uses --infer-len to infer schema.<br>1: cache Polars schema with the following behavior:<ul><li>If schema file exists and is newer than input: use cached schema</li><li>If schema file missing/outdated and stats cache exists: derive schema from stats and cache it</li><li>If no schema or stats cache: infer schema using --infer-len and cache the result</li></ul> Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json).<br>NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. | `0` |
 | &nbsp;`--low-memory`&nbsp; | flag | Use low memory mode when parsing CSVs. This will use less memory but will be slower. It will also process the join in streaming mode. Only use this when you get out of memory errors. |  |
 | &nbsp;`--no-optimizations`&nbsp; | flag | Disable non-default join optimizations. This will make joins slower. Only use this when you get join errors. |  |
 | &nbsp;`--ignore-errors`&nbsp; | flag | Ignore errors when parsing CSVs. If set, rows with errors will be skipped. If not set, the query will fail. Only use this when debugging queries, as polars does batched parsing and will skip the entire batch where the error occurred. To get more detailed error messages, set the environment variable POLARS_BACKTRACE_IN_ERR=1 before running the join. |  |
diff --git a/docs/help/pragmastat.md b/docs/help/pragmastat.md
@@ -20,6 +20,10 @@ This is a "smart" command that uses the stats cache to work smarter & faster.
 When a stats cache is available, non-numeric columns are automatically filtered out
 (unless --select is explicitly provided) and Date/DateTime columns are supported.
 
+By default, one-sample mode appends 7 ps_* columns to the .stats.csv cache file
+(like moarstats). Use --standalone for the old standalone CSV output. Two-sample,
+compare1, and compare2 modes always produce standalone output.
+
 Input handling
 * Only finite numeric values are used; non-numeric/NaN/Inf are ignored.
 * Date/DateTime columns are supported when a stats cache is available
@@ -99,12 +103,18 @@ Valid metrics for compare2: shift, ratio, disparity
 
 ## Examples [↩](#nav)
 
-> Basic one-sample statistics
+> Append pragmastat columns to stats cache (default one-sample behavior)
 
 ```console
 qsv pragmastat data.csv
 ```
 
+> Standalone one-sample output (old behavior)
+
+```console
+qsv pragmastat --standalone data.csv
+```
+
 > One-sample statistics with selected columns
 
 ```console
@@ -158,13 +168,17 @@ qsv pragmastat --help
 
 ## Pragmastat Options [↩](#nav)
 
-| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
+| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
 |--------|------|-------------|--------|
 | &nbsp;`-t,`<br>`--twosample`&nbsp; | flag | Compute two-sample estimators for all column pairs. |  |
 | &nbsp;`--compare1`&nbsp; | string | One-sample confirmatory analysis. Test center/spread against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare2. |  |
 | &nbsp;`--compare2`&nbsp; | string | Two-sample confirmatory analysis. Test shift/ratio/disparity against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare1. |  |
 | &nbsp;`-s,`<br>`--select`&nbsp; | string | Select columns for analysis. Uses qsv's column selection syntax. Non-numeric columns appear with n=0. In two-sample mode, all pairs of selected columns are computed. |  |
 | &nbsp;`-m,`<br>`--misrate`&nbsp; | string | Probability that bounds fail to contain the true parameter. Lower values produce wider bounds. Must be achievable for the given sample size. | `0.001` |
+| &nbsp;`--standalone`&nbsp; | flag | Output one-sample results as standalone CSV instead of appending to the stats cache. |  |
+| &nbsp;`--stats-options`&nbsp; | string | Options to pass to the stats command if baseline stats need to be generated. The options are passed as a single string that will be split by whitespace. | `--infer-dates --infer-boolean --mad --quartiles --force --stats-jsonl` |
+| &nbsp;`--round`&nbsp; | string | Round statistics to <n> decimal places. Rounding follows Midpoint Nearest Even (Bankers Rounding) rule. | `4` |
+| &nbsp;`--force`&nbsp; | flag | Force recomputing ps_* columns even if they already exist in the stats cache. |  |
 
 <a name="common-options"></a>
 
diff --git a/docs/help/stats.md b/docs/help/stats.md
@@ -266,7 +266,7 @@ qsv stats --help
 | &nbsp;`--force`&nbsp; | flag | Force recomputing stats even if valid precomputed stats cache exists. |  |
 | &nbsp;`-j,`<br>`--jobs`&nbsp; | string | The number of jobs to run in parallel. This works only when the given CSV has an index. Note that a file handle is opened for each job. When not set, the number of jobs is set to the number of CPUs detected. |  |
 | &nbsp;`--stats-jsonl`&nbsp; | flag | Also write the stats in JSONL format. If set, the stats will be written to <FILESTEM>.stats.csv.data.jsonl. Note that this option used internally by other qsv "smart" commands (see <https://github.com/dathere/qsv/blob/master/docs/PERFORMANCE.md#stats-cache>) to load cached stats to make them work smarter & faster. You can preemptively create the stats-jsonl file by using this option BEFORE running "smart" commands and they will automatically use it. |  |
-| &nbsp;`-c,`<br>`--cache-threshold`&nbsp; | string | Controls the creation of stats cache files. * when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached. * 0 to suppress caching. * 1 to force caching. * a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept. | `5000` |
+| &nbsp;`-c,`<br>`--cache-threshold`&nbsp; | string | Controls the creation of stats cache files.<ul><li>when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached.</li><li>0 to suppress caching.</li><li>1 to force caching.</li><li>a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept.</li></ul> | `5000` |
 | &nbsp;`--vis-whitespace`&nbsp; | flag | Visualize whitespace characters in the output. See <https://github.com/dathere/qsv/wiki/Supplemental#whitespace-markers> for the list of whitespace markers. |  |
 
 <a name="common-options"></a>
diff --git a/src/cmd/describegpt.rs b/src/cmd/describegpt.rs
@@ -301,11 +301,10 @@ describegpt options:
 
                            CACHING OPTIONS:
     --no-cache             Disable default disk cache.
-  --disk-cache-dir <dir>   The directory <dir> to store the disk cache. Note that if the directory
-                           does not exist, it will be created. If the directory exists, it will be used as is,
-                           and will not be flushed. This option allows you to maintain several disk caches
-                           for different describegpt jobs (e.g. one for a data portal, another for internal
-                           data exchange, etc.)
+  --disk-cache-dir <dir>   The directory to store the disk cache. Note that if the directory does not exist,
+                           it will be created. If the directory exists, it will be used as is, and will not
+                           be flushed. This option allows you to maintain several disk caches for different
+                           describegpt jobs (e.g. one for a data portal, another for internal data exchange).
                            [default: ~/.qsv/cache/describegpt]
     --redis-cache          Use Redis instead of the default disk cache to cache LLM completions.
                            It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool
diff --git a/src/cmd/joinp.rs b/src/cmd/joinp.rs
@@ -121,16 +121,16 @@ joinp options:
                            Ignored when --infer-len is 0.
                            ‎ -2: treat all columns as String. A Polars schema file is created & cached.
                            ‎ -1: treat all columns as String. No Polars schema file is created.
-                             0: do not cache Polars schema. Uses --infer-len to infer schema.
-                             1: cache Polars schema with the following behavior:
+                           ‎  0: do not cache Polars schema. Uses --infer-len to infer schema.
+                           ‎  1: cache Polars schema with the following behavior:
                                 * If schema file exists and is newer than input: use cached schema
                                 * If schema file missing/outdated and stats cache exists: 
                                   derive schema from stats and cache it
                                 * If no schema or stats cache: infer schema using --infer-len 
                                   and cache the result
                                 Schema files use the same name as input with .pschema.json extension
-                                (e.g., data.csv -> data.pschema.json)
-                           NOTE: If the input files have pschema.json files that are newer or created
+                                (e.g., data.csv -> data.pschema.json).
+                           ‎NOTE: If the input files have pschema.json files that are newer or created
                            at the same time as the input files, they will be used to inform the join
                            operation regardless of the value of --cache-schema unless --infer-len is 0.
                            [default: 0]
diff --git a/src/help_markdown_gen.rs b/src/help_markdown_gen.rs