Skip to content

Commit 8f57e87

Browse files
jqnatividadclaude
andauthored
fix(help): fine-tune markdown generation of docopt usage text (#3600)
* fix(help): process list items in help parsing properly Treat lines beginning with "- " or "* " as list items when parsing help text instead of as section breaks. Such lines are now appended as HTML bullets ("<br>• " + content with the marker removed). Also adjust the break condition so a hyphen followed by a space ("- ") is not interpreted as a new option/section. Changes applied to parse_arguments_section and parse_option_line. * docs(describegpt): wordsmith usage text to get rid of `<dir>` which was causing markdown formatting error * refactor(help): use inline HTML for bullets so they render properly in GH markdown inside tables * docs(help): render bullets in options properly * docs(help): render new pragmastat options * docs(help): update help markdown to remove `<dir>` causing markdown rendering issues also capitalize MCP * fix(help): separate overly-merged list items and trim trailing whitespace in bullets Use indentation tracking to detect when a non-bullet line in a list is post-list text rather than a continuation of the current bullet item. Also trim trailing whitespace from list item content before closing </li>. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(help): Treat U+200E as line break in help generator Recognize the Left-to-Right mark (U+200E) as an intentional line break when generating help markdown and render it as <br> in output. Update help text for the joinp command to use HTML <br> for line breaks and fix minor punctuation/formatting in the cached schema option. This ensures docopt-inserted U+200E markers (used to avoid parsing negative numbers as flags) are displayed properly in generated docs. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 63cad7c commit 8f57e87

File tree

9 files changed

+111
-24
lines changed

9 files changed

+111
-24
lines changed

docs/help/describegpt.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
**[Table of Contents](TableOfContents.md)** | **Source: [src/cmd/describegpt.rs](https://github.com/dathere/qsv/blob/master/src/cmd/describegpt.rs)** | [🌐](TableOfContents.md#legend "has web-aware options.")[🤖](TableOfContents.md#legend "command uses Natural Language Processing or Generative AI.")[🪄](TableOfContents.md#legend "\"automagical\" commands that uses stats and/or frequency tables to work \"smarter\" & \"faster\".")[🗃️](TableOfContents.md#legend "Limited Extended input support.")[📚](TableOfContents.md#legend "has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[⛩️](TableOfContents.md#legend "uses Mini Jinja template engine.") [![CKAN](../images/ckan.png)](TableOfContents.md#legend "has CKAN-aware integration options.")
66

77
<a name="nav"></a>
8-
[Description](#description) | [Examples](#examples) | [Usage](#usage) | [Data Analysis/Inferencing Options](#data-analysis/inferencing-options) | [Dictionary Options](#dictionary-options) | [Tag Options](#tag-options) | [Stats/Frequency Options](#stats/frequency-options) | [Custom Prompt Options](#custom-prompt-options) | [LLM API Options](#llm-api-options) | [Caching Options](#caching-options) | [Mcp Sampling Options](#mcp-sampling-options) | [Common Options](#common-options)
8+
[Description](#description) | [Examples](#examples) | [Usage](#usage) | [Data Analysis/Inferencing Options](#data-analysis/inferencing-options) | [Dictionary Options](#dictionary-options) | [Tag Options](#tag-options) | [Stats/Frequency Options](#stats/frequency-options) | [Custom Prompt Options](#custom-prompt-options) | [LLM API Options](#llm-api-options) | [Caching Options](#caching-options) | [MCP Sampling Options](#mcp-sampling-options) | [Common Options](#common-options)
99

1010
<a name="description"></a>
1111

@@ -276,15 +276,15 @@ qsv describegpt --help
276276
| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
277277
|--------|------|-------------|--------|
278278
| &nbsp;`--no-cache`&nbsp; | flag | Disable default disk cache. | |
279-
| &nbsp;`--disk-cache-dir`&nbsp; | string | The directory <dir> to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange, etc.) | `~/.qsv/cache/describegpt` |
279+
| &nbsp;`--disk-cache-dir`&nbsp; | string | The directory to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange). | `~/.qsv/cache/describegpt` |
280280
| &nbsp;`--redis-cache`&nbsp; | flag | Use Redis instead of the default disk cache to cache LLM completions. It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool size of 20, with a TTL of 28 days, and cache hits NOT refreshing an existing cached value's TTL. This option automatically disables the disk cache. | |
281281
| &nbsp;`--fresh`&nbsp; | flag | Send a fresh request to the LLM API, refreshing a cached response if it exists. When a --prompt SQL query fails, you can also use this option to request the LLM to generate a new SQL query. | |
282282
| &nbsp;`--forget`&nbsp; | flag | Remove a cached response if it exists and then exit. | |
283283
| &nbsp;`--flush-cache`&nbsp; | flag | Flush the current cache entries on startup. WARNING: This operation is irreversible. | |
284284

285285
<a name="mcp-sampling-options"></a>
286286

287-
## Mcp Sampling Options [](#nav)
287+
## MCP Sampling Options [](#nav)
288288

289289
| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
290290
|--------|------|-------------|--------|

docs/help/excel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ qsv excel --help
171171
| &nbsp;`--table`&nbsp; | string | An Excel table (case-insensitive) to extract to a CSV. Only valid for XLSX files. The --sheet option is ignored as a table could be in any sheet. Overrides --range option. | |
172172
| &nbsp;`--range`&nbsp; | string | An Excel format range - like RangeName, C:T, C3:T25 or 'Sheet1!C3:T25' to extract to the CSV. If the specified range contains the required sheet, the --sheet option is ignored. If the range is not found, qsv will exit with an error. | |
173173
| &nbsp;`--cell`&nbsp; | string | A single cell reference - like C3 or 'Sheet1!C3' to extract. This is a convenience option equivalent to --range C3:C3. If both --cell and --range are specified, --cell takes precedence. | |
174-
| &nbsp;`--error-format`&nbsp; | string | The format to use when formatting error cells. There are 3 formats: * "code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!) * "formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0) * "both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1) | `code` |
174+
| &nbsp;`--error-format`&nbsp; | string | The format to use when formatting error cells. There are 3 formats:<ul><li>"code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!)</li><li>"formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0)</li><li>"both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1)</li></ul> | `code` |
175175
| &nbsp;`--flexible`&nbsp; | flag | Continue even if the number of columns is different from row to row. | |
176176
| &nbsp;`--trim`&nbsp; | flag | Trim all fields so that leading & trailing whitespaces are removed. Also removes embedded linebreaks. | |
177177
| &nbsp;`--date-format`&nbsp; | string | Optional date format to use when formatting dates. See <https://docs.rs/chrono/latest/chrono/format/strftime/index.html> for the full list of supported format specifiers. Note that if a date format is invalid, qsv will fall back and return the date as if no date-format was specified. | |

docs/help/geoconvert.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ qsv geoconvert --help
5959
|----------|-------------|
6060
| &nbsp;`<input>`&nbsp; | The spatial file to convert. To use stdin instead, use a dash "-". Note: SHP input must be a path to a .shp file and cannot use stdin. |
6161
| &nbsp;`<input-format>`&nbsp; | Valid values are "geojson", "shp", and "csv" |
62-
| &nbsp;`<output-format>`&nbsp; | Valid values are: |
62+
| &nbsp;`<output-format>`&nbsp; | Valid values are:<ul><li>For GeoJSON input: "csv", "svg", and "geojsonl"</li><li>For SHP input: "csv", "geojson", and "geojsonl"</li><li>For CSV input: "geojson", "geojsonl", "csv", and "svg"</li></ul> |
6363

6464
<a name="geoconvert-options"></a>
6565

docs/help/joinp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ qsv joinp --help
7474
|--------|------|-------------|--------|
7575
| &nbsp;`--try-parsedates`&nbsp; | flag | When set, will attempt to parse the columns as dates. If the parse fails, columns remain as strings. This is useful when the join keys are formatted as dates with differing date formats, as the date formats will be normalized. Note that this will be automatically enabled when using asof joins. | |
7676
| &nbsp;`--infer-len`&nbsp; | string | The number of rows to scan when inferring the schema of the CSV. Set to 0 to do a full table scan (warning: very slow). Only used when --cache-schema is 0 or 1 and no cached schema exists or when --infer-len is 0. | `10000` |
77-
| &nbsp;`--cache-schema`&nbsp; | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0.-2: treat all columns as String. A Polars schema file is created & cached.-1: treat all columns as String. No Polars schema file is created. 0: do not cache Polars schema. Uses --infer-len to infer schema. 1: cache Polars schema with the following behavior: * If schema file exists and is newer than input: use cached schema * If schema file missing/outdated and stats cache exists: derive schema from stats and cache it * If no schema or stats cache: infer schema using --infer-len and cache the result Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json) NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. | `0` |
77+
| &nbsp;`--cache-schema`&nbsp; | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0.<br>-2: treat all columns as String. A Polars schema file is created & cached.<br>-1: treat all columns as String. No Polars schema file is created.<br>0: do not cache Polars schema. Uses --infer-len to infer schema.<br>1: cache Polars schema with the following behavior:<ul><li>If schema file exists and is newer than input: use cached schema</li><li>If schema file missing/outdated and stats cache exists: derive schema from stats and cache it</li><li>If no schema or stats cache: infer schema using --infer-len and cache the result</li></ul> Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json).<br>NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. | `0` |
7878
| &nbsp;`--low-memory`&nbsp; | flag | Use low memory mode when parsing CSVs. This will use less memory but will be slower. It will also process the join in streaming mode. Only use this when you get out of memory errors. | |
7979
| &nbsp;`--no-optimizations`&nbsp; | flag | Disable non-default join optimizations. This will make joins slower. Only use this when you get join errors. | |
8080
| &nbsp;`--ignore-errors`&nbsp; | flag | Ignore errors when parsing CSVs. If set, rows with errors will be skipped. If not set, the query will fail. Only use this when debugging queries, as polars does batched parsing and will skip the entire batch where the error occurred. To get more detailed error messages, set the environment variable POLARS_BACKTRACE_IN_ERR=1 before running the join. | |

docs/help/pragmastat.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ This is a "smart" command that uses the stats cache to work smarter & faster.
2020
When a stats cache is available, non-numeric columns are automatically filtered out
2121
(unless --select is explicitly provided) and Date/DateTime columns are supported.
2222

23+
By default, one-sample mode appends 7 ps_* columns to the .stats.csv cache file
24+
(like moarstats). Use --standalone for the old standalone CSV output. Two-sample,
25+
compare1, and compare2 modes always produce standalone output.
26+
2327
Input handling
2428
* Only finite numeric values are used; non-numeric/NaN/Inf are ignored.
2529
* Date/DateTime columns are supported when a stats cache is available
@@ -99,12 +103,18 @@ Valid metrics for compare2: shift, ratio, disparity
99103

100104
## Examples [](#nav)
101105

102-
> Basic one-sample statistics
106+
> Append pragmastat columns to stats cache (default one-sample behavior)
103107
104108
```console
105109
qsv pragmastat data.csv
106110
```
107111

112+
> Standalone one-sample output (old behavior)
113+
114+
```console
115+
qsv pragmastat --standalone data.csv
116+
```
117+
108118
> One-sample statistics with selected columns
109119
110120
```console
@@ -158,13 +168,17 @@ qsv pragmastat --help
158168

159169
## Pragmastat Options [](#nav)
160170

161-
| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
171+
| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Type | Description | Default |
162172
|--------|------|-------------|--------|
163173
| &nbsp;`-t,`<br>`--twosample`&nbsp; | flag | Compute two-sample estimators for all column pairs. | |
164174
| &nbsp;`--compare1`&nbsp; | string | One-sample confirmatory analysis. Test center/spread against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare2. | |
165175
| &nbsp;`--compare2`&nbsp; | string | Two-sample confirmatory analysis. Test shift/ratio/disparity against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare1. | |
166176
| &nbsp;`-s,`<br>`--select`&nbsp; | string | Select columns for analysis. Uses qsv's column selection syntax. Non-numeric columns appear with n=0. In two-sample mode, all pairs of selected columns are computed. | |
167177
| &nbsp;`-m,`<br>`--misrate`&nbsp; | string | Probability that bounds fail to contain the true parameter. Lower values produce wider bounds. Must be achievable for the given sample size. | `0.001` |
178+
| &nbsp;`--standalone`&nbsp; | flag | Output one-sample results as standalone CSV instead of appending to the stats cache. | |
179+
| &nbsp;`--stats-options`&nbsp; | string | Options to pass to the stats command if baseline stats need to be generated. The options are passed as a single string that will be split by whitespace. | `--infer-dates --infer-boolean --mad --quartiles --force --stats-jsonl` |
180+
| &nbsp;`--round`&nbsp; | string | Round statistics to <n> decimal places. Rounding follows Midpoint Nearest Even (Bankers Rounding) rule. | `4` |
181+
| &nbsp;`--force`&nbsp; | flag | Force recomputing ps_* columns even if they already exist in the stats cache. | |
168182

169183
<a name="common-options"></a>
170184

docs/help/stats.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ qsv stats --help
266266
| &nbsp;`--force`&nbsp; | flag | Force recomputing stats even if valid precomputed stats cache exists. | |
267267
| &nbsp;`-j,`<br>`--jobs`&nbsp; | string | The number of jobs to run in parallel. This works only when the given CSV has an index. Note that a file handle is opened for each job. When not set, the number of jobs is set to the number of CPUs detected. | |
268268
| &nbsp;`--stats-jsonl`&nbsp; | flag | Also write the stats in JSONL format. If set, the stats will be written to <FILESTEM>.stats.csv.data.jsonl. Note that this option used internally by other qsv "smart" commands (see <https://github.com/dathere/qsv/blob/master/docs/PERFORMANCE.md#stats-cache>) to load cached stats to make them work smarter & faster. You can preemptively create the stats-jsonl file by using this option BEFORE running "smart" commands and they will automatically use it. | |
269-
| &nbsp;`-c,`<br>`--cache-threshold`&nbsp; | string | Controls the creation of stats cache files. * when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached. * 0 to suppress caching. * 1 to force caching. * a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept. | `5000` |
269+
| &nbsp;`-c,`<br>`--cache-threshold`&nbsp; | string | Controls the creation of stats cache files.<ul><li>when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached.</li><li>0 to suppress caching.</li><li>1 to force caching.</li><li>a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept.</li></ul> | `5000` |
270270
| &nbsp;`--vis-whitespace`&nbsp; | flag | Visualize whitespace characters in the output. See <https://github.com/dathere/qsv/wiki/Supplemental#whitespace-markers> for the list of whitespace markers. | |
271271

272272
<a name="common-options"></a>

src/cmd/describegpt.rs

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -301,11 +301,10 @@ describegpt options:
301301
302302
CACHING OPTIONS:
303303
--no-cache Disable default disk cache.
304-
--disk-cache-dir <dir> The directory <dir> to store the disk cache. Note that if the directory
305-
does not exist, it will be created. If the directory exists, it will be used as is,
306-
and will not be flushed. This option allows you to maintain several disk caches
307-
for different describegpt jobs (e.g. one for a data portal, another for internal
308-
data exchange, etc.)
304+
--disk-cache-dir <dir> The directory to store the disk cache. Note that if the directory does not exist,
305+
it will be created. If the directory exists, it will be used as is, and will not
306+
be flushed. This option allows you to maintain several disk caches for different
307+
describegpt jobs (e.g. one for a data portal, another for internal data exchange).
309308
[default: ~/.qsv/cache/describegpt]
310309
--redis-cache Use Redis instead of the default disk cache to cache LLM completions.
311310
It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool

src/cmd/joinp.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -121,16 +121,16 @@ joinp options:
121121
Ignored when --infer-len is 0.
122122
‎ -2: treat all columns as String. A Polars schema file is created & cached.
123123
‎ -1: treat all columns as String. No Polars schema file is created.
124-
0: do not cache Polars schema. Uses --infer-len to infer schema.
125-
1: cache Polars schema with the following behavior:
124+
0: do not cache Polars schema. Uses --infer-len to infer schema.
125+
1: cache Polars schema with the following behavior:
126126
* If schema file exists and is newer than input: use cached schema
127127
* If schema file missing/outdated and stats cache exists:
128128
derive schema from stats and cache it
129129
* If no schema or stats cache: infer schema using --infer-len
130130
and cache the result
131131
Schema files use the same name as input with .pschema.json extension
132-
(e.g., data.csv -> data.pschema.json)
133-
NOTE: If the input files have pschema.json files that are newer or created
132+
(e.g., data.csv -> data.pschema.json).
133+
NOTE: If the input files have pschema.json files that are newer or created
134134
at the same time as the input files, they will be used to inform the join
135135
operation regardless of the value of --cache-schema unless --infer-len is 0.
136136
[default: 0]

0 commit comments

Comments
 (0)