You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(help): fine-tune markdown generation of docopt usage text (#3600)
* fix(help): process list items in help parsing properly
Treat lines beginning with "- " or "* " as list items when parsing help text instead of as section breaks. Such lines are now appended as HTML bullets ("<br>• " + content with the marker removed). Also adjust the break condition so a hyphen followed by a space ("- ") is not interpreted as a new option/section. Changes applied to parse_arguments_section and parse_option_line.
* docs(describegpt): wordsmith usage text to get rid of `<dir>`
which was causing markdown formatting error
* refactor(help): use inline HTML for bullets so they render properly in GH markdown inside tables
* docs(help): render bullets in options properly
* docs(help): render new pragmastat options
* docs(help): update help markdown to remove `<dir>` causing markdown rendering issues
also capitalize MCP
* fix(help): separate overly-merged list items and trim trailing whitespace in bullets
Use indentation tracking to detect when a non-bullet line in a list is
post-list text rather than a continuation of the current bullet item.
Also trim trailing whitespace from list item content before closing </li>.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(help): Treat U+200E as line break in help generator
Recognize the Left-to-Right mark (U+200E) as an intentional line break when generating help markdown and render it as <br> in output. Update help text for the joinp command to use HTML <br> for line breaks and fix minor punctuation/formatting in the cached schema option. This ensures docopt-inserted U+200E markers (used to avoid parsing negative numbers as flags) are displayed properly in generated docs.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/help/describegpt.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
**[Table of Contents](TableOfContents.md)** | **Source: [src/cmd/describegpt.rs](https://github.com/dathere/qsv/blob/master/src/cmd/describegpt.rs)** | [🌐](TableOfContents.md#legend"has web-aware options.")[🤖](TableOfContents.md#legend"command uses Natural Language Processing or Generative AI.")[🪄](TableOfContents.md#legend"\"automagical\" commands that uses stats and/or frequency tables to work \"smarter\" & \"faster\".")[🗃️](TableOfContents.md#legend"Limited Extended input support.")[📚](TableOfContents.md#legend"has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[⛩️](TableOfContents.md#legend"uses Mini Jinja template engine.")[](TableOfContents.md#legend"has CKAN-aware integration options.")
| Option | Type | Description | Default |
277
277
|--------|------|-------------|--------|
278
278
| `--no-cache` | flag | Disable default disk cache. ||
279
-
| `--disk-cache-dir` | string | The directory <dir> to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange, etc.)|`~/.qsv/cache/describegpt`|
279
+
| `--disk-cache-dir` | string | The directory to store the disk cache. Note that if the directory does not exist, it will be created. If the directory exists, it will be used as is, and will not be flushed. This option allows you to maintain several disk caches for different describegpt jobs (e.g. one for a data portal, another for internal data exchange).|`~/.qsv/cache/describegpt`|
280
280
| `--redis-cache` | flag | Use Redis instead of the default disk cache to cache LLM completions. It connects to "redis://127.0.0.1:6379/3" by default, with a connection pool size of 20, with a TTL of 28 days, and cache hits NOT refreshing an existing cached value's TTL. This option automatically disables the disk cache. ||
281
281
| `--fresh` | flag | Send a fresh request to the LLM API, refreshing a cached response if it exists. When a --prompt SQL query fails, you can also use this option to request the LLM to generate a new SQL query. ||
282
282
| `--forget` | flag | Remove a cached response if it exists and then exit. ||
283
283
| `--flush-cache` | flag | Flush the current cache entries on startup. WARNING: This operation is irreversible. ||
284
284
285
285
<aname="mcp-sampling-options"></a>
286
286
287
-
## Mcp Sampling Options [↩](#nav)
287
+
## MCP Sampling Options [↩](#nav)
288
288
289
289
| Option | Type | Description | Default |
Copy file name to clipboardExpand all lines: docs/help/excel.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -171,7 +171,7 @@ qsv excel --help
171
171
| `--table` | string | An Excel table (case-insensitive) to extract to a CSV. Only valid for XLSX files. The --sheet option is ignored as a table could be in any sheet. Overrides --range option. ||
172
172
| `--range` | string | An Excel format range - like RangeName, C:T, C3:T25 or 'Sheet1!C3:T25' to extract to the CSV. If the specified range contains the required sheet, the --sheet option is ignored. If the range is not found, qsv will exit with an error. ||
173
173
| `--cell` | string | A single cell reference - like C3 or 'Sheet1!C3' to extract. This is a convenience option equivalent to --range C3:C3. If both --cell and --range are specified, --cell takes precedence. ||
174
-
| `--error-format` | string | The format to use when formatting error cells. There are 3 formats: * "code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!) * "formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0) * "both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1) |`code`|
174
+
| `--error-format` | string | The format to use when formatting error cells. There are 3 formats:<ul><li>"code": return the error code. (#DIV/0!; #N/A; #NAME?; #NULL!; #NUM!; #REF!; #VALUE!; #DATA!)</li><li>"formula": return the formula, prefixed with '#'. (e.g. #=A1/B1 where B1 is 0; #=100/0)</li><li>"both": return both error code and the formula. (e.g. #DIV/0!: =A1/B1)</li></ul>|`code`|
175
175
| `--flexible` | flag | Continue even if the number of columns is different from row to row. ||
176
176
| `--trim` | flag | Trim all fields so that leading & trailing whitespaces are removed. Also removes embedded linebreaks. ||
177
177
| `--date-format` | string | Optional date format to use when formatting dates. See <https://docs.rs/chrono/latest/chrono/format/strftime/index.html> for the full list of supported format specifiers. Note that if a date format is invalid, qsv will fall back and return the date as if no date-format was specified. ||
Copy file name to clipboardExpand all lines: docs/help/geoconvert.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@ qsv geoconvert --help
59
59
|----------|-------------|
60
60
| `<input>` | The spatial file to convert. To use stdin instead, use a dash "-". Note: SHP input must be a path to a .shp file and cannot use stdin. |
61
61
| `<input-format>` | Valid values are "geojson", "shp", and "csv" |
Copy file name to clipboardExpand all lines: docs/help/joinp.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ qsv joinp --help
74
74
|--------|------|-------------|--------|
75
75
| `--try-parsedates` | flag | When set, will attempt to parse the columns as dates. If the parse fails, columns remain as strings. This is useful when the join keys are formatted as dates with differing date formats, as the date formats will be normalized. Note that this will be automatically enabled when using asof joins. ||
76
76
| `--infer-len` | string | The number of rows to scan when inferring the schema of the CSV. Set to 0 to do a full table scan (warning: very slow). Only used when --cache-schema is 0 or 1 and no cached schema exists or when --infer-len is 0. |`10000`|
77
-
| `--cache-schema` | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0. -2: treat all columns as String. A Polars schema file is created & cached. -1: treat all columns as String. No Polars schema file is created.0: do not cache Polars schema. Uses --infer-len to infer schema.1: cache Polars schema with the following behavior: * If schema file exists and is newer than input: use cached schema * If schema file missing/outdated and stats cache exists: derive schema from stats and cache it * If no schema or stats cache: infer schema using --infer-len and cache the result Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json)NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. |`0`|
77
+
| `--cache-schema` | string | Create and cache Polars schema JSON files. Ignored when --infer-len is 0.<br>-2: treat all columns as String. A Polars schema file is created & cached.<br>-1: treat all columns as String. No Polars schema file is created.<br>0: do not cache Polars schema. Uses --infer-len to infer schema.<br>1: cache Polars schema with the following behavior:<ul><li>If schema file exists and is newer than input: use cached schema</li><li>If schema file missing/outdated and stats cache exists: derive schema from stats and cache it</li><li>If no schema or stats cache: infer schema using --infer-len and cache the result</li></ul> Schema files use the same name as input with .pschema.json extension (e.g., data.csv -> data.pschema.json).<br>NOTE: If the input files have pschema.json files that are newer or created at the same time as the input files, they will be used to inform the join operation regardless of the value of --cache-schema unless --infer-len is 0. |`0`|
78
78
| `--low-memory` | flag | Use low memory mode when parsing CSVs. This will use less memory but will be slower. It will also process the join in streaming mode. Only use this when you get out of memory errors. ||
79
79
| `--no-optimizations` | flag | Disable non-default join optimizations. This will make joins slower. Only use this when you get join errors. ||
80
80
| `--ignore-errors` | flag | Ignore errors when parsing CSVs. If set, rows with errors will be skipped. If not set, the query will fail. Only use this when debugging queries, as polars does batched parsing and will skip the entire batch where the error occurred. To get more detailed error messages, set the environment variable POLARS_BACKTRACE_IN_ERR=1 before running the join. ||
> Append pragmastat columns to stats cache (default one-sample behavior)
103
107
104
108
```console
105
109
qsv pragmastat data.csv
106
110
```
107
111
112
+
> Standalone one-sample output (old behavior)
113
+
114
+
```console
115
+
qsv pragmastat --standalone data.csv
116
+
```
117
+
108
118
> One-sample statistics with selected columns
109
119
110
120
```console
@@ -158,13 +168,17 @@ qsv pragmastat --help
158
168
159
169
## Pragmastat Options [↩](#nav)
160
170
161
-
| Option | Type | Description | Default |
171
+
| Option | Type | Description | Default |
162
172
|--------|------|-------------|--------|
163
173
| `-t,`<br>`--twosample` | flag | Compute two-sample estimators for all column pairs. ||
164
174
| `--compare1` | string | One-sample confirmatory analysis. Test center/spread against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare2. ||
165
175
| `--compare2` | string | Two-sample confirmatory analysis. Test shift/ratio/disparity against thresholds. Format: metric:value[,metric:value,...]. Mutually exclusive with --twosample and --compare1. ||
166
176
| `-s,`<br>`--select` | string | Select columns for analysis. Uses qsv's column selection syntax. Non-numeric columns appear with n=0. In two-sample mode, all pairs of selected columns are computed. ||
167
177
| `-m,`<br>`--misrate` | string | Probability that bounds fail to contain the true parameter. Lower values produce wider bounds. Must be achievable for the given sample size. |`0.001`|
178
+
| `--standalone` | flag | Output one-sample results as standalone CSV instead of appending to the stats cache. ||
179
+
| `--stats-options` | string | Options to pass to the stats command if baseline stats need to be generated. The options are passed as a single string that will be split by whitespace. |`--infer-dates --infer-boolean --mad --quartiles --force --stats-jsonl`|
180
+
| `--round` | string | Round statistics to <n> decimal places. Rounding follows Midpoint Nearest Even (Bankers Rounding) rule. |`4`|
181
+
| `--force` | flag | Force recomputing ps_* columns even if they already exist in the stats cache. ||
Copy file name to clipboardExpand all lines: docs/help/stats.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -266,7 +266,7 @@ qsv stats --help
266
266
| `--force` | flag | Force recomputing stats even if valid precomputed stats cache exists. ||
267
267
| `-j,`<br>`--jobs` | string | The number of jobs to run in parallel. This works only when the given CSV has an index. Note that a file handle is opened for each job. When not set, the number of jobs is set to the number of CPUs detected. ||
268
268
| `--stats-jsonl` | flag | Also write the stats in JSONL format. If set, the stats will be written to <FILESTEM>.stats.csv.data.jsonl. Note that this option used internally by other qsv "smart" commands (see <https://github.com/dathere/qsv/blob/master/docs/PERFORMANCE.md#stats-cache>) to load cached stats to make them work smarter & faster. You can preemptively create the stats-jsonl file by using this option BEFORE running "smart" commands and they will automatically use it. ||
269
-
| `-c,`<br>`--cache-threshold` | string | Controls the creation of stats cache files. * when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached. * 0 to suppress caching. * 1 to force caching. * a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept. |`5000`|
269
+
| `-c,`<br>`--cache-threshold` | string | Controls the creation of stats cache files.<ul><li>when greater than 1, the threshold in milliseconds before caching stats results. If a stats run takes longer than this threshold, the stats results will be cached.</li><li>0 to suppress caching.</li><li>1 to force caching.</li><li>a negative number to automatically create an index when the input file size is greater than abs(arg) in bytes. If the negative number ends with 5, it will delete the index file and the stats cache file after the stats run. Otherwise, the index file and the cache files are kept.</li></ul>|`5000`|
270
270
| `--vis-whitespace` | flag | Visualize whitespace characters in the output. See <https://github.com/dathere/qsv/wiki/Supplemental#whitespace-markers> for the list of whitespace markers. ||
0 commit comments