|
| 1 | +# qsv Data Wrangling - Workflow Guide |
| 2 | + |
| 3 | +This CLAUDE.md was auto-deployed by the qsv plugin to provide workflow guidance. |
| 4 | +You can edit or replace it — it will NOT be overwritten on future sessions. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Workflow Order |
| 9 | + |
| 10 | +For new files: |
| 11 | +1. **`qsv_list_files`** to discover files in the working directory |
| 12 | +2. **`qsv_index`** for files >10MB (enables faster processing) |
| 13 | +3. **`qsv_stats --cardinality --stats-jsonl`** to create a stats cache |
| 14 | +4. Then run analysis/transformation commands |
| 15 | + |
| 16 | +The stats cache accelerates: `frequency`, `schema`, `tojsonl`, `sqlp`, `joinp`, `pivotp`, `describegpt`, `moarstats`, `sample`. |
| 17 | + |
| 18 | +SQL queries on CSV inputs auto-convert to Parquet before execution. |
| 19 | + |
| 20 | +## File Handling |
| 21 | + |
| 22 | +- Save outputs to files with descriptive names rather than returning large results to chat. |
| 23 | +- Ensure output files are saved to the qsv working directory. |
| 24 | +- **Parquet** is ONLY for `sqlp`/DuckDB; all other qsv commands require CSV/TSV/SSV input. |
| 25 | +- The working directory is automatically synced from the MCP client's root directory when available. |
| 26 | +- If the auto-synced directory is incorrect or no root is provided, call **`qsv_set_working_dir`** to set it manually. |
| 27 | +- In Claude Cowork, verify the working directory matches the "Work in a folder" path by calling **`qsv_get_working_dir`**, and correct it with **`qsv_set_working_dir`** if needed. |
| 28 | + |
| 29 | +## Tool Composition |
| 30 | + |
| 31 | +- **`qsv_sqlp`** auto-converts CSV inputs to Parquet, then routes to DuckDB when available for better SQL compatibility and performance; falls back to Polars SQL otherwise. |
| 32 | +- For multi-file SQL queries, convert all files to Parquet first with **`qsv_to_parquet`**, then use `read_parquet()` references in SQL. |
| 33 | +- For custom row-level logic, use **`qsv_command`** with `command="luau"`. |
| 34 | + |
| 35 | +## Memory Limits |
| 36 | + |
| 37 | +Commands `dedup`, `sort`, `reverse`, `table`, `transpose`, `pragmastat` load entire files into memory. |
| 38 | + |
| 39 | +For files >1GB, prefer `extdedup`/`extsort` alternatives via **`qsv_command`**. |
| 40 | + |
| 41 | +Check column cardinality with **`qsv_stats`** before running `frequency` or `pivotp` to avoid huge output. |
| 42 | + |
| 43 | +## Tool Discovery |
| 44 | + |
| 45 | +Use **`qsv_search_tools`** to discover commands beyond the initially loaded core tools. There are 56 qsv commands available covering selection, filtering, transformation, aggregation, joining, validation, formatting, conversion, and more. |
| 46 | + |
| 47 | +## Operation Timeout |
| 48 | + |
| 49 | +qsv operations can take significant time on larger files. The MCP server's default operation timeout is 10 minutes (configurable via `QSV_MCP_OPERATION_TIMEOUT_MS`, max 30 minutes). Allow operations to run to completion. Check the current timeout with **`qsv_config`**. |
0 commit comments