Skip to content

Add --describe mode for summary statistics #159

Description

@vmvarela

Description

Load the data, print per-column statistics to stdout, exit. No query needed.

Examples

$ sql-pipe --describe sales.csv
Column     Type      NonNull  Null  Unique  Min         Max         Top3
─────────  ────────  ───────  ────  ──────  ──────────  ──────────  ──────────────────
id         INTEGER   12483    0     12483   1           12483       -
region     TEXT      12483    0     3       -           -           AMER(5200), EMEA(4100), APAC(3183)
amount     REAL      12400    83    8742    0.50        9999.99     -
date       DATE      12483    0     365     2024-01-01  2024-12-31  -

Acceptance Criteria

  • --describe flag loads data and prints summary statistics
  • Statistics include: column name, type, non-null count, null count, unique count, min, max
  • For TEXT columns with low cardinality, show top 3 values with counts
  • For numeric/date columns, show min and max values
  • Output is formatted as an aligned table
  • Works with file arguments and stdin
  • Compatible with all input formats (CSV, TSV, JSON, NDJSON, XML)
  • All existing tests pass
  • New tests cover statistics computation and output formatting

Notes

  • Implementation: ~300 lines
  • Reuse type inference infrastructure
  • Reuse table formatting code from pretty-print issue
  • For large files (>100K rows), use approximate unique counts or skip
  • Fits existing mode pattern (--columns, --validate, --sample, --describe)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions