@@ -26,8 +26,8 @@ cargo install --git https://github.com/aisrael/datu
2626| Parquet (` .parquet ` , ` .parq ` ) | ✓ | ✓ | — |
2727| Avro (` .avro ` ) | ✓ | ✓ | — |
2828| ORC (` .orc ` ) | ✓ | ✓ | — |
29+ | CSV (` .csv ` ) | ✓ | ✓ | ✓ |
2930| XLSX (` .xlsx ` ) | — | ✓ | — |
30- | CSV (` .csv ` ) | — | ✓ | ✓ |
3131| JSON (` .json ` ) | — | ✓ | ✓ |
3232| JSON (pretty) | — | — | ✓ |
3333| YAML | — | — | ✓ |
@@ -36,6 +36,8 @@ cargo install --git https://github.com/aisrael/datu
3636- ** Write** — Output file formats for ` convert ` .
3737- ** Display** — Output format when printing to stdout (` schema ` , ` head ` , ` tail ` via ` --output ` : csv, json, json-pretty, yaml).
3838
39+ ** CSV options:** When reading CSV files, the ` --has-headers ` option controls whether the first row is treated as column names. Omitted or ` --has-headers ` means true (header present); ` --has-headers=false ` for headerless CSV. Applies to ` convert ` , ` count ` , ` schema ` , ` head ` , and ` tail ` .
40+
3941Usage
4042=====
4143
@@ -60,9 +62,9 @@ Perform the same conversion and column filtering.
6062
6163### ` schema `
6264
63- Display the schema of a Parquet, Avro, or ORC file (column names, types, and nullability). Useful for inspecting file structure without reading data.
65+ Display the schema of a Parquet, Avro, CSV, or ORC file (column names, types, and nullability). Useful for inspecting file structure without reading data. CSV schema uses type inference from the data.
6466
65- ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
67+ ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ).
6668
6769** Usage:**
6870
@@ -75,6 +77,7 @@ datu schema <FILE> [OPTIONS]
7577| Option | Description |
7678| --------| -------------|
7779| ` --output <FORMAT> ` | Output format: ` csv ` , ` json ` , ` json-pretty ` , or ` yaml ` . Case insensitive. Default: ` csv ` . |
80+ | ` --has-headers [BOOL] ` | For CSV input: whether the first row is a header. Default: true when omitted. Use ` --has-headers=false ` for headerless CSV. |
7881
7982** Output formats:**
8083
@@ -104,25 +107,35 @@ datu schema events.avro -o YAML
104107
105108### ` count `
106109
107- Return the number of rows in a Parquet, Avro, or ORC file.
110+ Return the number of rows in a Parquet, Avro, CSV, or ORC file.
108111
109- ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
112+ ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ).
110113
111114** Usage:**
112115
113116``` sh
114- datu count < FILE>
117+ datu count < FILE> [OPTIONS]
115118```
116119
120+ ** Options:**
121+
122+ | Option | Description |
123+ | --------| -------------|
124+ | ` --has-headers [BOOL] ` | For CSV input: whether the first row is a header. Default: true when omitted. Use ` --has-headers=false ` for headerless CSV. |
125+
117126** Examples:**
118127
119128``` sh
120129# Count rows in a Parquet file
121130datu count data.parquet
122131
123- # Count rows in an Avro or ORC file
132+ # Count rows in an Avro, CSV, or ORC file
124133datu count events.avro
134+ datu count data.csv
125135datu count data.orc
136+
137+ # Count rows in a headerless CSV file
138+ datu count data.csv --has-headers=false
126139```
127140
128141---
@@ -131,7 +144,7 @@ datu count data.orc
131144
132145Convert data between supported formats. Input and output formats are inferred from file extensions.
133146
134- ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
147+ ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ).
135148
136149** Supported output formats:** CSV (` .csv ` ), JSON (` .json ` ), Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ), XLSX (` .xlsx ` ).
137150
@@ -149,23 +162,30 @@ datu convert <INPUT> <OUTPUT> [OPTIONS]
149162| ` --limit <N> ` | Maximum number of records to read from the input. |
150163| ` --sparse ` | For JSON/YAML: omit keys with null/missing values. Default: true. Use ` --sparse=false ` to include default values (e.g. empty string). |
151164| ` --json-pretty ` | When converting to JSON, format output with indentation and newlines. Ignored for other output formats. |
165+ | ` --has-headers [BOOL] ` | For CSV input: whether the first row is a header. Default: true when omitted. Use ` --has-headers=false ` for headerless CSV. |
152166
153167** Examples:**
154168
155169``` sh
156170# Parquet to CSV (all columns)
157171datu convert data.parquet data.csv
158172
173+ # CSV to Parquet (with automatic type inference)
174+ datu convert data.csv data.parquet
175+
159176# Parquet to Avro (first 1000 rows)
160177datu convert data.parquet data.avro --limit 1000
161178
162179# Avro to CSV, only specific columns
163180datu convert events.avro events.csv --select id,timestamp,user_id
164181
182+ # CSV to JSON with headerless input
183+ datu convert data.csv output.json --has-headers=false
184+
165185# Parquet to Parquet with column subset
166186datu convert input.parq output.parquet --select one,two,three
167187
168- # Parquet, Avro, or ORC to Excel (.xlsx)
188+ # Parquet, Avro, CSV, or ORC to Excel (.xlsx)
169189datu convert data.parquet report.xlsx
170190
171191# Parquet or Avro to ORC
@@ -179,9 +199,9 @@ datu convert data.parquet data.json
179199
180200### ` head `
181201
182- Print the first N rows of a Parquet, Avro, or ORC file to stdout (default CSV; use ` --output ` for other formats).
202+ Print the first N rows of a Parquet, Avro, CSV, or ORC file to stdout (default CSV; use ` --output ` for other formats).
183203
184- ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
204+ ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ).
185205
186206** Usage:**
187207
@@ -197,6 +217,7 @@ datu head <INPUT> [OPTIONS]
197217| ` --output <FORMAT> ` | Output format: ` csv ` , ` json ` , ` json-pretty ` , or ` yaml ` . Case insensitive. Default: ` csv ` . |
198218| ` --sparse ` | For JSON/YAML: omit keys with null/missing values. Default: true. Use ` --sparse=false ` to include default values. |
199219| ` --select <COLUMNS>... ` | Columns to include. If not specified, all columns are printed. Same format as ` convert --select ` . |
220+ | ` --has-headers [BOOL] ` | For CSV input: whether the first row is a header. Default: true when omitted. Use ` --has-headers=false ` for headerless CSV. |
200221
201222** Examples:**
202223
@@ -207,21 +228,25 @@ datu head data.parquet
207228# First 100 rows
208229datu head data.parquet -n 100
209230datu head data.avro --number 100
231+ datu head data.csv -n 100
210232datu head data.orc --number 100
211233
212234# First 20 rows, specific columns
213235datu head data.parquet -n 20 --select id,name,email
236+
237+ # Head from a headerless CSV file
238+ datu head data.csv --has-headers=false
214239```
215240
216241---
217242
218243### ` tail `
219244
220- Print the last N rows of a Parquet, Avro, or ORC file to stdout (default CSV; use ` --output ` for other formats).
245+ Print the last N rows of a Parquet, Avro, CSV, or ORC file to stdout (default CSV; use ` --output ` for other formats).
221246
222- ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
247+ ** Supported input formats:** Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ).
223248
224- > ** Note:** For Avro files, ` tail ` requires a full file scan since Avro does not support random access to the end of the file.
249+ > ** Note:** For Avro and CSV files, ` tail ` requires a full file scan since these formats do not support random access to the end of the file.
225250
226251** Usage:**
227252
@@ -237,6 +262,7 @@ datu tail <INPUT> [OPTIONS]
237262| ` --output <FORMAT> ` | Output format: ` csv ` , ` json ` , ` json-pretty ` , or ` yaml ` . Case insensitive. Default: ` csv ` . |
238263| ` --sparse ` | For JSON/YAML: omit keys with null/missing values. Default: true. Use ` --sparse=false ` to include default values. |
239264| ` --select <COLUMNS>... ` | Columns to include. If not specified, all columns are printed. Same format as ` convert --select ` . |
265+ | ` --has-headers [BOOL] ` | For CSV input: whether the first row is a header. Default: true when omitted. Use ` --has-headers=false ` for headerless CSV. |
240266
241267** Examples:**
242268
@@ -247,6 +273,7 @@ datu tail data.parquet
247273# Last 50 rows
248274datu tail data.parquet -n 50
249275datu tail data.avro --number 50
276+ datu tail data.csv -n 50
250277datu tail data.orc --number 50
251278
252279# Last 20 rows, specific columns
@@ -285,10 +312,11 @@ read("input") |> ... |> write("output")
285312
286313#### ` read(path) `
287314
288- Read a data file. Supported formats: Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), ORC (` .orc ` ).
315+ Read a data file. Supported formats: Parquet (` .parquet ` , ` .parq ` ), Avro (` .avro ` ), CSV ( ` .csv ` ), ORC (` .orc ` ). CSV files are assumed to have a header row by default .
289316
290317``` text
291318> read("data.parquet") |> write("data.csv")
319+ > read("data.csv") |> write("data.parquet")
292320```
293321
294322#### ` write(path) `
0 commit comments