Postgres has CSV support on the COPY command, but COPY
has problems:
- It uses a special protocol, so it doesn't work with other standard features like prepared statements, pipeline mode or pgbench.
- Is not composable. You can't use COPY inside CTEs, subqueries, view definitions or as function arguments.
pg_csv
offers flexible CSV processing as a solution.
- Includes a CSV aggregate that composes with SQL expressions.
- Native C extension, x2 times faster than SQL queries that try to output CSV (see our CI results).
- No dependencies except Postgres.
Clone this repo and run:
make && make install
To install the extension:
create extension pg_csv;
Aggregate that builds a CSV respecting RFC 4180, quoting as required.
create table projects as
select *
from (
values
(1, 'Death Star OS', 1),
(2, 'Windows 95 Rebooted', 1),
(3, 'Project "Comma,Please"', 2),
(4, 'Escape ""Plan""', 2),
(NULL, 'NULL & Void', NULL)
) as _(id, name, client_id);
select csv_agg(x) from projects x;
csv_agg
--------------------------------
id,name,client_id +
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,"Project ""Comma,Please""",2+
4,"Escape """"Plan""""",2 +
,NULL & Void,
(1 row)
Custom delimiters can be used to produce different formats like pipe-separated values, tab-separated values or semicolon-separated values.
select csv_agg(x, csv_options(delimiter := '|')) from projects x;
csv_agg
-----------------------------
id|name|client_id +
1|Death Star OS|1 +
2|Windows 95 Rebooted|1 +
3|Open Source Lightsabers|2+
4|Galactic Payroll System|2+
7|Bugzilla Revival|3
(1 row)
select csv_agg(x, csv_options(delimiter := E'\t')) from projects x;
csv_agg
-----------------------------------
id name client_id +
1 Death Star OS 1 +
2 Windows 95 Rebooted 1+
3 Open Source Lightsabers 2+
4 Galactic Payroll System 2+
7 Bugzilla Revival 3
(1 row)
Note
- Newline, carriage return and double quotes are not supported as delimiters to maintain the integrity of the separated values format.
- The delimiter can only be a single char, if a longer string is specified only the first char will be used.
- Why use a
csv_options
constructor function instead of extra arguments? Aggregates don't support named arguments in postgres, see a discussion on #2 (comment).
You can include a byte-order mark (BOM) to make the CSV compatible with Excel.
select csv_agg(x, csv_options(bom := true)) from projects x;
csv_agg
-------------------
id,name,client_id+
1,Death Star OS,1
2,Windows 95 Rebooted,1
3,Open Source Lightsabers,2
4,Galactic Payroll System,2
5,Bugzilla Revival,3
(1 row)
You can omit or include the CSV header.
select csv_agg(x, csv_options(header := false)) from projects x;
csv_agg
-----------------------------
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,Open Source Lightsabers,2+
4,Galactic Payroll System,2+
7,Bugzilla Revival,3
(1 row)
NULL values are represented by an empty string by default. This can be changed with the nullstr
option.
SELECT csv_agg(x, csv_options(nullstr:='<NULL>')) AS body
FROM projects x;
body
--------------------------------
id,name,client_id +
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,"Project ""Comma,Please""",2+
4,"Escape """"Plan""""",2 +
<NULL>,NULL & Void,<NULL>
(1 row)
- For large bulk exports and imports,
COPY ... CSV
should still be preferred as its faster due to streaming support.