Skip to content

Commit a0ceed2

Browse files
committed
Sync pg_clickhouse docs to the v0.1.4 release
1 parent 2ed7bad commit a0ceed2

File tree

4 files changed

+226
-70
lines changed

4 files changed

+226
-70
lines changed

docs/integrations/tools/data-integration/pg_clickhouse/introduction.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,6 @@ factor 1; ✔︎ indicates full pushdown, while a dash indicates a query
4444
cancellation after 1m. All tests run on a MacBook Pro M4 Max with 36 GB of
4545
memory.
4646

47-
<!-- cd dev/tpch && make ch && make pg && make run -->
48-
4947
| Query | PostgreSQL | pg_clickhouse | Pushdown |
5048
| ----------:| ----------:| -------------:|:--------:|
5149
| [Query 1] | 4693 ms | 268 ms | ✔︎ |
@@ -58,7 +56,7 @@ memory.
5856
| [Query 8] | 342 ms | 156 ms | ✔︎ |
5957
| [Query 9] | 3094 ms | 298 ms | ✔︎ |
6058
| [Query 10] | 581 ms | 197 ms | ✔︎ |
61-
| [Query 11] | 212 ms | 24 ms | ✔︎ |
59+
| [Query 11] | 212 ms | 24 ms | |
6260
| [Query 12] | 1116 ms | 84 ms | ✔︎ |
6361
| [Query 13] | 958 ms | 1368 ms | |
6462
| [Query 14] | 181 ms | 73 ms | ✔︎ |

docs/integrations/tools/data-integration/pg_clickhouse/reference.md

Lines changed: 162 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ DROP FOREIGN TABLE uact CASCADE;
357357
## DML SQL Reference {#dml-sql-reference}
358358
359359
The SQL [DML] expressions below may use pg_clickhouse. Examples depend on
360-
these ClickHouse tables, created by [make-logs.sql]:
360+
these ClickHouse tables:
361361
362362
```sql
363363
CREATE TABLE logs (
@@ -574,6 +574,15 @@ try=# EXECUTE avg_durations_between_dates('2025-12-09', '2025-12-13');
574574
(5 rows)
575575
```
576576
577+
:::warning
578+
Parameterized execution prevents the [http driver](#create-server) from
579+
properly converting DateTime time zones on ClickHouse versions prior to 25.8,
580+
when the [underlying bug] was [fixed]. Note that sometimes PostgreSQL will use
581+
a parameterized query plan even without using `PREPARE`. For any queries on
582+
that require accurate time zone conversion, and where upgrading to 25.8 or
583+
later is not an option, use the [binary driver](#create-server), instead.
584+
:::
585+
577586
pg_clickhouse pushes down the aggregations, as usual, as seen in the
578587
[EXPLAIN](#explain) verbose output:
579588
@@ -714,8 +723,16 @@ SET pg_clickhouse.session_settings TO $$
714723
$$;
715724
```
716725

717-
pg_clickhouse doesn't validate the settings, but passes them on to ClickHouse
718-
for every query. It thus supports all settings for each ClickHouse version.
726+
Some settings will be ignored in cases where they would interfere with the
727+
operation of pg_clickhouse itself. These include:
728+
729+
* `date_time_output_format`: the http driver requires it to be "iso"
730+
* `format_tsv_null_representation`: the http driver requires the default
731+
* `output_format_tsv_crlf_end_of_line` the http driver requires the default
732+
733+
Otherwise, pg_clickhouse does not validate the settings, but passes them on to
734+
ClickHouse for every query. It thus supports all settings for each ClickHouse
735+
version.
719736

720737
Note that pg_clickhouse must be loaded before setting
721738
`pg_clickhouse.session_settings`; either use [shared library preloading] or
@@ -773,19 +790,19 @@ shared_preload_libraries = pg_clickhouse
773790
Useful to save memory and load overhead for every session, but requires the
774791
cluster to be restart when the library is updated.
775792

776-
## Function and Operator Reference {#function-and-operator-reference}
777-
778-
### Data Types {#data-types}
793+
## Data Types {#data-types}
779794

780795
pg_clickhouse maps the following ClickHouse data types to PostgreSQL data
781-
types:
796+
types. [IMPORT FOREIGN SCHEMA](#import-foreign-schema) use the first type in
797+
the PostgreSQL column when importing columns; additional types may be used in
798+
[CREATE FOREIGN TABLE](#create-foreign-table) statements:
782799

783800
| ClickHouse | PostgreSQL | Notes |
784-
| -----------|------------------|-------------------------------|
801+
|------------|------------------|-------------------------------|
785802
| Bool | boolean | |
786803
| Date | date | |
787804
| Date32 | date | |
788-
| DateTime | timestamp | |
805+
| DateTime | timestamptz | |
789806
| Decimal | numeric | |
790807
| Float32 | real | |
791808
| Float64 | double precision | |
@@ -796,13 +813,136 @@ types:
796813
| Int64 | bigint | |
797814
| Int8 | smallint | |
798815
| JSON | jsonb | HTTP engine only |
799-
| String | text | |
816+
| String | text, bytea | |
800817
| UInt16 | integer | |
801818
| UInt32 | bigint | |
802819
| UInt64 | bigint | Errors on values > BIGINT max |
803820
| UInt8 | smallint | |
804821
| UUID | uuid | |
805822

823+
Additional notes and details follow.
824+
825+
### BYTEA {#bytea}
826+
827+
ClickHouse does not provide the equivalent of the PostgreSQL [BYTEA] type, but
828+
allows any bytes to be stored in [String] type. In general ClickHouse strings
829+
should be mapped to the PostgreSQL [TEXT], but when using binary data, map it
830+
to [BYTEA]. Example:
831+
832+
```sql
833+
-- Create clickHouse table with String columns.
834+
SELECT clickhouse_raw_query($$
835+
CREATE TABLE bytes (
836+
c1 Int8, c2 String, c3 String
837+
) ENGINE = MergeTree ORDER BY (c1);
838+
$$);
839+
840+
-- Create foreign table with BYTEA columns.
841+
CREATE FOREIGN TABLE bytes (
842+
c1 int,
843+
c2 BYTEA,
844+
c3 BYTEA
845+
) SERVER ch_srv OPTIONS( table_name 'bytes' );
846+
847+
-- Insert binary data into the foreign table.
848+
INSERT INTO bytes
849+
SELECT n, sha224(bytea('val'||n)), decode(md5('int'||n), 'hex')
850+
FROM generate_series(1, 4) n;
851+
852+
-- View the results.
853+
SELECT * FROM bytes;
854+
```
855+
856+
That final `SELECT` query will output:
857+
858+
```pgsql
859+
c1 | c2 | c3
860+
----+------------------------------------------------------------+------------------------------------
861+
1 | \x1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | \xae3b28cde02542f81acce8783245430d
862+
2 | \x5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | \x23e7c6cacb8383f878ad093b0027d72b
863+
3 | \x53ac2c1fa83c8f64603fe9568d883331007d6281de330a4b5e728f9e | \x7e969132fc656148b97b6a2ee8bc83c1
864+
4 | \x4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | \x8ef30f44c65480d12b650ab6b2b04245
865+
(4 rows)
866+
```
867+
868+
Note that if there are any nul bytes in the ClickHouse columns, a foreign
869+
table using [TEXT] columns will not output the proper values:
870+
871+
```sql
872+
-- Create foreign table with TEXT columns.
873+
CREATE FOREIGN TABLE texts (
874+
c1 int,
875+
c2 TEXT,
876+
c3 TEXT
877+
) SERVER ch_srv OPTIONS( table_name 'bytes' );
878+
879+
-- Encode binary data as hex.
880+
SELECT c1, encode(c2::bytea, 'hex'), encode(c3::bytea, 'hex') FROM texts ORDER BY c1;
881+
```
882+
883+
Will output:
884+
885+
```pgsql
886+
c1 | encode | encode
887+
----+----------------------------------------------------------+----------------------------------
888+
1 | 1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | ae3b28cde02542f81acce8783245430d
889+
2 | 5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | 23e7c6cacb8383f878ad093b
890+
3 | 53ac2c1fa83c8f64603fe9568d883331 | 7e969132fc656148b97b6a2ee8bc83c1
891+
4 | 4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | 8ef30f44c65480d12b650ab6b2b04245
892+
(4 rows)
893+
```
894+
895+
Note that rows two and three contain truncated values. This is because
896+
PostgreSQL relies on nul-terminated strings and does not support nuls in its
897+
strings.
898+
899+
Attempting to insert binary values into [TEXT] columns will succeed and work
900+
as expected:
901+
902+
```sql
903+
-- Insert via text columns:
904+
TRUNCATE texts;
905+
INSERT INTO texts
906+
SELECT n, sha224(bytea('val'||n)), decode(md5('int'||n), 'hex')
907+
FROM generate_series(1, 4) n;
908+
909+
-- View the data.
910+
SELECT c1, encode(c2::bytea, 'hex'), encode(c3::bytea, 'hex') FROM texts ORDER BY c1;
911+
```
912+
913+
The text columns will be correct:
914+
915+
```pgdsql
916+
917+
c1 | encode | encode
918+
----+----------------------------------------------------------+----------------------------------
919+
1 | 1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | ae3b28cde02542f81acce8783245430d
920+
2 | 5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | 23e7c6cacb8383f878ad093b0027d72b
921+
3 | 53ac2c1fa83c8f64603fe9568d883331007d6281de330a4b5e728f9e | 7e969132fc656148b97b6a2ee8bc83c1
922+
4 | 4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | 8ef30f44c65480d12b650ab6b2b04245
923+
(4 rows)
924+
```
925+
926+
But reading them as [BYTEA] will not:
927+
928+
```pgsql
929+
# SELECT * FROM bytes;
930+
c1 | c2 | c3
931+
----+------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------
932+
1 | \x5c783162663766306363383231643331313738363136613535613865306335323637373733353339376364646536663431353361396664336437 | \x5c786165336232386364653032353432663831616363653837383332343534333064
933+
2 | \x5c783566366539653132636438353932373132653633383031366634623161326537333233306565343064623439386330663062316463383431 | \x5c783233653763366361636238333833663837386164303933623030323764373262
934+
3 | \x5c783533616332633166613833633866363436303366653935363864383833333331303037643632383164653333306134623565373238663965 | \x5c783765393639313332666336353631343862393762366132656538626338336331
935+
4 | \x5c783465336332653463623735343261343531373361386461633933396464633462633735323032653334326562633736396230663564613266 | \x5c783865663330663434633635343830643132623635306162366232623034323435
936+
(4 rows)
937+
```
938+
939+
:::tip
940+
As a rule, only use [TEXT] columns for encoded strings and use [BYTEA] columns
941+
only for binary data, and never switch between them.
942+
:::
943+
944+
## Function and Operator Reference {#function-and-operator-reference}
945+
806946
### Functions {#functions}
807947

808948
These functions provide the interface to query a ClickHouse database.
@@ -883,6 +1023,7 @@ maps the following functions:
8831023
* `btrim`: [trimBoth](https://clickhouse.com/docs/sql-reference/functions/string-functions#trimboth)
8841024
* `strpos`: [position](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#position)
8851025
* `regexp_like`: [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1026+
* `md5`: [MD5](https://clickhouse.com/docs/sql-reference/functions/hash-functions#MD5)
8861027

8871028
### Custom Functions {#custom-functions}
8881029

@@ -1040,8 +1181,18 @@ Copyright (c) 2025-2026, ClickHouse
10401181
[dollar quoting]: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING
10411182
"PostgreSQL Docs: Dollar-Quoted String Constants"
10421183
[library preloading]: https://www.postgresql.org/docs/18/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-PRELOAD
1043-
"PostgreSQL Docs: Shared Library Preloading
1184+
"PostgreSQL Docs: Shared Library Preloading"
10441185
[PREPARE notes]: https://www.postgresql.org/docs/current/sql-prepare.html#SQL-PREPARE-NOTES
10451186
"PostgreSQL Docs: PREPARE notes"
10461187
[query parameters]: https://clickhouse.com/docs/guides/developer/stored-procedures-and-prepared-statements#alternatives-to-prepared-statements-in-clickhouse
10471188
"ClickHouse Docs: Alternatives to prepared statements in ClickHouse"
1189+
[underlying bug]: https://github.com/ClickHouse/ClickHouse/issues/85847
1190+
"ClickHouse/ClickHouse#85847 Some queries in a multipart forms don't read settings"
1191+
[fixed]: https://github.com/ClickHouse/ClickHouse/pull/85570
1192+
"ClickHouse/ClickHouse#85570 fix HTTP with multipart"
1193+
[BYTEA]: https://www.postgresql.org/docs/current/datatype-binary.html
1194+
"PostgreSQL Docs: Binary Data Types"
1195+
[String]: https://clickhouse.com/docs/sql-reference/data-types/string
1196+
"ClickHouse Docs: String"
1197+
[TEXT]: https://www.postgresql.org/docs/current/datatype-character.html
1198+
"PostgreSQL Docs: Character Types"

0 commit comments

Comments
 (0)