diff --git a/api/compression/compress_chunk.md b/api/compression/compress_chunk.md index d964535992..9f67b98b96 100644 --- a/api/compression/compress_chunk.md +++ b/api/compression/compress_chunk.md @@ -51,11 +51,11 @@ SELECT compress_chunk('_timescaledb_internal._hyper_1_2_chunk'); ## Optional arguments -| Name | Type | Default | Required | Description | -|----------------------|--|---------|--|----------------------------------------------------------------------------------------------------------------------------------------------------| -| `chunk` | REGCLASS | - |✔| Name of the chunk to add to the $COLUMNSTORE. | -| `if_not_columnstore` | BOOLEAN | `true` |✖| Set to `false` so this job fails with an error rather than a warning if `chunk` is already in the $COLUMNSTORE. | -| `recompress` | BOOLEAN | `false` |✖| Set to true to recompress. In-memory recompression will be attempted first; otherwise it will fall back to internal decompress/compress. | +| Name | Type | Default | Required | Description | +|----------------------|--|---------|--|--------------------------------------------------------------------------------------------------------------------------------| +| `chunk` | REGCLASS | - |✔| Name of the chunk to add to the $COLUMNSTORE. | +| `if_not_columnstore` | BOOLEAN | `true` |✖| Set to `false` so this job fails with an error rather than a warning if `chunk` is already in the $COLUMNSTORE. | +| `recompress` | BOOLEAN | `false` |✖| Set to true to recompress. In-memory recompression is attempted first; it falls back to internal decompress/compress. | ## Returns diff --git a/api/hypercore/convert_to_columnstore.md b/api/hypercore/convert_to_columnstore.md index 916c0d9847..7856de202c 100644 --- a/api/hypercore/convert_to_columnstore.md +++ b/api/hypercore/convert_to_columnstore.md @@ -34,11 +34,11 @@ CALL convert_to_columnstore('_timescaledb_internal._hyper_1_2_chunk'); ## Arguments -| Name | Type | Default | Required | Description | -|----------------------|--|---------|--|----------------------------------------------------------------------------------------------------------------------------------------------------| -| `chunk` | REGCLASS | - |✔| Name of the chunk to add to the $COLUMNSTORE. | -| `if_not_columnstore` | BOOLEAN | `true` |✖| Set to `false` so this job fails with an error rather than a warning if `chunk` is already in the $COLUMNSTORE. | -| `recompress` | BOOLEAN | `false` |✖| Set to true to recompress. In-memory recompression will be attempted first; otherwise it will fall back to internal decompress/compress. | +| Name | Type | Default | Required | Description | +|----------------------|--|---------|--|---------------------------------------------------------------------------------------------------------------------------------| +| `chunk` | REGCLASS | - |✔| Name of the chunk to add to the $COLUMNSTORE. | +| `if_not_columnstore` | BOOLEAN | `true` |✖| Set to `false` so this job fails with an error rather than a warning if `chunk` is already in the $COLUMNSTORE. | +| `recompress` | BOOLEAN | `false` |✖| Set to true to recompress. In-memory recompression is attempted first; it falls back to internal decompress/compress. | ## Returns diff --git a/use-timescale/write-data/index.md b/use-timescale/write-data/index.md index 51b86efb3d..7b90313133 100644 --- a/use-timescale/write-data/index.md +++ b/use-timescale/write-data/index.md @@ -18,12 +18,12 @@ using `INSERT`, `UPDATE`, and `DELETE` statements. * [Upsert data][upsert] into hypertables * [Delete data][delete] from hypertables -For more information about using third-party tools to write data -into $TIMESCALE_DB, see the [Ingest data from other sources][ingest-data] section. +To find out how to add and sync data to your $SERVICE_SHORT from other sources, see +[Import and sync][ingest-data]. [about-writing-data]: /use-timescale/:currentVersion:/write-data/about-writing-data/ [delete]: /use-timescale/:currentVersion:/write-data/delete/ -[ingest-data]: /use-timescale/:currentVersion:/ingest-data/ +[ingest-data]: /migrate/:currentVersion: [insert]: /use-timescale/:currentVersion:/write-data/insert/ [update]: /use-timescale/:currentVersion:/write-data/update/ [upsert]: /use-timescale/:currentVersion:/write-data/upsert/ diff --git a/use-timescale/write-data/insert.md b/use-timescale/write-data/insert.md index 3d79f95444..220d88e031 100644 --- a/use-timescale/write-data/insert.md +++ b/use-timescale/write-data/insert.md @@ -1,22 +1,25 @@ --- title: Insert data -excerpt: Insert single and multiple rows and return data in TimescaleDB with SQL +excerpt: Insert single and multiple rows and bulk load data into TimescaleDB with SQL products: [cloud, mst, self_hosted] -keywords: [ingest] -tags: [insert, write, hypertables] +keywords: [ingest, bulk load] +tags: [insert, write, hypertables, copy] --- import EarlyAccess2230 from "versionContent/_partials/_early_access_2_23_0.mdx"; # Insert data -Insert data into a hypertable with a standard [`INSERT`][postgres-insert] SQL -command. +You insert data into a $HYPERTABLE using the following standard SQL commands: + +- `INSERT`: single rows or small batches +- `COPY`: bulk data loading + +To improve performance, insert time series data directly to the columnstore using [direct compress][direct-compress]. ## Insert a single row -To insert a single row into a hypertable, use the syntax `INSERT INTO ... -VALUES`. For example, to insert data into a hypertable named `conditions`: +To insert a single row into a $HYPERTABLE, use the syntax `INSERT INTO ... VALUES`: ```sql INSERT INTO conditions(time, location, temperature, humidity) @@ -25,11 +28,11 @@ INSERT INTO conditions(time, location, temperature, humidity) ## Insert multiple rows -You can also insert multiple rows into a hypertable using a single `INSERT` -call. This works even for thousands of rows at a time. This is more efficient -than inserting data row-by-row, and is recommended when possible. +A more efficient method to insert row-by-row is to insert multiple rows into a $HYPERTABLE using a single +`INSERT` call. This works even for thousands of rows at a time. $TIMESCALE_DB batches the rows by chunk, then writes to +each chunk in a single transaction. -Use the same syntax, separating rows with a comma: +You use the same syntax, separating rows with a comma: ```sql INSERT INTO conditions @@ -39,18 +42,13 @@ INSERT INTO conditions (NOW(), 'garage', 77.0, 65.2); ``` - - -You can insert multiple rows belonging to different -chunks within the same `INSERT` statement. Behind the scenes, $TIMESCALE_DB batches the rows by chunk, and writes to each chunk in a single -transaction. - - +If you `INSERT` unsorted data, call [convert_to_columnstore('', recompress => true)][convert_to_columnstore] +on the $CHUNK to reorder and optimize your data. ## Insert and return data -In the same `INSERT` command, you can return some or all of the inserted data by -adding a `RETURNING` clause. For example, to return all the inserted data, run: +You can return some or all of the inserted data by adding a `RETURNING` clause to the `INSERT` command. For example, +to return all the inserted data, run: ```sql INSERT INTO conditions @@ -67,29 +65,96 @@ time | location | temperature | humidity (1 row) ``` -## Direct compress on INSERT +If you `INSERT` unsorted data, call [convert_to_columnstore('', recompress => true)][convert_to_columnstore] +on the $CHUNK to reorder and optimize your data. -This columnar format enables fast scanning and -aggregation, optimizing performance for analytical workloads while also saving significant storage space. In the -$COLUMNSTORE conversion, $HYPERTABLE chunks are compressed by up to 98%, and organized for efficient, large-scale -queries. +## Bulk insert with COPY -To improve performance, you can compress data during `INSERT` so that it is injected directly into chunks -in the $COLUMNSTORE rather than waiting for the policy. +The `COPY` command is the most efficient way to load large amounts of data into a $HYPERTABLE. For +bulk data loading, `COPY` can be 2-3x faster or more than `INSERT`, especially when combined with +[direct compress][direct-compress]. -To enable direct compress on INSERT, enable the following [GUC parameters][gucs]: +`COPY` supports loading from: -```sql -SET timescaledb.enable_compressed_insert = true; -SET timescaledb.enable_compressed_insert_sort_batches = true; -SET timescaledb.enable_compressed_insert_client_sorted = true; -``` +- **CSV files**: + + ```sql + COPY conditions(time, location, temperature, humidity) + FROM '/path/to/data.csv' + WITH (FORMAT CSV, HEADER); + ``` -When you set `enable_compressed_insert_client_sorted` to `true`, you must ensure that data in the input -stream is sorted. +- **Standard input** + + To load data from your application or script using standard input: + + ```sql + COPY conditions(time, location, temperature, humidity) + FROM STDIN + WITH (FORMAT CSV); + ``` + + To signal the end of input, add `\.` on a new line. + +- **Program output** + + To load data generated by a program or script: + + ```sql + COPY conditions(time, location, temperature, humidity) + FROM PROGRAM 'generate_data.sh' + WITH (FORMAT CSV); + ``` + +If you `COPY` unsorted data, call [convert_to_columnstore('', recompress => true)][convert_to_columnstore] +on the $CHUNK to reorder and optimize your data. + +## Improve performance with direct compress +The columnar format in the $COLUMNSTORE enables fast scanning and aggregation, optimizing performance for +analytical workloads while also saving significant storage space. In the $COLUMNSTORE conversion, $HYPERTABLE chunks are +compressed by up to 98%, and organized for efficient, large-scale queries. + +To improve performance, compress data during the `INSERT` and `COPY` operations so that it is injected +directly into chunks in the $COLUMNSTORE rather than waiting for the policy. Direct compress writes data in the +compressed format in memory, significantly reducing I/O and improving ingestion performance. + +When you enable direct compress, ensure that your data is already sorted by the table's compression `order_by` columns. +Incorrectly sorted data results in poor compression and query performance. + +- **Enable direct compress on `INSERT`** + + Set the following [GUC parameters][gucs]: + ```sql + SET timescaledb.enable_direct_compress_insert = true; + SET timescaledb.enable_direct_compress_insert_client_sorted = true; + ``` + +- **Enable direct compress on `COPY`** + + Set the following [GUC parameter][gucs]: + + ```sql + SET timescaledb.enable_direct_compress_copy = true; + SET timescaledb.enable_direct_compress_copy_client_sorted = true; + ``` + + - **Optimal batch size**: best results with batches of 1,000 to 10,000 records + - **Cardinality**: high cardinality datasets do not compress well and may degrade query performance + - **Batch format**: the columnstore is optimized for 1,000 records per batch per segment + - **WAL efficiency**: compressed batches are written to WAL rather than individual tuples + - **Continuous aggregates**: not supported with direct compress + - **Unique constraints**: tables with unique constraints cannot use direct compress + + + + +[postgres-insert]: https://www.postgresql.org/docs/current/sql-insert.html +[postgres-copy]: https://www.postgresql.org/docs/current/sql-copy.html +[upsert]: /use-timescale/:currentVersion:/write-data/upsert/ +[gucs]: /api/:currentVersion:/configuration/gucs/ [postgres-update]: https://www.postgresql.org/docs/current/sql-update.html [hypertable-create-table]: /api/:currentVersion:/hypertable/create_table/ [add_columnstore_policy]: /api/:currentVersion:/hypercore/add_columnstore_policy/ @@ -97,6 +162,4 @@ stream is sorted. [create_table_arguments]: /api/:currentVersion:/hypertable/create_table/#arguments [alter_job_samples]: /api/:currentVersion:/jobs-automation/alter_job/#samples [convert_to_columnstore]: /api/:currentVersion:/hypercore/convert_to_columnstore/ -[gucs]: /api/:currentVersion:/configuration/gucs/ - -[postgres-insert]: https://www.postgresql.org/docs/current/sql-insert.html +[direct-compress]: /use-timescale/:currentVersion:/write-data/insert/#improve-performance-with-direct-compress diff --git a/use-timescale/write-data/upsert.md b/use-timescale/write-data/upsert.md index 70ca96dbab..5bffab3f27 100644 --- a/use-timescale/write-data/upsert.md +++ b/use-timescale/write-data/upsert.md @@ -2,77 +2,57 @@ title: Upsert data excerpt: Insert a new row or update an existing row in a hypertable using UPSERT products: [cloud, mst, self_hosted] -keywords: [upsert, hypertables] +keywords: [upsert, hypertables, bulk load, copy] +tags: [insert, write, unique constraints] --- # Upsert data -Upserting is an operation that performs both: +Upserting is an operation to add data to your database where: -* Inserting a new row if a matching row doesn't already exist -* Either updating the existing row, or doing nothing, if a matching row - already exists +* **A matching row does not exist**: inserts a new row +* **A matching row exists**: either updates the existing row, or does nothing -Upserts only work when you have a unique index or constraint. A matching row is -one that has identical values for the columns covered by the index or -constraint. +## Upsert, unique indexes, and constraints - - -In $PG, a primary key is a unique index with a `NOT NULL` constraint. +Upserts work when you have a unique index or constraint. A matching row is one that has identical values for the columns +covered by the index or constraint. In $PG, a primary key is a unique index with a `NOT NULL` constraint. If you have a primary key, you automatically have a unique index. - - -## Create a table with a unique constraint - -The examples in this section use a `conditions` table with a unique constraint -on the columns `(time, location)`. To create a unique constraint, use `UNIQUE -()` while defining your table: - -```sql -CREATE TABLE conditions ( - time TIMESTAMPTZ NOT NULL, - location TEXT NOT NULL, - temperature DOUBLE PRECISION NULL, - humidity DOUBLE PRECISION NULL, - UNIQUE (time, location) -); -``` +Unique constraints must include all partitioning columns. That means unique +constraints on a $HYPERTABLE must include the time column. If you added other +partitioning columns to your $HYPERTABLE, the constraint must include those as +well. For more information, see [Enforce constraints with unique indexes][hypertables-and-unique-indexes]. -You can also create a unique constraint after the table is created. Use the -syntax `ALTER TABLE ... ADD CONSTRAINT ... UNIQUE`. In this example, the -constraint is named `conditions_time_location`: -```sql -ALTER TABLE conditions - ADD CONSTRAINT conditions_time_location - UNIQUE (time, location); -``` +The examples in this page use a `conditions` table with a unique constraint +on the columns `(time, location)`. To create a unique constraint, either: -When you add a unique constraint to a table, you can't insert data that violates -the constraint. In other words, if you try to insert data that has identical -values to another row, within the columns covered by the constraint, you get an -error. +- Use `UNIQUE ()` when you define your table: - - -Unique constraints must include all partitioning columns. That means unique -constraints on a hypertable must include the time column. If you added other -partitioning columns to your hypertable, the constraint must include those as -well. For more information, see the section on -[hypertables and unique indexes](/use-timescale/latest/hypertables/hypertables-and-unique-indexes/). + ```sql + CREATE TABLE conditions ( + time TIMESTAMPTZ NOT NULL, + location TEXT NOT NULL, + temperature DOUBLE PRECISION NULL, + humidity DOUBLE PRECISION NULL, + UNIQUE (time, location) + ); + ``` - +- Use `ALTER TABLE` after the table is created: -## Insert or update data to a table with a unique constraint + ```sql + ALTER TABLE conditions + ADD CONSTRAINT conditions_time_location + UNIQUE (time, location); + ``` -You can tell the database to insert new data if it doesn't violate the -constraint, and to update the existing row if it does. Use the syntax `INSERT -INTO ... VALUES ... ON CONFLICT ... DO UPDATE`. +## Insert or update data -For example, to update the `temperature` and `humidity` values if a row with the -specified `time` and `location` already exists, run: +To insert new data that doesn't violate the constraint, and to update the existing row if it does, use the syntax +`INSERT INTO ... VALUES ... ON CONFLICT ... DO UPDATE`. For example, to update the `temperature` and `humidity` values +if a row with the specified `time` and `location` already exists, run: ```sql INSERT INTO conditions @@ -82,12 +62,11 @@ INSERT INTO conditions humidity = excluded.humidity; ``` -## Insert or do nothing to a table with a unique constraint +## Insert or do nothing -You can also tell the database to do nothing if the constraint is violated. The -new data is not inserted, and the old row is not updated. This is useful when -writing many rows as one batch, to prevent the entire transaction from failing. -The database engine skips the row and moves on. +You can also do nothing if the constraint is violated. The new data is not inserted, and the old row is not updated, +the database engine skips the row and moves on. This is useful to prevent the entire transaction from failing when +writing many rows as one batch. To insert or do nothing, use the syntax `INSERT INTO ... VALUES ... ON CONFLICT DO NOTHING`: @@ -98,4 +77,47 @@ INSERT INTO conditions ON CONFLICT DO NOTHING; ``` +## Bulk upsert using COPY + +When you need to upsert large amounts of data, `COPY` is significantly faster than `INSERT`. However, `COPY` doesn't +support `ON CONFLICT` clauses directly. Best practice is to use a staging table. This two-step approach combines the +speed of `COPY` for bulk loading with the flexibility of `INSERT...ON CONFLICT` for upsert logic. For large datasets, +this is much faster than using `INSERT...ON CONFLICT` directly. + +To load data efficiently with `COPY`, then upsert: + + + +1. **Create a staging table with the same structure as the destination table** + ```sql + CREATE TEMP TABLE conditions_staging (LIKE conditions); + ``` + +1. **Use `COPY` to bulk load data into the staging table** + ```sql + COPY conditions_staging(time, location, temperature, humidity) + FROM '/path/to/data.csv' + WITH (FORMAT CSV, HEADER); + ``` + +1. **Upsert from the staging table to the destination table** + ```sql + INSERT INTO conditions + SELECT * FROM conditions_staging + ON CONFLICT (time, location) DO UPDATE + SET temperature = EXCLUDED.temperature, + humidity = EXCLUDED.humidity; + ``` + To skip duplicate rows, set `ON CONFLICT (time, location) DO NOTHING`. + +1. **Clean up the staging table** + ```sql + DROP TABLE conditions_staging; + ``` + + + + [postgres-upsert]: https://www.postgresql.org/docs/current/static/sql-insert.html#SQL-ON-CONFLICT +[postgres-copy]: https://www.postgresql.org/docs/current/sql-copy.html +[hypertables-and-unique-indexes]: /use-timescale/:currentVersion:/hypertables/hypertables-and-unique-indexes/