Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 95 additions & 38 deletions use-timescale/write-data/insert.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
---
title: Insert data
excerpt: Insert single and multiple rows and return data in TimescaleDB with SQL
excerpt: Insert single and multiple rows and bulk load data into TimescaleDB with SQL
products: [cloud, mst, self_hosted]
keywords: [ingest]
tags: [insert, write, hypertables]
keywords: [ingest, bulk load]
tags: [insert, write, hypertables, copy]
---

import EarlyAccess2230 from "versionContent/_partials/_early_access_2_23_0.mdx";

# Insert data

Insert data into a hypertable with a standard [`INSERT`][postgres-insert] SQL
command.
You insert data into a $HYPERTABLE using the following standard SQL commands:

- `INSERT`: single rows or small batches
- `COPY`: bulk data loading

To improve performance, insert time series data directly to the columnstore using [direct compress][direct-compress].

## Insert a single row

To insert a single row into a hypertable, use the syntax `INSERT INTO ...
VALUES`. For example, to insert data into a hypertable named `conditions`:
To insert a single row into a $HYPERTABLE, use the syntax `INSERT INTO ... VALUES`:

```sql
INSERT INTO conditions(time, location, temperature, humidity)
Expand All @@ -25,11 +28,11 @@ INSERT INTO conditions(time, location, temperature, humidity)

## Insert multiple rows

You can also insert multiple rows into a hypertable using a single `INSERT`
call. This works even for thousands of rows at a time. This is more efficient
than inserting data row-by-row, and is recommended when possible.
A more efficient method to row-by-row is to insert multiple rows into a $HYPERTABLE using a single
`INSERT` call. This works even for thousands of rows at a time. $TIMESCALE_DB batches the rows by chunk, then writes to
each chunk in a single transaction.

Use the same syntax, separating rows with a comma:
You use the same syntax, separating rows with a comma:

```sql
INSERT INTO conditions
Expand All @@ -39,18 +42,12 @@ INSERT INTO conditions
(NOW(), 'garage', 77.0, 65.2);
```

<Highlight type="note">

You can insert multiple rows belonging to different
chunks within the same `INSERT` statement. Behind the scenes, $TIMESCALE_DB batches the rows by chunk, and writes to each chunk in a single
transaction.

</Highlight>

## Insert and return data

In the same `INSERT` command, you can return some or all of the inserted data by
adding a `RETURNING` clause. For example, to return all the inserted data, run:
In the same `INSERT` command,
You can return some or all of the inserted data by adding a `RETURNING` clause to the `INSERT` command. For example,
to return all the inserted data, run:

```sql
INSERT INTO conditions
Expand All @@ -67,36 +64,96 @@ time | location | temperature | humidity
(1 row)
```

## Direct compress on INSERT
## Bulk insert with COPY

This columnar format enables fast scanning and
aggregation, optimizing performance for analytical workloads while also saving significant storage space. In the
$COLUMNSTORE conversion, $HYPERTABLE chunks are compressed by up to 98%, and organized for efficient, large-scale
queries.
The `COPY` command is the most efficient way to load large amounts of data into a $HYPERTABLE. For
bulk data loading, `COPY` can be 2-3x faster or more than `INSERT`, especially when combined with
[direct compression][direct-compress].

To improve performance, you can compress data during `INSERT` so that it is injected directly into chunks
in the $COLUMNSTORE rather than waiting for the policy.
`COPY` supports loading from:

To enable direct compress on INSERT, enable the following [GUC parameters][gucs]:
- **CSV files**:

```sql
SET timescaledb.enable_compressed_insert = true;
SET timescaledb.enable_compressed_insert_sort_batches = true;
SET timescaledb.enable_compressed_insert_client_sorted = true;
```
```sql
COPY conditions(time, location, temperature, humidity)
FROM '/path/to/data.csv'
WITH (FORMAT CSV, HEADER);
```

- **Standard input**

When you set `enable_compressed_insert_client_sorted` to `true`, you must ensure that data in the input
stream is sorted.
To load data from your application or script using standard input:

```sql
COPY conditions(time, location, temperature, humidity)
FROM STDIN
WITH (FORMAT CSV);
```

To signal the end of input, add `\.` on a new line.

- **Program output**

To load data generated by a program or script:

```sql
COPY conditions(time, location, temperature, humidity)
FROM PROGRAM 'generate_data.sh'
WITH (FORMAT CSV);
```


## Improve performance with direct compress

<EarlyAccess2230 />

The columnar format in the $COLUMNSTORE enables fast scanning and aggregation, optimizing performance for
analytical workloads while also saving significant storage space. In the $COLUMNSTORE conversion, $HYPERTABLE chunks are
compressed by up to 98%, and organized for efficient, large-scale queries.

To improve performance, compress data during the `INSERT` and `COPY` operations so that it is injected
directly into chunks in the $COLUMNSTORE rather than waiting for the policy. Direct compress writes data in the
compressed format in memory, significantly reducing I/O and improving ingestion performance.

When you enable direct compress, ensure that your data is already sorted by the table's compression `order_by` columns.
Incorrectly sorted data results in poor compression and query performance.

- **Enable direct compress on `INSERT`**

Set the following [GUC parameters][gucs]:
```sql
SET timescaledb.enable_direct_compress_insert = true;
SET timescaledb.enable_direct_compress_insert_client_sorted = true;
```

- **Enable direct compress on `COPY`**

Set the following [GUC parameter][gucs]:

```sql
SET timescaledb.enable_direct_compress_copy = true;
SET timescaledb.enable_direct_compress_copy_client_sorted = true;
```

- **Optimal batch size**: best results with batches of 1,000 to 10,000 records
- **Cardinality**: high cardinality datasets do not compress well and may degrade query performance
- **Batch format**: The columnstore is optimized for 1,000 records per batch per segment
- **WAL efficiency**: compressed batches are written to WAL rather than individual tuples
- **Continuous aggregates**: not supported with direct compress
- **Unique constraints**: tables with unique constraints cannot use direct compress




[postgres-insert]: https://www.postgresql.org/docs/current/sql-insert.html
[postgres-copy]: https://www.postgresql.org/docs/current/sql-copy.html
[upsert]: /use-timescale/:currentVersion:/write-data/upsert/
[gucs]: /api/:currentVersion:/configuration/gucs/
[postgres-update]: https://www.postgresql.org/docs/current/sql-update.html
[hypertable-create-table]: /api/:currentVersion:/hypertable/create_table/
[add_columnstore_policy]: /api/:currentVersion:/hypercore/add_columnstore_policy/
[remove_columnstore_policy]: /api/:currentVersion:/hypercore/remove_columnstore_policy/
[create_table_arguments]: /api/:currentVersion:/hypertable/create_table/#arguments
[alter_job_samples]: /api/:currentVersion:/jobs-automation/alter_job/#samples
[convert_to_columnstore]: /api/:currentVersion:/hypercore/convert_to_columnstore/
[gucs]: /api/:currentVersion:/configuration/gucs/

[postgres-insert]: https://www.postgresql.org/docs/current/sql-insert.html
[direct-compress]: /use-timescale/:currentVersion:/write-data/insert/#improve-performance-with-direct-compress
140 changes: 81 additions & 59 deletions use-timescale/write-data/upsert.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,77 +2,57 @@
title: Upsert data
excerpt: Insert a new row or update an existing row in a hypertable using UPSERT
products: [cloud, mst, self_hosted]
keywords: [upsert, hypertables]
keywords: [upsert, hypertables, bulk load, copy]
tags: [insert, write, unique constraints]
---

# Upsert data

Upserting is an operation that performs both:
Upserting is an operation to add data to your database where, if a matching row:

* Inserting a new row if a matching row doesn't already exist
* Either updating the existing row, or doing nothing, if a matching row
already exists
* **Does not exist**: inserts a new row
* **Exists**: either updates the existing row, or does nothing

Upserts only work when you have a unique index or constraint. A matching row is
one that has identical values for the columns covered by the index or
constraint.
## Upsert, unique indexes and constraints

<Highlight type="note">

In $PG, a primary key is a unique index with a `NOT NULL` constraint.
Upserts work when you have a unique index or constraint. A matching row is one that has identical values for the columns
covered by the index or constraint. In $PG, a primary key is a unique index with a `NOT NULL` constraint.
If you have a primary key, you automatically have a unique index.

</Highlight>

## Create a table with a unique constraint

The examples in this section use a `conditions` table with a unique constraint
on the columns `(time, location)`. To create a unique constraint, use `UNIQUE
(<COLUMNS>)` while defining your table:

```sql
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL,
UNIQUE (time, location)
);
```
Unique constraints must include all partitioning columns. That means unique
constraints on a $HYPERTABLE must include the time column. If you added other
partitioning columns to your $HYPERTABLE, the constraint must include those as
well. For more information, see [$HYPERTABLE_CAPs and unique indexes][hypertables-and-unique-indexes].

You can also create a unique constraint after the table is created. Use the
syntax `ALTER TABLE ... ADD CONSTRAINT ... UNIQUE`. In this example, the
constraint is named `conditions_time_location`:

```sql
ALTER TABLE conditions
ADD CONSTRAINT conditions_time_location
UNIQUE (time, location);
```
The examples in this page use a `conditions` table with a unique constraint
on the columns `(time, location)`. To create a unique constraint, either:

When you add a unique constraint to a table, you can't insert data that violates
the constraint. In other words, if you try to insert data that has identical
values to another row, within the columns covered by the constraint, you get an
error.
- Use `UNIQUE (<COLUMNS>)` when you define your table:

<Highlight type="note">

Unique constraints must include all partitioning columns. That means unique
constraints on a hypertable must include the time column. If you added other
partitioning columns to your hypertable, the constraint must include those as
well. For more information, see the section on
[hypertables and unique indexes](/use-timescale/latest/hypertables/hypertables-and-unique-indexes/).
```sql
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL,
UNIQUE (time, location)
);
```

</Highlight>
- Use `ALTER TABLE` after the table is created:

## Insert or update data to a table with a unique constraint
```sql
ALTER TABLE conditions
ADD CONSTRAINT conditions_time_location
UNIQUE (time, location);
```

You can tell the database to insert new data if it doesn't violate the
constraint, and to update the existing row if it does. Use the syntax `INSERT
INTO ... VALUES ... ON CONFLICT ... DO UPDATE`.
## Insert or update data

For example, to update the `temperature` and `humidity` values if a row with the
specified `time` and `location` already exists, run:
To insert new data that doesn't violate the constraint, and to update the existing row if it does. Use the syntax
`INSERT INTO ... VALUES ... ON CONFLICT ... DO UPDATE`. For example, to update the `temperature` and `humidity` values
if a row with the specified `time` and `location` already exists, run:

```sql
INSERT INTO conditions
Expand All @@ -82,12 +62,11 @@ INSERT INTO conditions
humidity = excluded.humidity;
```

## Insert or do nothing to a table with a unique constraint
## Insert or do nothing

You can also tell the database to do nothing if the constraint is violated. The
new data is not inserted, and the old row is not updated. This is useful when
writing many rows as one batch, to prevent the entire transaction from failing.
The database engine skips the row and moves on.
You can also do nothing if the constraint is violated. The new data is not inserted, and the old row is not updated,
the database engine skips the row and moves on. This is useful to prevent the entire transaction from failing when
writing many rows as one batch.

To insert or do nothing, use the syntax `INSERT INTO ... VALUES ... ON CONFLICT
DO NOTHING`:
Expand All @@ -98,4 +77,47 @@ INSERT INTO conditions
ON CONFLICT DO NOTHING;
```

## Bulk upsert using COPY

When you need to upsert large amounts of data, `COPY` is significantly faster than `INSERT`. However, `COPY` doesn't
support `ON CONFLICT` clauses directly. Best practice is to use a staging table. This two-step approach combines the
speed of `COPY` for bulk loading with the flexibility of `INSERT...ON CONFLICT` for upsert logic. For large datasets,
this is much faster than using `INSERT...ON CONFLICT` directly.

To load data efficiently with `COPY`, then upsert:

<Procedure>

1. **Create a staging table with the same structure as the destination table**
```sql
CREATE TEMP TABLE conditions_staging (LIKE conditions);
```

1. **Use `COPY` to bulk load data into the staging table**
```sql
COPY conditions_staging(time, location, temperature, humidity)
FROM '/path/to/data.csv'
WITH (FORMAT CSV, HEADER);
```

1. **Upsert from the staging table to the destination table**
```sql
INSERT INTO conditions
SELECT * FROM conditions_staging
ON CONFLICT (time, location) DO UPDATE
SET temperature = EXCLUDED.temperature,
humidity = EXCLUDED.humidity;
```
To skip duplicate rows, set `ON CONFLICT (time, location) DO NOTHING`.

1. **Clean up the staging table**
```sql
DROP TABLE conditions_staging;
```

</Procedure>


[postgres-upsert]: https://www.postgresql.org/docs/current/static/sql-insert.html#SQL-ON-CONFLICT
[postgres-copy]: https://www.postgresql.org/docs/current/sql-copy.html
[hypertables-and-unique-indexes]: /use-timescale/:currentVersion:/hypertables/hypertables-and-unique-indexes/