docs(postgres-sql-cmd): add docs for PostgresSqlCommand attachment (#1146)

georgeh0 · web-flow · commit 5f4c2f4da375 · 2025-10-05T19:53:59.000-07:00
diff --git a/docs/docs/core/flow_def.mdx b/docs/docs/core/flow_def.mdx
@@ -30,8 +30,8 @@ This `@cocoindex.flow_def` decorator declares this function as a CocoIndex flow
 
 It takes two arguments:
 
-*   `flow_builder`: a `FlowBuilder` object to help build the flow.
-*   `data_scope`: a `DataScope` object, representing the top-level data scope. Any data created by the flow should be added to it.
+* `flow_builder`: a `FlowBuilder` object to help build the flow.
+* `data_scope`: a `DataScope` object, representing the top-level data scope. Any data created by the flow should be added to it.
 
 Alternatively, for more flexibility (e.g. you want to do this conditionally or generate dynamic name), you can explicitly call the `cocoindex.open_flow()` method:
 
@@ -210,7 +210,6 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
 </TabItem>
 </Tabs>
 
-
 ### Get a sub field
 
 If the data slice has `Struct` type, you can obtain a data slice on a specific sub field of it, similar to getting a field of a data scope.
@@ -224,8 +223,8 @@ A **data collector** can be added from a specific data scope, and it collects mu
 Call its `collect()` method to collect a specific entry, which can have multiple fields.
 Each field has a name as specified by the argument name, and a value in one of the following representations:
 
-*   A `DataSlice`.
-*   An enum `cocoindex.GeneratedField.UUID` indicating its value is an UUID automatically generated by the engine.
+* A `DataSlice`.
+* An enum `cocoindex.GeneratedField.UUID` indicating its value is an UUID automatically generated by the engine.
     The uuid will remain stable when other collected input values are unchanged.
 
     :::note
@@ -267,13 +266,16 @@ A *target spec* needs to be provided for any export operation, to describe the t
 
 Export must happen at the top level of a flow, i.e. not within any child scopes created by "for each row". It takes the following arguments:
 
-*   `name`: the name to identify the export target.
-*   `target_spec`: the target spec as the export target.
-*   `setup_by_user` (optional):
+* `name`: the name to identify the export target.
+* `target_spec`: the target spec as the export target.
+* `attachments` (optional): additional attachments for the export target.
+  Different targets support different attachments.
+  For example, `Postgres` supports `PostgresSqlAttachment`, which can be used to configure arbitrary SQL statements for the export target, after the target is created or before the target is dropped.
+* `setup_by_user` (optional):
      whether the export target is setup by user.
      By default, CocoIndex is managing the target setup (see [Setup / drop flow](/docs/core/flow_methods#setupdrop-flow)), e.g. create related tables/collections/etc. with compatible schema, and update them upon change.
      If `True`, the export target will be managed by users, and users are responsible for creating the target and updating it upon change.
-*   Fields to configure [storage indexes](#storage-indexes). `primary_key_fields` is required, and all others are optional.
+* Fields to configure [storage indexes](#storage-indexes). `primary_key_fields` is required, and all others are optional.
 
 <Tabs>
 <TabItem value="python" label="Python" default>
@@ -334,7 +336,7 @@ to organize flows across different environments (e.g., dev, staging, production)
 
 In the code, You can call `flow.get_app_namespace()` to get the app namespace, and use it to name certain backends. It takes the following arguments:
 
-*   `trailing_delimiter` (optional): a string to append to the app namespace when it's not empty.
+* `trailing_delimiter` (optional): a string to append to the app namespace when it's not empty.
 
 e.g. when the current app namespace is `Staging`, `flow.get_app_namespace(trailing_delimiter='.')` will return `Staging.`.
 
@@ -364,31 +366,31 @@ It will use `Staging__doc_embeddings` as the collection name if the current app
 CocoIndex processes data in parallel to maximize throughput, but unconstrained parallelism can overwhelm your system.
 Processing too many items simultaneously can lead to:
 
-- **Memory exhaustion**: Large datasets loaded concurrently can consume excessive RAM
-- **Resource contention**: Too many parallel operations competing for CPU, disk I/O, or network bandwidth
-- **System instability**: High concurrency can cause timeouts, crashes, or degraded performance
+* **Memory exhaustion**: Large datasets loaded concurrently can consume excessive RAM
+* **Resource contention**: Too many parallel operations competing for CPU, disk I/O, or network bandwidth
+* **System instability**: High concurrency can cause timeouts, crashes, or degraded performance
 
 To prevent these issues, CocoIndex provides concurrency controls that limit how many data items are processed simultaneously.
 
 #### Concurrency Options
 
 You can control processing concurrency using these options:
 
-*   `max_inflight_rows`: Limits the maximum number of data rows being processed concurrently
-*   `max_inflight_bytes`: Limits the total memory footprint of data being processed concurrently (measured in bytes)
+* `max_inflight_rows`: Limits the maximum number of data rows being processed concurrently
+* `max_inflight_bytes`: Limits the total memory footprint of data being processed concurrently (measured in bytes)
 
 When these limits are reached, CocoIndex will pause loading new data until some of the current processing completes, ensuring your system remains stable.
 
 #### Where to Apply Concurrency Controls
 
 These concurrency options can be configured at different levels:
 
-*   **Source level** via [`FlowBuilder.add_source()`](#import-from-source): Controls how many rows from a data source are processed simultaneously. This prevents overwhelming your system when ingesting large datasets.
+* **Source level** via [`FlowBuilder.add_source()`](#import-from-source): Controls how many rows from a data source are processed simultaneously. This prevents overwhelming your system when ingesting large datasets.
 
     You can also set global limits across all sources and flows using [`GlobalExecutionOptions`](/docs/core/settings#globalexecutionoptions) or environment variables [`COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS`](/docs/core/settings#list-of-environment-variables)/[`COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES`](/docs/core/settings#list-of-environment-variables).
     When both global and per-source limits are specified, both limits are enforced independently - a new row can only be processed if there's available capacity in both the global budget (shared across all sources) and the per-source budget (specific to that source).
 
-*   **Row iteration level** via [`DataSlice.row()`](#for-each-row): Provides fine-grained control over parallel processing within nested data structures, allowing you to tune concurrency at any level of your data hierarchy.
+* **Row iteration level** via [`DataSlice.row()`](#for-each-row): Provides fine-grained control over parallel processing within nested data structures, allowing you to tune concurrency at any level of your data hierarchy.
 
 :::note
 
@@ -440,19 +442,18 @@ It's usually used for targets, where key stability is important for backend clea
 
 Operation spec is the default way to configure sources, functions and targets. But it has the following limitations:
 
-*   The spec isn't supposed to contain secret information, and it's frequently shown in various places, e.g. `cocoindex show`.
-*   For targets, once an operation is removed after flow definition code change, the spec is also gone.
+* The spec isn't supposed to contain secret information, and it's frequently shown in various places, e.g. `cocoindex show`.
+* For targets, once an operation is removed after flow definition code change, the spec is also gone.
     But we still need to be able to drop the persistent backend (e.g. a table) when [setup / drop flow](/docs/core/flow_methods#setupdrop-flow).
 
 Auth registry is introduced to solve the problems above.
 
-
 #### Auth Entry
 
 An auth entry is an entry in the auth registry with an explicit key.
 
-*   You can create new *auth entry* by a key and a value.
-*   You can reference the entry by the key, and pass it as part of spec for certain operations. e.g. `Neo4j` takes `connection` field in the form of auth entry reference.
+* You can create new *auth entry* by a key and a value.
+* You can reference the entry by the key, and pass it as part of spec for certain operations. e.g. `Neo4j` takes `connection` field in the form of auth entry reference.
 
 <Tabs>
 <TabItem value="python" label="Python" default>
@@ -471,7 +472,7 @@ my_graph_conn = cocoindex.add_auth_entry(
 
 Then reference it when building a spec that takes an auth entry:
 
-*   You can either reference by the `AuthEntryReference[T]` object directly:
+* You can either reference by the `AuthEntryReference[T]` object directly:
 
     ```python
     demo_collector.export(
@@ -480,7 +481,7 @@ Then reference it when building a spec that takes an auth entry:
     )
     ```
 
-*   You can also reference it by the key string, using `cocoindex.ref_auth_entry()` function:
+* You can also reference it by the key string, using `cocoindex.ref_auth_entry()` function:
 
     ```python
     demo_collector.export(
@@ -493,10 +494,10 @@ Then reference it when building a spec that takes an auth entry:
 
 Note that CocoIndex backends use the key of an auth entry to identify the backend.
 
-*   Keep the key stable.
+* Keep the key stable.
     If the key doesn't change, it's considered to be the same backend (even if the underlying way to connect/authenticate changes).
 
-*   If a key is no longer referenced in any operation spec, keep it until the next flow setup / drop action,
+* If a key is no longer referenced in any operation spec, keep it until the next flow setup / drop action,
     so that CocoIndex will be able to clean up the backends.
 
 #### Transient Auth Entry
@@ -518,7 +519,6 @@ flow_builder.add_source(
 )
 ```
 
-
 </TabItem>
 </Tabs>
 
diff --git a/docs/docs/targets/postgres.md b/docs/docs/targets/postgres.md
@@ -41,15 +41,57 @@ CocoIndex automatically strips U+0000 (NUL) characters from strings before expor
 
 The spec takes the following fields:
 
-*   `database` ([auth reference](/docs/core/flow_def#auth-registry) to `DatabaseConnectionSpec`, optional): The connection to the Postgres database.
+* `database` ([auth reference](/docs/core/flow_def#auth-registry) to `DatabaseConnectionSpec`, optional): The connection to the Postgres database.
     See [DatabaseConnectionSpec](/docs/core/settings#databaseconnectionspec) for its specific fields.
     If not provided, will use the same database as the [internal storage](/docs/core/basics#internal-storage).
 
-*   `table_name` (`str`, optional): The name of the table to store to. If unspecified, will use the table name `[${AppNamespace}__]${FlowName}__${TargetName}`, e.g. `DemoFlow__doc_embeddings` or `Staging__DemoFlow__doc_embeddings`.
+* `table_name` (`str`, optional): The name of the table to store to. If unspecified, will use the table name `[${AppNamespace}__]${FlowName}__${TargetName}`, e.g. `DemoFlow__doc_embeddings` or `Staging__DemoFlow__doc_embeddings`.
 
-*   `schema` (`str`, optional): The PostgreSQL schema to create the table in. If unspecified, the table will be created in the default schema (usually `public`). When specified, `table_name` must also be explicitly specified. CocoIndex will automatically create the schema if it doesn't exist.
+* `schema` (`str`, optional): The PostgreSQL schema to create the table in. If unspecified, the table will be created in the default schema (usually `public`). When specified, `table_name` must also be explicitly specified. CocoIndex will automatically create the schema if it doesn't exist.
+
+## Attachments
+
+### PostgresSqlCommand
+
+Execute arbitrary Postgres SQL during flow setup, with an optional SQL to undo it when the attachment or target is removed.
+
+This attachment is useful for capabilities not natively modeled by the target spec, such as creating specialized indexes, triggers, or grants.
+
+Fields:
+
+* `name` (`str`, required): A identifier for this attachment on the target. Unique within the target.
+* `setup_sql` (`str`, required): SQL to execute during setup.
+* `teardown_sql` (`str`, optional): SQL to execute on removal/drop.
+
+Notes about `setup_sql` and `teardown_sql`:
+
+* Multiple statements are allowed in both `setup_sql` and `teardown_sql`. Use `;` to separate them.
+* Both `setup_sql` and `teardown_sql` are expected to be idempotent, e.g. use statements like `CREATE ... IF NOT EXISTS` and `DROP ... IF EXISTS`.
+* The `setup_sql` is expected to have an "upsert" behavior. If you update `setup_sql`, the updated `setup_sql` will be executed during setup.
+* The `teardown_sql` is saved by CocoIndex, so it'll be executed when the attachment no longer exists. If you update `teardown_sql`, the updated `teardown_sql` will be saved and executed (instead of the previous one) during teardown.
+
+Example (create a custom index):
+
+```py
+collector.export(
+    "doc_embeddings",
+    cocoindex.targets.Postgres(table_name="doc_embeddings"),
+    primary_key_fields=["id"],
+    attachments=[
+        cocoindex.targets.PostgresSqlCommand(
+            name="fts",
+            setup_sql=(
+                "CREATE INDEX IF NOT EXISTS doc_embeddings_text_fts "
+                "ON doc_embeddings USING GIN (to_tsvector('english', text));"
+            ),
+            teardown_sql= "DROP INDEX IF EXISTS doc_embeddings_text_fts;",
+        )
+    ],
+)
+```
 
 ## Example
+
 <ExampleButton
   href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding"
   text="Text Embedding Example with Postgres"