Skip to content

Commit f6e3b31

Browse files
authored
LanceDB destination connector: tables should have at least a record_id field before attempting upserts (#394)
1 parent 313acc8 commit f6e3b31

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<Note>
2+
Unstructured recommends that the target table have a field named `record_id` with a text string data type.
3+
Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
4+
might be written to the table or, in some cases, the operation could fail altogether.
5+
</Note>

snippets/general-shared-text/lancedb.mdx

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import LanceDBRecordID from '/snippets/general-shared-text/lancedb-record-id.mdx';
2+
13
- A [LanceDB open source software (OSS) installation](https://lancedb.github.io/lancedb/basic/#installation) on a local machine, a server, or a virtual machine.
24
(LanceDB Cloud is not supported.)
35
- For LanceDB OSS with local data storage:
@@ -6,10 +8,15 @@
68
See [Connect to a database](https://lancedb.github.io/lancedb/basic/#connect-to-a-database) in the LanceDB documentation.
79
- The name of the target [LanceDB table](https://lancedb.github.io/lancedb/basic/#create-an-empty-table) within the local data folder.
810

11+
<LanceDBRecordID />
12+
913
- For LanceDB OSS with data storage in an Amazon S3 bucket:
1014

1115
- The URI for the target Amazon S3 bucket and any target folder path within that bucket. Use the format `s3://<bucket-name>[/<folder-name>]`.
1216
- The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Amazon S3 bucket.
17+
18+
<LanceDBRecordID />
19+
1320
- The AWS access key ID and AWS secret access key for the AWS IAM entity that has access to the Amazon S3 bucket.
1421

1522
For more information, see [AWS S3](https://lancedb.github.io/lancedb/guides/storage/#aws-s3) in the LanceDB documentation, along with the following video:
@@ -29,6 +36,9 @@
2936
- The name of the target Azure Blob Storage account.
3037
= The URI for the target container within that Azure Blob Storage account and any target folder path within that container. Use the format `az://<container-name>[/<folder-name>]`.
3138
- The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Azure Blob Storage account.
39+
40+
<LanceDBRecordID />
41+
3242
- The access key for the Azure Blob Storage account.
3343

3444
For more information, see [Azure Blob Storage](https://lancedb.github.io/lancedb/guides/storage/#azure-blob-storage) in the LanceDB documentation, along with the following video:
@@ -47,6 +57,9 @@
4757

4858
- The URI for the target Google Cloud Storage bucket and any target folder path within that bucket. Use the format `gs://<bucket-name>[/<folder-name>]`.
4959
- The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Google Cloud Storage bucket.
60+
61+
<LanceDBRecordID />
62+
5063
- A single-line string that contains the contents of the downloaded service account key file for the Google Cloud service account that has access to the
5164
Google Cloud Storage bucket.
5265

0 commit comments

Comments
 (0)