You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -38,9 +38,9 @@ just execute `unstructured_ingest/main.py`, e.g.:
38
38
39
39
## Adding Source Data Connectors
40
40
41
-
To add a source connector, refer to [local.py](unstructured_ingest/v2/processes/connectors/local.py) as an example that implements the two relevant abstract base classes with their associated configs.
41
+
To add a source connector, refer to [local.py](unstructured_ingest/processes/connectors/local.py) as an example that implements the two relevant abstract base classes with their associated configs.
42
42
43
-
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/v2/processes/connectors/fsspec/s3.py).
43
+
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/processes/connectors/fsspec/s3.py).
44
44
45
45
Make sure to update the source registry via `add_source_entry` using a unique key for the source type. This will expose it as an available connector.
46
46
@@ -56,9 +56,9 @@ Double check that the connector is optimized for the best fan out, check [here](
56
56
57
57
## Adding Destination Data Connectors
58
58
59
-
To add a source connector, refer to [local.py](unstructured_ingest/v2/processes/connectors/local.py) as an example that implements the uploader abstract base classes with the associated configs.
59
+
To add a source connector, refer to [local.py](unstructured_ingest/processes/connectors/local.py) as an example that implements the uploader abstract base classes with the associated configs.
60
60
61
-
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/v2/processes/connectors/fsspec/s3.py).
61
+
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/processes/connectors/fsspec/s3.py).
62
62
63
63
Make sure to update the destination registry via `add_source_entry` using a unique key for the source type. This will expose it as an available connector.
64
64
@@ -70,14 +70,14 @@ Double check that the connector is optimized for the best fan out, check [here](
70
70
71
71
In checklist form, the above steps are summarized as:
72
72
73
-
-[ ] Create a new file under [connectors/](unstructured_ingest/v2/processes/connectors/) implementing the base classes required depending on if it's a new source or destination connector.
73
+
-[ ] Create a new file under [connectors/](unstructured_ingest/processes/connectors/) implementing the base classes required depending on if it's a new source or destination connector.
74
74
-[ ] If the IngestDoc relies on a connection or session that could be reused, the subclass of `BaseConnectorConfig` implements a session handle to manage connections. The ConnectorConfig subclass should also inherit from `ConfigSessionHandleMixin` and the IngestDoc subclass should also inherit from `IngestDocSessionHandleMixin`. Check [here](https://github.com/Unstructured-IO/unstructured/pull/1058/files#diff-dae96d30f58cffe1b348c036d006b48bdc7e2e47fbd7c8ec1c45d63face1542d) for a detailed example.
75
75
-[ ] Indexer should fetch appropriate metadata from the source that can be used to reference the doc in the pipeline and detect if there are any changes from what might already exist locally.
76
76
-[ ] Add the relevant decorators from `unstructured.ingest.error` on top of relevant methods to handle errors such as a source connection error, destination connection error, or a partition error.
77
77
-[ ] Register the required information via `add_source_entry` or `add_source_entry` to expose the new connectors.
78
78
-[ ] Update the CLI to expose the new connectors via CLI params
79
-
-[ ] Add a new file under [cmds](unstructured_ingest/v2/cli/cmds)
80
-
-[ ] Add the command base classes from the file above in the [__init__.py](unstructured_ingest/v2/cli/cmds/__init__.py). This will expose it in the CLI.
79
+
-[ ] Add a new file under [cmds](unstructured_ingest/cli/cmds)
80
+
-[ ] Add the command base classes from the file above in the [__init__.py](unstructured_ingest/cli/cmds/__init__.py). This will expose it in the CLI.
81
81
-[ ] Update [unstructured_ingest/cli](unstructured_ingest/cli) with support for the new connector.
82
82
-[ ] Create a folder under [examples/ingest](examples/ingest) that includes at least one well documented script.
83
83
-[ ] Add a script test_unstructured_ingest/[src|dest\/test-ingest-\<the-new-data-source\>.sh. It's json output files should have a total of no more than 100K.
0 commit comments