Skip to content

Commit 88fb05b

Browse files
authored
chore: update ingest cli readme (#548)
1 parent ef33ddf commit 88fb05b

File tree

4 files changed

+14
-10
lines changed

4 files changed

+14
-10
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## 1.0.50-dev0
2+
3+
* **Update ingest cli and docs readme files**
4+
15
## 1.0.49
26

37
* **Improve MongoDB SCRAM-SHA-1 authentication error message**

docs/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ See the [Quick Start](https://github.com/Unstructured-IO/unstructured#eight_poin
2929
When testing from a local checkout rather than a pip-installed version of `unstructured`,
3030
just execute `unstructured_ingest/main.py`, e.g.:
3131

32-
PYTHONPATH=. ./unstructured_ingest/v2/main.py \
32+
PYTHONPATH=. ./unstructured_ingest/main.py \
3333
s3 \
3434
--remote-url s3://utic-dev-tech-fixtures/small-pdf-set/ \
3535
--anonymous \
@@ -38,9 +38,9 @@ just execute `unstructured_ingest/main.py`, e.g.:
3838

3939
## Adding Source Data Connectors
4040

41-
To add a source connector, refer to [local.py](unstructured_ingest/v2/processes/connectors/local.py) as an example that implements the two relevant abstract base classes with their associated configs.
41+
To add a source connector, refer to [local.py](unstructured_ingest/processes/connectors/local.py) as an example that implements the two relevant abstract base classes with their associated configs.
4242

43-
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/v2/processes/connectors/fsspec/s3.py).
43+
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/processes/connectors/fsspec/s3.py).
4444

4545
Make sure to update the source registry via `add_source_entry` using a unique key for the source type. This will expose it as an available connector.
4646

@@ -56,9 +56,9 @@ Double check that the connector is optimized for the best fan out, check [here](
5656

5757
## Adding Destination Data Connectors
5858

59-
To add a source connector, refer to [local.py](unstructured_ingest/v2/processes/connectors/local.py) as an example that implements the uploader abstract base classes with the associated configs.
59+
To add a source connector, refer to [local.py](unstructured_ingest/processes/connectors/local.py) as an example that implements the uploader abstract base classes with the associated configs.
6060

61-
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/v2/processes/connectors/fsspec/s3.py).
61+
If the connector has an available `fsspec` implementation, then refer to [s3.py](unstructured_ingest/processes/connectors/fsspec/s3.py).
6262

6363
Make sure to update the destination registry via `add_source_entry` using a unique key for the source type. This will expose it as an available connector.
6464

@@ -70,14 +70,14 @@ Double check that the connector is optimized for the best fan out, check [here](
7070

7171
In checklist form, the above steps are summarized as:
7272

73-
- [ ] Create a new file under [connectors/](unstructured_ingest/v2/processes/connectors/) implementing the base classes required depending on if it's a new source or destination connector.
73+
- [ ] Create a new file under [connectors/](unstructured_ingest/processes/connectors/) implementing the base classes required depending on if it's a new source or destination connector.
7474
- [ ] If the IngestDoc relies on a connection or session that could be reused, the subclass of `BaseConnectorConfig` implements a session handle to manage connections. The ConnectorConfig subclass should also inherit from `ConfigSessionHandleMixin` and the IngestDoc subclass should also inherit from `IngestDocSessionHandleMixin`. Check [here](https://github.com/Unstructured-IO/unstructured/pull/1058/files#diff-dae96d30f58cffe1b348c036d006b48bdc7e2e47fbd7c8ec1c45d63face1542d) for a detailed example.
7575
- [ ] Indexer should fetch appropriate metadata from the source that can be used to reference the doc in the pipeline and detect if there are any changes from what might already exist locally.
7676
- [ ] Add the relevant decorators from `unstructured.ingest.error` on top of relevant methods to handle errors such as a source connection error, destination connection error, or a partition error.
7777
- [ ] Register the required information via `add_source_entry` or `add_source_entry` to expose the new connectors.
7878
- [ ] Update the CLI to expose the new connectors via CLI params
79-
- [ ] Add a new file under [cmds](unstructured_ingest/v2/cli/cmds)
80-
- [ ] Add the command base classes from the file above in the [__init__.py](unstructured_ingest/v2/cli/cmds/__init__.py). This will expose it in the CLI.
79+
- [ ] Add a new file under [cmds](unstructured_ingest/cli/cmds)
80+
- [ ] Add the command base classes from the file above in the [__init__.py](unstructured_ingest/cli/cmds/__init__.py). This will expose it in the CLI.
8181
- [ ] Update [unstructured_ingest/cli](unstructured_ingest/cli) with support for the new connector.
8282
- [ ] Create a folder under [examples/ingest](examples/ingest) that includes at least one well documented script.
8383
- [ ] Add a script test_unstructured_ingest/[src|dest\/test-ingest-\<the-new-data-source\>.sh. It's json output files should have a total of no more than 100K.

unstructured_ingest/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "1.0.49" # pragma: no cover
1+
__version__ = "1.0.50-dev0" # pragma: no cover

unstructured_ingest/cli/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ source and destination connectors.
77

88
To manually run the cli:
99
```shell
10-
PYTHONPATH=. python unstructured_ingest/v2/main.py --help
10+
PYTHONPATH=. python unstructured_ingest/main.py --help
1111
```
1212

1313
The `main.py` file simply wraps the generated Click command created in `cli.py`.

0 commit comments

Comments
 (0)