You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add `requires_dependencies` decorator
* Use `required_dependencies` on Reddit & S3
* Fix bug in `requires_dependencies`
To used named args the decorator needs to be also wrapped
* Add `requires_dependencies` integration tests
* Add `requires_dependencies` in `Competition.md`
* Update `CHANGELOG.md`
* Bump version 0.4.16-dev5
* Ignore `F401` unused imports in `requires_dependencies` tests
* Apply suggestions from code review
* Add `functools.wrap` to keep docs, & annotations
* Use `requires_dependencies` in `GitHubConnector`
Copy file name to clipboardExpand all lines: Ingest.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,11 +67,11 @@ In checklist form, the above steps are summarized as:
67
67
-[ ] Add them as an extra to [setup.py](unstructured/setup.py).
68
68
-[ ] Update the Makefile, adding a target for `install-ingest-<name>` and adding another `pip-compile` line to the `pip-compile` make target. See [this commit](https://github.com/Unstructured-IO/unstructured/commit/ab542ca3c6274f96b431142262d47d727f309e37) for a reference.
69
69
-[ ] The added dependencies should be imported at runtime when the new connector is invoked, rather than as top-level imports.
70
+
-[ ] Add the decorator `unstructured.utils.requires_dependencies` on top of each class instance or function that uses those connector-specific dependencies e.g. for `S3Connector` should look like `@requires_dependencies(dependencies=["boto3"], extras="s3")`
70
71
-[ ] Honors the conventions of `BaseConnectorConfig` defined in [unstructured/ingest/interfaces.py](unstructured/ingest/interfaces.py) which is passed through [the CLI](unstructured/ingest/main.py):
71
72
-[ ] If running with an `.output_dir` where structured outputs already exists for a given file, the file content is not re-downloaded from the data source nor is it reprocessed. This is made possible by implementing the call to `MyIngestDoc.has_output()` which is invoked in [MainProcess._filter_docs_with_outputs](ingest-prep-for-many/unstructured/ingest/main.py).
72
73
-[ ] Unless `.reprocess` is `True`, then documents are always reprocessed.
73
74
-[ ] If `.preserve_download` is `True`, documents downloaded to `.download_dir` are not removed after processing.
74
75
-[ ] Else if `.preserve_download` is `False`, documents downloaded to `.download_dir` are removed after they are **successfully** processed during the invocation of `MyIngestDoc.cleanup_file()` in [process_document](unstructured/ingest/doc_processor/generalized.py)
75
76
-[ ] Does not re-download documents to `.download_dir` if `.re_download` is False, enforced in `MyIngestDoc.get_file()`
76
77
-[ ] Prints more details if `.verbose` similar to [unstructured/ingest/connector/s3_connector.py](unstructured/ingest/connector/s3_connector.py).
0 commit comments