You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: partiton_pdf() set inferred elements text (#3061)
This PR adds the ability to fill inferred elements text from embedded
text (`pdfminer`) without depending on `unstructured-inference` library.
This PR is the second part of moving embedded text related code from
`unstructured-inference` to `unstructured` and works together with
Unstructured-IO/unstructured-inference#349.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,19 @@
1
-
## 0.14.1-dev1
1
+
## 0.14.1
2
2
3
-
***Add support for Python 3.12**. `unstructured` now works with Python 3.12!
3
+
### Enhancements
4
+
5
+
***Refactor code related to embedded text extraction**. The embedded text extraction code is moved from `unstructured-inference` to `unstructured`.
4
6
5
7
### Features
8
+
6
9
***Large improvements to the ingest process:**
7
10
* Support for multiprocessing and async, with limits for both.
8
11
* Streamlined to process when mapping CLI invocations to the underlying code
9
12
* More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)
10
13
* Use the python client when calling the unstructured api for partitioning or chunking
11
14
* Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.
12
15
* Leverage last modified date when deciding if new files should be downloaded and reprocessed.
16
+
***Add support for Python 3.12**. `unstructured` now works with Python 3.12!
0 commit comments