Skip to content

Commit ad2f3b1

Browse files
committed
chore: Update README and cut 0.26.0 for publishing
Bring back some of the autogenerated README content and make sure our manual sections are using the right syntax. Once we merge and regenerate, 0.26.0 will be published.
1 parent 6e1fa29 commit ad2f3b1

File tree

2 files changed

+184
-104
lines changed

2 files changed

+184
-104
lines changed

README.md

Lines changed: 183 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,12 @@
1111

1212
<div align="center">
1313

14-
<a
15-
href="https://www.phorm.ai/query?projectId=34efc517-2201-4376-af43-40c4b9da3dc5">
16-
<img src="https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg?&logo=" />
17-
</a>
18-
1914
</div>
2015

21-
2216
<h2 align="center">
2317
<p>Python SDK for the Unstructured API</p>
2418
</h2>
2519

26-
NOTE: This README is for the `0.26.0-beta` version. The current published SDK, `0.25.5` can be found [here](https://github.com/Unstructured-IO/unstructured-python-client/blob/v0.25.5/README.md).
27-
2820
This is a Python client for the [Unstructured API](https://docs.unstructured.io/api-reference/api-services/saas-api-development-guide) and you can sign up for your API key on https://app.unstructured.io.
2921

3022
Please refer to the [Unstructured docs](https://docs.unstructured.io/api-reference/api-services/sdk-python) for a full guide to using the client.
@@ -73,94 +65,6 @@ poetry add unstructured-client
7365
```
7466
<!-- End SDK Installation [installation] -->
7567

76-
## SDK Example Usage
77-
78-
### Example
79-
80-
```python
81-
import os
82-
83-
import unstructured_client
84-
from unstructured_client.models import operations, shared
85-
86-
client = unstructured_client.UnstructuredClient(
87-
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
88-
server_url=os.getenv("UNSTRUCTURED_API_URL"),
89-
)
90-
91-
filename = "PATH_TO_FILE"
92-
with open(filename, "rb") as f:
93-
data = f.read()
94-
95-
req = operations.PartitionRequest(
96-
partition_parameters=shared.PartitionParameters(
97-
files=shared.Files(
98-
content=data,
99-
file_name=filename,
100-
),
101-
# --- Other partition parameters ---
102-
strategy=shared.Strategy.AUTO,
103-
languages=['eng'],
104-
),
105-
)
106-
107-
try:
108-
res = client.general.partition(request=req)
109-
print(res.elements[0])
110-
except Exception as e:
111-
print(e)
112-
```
113-
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
114-
115-
### Configuration
116-
117-
#### Splitting PDF by pages
118-
119-
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
120-
121-
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
122-
123-
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
124-
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
125-
126-
Example:
127-
```python
128-
req = shared.PartitionParameters(
129-
files=files,
130-
strategy="fast",
131-
languages=["eng"],
132-
split_pdf_concurrency_level=8
133-
)
134-
```
135-
136-
#### Sending specific page ranges
137-
138-
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
139-
140-
Example:
141-
```python
142-
req = shared.PartitionParameters(
143-
files=files,
144-
strategy="fast",
145-
languages=["eng"],
146-
split_pdf_page_range=[10,15],
147-
)
148-
```
149-
150-
#### Splitting PDF by pages - strict mode
151-
152-
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
153-
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
154-
155-
Example:
156-
```python
157-
req = shared.PartitionParameters(
158-
files=files,
159-
strategy="fast",
160-
languages=["eng"],
161-
split_pdf_allow_failed=True,
162-
)
163-
```
16468

16569
<!-- Start Retries [retries] -->
16670
## Retries
@@ -229,6 +133,59 @@ if res.elements is not None:
229133
```
230134
<!-- End Retries [retries] -->
231135

136+
137+
<!-- Start Error Handling [errors] -->
138+
## Error Handling
139+
140+
Handling errors in this SDK should largely match your expectations. All operations return a response object or raise an error. If Error objects are specified in your OpenAPI Spec, the SDK will raise the appropriate Error type.
141+
142+
| Error Object | Status Code | Content Type |
143+
| -------------------------- | -------------------------- | -------------------------- |
144+
| errors.HTTPValidationError | 422 | application/json |
145+
| errors.ServerError | 5XX | application/json |
146+
| errors.SDKError | 4xx-5xx | */* |
147+
148+
### Example
149+
150+
```python
151+
from unstructured_client import UnstructuredClient
152+
from unstructured_client.models import errors, shared
153+
154+
s = UnstructuredClient()
155+
156+
res = None
157+
try:
158+
res = s.general.partition(request={
159+
"partition_parameters": {
160+
"files": {
161+
"content": open("example.file", "rb"),
162+
"file_name": "example.file",
163+
},
164+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
165+
"split_pdf_page_range": [
166+
1,
167+
10,
168+
],
169+
"strategy": shared.Strategy.HI_RES,
170+
},
171+
})
172+
173+
if res.elements is not None:
174+
# handle response
175+
pass
176+
177+
except errors.HTTPValidationError as e:
178+
# handle e.data: errors.HTTPValidationErrorData
179+
raise(e)
180+
except errors.ServerError as e:
181+
# handle e.data: errors.ServerErrorData
182+
raise(e)
183+
except errors.SDKError as e:
184+
# handle exception
185+
raise(e)
186+
```
187+
<!-- End Error Handling [errors] -->
188+
232189
<!-- Start Custom HTTP Client [http-client] -->
233190
## Custom HTTP Client
234191

@@ -310,13 +267,6 @@ s = UnstructuredClient(async_client=CustomClient(httpx.AsyncClient()))
310267
```
311268
<!-- End Custom HTTP Client [http-client] -->
312269

313-
<!-- No SDK Example Usage [usage] -->
314-
<!-- No SDK Available Operations -->
315-
<!-- No Pagination -->
316-
<!-- No Error Handling -->
317-
<!-- No Server Selection -->
318-
<!-- No Authentication -->
319-
320270
<!-- Start IDE Support [idesupport] -->
321271
## IDE Support
322272

@@ -327,6 +277,131 @@ Generally, the SDK will work well with most IDEs out of the box. However, when u
327277
- [PyCharm Pydantic Plugin](https://docs.pydantic.dev/latest/integrations/pycharm/)
328278
<!-- End IDE Support [idesupport] -->
329279

280+
281+
<!-- Start SDK Example Usage [usage] -->
282+
## SDK Example Usage
283+
284+
### Example
285+
286+
```python
287+
# Synchronous Example
288+
from unstructured_client import UnstructuredClient
289+
from unstructured_client.models import shared
290+
291+
s = UnstructuredClient()
292+
293+
res = s.general.partition(request={
294+
"partition_parameters": {
295+
"files": {
296+
"content": open("example.file", "rb"),
297+
"file_name": "example.file",
298+
},
299+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
300+
"split_pdf_page_range": [
301+
1,
302+
10,
303+
],
304+
"strategy": shared.Strategy.HI_RES,
305+
},
306+
})
307+
308+
if res.elements is not None:
309+
# handle response
310+
pass
311+
```
312+
313+
</br>
314+
315+
The same SDK client can also be used to make asychronous requests by importing asyncio.
316+
```python
317+
# Asynchronous Example
318+
import asyncio
319+
from unstructured_client import UnstructuredClient
320+
from unstructured_client.models import shared
321+
322+
async def main():
323+
s = UnstructuredClient()
324+
res = await s.general.partition_async(request={
325+
"partition_parameters": {
326+
"files": {
327+
"content": open("example.file", "rb"),
328+
"file_name": "example.file",
329+
},
330+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
331+
"split_pdf_page_range": [
332+
1,
333+
10,
334+
],
335+
"strategy": shared.Strategy.HI_RES,
336+
},
337+
})
338+
if res.elements is not None:
339+
# handle response
340+
pass
341+
342+
asyncio.run(main())
343+
```
344+
<!-- End SDK Example Usage [usage] -->
345+
346+
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
347+
348+
349+
## Configuration
350+
351+
### Splitting PDF by pages
352+
353+
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
354+
355+
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
356+
357+
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
358+
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
359+
360+
Example:
361+
```python
362+
req = operations.PartitionRequest(
363+
partition_parameters=shared.PartitionParameters(
364+
files=files,
365+
strategy="fast",
366+
languages=["eng"],
367+
split_pdf_concurrency_level=8
368+
)
369+
)
370+
```
371+
372+
### Sending specific page ranges
373+
374+
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
375+
376+
Example:
377+
```python
378+
req = operations.PartitionRequest(
379+
partition_parameters=shared.PartitionParameters(
380+
files=files,
381+
strategy="fast",
382+
languages=["eng"],
383+
split_pdf_page_range=[10,15],
384+
)
385+
)
386+
```
387+
388+
### Splitting PDF by pages - strict mode
389+
390+
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
391+
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
392+
393+
Example:
394+
```python
395+
req = operations.PartitionRequest(
396+
partition_parameters=shared.PartitionParameters(
397+
files=files,
398+
strategy="fast",
399+
languages=["eng"],
400+
split_pdf_allow_failed=True,
401+
)
402+
)
403+
```
404+
330405
<!-- Start File uploads [file-upload] -->
331406
## File uploads
332407

@@ -380,6 +455,11 @@ s = UnstructuredClient(debug_logger=logging.getLogger("unstructured_client"))
380455
```
381456
<!-- End Debugging [debug] -->
382457

458+
<!-- No SDK Available Operations -->
459+
<!-- No Pagination -->
460+
<!-- No Server Selection -->
461+
<!-- No Authentication -->
462+
383463
<!-- Placeholder for Future Speakeasy SDK Sections -->
384464

385465
### Maturity

gen.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ generation:
1010
auth:
1111
oAuth2ClientCredentialsEnabled: false
1212
python:
13-
version: 0.26.0-beta.4
13+
version: 0.26.0
1414
additionalDependencies:
1515
dev:
1616
deepdiff: '>=6.0'

0 commit comments

Comments
 (0)