Skip to content

Commit ee62f23

Browse files
authored
Python SDK: new classes for more robust error handling for the Partition Endpoint (#700)
1 parent 731eaf8 commit ee62f23

File tree

1 file changed

+119
-32
lines changed

1 file changed

+119
-32
lines changed
Lines changed: 119 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,132 @@
11
---
2-
title: Endpoint validation errors
3-
description: This section details the structure of HTTP validation errors returned by the Unstructured Partition Endpoint.
2+
title: Endpoint errors
43
---
54

6-
## HTTPValidationError
5+
For the [Unstructured Python SDK](/api-reference/partition/sdk-python),
6+
the [Unstructured Partition Endpoint](/api-reference/partition/overview) returns errors primarily through
7+
the `UnstructuredClientError` class (the base class for all errors raised by the Unstructured Python SDK) and
8+
the `HTTPValidationError` class (inherited from `UnstructuredClientError`). Less common errors are returned through the following classes:
79

8-
**Type**: object
10+
- `httpx.RequestError`, the base class for request errors.
11+
- `httpx.ConnectError`, for HTTP connection request errors.
12+
- `httpx.TimeoutException`, for HTTP request timeout errors.
13+
- `ServerError` (inherited from `UnstructuredClientError`), for server-side errors.
14+
- `ResponseValidationError` (inherited from `UnstructuredClientError`), for type mismatches between the response data and the expected Pydantic model.
915

10-
**Title**: HTTPValidationError
16+
Each of the preceding classes has the following members:
1117

12-
**Detail**
18+
| Member | Type | Description |
19+
|--------|------|-------------|
20+
| `message` | `str` | The eror message. |
21+
| `status_code` | `int` | The HTTP response status code, for example `401`. |
22+
| `headers` | `httpx.Headers` | A collection of HTTP response headers. |
23+
| `body` | `str` | The HTTP body. This can be an empty string if no body is returned. |
24+
| `raw_response` | `httpx.Response` | The raw HTTP response.
1325

14-
* **Type**: array
15-
16-
* **Description**: An array of ValidationError items, providing detailed information about the validation errors encountered.
17-
26+
The following example shows how to handle the preceding errors. In this example,
27+
a required Unstructured API key is intentionally commented out of the code, so that a
28+
`401 Unauthorized` error is intentionally thrown.
1829

19-
## ValidationError
30+
```python
31+
import os
32+
import json
2033

21-
**Type**: object
34+
from unstructured_client import UnstructuredClient
35+
from unstructured_client.models.operations import PartitionRequest
36+
from unstructured_client.models.shared import (
37+
PartitionParameters,
38+
Files,
39+
Strategy
40+
)
41+
from unstructured_client.models.errors import (
42+
UnstructuredClientError,
43+
HTTPValidationError
44+
)
45+
from unstructured_client.models.errors.servererror import ServerError
46+
from unstructured_client.models.errors.responsevalidationerror import ResponseValidationError
47+
import httpx
2248

23-
**Title**: ValidationError
49+
try:
50+
client = UnstructuredClient(
51+
# For example, intentionally leave out the API key to intentionally throw an error.
52+
# api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
53+
)
2454

25-
**Required Fields**: loc, msg, type
55+
filename = "PATH_TO_INPUT_FILE"
2656

27-
* **Location (loc)**
28-
29-
* **Type**: array
30-
31-
* **Description**: The location of the validation error in the request. Each item in the array can be either a string (e.g., field name) or an integer (e.g., array index).
32-
33-
34-
* **Message (msg)**
35-
36-
* **Type**: string
37-
38-
* **Description**: A descriptive message about the validation error.
39-
40-
41-
* **Error Type (type)**
57+
request = PartitionRequest(
58+
partition_parameters=PartitionParameters(
59+
files=Files(
60+
content=open(filename, "rb"),
61+
file_name=filename,
62+
),
63+
strategy=Strategy.VLM,
64+
vlm_model="gpt-4o",
65+
vlm_model_provider="openai",
66+
languages=['eng'],
67+
split_pdf_page=True, # If True, splits the PDF file into smaller chunks of pages.
68+
# split_pdf_allow_failed=True, # If True, the partitioning continues even if some pages fail.
69+
split_pdf_concurrency_level=15 # Set the number of concurrent request to the maximum value: 15.
70+
),
71+
)
72+
73+
response = client.general.partition(
74+
request=request
75+
)
76+
element_dicts = [element for element in response.elements]
4277

43-
* **Type**: string
44-
45-
* **Description**: The type of validation error, categorizing the nature of the error.
78+
# Print the processed data's first element only.
79+
print(element_dicts[0])
80+
81+
# Write the processed data to a local file.
82+
json_elements = json.dumps(element_dicts, indent=2)
83+
84+
with open("PATH_TO_OUTPUT_FILE", "w") as file:
85+
file.write(json_elements)
86+
87+
except HTTPValidationError as e:
88+
print("Validation error (HTTP 422):", e)
89+
except ServerError as e:
90+
print("Server error (HTTP 5XX):", e)
91+
except ResponseValidationError as e:
92+
print("Response validation/type mismatch:", e)
93+
except UnstructuredClientError as e:
94+
# This catches any other UnstructuredClientError not already caught above.
95+
# This and all of the other error classes in this example expose the following members:
96+
print("Other Unstructured client error:")
97+
print(f"Message: {e.message}")
98+
print(f"Status code: {e.status_code}")
99+
print(f"Body: {e.body}")
100+
print(f"Raw response: {e.raw_response}")
101+
print(f"Headers:")
102+
103+
for header in e.headers.raw:
104+
key = header[0].decode('utf-8')
105+
value = header[1].decode('utf-8')
106+
print(f" {key}: {value}")
107+
108+
except httpx.ConnectError as e:
109+
print("HTTP connection error:", e)
110+
except httpx.TimeoutException as e:
111+
print("HTTP timeout error:", e)
112+
except httpx.RequestError as e:
113+
# This catches catch-all network errors from HTTP not already caught above.
114+
print("Other HTTPX request error:", e)
115+
except Exception as e:
116+
# Optional: this catches any other unforeseen errors.
117+
print("Unexpected error:", e)
118+
```
119+
120+
The results of running the preceding code are similar to the following:
121+
122+
```text
123+
Message: API error occurred: Status 401. Body: {"detail":"API key is missing, please provide an API key in the header."}
124+
Status code: 401
125+
Body: {"detail":"API key is missing, please provide an API key in the header."}
126+
Raw response: <Response [401 Unauthorized]>
127+
Headers:
128+
date: <date-and-time-of-the-error>
129+
server: <server-identifier>
130+
content-length: 73
131+
content-type: application/json
132+
```

0 commit comments

Comments
 (0)