|
1 | 1 | --- |
2 | | -title: Endpoint validation errors |
3 | | -description: This section details the structure of HTTP validation errors returned by the Unstructured Partition Endpoint. |
| 2 | +title: Endpoint errors |
4 | 3 | --- |
5 | 4 |
|
6 | | -## HTTPValidationError |
| 5 | +For the [Unstructured Python SDK](/api-reference/partition/sdk-python), |
| 6 | +the [Unstructured Partition Endpoint](/api-reference/partition/overview) returns errors primarily through |
| 7 | +the `UnstructuredClientError` class (the base class for all errors raised by the Unstructured Python SDK) and |
| 8 | +the `HTTPValidationError` class (inherited from `UnstructuredClientError`). Less common errors are returned through the following classes: |
7 | 9 |
|
8 | | -**Type**: object |
| 10 | +- `httpx.RequestError`, the base class for request errors. |
| 11 | +- `httpx.ConnectError`, for HTTP connection request errors. |
| 12 | +- `httpx.TimeoutException`, for HTTP request timeout errors. |
| 13 | +- `ServerError` (inherited from `UnstructuredClientError`), for server-side errors. |
| 14 | +- `ResponseValidationError` (inherited from `UnstructuredClientError`), for type mismatches between the response data and the expected Pydantic model. |
9 | 15 |
|
10 | | -**Title**: HTTPValidationError |
| 16 | +Each of the preceding classes has the following members: |
11 | 17 |
|
12 | | -**Detail** |
| 18 | +| Member | Type | Description | |
| 19 | +|--------|------|-------------| |
| 20 | +| `message` | `str` | The eror message. | |
| 21 | +| `status_code` | `int` | The HTTP response status code, for example `401`. | |
| 22 | +| `headers` | `httpx.Headers` | A collection of HTTP response headers. | |
| 23 | +| `body` | `str` | The HTTP body. This can be an empty string if no body is returned. | |
| 24 | +| `raw_response` | `httpx.Response` | The raw HTTP response. |
13 | 25 |
|
14 | | -* **Type**: array |
15 | | - |
16 | | -* **Description**: An array of ValidationError items, providing detailed information about the validation errors encountered. |
17 | | - |
| 26 | +The following example shows how to handle the preceding errors. In this example, |
| 27 | +a required Unstructured API key is intentionally commented out of the code, so that a |
| 28 | +`401 Unauthorized` error is intentionally thrown. |
18 | 29 |
|
19 | | -## ValidationError |
| 30 | +```python |
| 31 | +import os |
| 32 | +import json |
20 | 33 |
|
21 | | -**Type**: object |
| 34 | +from unstructured_client import UnstructuredClient |
| 35 | +from unstructured_client.models.operations import PartitionRequest |
| 36 | +from unstructured_client.models.shared import ( |
| 37 | + PartitionParameters, |
| 38 | + Files, |
| 39 | + Strategy |
| 40 | +) |
| 41 | +from unstructured_client.models.errors import ( |
| 42 | + UnstructuredClientError, |
| 43 | + HTTPValidationError |
| 44 | +) |
| 45 | +from unstructured_client.models.errors.servererror import ServerError |
| 46 | +from unstructured_client.models.errors.responsevalidationerror import ResponseValidationError |
| 47 | +import httpx |
22 | 48 |
|
23 | | -**Title**: ValidationError |
| 49 | +try: |
| 50 | + client = UnstructuredClient( |
| 51 | + # For example, intentionally leave out the API key to intentionally throw an error. |
| 52 | + # api_key_auth=os.getenv("UNSTRUCTURED_API_KEY") |
| 53 | + ) |
24 | 54 |
|
25 | | -**Required Fields**: loc, msg, type |
| 55 | + filename = "PATH_TO_INPUT_FILE" |
26 | 56 |
|
27 | | -* **Location (loc)** |
28 | | - |
29 | | - * **Type**: array |
30 | | - |
31 | | - * **Description**: The location of the validation error in the request. Each item in the array can be either a string (e.g., field name) or an integer (e.g., array index). |
32 | | - |
33 | | - |
34 | | -* **Message (msg)** |
35 | | - |
36 | | - * **Type**: string |
37 | | - |
38 | | - * **Description**: A descriptive message about the validation error. |
39 | | - |
40 | | - |
41 | | -* **Error Type (type)** |
| 57 | + request = PartitionRequest( |
| 58 | + partition_parameters=PartitionParameters( |
| 59 | + files=Files( |
| 60 | + content=open(filename, "rb"), |
| 61 | + file_name=filename, |
| 62 | + ), |
| 63 | + strategy=Strategy.VLM, |
| 64 | + vlm_model="gpt-4o", |
| 65 | + vlm_model_provider="openai", |
| 66 | + languages=['eng'], |
| 67 | + split_pdf_page=True, # If True, splits the PDF file into smaller chunks of pages. |
| 68 | + # split_pdf_allow_failed=True, # If True, the partitioning continues even if some pages fail. |
| 69 | + split_pdf_concurrency_level=15 # Set the number of concurrent request to the maximum value: 15. |
| 70 | + ), |
| 71 | + ) |
| 72 | + |
| 73 | + response = client.general.partition( |
| 74 | + request=request |
| 75 | + ) |
| 76 | + element_dicts = [element for element in response.elements] |
42 | 77 |
|
43 | | - * **Type**: string |
44 | | - |
45 | | - * **Description**: The type of validation error, categorizing the nature of the error. |
| 78 | + # Print the processed data's first element only. |
| 79 | + print(element_dicts[0]) |
| 80 | + |
| 81 | + # Write the processed data to a local file. |
| 82 | + json_elements = json.dumps(element_dicts, indent=2) |
| 83 | + |
| 84 | + with open("PATH_TO_OUTPUT_FILE", "w") as file: |
| 85 | + file.write(json_elements) |
| 86 | + |
| 87 | +except HTTPValidationError as e: |
| 88 | + print("Validation error (HTTP 422):", e) |
| 89 | +except ServerError as e: |
| 90 | + print("Server error (HTTP 5XX):", e) |
| 91 | +except ResponseValidationError as e: |
| 92 | + print("Response validation/type mismatch:", e) |
| 93 | +except UnstructuredClientError as e: |
| 94 | + # This catches any other UnstructuredClientError not already caught above. |
| 95 | + # This and all of the other error classes in this example expose the following members: |
| 96 | + print("Other Unstructured client error:") |
| 97 | + print(f"Message: {e.message}") |
| 98 | + print(f"Status code: {e.status_code}") |
| 99 | + print(f"Body: {e.body}") |
| 100 | + print(f"Raw response: {e.raw_response}") |
| 101 | + print(f"Headers:") |
| 102 | + |
| 103 | + for header in e.headers.raw: |
| 104 | + key = header[0].decode('utf-8') |
| 105 | + value = header[1].decode('utf-8') |
| 106 | + print(f" {key}: {value}") |
| 107 | + |
| 108 | +except httpx.ConnectError as e: |
| 109 | + print("HTTP connection error:", e) |
| 110 | +except httpx.TimeoutException as e: |
| 111 | + print("HTTP timeout error:", e) |
| 112 | +except httpx.RequestError as e: |
| 113 | + # This catches catch-all network errors from HTTP not already caught above. |
| 114 | + print("Other HTTPX request error:", e) |
| 115 | +except Exception as e: |
| 116 | + # Optional: this catches any other unforeseen errors. |
| 117 | + print("Unexpected error:", e) |
| 118 | +``` |
| 119 | + |
| 120 | +The results of running the preceding code are similar to the following: |
| 121 | + |
| 122 | +```text |
| 123 | +Message: API error occurred: Status 401. Body: {"detail":"API key is missing, please provide an API key in the header."} |
| 124 | +Status code: 401 |
| 125 | +Body: {"detail":"API key is missing, please provide an API key in the header."} |
| 126 | +Raw response: <Response [401 Unauthorized]> |
| 127 | +Headers: |
| 128 | + date: <date-and-time-of-the-error> |
| 129 | + server: <server-identifier> |
| 130 | + content-length: 73 |
| 131 | + content-type: application/json |
| 132 | +``` |
0 commit comments