-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
SplitPdfHook ignores custom HTTPClient configuration
Description
When providing a custom httpClient to UnstructuredClient, the SplitPdfHook ignores it and creates its own HTTPClientExtension with default settings. This makes it impossible to configure socket-level timeouts for split PDF operations.
Current Behavior
In SplitPdfHook.ts, the sdkInit method discards the user-provided client:
sdkInit(opts: SDKInitOptions): SDKInitOptions {
const { baseURL } = opts; // client is ignored
this.client = new HTTPClientExtension(); // creates new client with defaults
// ...
return { baseURL: baseURL, client: this.client };
}This causes split PDF requests to use Node.js's default undici headersTimeout of 5 minutes, even when users configure extended timeouts via custom HTTPClient.
Expected Behavior
The SplitPdfHook should respect the user-provided httpClient configuration, either by:
- Passing the custom client through to
HTTPClientExtension - Using the custom client's fetcher for split PDF requests
- Exposing timeout configuration options that apply to split PDF operations
Reproduction
import { Agent, fetch as undiciFetch } from 'undici';
import { UnstructuredClient } from 'unstructured-client';
import { HTTPClient } from 'unstructured-client/lib/http';
const agent = new Agent({
headersTimeout: 20 * 60 * 1000, // 20 minutes
bodyTimeout: 20 * 60 * 1000,
});
const customFetch = (input, init) => undiciFetch(input, { ...init, dispatcher: agent });
const httpClient = new HTTPClient({ fetcher: customFetch });
const client = new UnstructuredClient({
security: { apiKeyAuth: 'xxx' },
httpClient, // This is ignored when splitPdfPage=true
timeoutMs: 20 * 60 * 1000,
});
// Large PDF with splitPdfPage=true will timeout at 5 minutes despite configWorkaround
Currently requires setting a global undici dispatcher via setGlobalDispatcher(), which affects all fetch calls in the process.
Environment
unstructured-client: 0.25.1- Node.js: 20.x
Metadata
Metadata
Assignees
Labels
No labels