You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Parameter to send custom page range when splitting pdf (#101)
To match the python feature:
Unstructured-IO/unstructured-python-client#125
# New parameter
Add a client-side param called `splitPdfPageRange` which takes a list of
two integers, `[start, end]`. If `splitPdfPage` is `true` and a range is
set, slice the doc from `start` up to and including `end`. Only this
page range will be sent to the API. The subset of pages is still split
up as needed. If `[start, end]` is out of bounds, throw an error to the
user.
# Testing
Check out this branch and set up a request to your local API:
```
const client = new UnstructuredClient({
serverURL: "http://localhost:8000",
security: {
apiKeyAuth: key,
},
});
const filename = "layout-parser-paper.pdf";
const data = fs.readFileSync(filename);
client.general.partition({
partitionParameters: {
files: {
content: data,
fileName: filename,
},
strategy: Strategy.Fast,
splitPdfPage: true,
splitPdfPageRange: [4, 8],
}
}).then((res: PartitionResponse) => {
if (res.statusCode == 200) {
console.log(res.elements);
}
}).catch((e) => {
if (e.statusCode) {
console.log(e.statusCode);
console.log(e.body);
} else {
console.log(e);
}
});
```
Test out various page ranges and confirm that the returned elements are
within the range. Invalid ranges should throw a useful Error (pages are
out of bounds, or end_page < start_page).
"description": "When `split_pdf_page is set to `True`, this parameter selects a subset of the pdf to send to the API. The parameter is a list of 2 integers within the range [1, length_of_pdf]. An Error is thrown if the given range is invalid. Ignored on backend.",
Copy file name to clipboardExpand all lines: src/sdk/models/shared/partitionparameters.ts
+9Lines changed: 9 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,10 @@ export type PartitionParameters = {
148
148
* Should the pdf file be split at client. Ignored on backend.
149
149
*/
150
150
splitPdfPage?: boolean|undefined;
151
+
/**
152
+
* When `split_pdf_page is set to `True`, this parameter selects a subset of the pdf to send to the API. The parameter is a list of 2 integers within the range [1, length_of_pdf]. An Error is thrown if the given range is invalid. Ignored on backend.
153
+
*/
154
+
splitPdfPageRange?: Array<number>|undefined;
151
155
/**
152
156
* When PDF is split into pages before sending it into the API, providing this information will allow the page number to be assigned correctly. Introduced in 1.0.27.
0 commit comments