You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-add-on-capabilities.md
+44-1Lines changed: 44 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,11 @@ Document Intelligence supports more sophisticated and modular analysis capabilit
48
48
49
49
*[`languages`](#language-detection)
50
50
51
+
Starting with `2024-07-31-preview` release, the Read model supports searchable PDF output:
52
+
53
+
*[`Searchable PDF](#searchable-pdf)
54
+
55
+
51
56
:::moniker-end
52
57
53
58
:::moniker range="doc-intel-4.0.0"
@@ -58,7 +63,7 @@ Document Intelligence supports more sophisticated and modular analysis capabilit
58
63
>
59
64
> * Add-on capabilities are currently not supported for Microsoft Office file types.
60
65
61
-
The following add-on capabilities are available for`2024-02-29-preview`, `2024-02-29-preview`, and later releases:
66
+
Document Intelligence supports optional features that can be enabled and disabled depending on the document extraction scenario. The following add-on capabilities are available for`2023-10-31-preview`, and later releases:
62
67
63
68
*[`keyValuePairs`](#key-value-pairs)
64
69
@@ -927,6 +932,44 @@ for lang_idx, lang in enumerate(result.languages):
927
932
928
933
::: moniker range="doc-intel-4.0.0"
929
934
935
+
## Searchable PDF
936
+
937
+
The searchable PDF capability enables you to convert an analog PDF, such as scanned-image PDF files, to a PDF with embedded text. The embedded text enables deep text search within the PDF's extracted content by overlaying the detected text entities on top of the image files.
938
+
939
+
> [!IMPORTANT]
940
+
>
941
+
> * Currently, the searchable PDF capability is only supported by Read OCR model `prebuilt-read`. When using this feature, please specify the `modelId` as `prebuilt-read`, as other model types will return error for this preview version.
942
+
> * Searchable PDF is included with the 2024-07-31-preview `prebuilt-read` model with no usage cost for general PDF consumption.
943
+
944
+
### Use searchable PDF
945
+
946
+
To use searchable PDF, make a `POST` request using the `Analyze` operation and specify the output format as `pdf`:
947
+
948
+
```bash
949
+
950
+
POST /documentModels/prebuilt-read:analyze?output=pdf
951
+
{...}
952
+
202
953
+
```
954
+
955
+
Once the `Analyze` operation is complete, make a `GET` request to retrieve the `Analyze` operation results.
956
+
957
+
Upon successful completion, the PDF can be retrieved and downloaded as `application/pdf`. This operation allows direct downloading of the embedded text form of PDF instead of Base64-encoded JSON.
958
+
959
+
```bash
960
+
961
+
// Monitor the operation until completion.
962
+
GET /documentModels/prebuilt-read/analyzeResults/{resultId}
963
+
200
964
+
{...}
965
+
966
+
// Upon successful completion, retrieve the PDF as application/pdf.
967
+
GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
968
+
200 OK
969
+
Content-Type: application/pdf
970
+
```
971
+
972
+
930
973
## Key-value Pairs
931
974
932
975
In earlier API versions, the prebuilt-document model extracted key-value pairs from forms and documents. With the addition of the `keyValuePairs` feature to prebuilt-layout, the layout model now produces the same results.
0 commit comments