Skip to content

Commit 6b5ce2d

Browse files
author
Yalin Li
authored
[DI] Add tests and samples for new API (Azure#37197)
1 parent d85e6c3 commit 6b5ce2d

31 files changed

+1638
-129
lines changed

sdk/documentintelligence/azure-ai-documentintelligence/CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Release History
22

3-
## 1.0.0b4 (Unreleased)
3+
## 1.0.0b4 (2024-09-05)
44

55
### Features Added
66
- Added support for the Analyze Batch Documents API:
@@ -22,6 +22,7 @@
2222
- Added property `allow_overwrite` to model `BuildDocumentClassifierRequest`.
2323
- Added properties `allow_overwrite` and `max_training_hours` to model `BuildDocumentModelRequest`.
2424
- Added properties `classifier_id`, `split` and `doc_types` to model `ComposeDocumentModelRequest`.
25+
- Added support for getting `operation_id` via `details` property in the new return types `AnalyzeDocumentLROPoller` and `AsyncAnalyzeDocumentLROPoller` in operation `begin_analyze_document()`.
2526

2627
### Breaking Changes
2728
- Removed support for extracting lists from analyzed documents:

sdk/documentintelligence/azure-ai-documentintelligence/README.md

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Azure AI Document Intelligence ([previously known as Form Recognizer][service-re
1818

1919
## _Disclaimer_
2020

21-
_The API version 2024-02-29-preview is currently only available in some Azure regions, the available regions can be found from [here][python-di-available-regions]._
21+
_The latest service API is currently only available in some Azure regions, the available regions can be found from [here][python-di-available-regions]._
2222

2323
## Getting started
2424

@@ -195,6 +195,8 @@ Sample code snippets are provided to illustrate using long-running operations [b
195195
The following section provides several code snippets covering some of the most common Document Intelligence tasks, including:
196196

197197
* [Extract Layout](#extract-layout "Extract Layout")
198+
* [Extract Figures from Documents](#extract-figures-from-documents "Extract Figures from Documents")
199+
* [Analyze Documents Result in PDF](#analyze-documents-result-in-pdf "Analyze Documents Result in PDF")
198200
* [Using the General Document Model](#using-the-general-document-model "Using the General Document Model")
199201
* [Using Prebuilt Models](#using-prebuilt-models "Using Prebuilt Models")
200202
* [Build a Custom Model](#build-a-custom-model "Build a custom model")
@@ -303,6 +305,81 @@ print("----------------------------------------")
303305

304306
<!-- END SNIPPET -->
305307

308+
### Extract Figures from Documents
309+
310+
Extract figures from the document as cropped images.
311+
312+
<!-- SNIPPET:sample_analyze_result_figures.analyze_result_figures -->
313+
314+
```python
315+
from azure.core.credentials import AzureKeyCredential
316+
from azure.ai.documentintelligence import DocumentIntelligenceClient
317+
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult
318+
319+
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
320+
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
321+
322+
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
323+
324+
with open(path_to_sample_documents, "rb") as f:
325+
poller = document_intelligence_client.begin_analyze_document(
326+
"prebuilt-layout",
327+
analyze_request=f,
328+
output=[AnalyzeOutputOption.FIGURES],
329+
content_type="application/octet-stream",
330+
)
331+
result: AnalyzeResult = poller.result()
332+
operation_id = poller.details["operation_id"]
333+
334+
if result.figures:
335+
for figure in result.figures:
336+
if figure.id:
337+
response = document_intelligence_client.get_analyze_result_figure(
338+
model_id=result.model_id, result_id=operation_id, figure_id=figure.id
339+
)
340+
with open(f"{figure.id}.png", "wb") as writer:
341+
writer.writelines(response)
342+
else:
343+
print("No figures found.")
344+
```
345+
346+
<!-- END SNIPPET -->
347+
348+
### Analyze Documents Result in PDF
349+
350+
Convert an analog PDF into a PDF with embedded text. Such text can enable text search within the PDF or allow the PDF to be used in LLM chat scenarios.
351+
352+
_Note: For now, this feature is only supported by `prebuilt-read`. All other models will return error._
353+
354+
<!-- SNIPPET:sample_analyze_result_pdf.analyze_result_pdf -->
355+
356+
```python
357+
from azure.core.credentials import AzureKeyCredential
358+
from azure.ai.documentintelligence import DocumentIntelligenceClient
359+
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult
360+
361+
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
362+
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
363+
364+
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
365+
366+
with open(path_to_sample_documents, "rb") as f:
367+
poller = document_intelligence_client.begin_analyze_document(
368+
"prebuilt-read",
369+
analyze_request=f,
370+
output=[AnalyzeOutputOption.PDF],
371+
content_type="application/octet-stream",
372+
)
373+
result: AnalyzeResult = poller.result()
374+
operation_id = poller.details["operation_id"]
375+
376+
response = document_intelligence_client.get_analyze_result_pdf(model_id=result.model_id, result_id=operation_id)
377+
with open("analyze_result.pdf", "wb") as writer:
378+
writer.writelines(response)
379+
```
380+
381+
<!-- END SNIPPET -->
382+
306383
### Using the General Document Model
307384

308385
Analyze key-value pairs, tables, styles, and selection marks from documents using the general document model provided by the Document Intelligence service.

sdk/documentintelligence/azure-ai-documentintelligence/assets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
"AssetsRepo": "Azure/azure-sdk-assets",
33
"AssetsRepoPrefixPath": "python",
44
"TagPrefix": "python/documentintelligence/azure-ai-documentintelligence",
5-
"Tag": "python/documentintelligence/azure-ai-documentintelligence_d5576d9de8"
5+
"Tag": "python/documentintelligence/azure-ai-documentintelligence_c952134951"
66
}

sdk/documentintelligence/azure-ai-documentintelligence/azure/ai/documentintelligence/__init__.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,21 @@
66
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
77
# --------------------------------------------------------------------------
88

9-
from ._client import DocumentIntelligenceClient
10-
from ._client import DocumentIntelligenceAdministrationClient
9+
from ._patch import DocumentIntelligenceClient
10+
from ._patch import DocumentIntelligenceAdministrationClient
1111
from ._version import VERSION
1212

1313
__version__ = VERSION
1414

15-
try:
16-
from ._patch import __all__ as _patch_all
17-
from ._patch import * # pylint: disable=unused-wildcard-import
18-
except ImportError:
19-
_patch_all = []
15+
16+
from ._patch import AnalyzeDocumentLROPoller
2017
from ._patch import patch_sdk as _patch_sdk
2118

2219
__all__ = [
20+
"AnalyzeDocumentLROPoller",
2321
"DocumentIntelligenceClient",
2422
"DocumentIntelligenceAdministrationClient",
2523
]
26-
__all__.extend([p for p in _patch_all if p not in __all__])
24+
2725

2826
_patch_sdk()

sdk/documentintelligence/azure-ai-documentintelligence/azure/ai/documentintelligence/_operations/__init__.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,15 @@
66
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
77
# --------------------------------------------------------------------------
88

9-
from ._operations import DocumentIntelligenceClientOperationsMixin
10-
from ._operations import DocumentIntelligenceAdministrationClientOperationsMixin
9+
from ._patch import DocumentIntelligenceClientOperationsMixin
10+
from ._patch import DocumentIntelligenceAdministrationClientOperationsMixin
11+
1112

12-
from ._patch import __all__ as _patch_all
13-
from ._patch import * # pylint: disable=unused-wildcard-import
1413
from ._patch import patch_sdk as _patch_sdk
1514

1615
__all__ = [
1716
"DocumentIntelligenceClientOperationsMixin",
1817
"DocumentIntelligenceAdministrationClientOperationsMixin",
1918
]
20-
__all__.extend([p for p in _patch_all if p not in __all__])
19+
2120
_patch_sdk()

0 commit comments

Comments
 (0)