Skip to content

Commit 09a5d86

Browse files
authored
Merge branch 'MicrosoftDocs:main' into master
2 parents fb76ec2 + 3e58998 commit 09a5d86

22 files changed

+333
-179
lines changed

articles/ai-services/document-intelligence/concept-custom-generative.md

Lines changed: 54 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,116 @@
11
---
2-
title: Azure AI Document Intelligence (formerly Form Recognizer) custom generative field extraction
2+
title: Azure AI Document Intelligence (formerly Form Recognizer) custom generative document field extraction
33
titleSuffix: Azure AI services
44
description: Custom generative AI model extracts user-specified fields from documents across a wide variety of visual templates.
55
author: laujan
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
8-
ms.custom:
9-
- ignite-2023
108
ms.topic: overview
11-
ms.date: 08/07/2024
9+
ms.date: 08/09/2024
1210
ms.author: lajanuar
1311
monikerRange: '>=doc-intel-4.0.0'
1412
---
1513

16-
# Document Intelligence custom generative model
14+
# Document Field extraction - custom generative AI model
1715

18-
[!INCLUDE [preview-version-notice](includes/preview-notice.md)]
16+
> [!IMPORTANT]
17+
>
18+
> * Document Intelligence public preview releases provide early access to features that are in active development. Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
19+
> * The public preview version of Document Intelligence client libraries default to REST API version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) and is currently only available in the following Azure regions.
20+
> * **East US**
21+
> * **West US2**
22+
> * **West Europe**
23+
> * **North Central US**
24+
>
25+
> * **The new custom generative model in AI Studio is only available in the North Central US region**:
1926
20-
The custom generative model combines the power of document understanding with Large Language Models (LLMs) and the rigor and schema from custom extraction capabilities. Custom generative extraction enables you to easily automate data extraction workflows for any type of document, with minimal labeling and greater accuracy and speed.
27+
The document field extraction (custom generative AI) model utilizes generative AI to extract user-specified fields from documents across a wide variety of visual templates. The custom generative AI model combines the power of document understanding with Large Language Models (LLMs) and the rigor and schema from custom extraction capabilities to create a model with high accuracy in minutes. With this generative model type, you can start with a single document and go through the schema addition and model creation process with minimal labeling. The custom generative model allows developers and enterprises to easily automate data extraction workflows with greater accuracy and speed for any type of document. The custom generative AI model excels in extracting simple fields from documents without labeled samples. However, providing a few labeled samples improves the extraction accuracy for complex fields and user-defined fields like tables. You can use the [REST API](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) or client libraries to submit a document for analysis with a model build and use the custom generative process.
2128

22-
## Custom generative model key features
29+
## Custom generative AI model benefits
2330

2431
* **Automatic labeling**. Utilize large language models (LLM) and extract user-specified fields for various document types and visual templates.
32+
2533
* **Improved Generalization**. Extract data from unstructured data and varying document templates with higher accuracy.
26-
* **Grounded results**. Localize the data extracted in documents and ensure the response is generated from the content and enables human review workflows.
27-
* **High confidence scores**. Use confidence scores and quickly filter high quality extracted data for downstream processing and lower manual review time.
34+
35+
* **Grounded results**. Localize the data extracted in the documents. Custom generative models ground the results where applicable, ensuring the response is generated from the content and enable human review workflows.
36+
37+
* **Confidence scores**. Use confidence scores for each extracted field to, filter high quality extracted data, maximize straight through processing of documents and minimize human review costs.
2838

2939
### Common use cases
3040

31-
* **Contract Lifecycle Management**. Build a generative model and extract the fields, clauses, and obligations from a wide array of contract types.  
32-
* **Loan & Mortgage Applications**. Automation of loan and mortgage application process enables banks, lenders, and government entities to quickly process loan and mortgage application.  
33-
* **Financial Services**. Analyze complex documents like financial reports and asset management reports.
34-
* **Expense management**. The custom generative model can extract expenses, receipts, and invoices with varying formats and templates.  
41+
* **Contract Lifecycle Management**. Build a generative model and extract the fields, clauses, and obligations from a wide array of contract types.
42+
43+
* **Loan & Mortgage Applications**. Automation of loan and mortgage application process enables banks, lenders, and government entities to quickly process loan and mortgage application.
44+
45+
* **Financial Services**. With the custom generative AI model, analyze complex documents like financial reports and asset management reports.
46+
47+
* **Expense management**. Receipts and invoices from various retailers and businesses need to be parsed to validate the expenses. The custom generative AI model can extract expenses across different formats and documents with varying templates.
48+
49+
### Managing the training dataset
50+
51+
With our other custom models, you need to maintain the dataset, add new samples, and train the model for accuracy improvements. With the custom generative AI model, the labeled documents are transformed, encrypted, and stored as part of the model. This process ensures that the model can continually use the labeled samples to improve the extraction quality. As with other custom models, models are stored in Microsoft storage, and you can delete them anytime.
52+
53+
The Document Intelligence service does manage your datasets, but your documents are stored encrypted and only used to improve the model results for your specific model. A service-manged key can be used to encrypt your data or it can be optionally encrypted with a customer managed key. The change in management and lifecycle of the dataset only applies to custom generative models.
3554

3655
## Model capabilities  
3756

38-
The Custom generative model currently supports dynamic table with the `2024-07-31-preview` and the following fields:
57+
Field extraction custom generative model currently supports dynamic table with the `2024-07-31-preview` and the following fields:
3958

4059
| Form fields | Selection marks | Tabular fields | Signature | Region labeling | Overlapping fields |
4160
|:--:|:--:|:--:|:--:|:--:|:--:|
4261
|Supported| Supported |Supported| Unsupported |Unsupported |Supported|
4362

4463
## Build mode  
4564

46-
The `build custom model` operation supports custom _template_, _neural_ and _generative_ models, _see_[Custom model build mode](concept-custom.md#build-mode):
65+
The `build custom model` operation supports custom **template**, **neural**, and **generative** models, _see_[Custom model build mode](concept-custom.md#build-mode). Here are the differences in the model types:
66+
67+
* **Custom generative AI models** can process complex documents with various formats, varied templates, and unstructured data.
68+
69+
* **Custom neural models** support complex document processing and also support more variance in pages for structured and semi-structured documents.
4770

48-
* **Custom generative models** can process complex documents in various formats, templates, and unstructured data.
49-
* **Custom neural models** support complex document processing and also support more variance in page for structured and semi-structured documents.
5071
* **Custom template models** rely on consistent visual templates, such as questionnaires or applications, to extract the labeled data.
5172

5273
## Languages and locale support
5374

54-
The custom generative model `2024-07-31-preview` version supports the **en-us** locale. For more information on language support, *see* [Language support - custom models](language-support-custom.md).
75+
Field extraction custom generative model `2024-07-31-preview` version supports the **en-us** locale. For more information on language support, _see_ [Language support - custom models](language-support-custom.md).
5576

5677
## Region support
5778

58-
The custom generative model `2024-07-31-preview` version is only available in `North Central US`.  
79+
Field extraction custom generative model `2024-07-31-preview` version is only available in `North Central US`.  
5980

60-
## Input requirements 
81+
## Input requirements
6182

6283
[!INCLUDE [input requirements](./includes/input-requirements.md)]
6384

6485
## Best practices  
6586

6687
* **Representative data**. Use representative documents that target actual data distribution, and train a high-quality custom generative model. For example, if the target document includes partially filled tabular fields, add training documents that consist of partially filled tables. Or if field is named date, values for this field should be a date as random strings can affect model performance.
88+
6789
* **Field naming**. Choose a precise field name that represents the field values. For example, for a field value containing the Transaction Date, consider naming the field _TransactionDate_ instead of `Date1`.
68-
* **Field Description**. Provide more contextual information in description to help clarify the field that needs to be extracted. Examples include location in the document, potential field labels it may be associated with, ways to differentiate with other terms that could be ambiguous.  
69-
* **Variation**. Custom generative models can generalize across different document templates of the same document type. As a best practice, create a single model for all variations of a document type. Ideally, include a visual template for each type, especially for ones that 
90+
91+
* **Field Description**. Provide more contextual information in description to help clarify the field that needs to be extracted. Examples include location in the document, potential field labels it can be associated with, and ways to differentiate with other terms that could be ambiguous.
92+
93+
* **Variation**. Custom generative models can generalize across different document templates of the same document type. As a best practice, create a single model for all variations of a document type. Ideally, include a visual template for each type, especially for ones that
7094

7195
## Service guidance
7296

7397
* The Custom Generative preview model doesn't currently support fixed table and signature extraction.
98+
7499
* Inference on the same document could yield slightly different results across calls and is a known limitation of current `GPT` models.
100+
75101
* Confidence scores for each field might vary. We recommend testing with your representative data to establish the confidence thresholds for your scenario.
76-
* Grounding, especially for tabular fields, is challenging and might not be perfect in some cases.  
102+
103+
* Grounding, especially for tabular fields, is challenging and might not be perfect in some cases.
104+
77105
* Latency for large documents is high and a known limitation in preview.
106+
78107
* Composed models don't support custom generative extraction.
79108

80109
## Training a model  
81110

82111
Custom generative models are available with the `2024-07-31-preview` version and later models.
83112

84-
The `build operation` to train model supports the ```buildMode``` property, to train a custom generative model, set the ```buildMode``` to ```generative```.
113+
The `build operation` to train model supports the `buildMode` property, to train a custom generative model, set the `buildMode` to `generative`.
85114

86115
```bash
87116

articles/ai-services/document-intelligence/includes/preview-notice.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ ms.date: 07/31/2024
1313

1414
> [!IMPORTANT]
1515
>
16-
> * Document Intelligence public preview releases provide early access to features that are in active development.
17-
> * Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
16+
> * Document Intelligence public preview releases provide early access to features that are in active development. Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
1817
> * The public preview version of Document Intelligence client libraries default to REST API version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true).
1918
> * Public preview version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) is currently only available in the following Azure regions. Note that the custom generative (document field extraction) model in AI Studio is only available in North Central US region:
2019
> * **East US**

articles/azure-monitor/agents/azure-monitor-agent-manage.md

Lines changed: 66 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -373,8 +373,7 @@ The AgentSettings DCR currently supports configuring the following parameters:
373373
374374
| Parameter | Description | Valid values |
375375
| --------- | ----------- | ----------- |
376-
377-
| `DiscQuotaInMb` | Defines the amount of disk space used by the Azure Monitor Agent log files and cache. | 1,000-50,000 (or 1-50 GB) |
376+
| `MaxDiskQuotaInMB` | Defines the amount of disk space used by the Azure Monitor Agent log files and cache. | 1000-50000 (in MB) |
378377
| `TimeReceivedForForwardedEvents` | Changes WEF column in the Sentinel WEF table to use TimeReceived instead of TimeGenerated data | 0 or 1 |
379378
380379
### Setting up AgentSettings DCR
@@ -397,9 +396,9 @@ N/A
397396
398397
[Install AMA](#installation-options) on your VM.
399398
400-
1. **Create a DCR via template deployment:**
399+
1. **Create a DCR:**
401400
402-
The following example changes the maximum amount of disk space used by AMA cache to 5 GB.
401+
This example sets the maximum amount of disk space used by AMA cache to 5000 MB.
403402
404403
```json
405404
{
@@ -431,24 +430,70 @@ N/A
431430
}
432431
```
433432
434-
> [!NOTE]
435-
> You can use the Get DataCollectionRule API to get the DCR payload you created with this template.
436-
437-
1. **Associate DCR with your machine:**
433+
1. **Associate the DCR with your machine:**
438434
439-
This can be done with a template or by using the [Create API](/rest/api/monitor/data-collection-rule-associations/create) with the following details:
440-
441-
* **AssociationName:** agentSettings
442-
* **ResourceUri:** Full ARM ID of the VM
443-
* **api-version:** 2023-03-11 (Old API version is also fine)
444-
* **Body:**
445-
```json
446-
{
447-
"properties": {
448-
"dataCollectionRuleId": “Full ARM ID for agent setting DCR”
449-
}
450-
}
451-
```
435+
Use these ARM template and parameter files:
436+
437+
**ARM template file**
438+
439+
```json
440+
{
441+
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
442+
"contentVersion": "1.0.0.0",
443+
"parameters": {
444+
"vmName": {
445+
"type": "string",
446+
"metadata": {
447+
"description": "The name of the virtual machine."
448+
}
449+
},
450+
"associationName": {
451+
"type": "string",
452+
"metadata": {
453+
"description": "The name of the association."
454+
}
455+
},
456+
"dataCollectionRuleId": {
457+
"type": "string",
458+
"metadata": {
459+
"description": "The resource ID of the data collection rule."
460+
}
461+
}
462+
},
463+
"resources": [
464+
{
465+
"type": "Microsoft.Insights/dataCollectionRuleAssociations",
466+
"apiVersion": "2021-09-01-preview",
467+
"scope": "[format('Microsoft.Compute/virtualMachines/{0}', parameters('vmName'))]",
468+
"name": "[parameters('associationName')]",
469+
"properties": {
470+
"description": "Association of data collection rule. Deleting this association will break the data collection for this virtual machine.",
471+
"dataCollectionRuleId": "[parameters('dataCollectionRuleId')]"
472+
}
473+
}
474+
]
475+
}
476+
```
477+
478+
**Parameter file**
479+
480+
```json
481+
{
482+
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
483+
"contentVersion": "1.0.0.0",
484+
"parameters": {
485+
"vmName": {
486+
"value": "my-azure-vm"
487+
},
488+
"associationName": {
489+
"value": "my-windows-vm-my-dcr"
490+
},
491+
"dataCollectionRuleId": {
492+
"value": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/my-resource-group/providers/microsoft.insights/datacollectionrules/my-dcr"
493+
}
494+
}
495+
}
496+
```
452497
453498
1. **Activate the settings:**
454499

articles/azure-monitor/alerts/resource-manager-alerts-metric.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4911,22 +4911,23 @@ resource metricAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
49114911
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
49124912
"contentVersion": "1.0.0.0",
49134913
"metadata": {
4914-
"parameters": {
4915-
"appName": {
4916-
"type": "string"
4917-
},
4918-
"pingURL": {
4919-
"type": "string"
4920-
},
4921-
"pingText": {
4922-
"type": "string",
4923-
"defaultValue": ""
4924-
},
4925-
"actionGroupId": {
4926-
"type": "string"
4927-
},
4928-
"location": {
4929-
"type": "string"
4914+
"parameters": {
4915+
"appName": {
4916+
"type": "string"
4917+
},
4918+
"pingURL": {
4919+
"type": "string"
4920+
},
4921+
"pingText": {
4922+
"type": "string",
4923+
"defaultValue": ""
4924+
},
4925+
"actionGroupId": {
4926+
"type": "string"
4927+
},
4928+
"location": {
4929+
"type": "string"
4930+
}
49304931
}
49314932
},
49324933
"variables": {

articles/azure-monitor/app/opentelemetry-configuration.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,14 @@ This article covers configuration settings for the Azure Monitor OpenTelemetry d
1515

1616
## Connection string
1717

18-
A connection string in Application Insights defines the target location for sending telemetry data, ensuring it reaches the appropriate resource for monitoring and analysis.
19-
18+
A connection string in Application Insights defines the target location for sending telemetry data.
2019
### [ASP.NET Core](#tab/aspnetcore)
2120

2221
Use one of the following three ways to configure the connection string:
2322

24-
- Add `UseAzureMonitor()` to your application startup, in your `program.cs` class.
23+
- Add `UseAzureMonitor()` to your `program.cs` file:
2524

2625
```csharp
27-
// Create a new ASP.NET Core web application builder.
2826
var builder = WebApplication.CreateBuilder(args);
2927

3028
// Add the OpenTelemetry telemetry service to the application.
@@ -33,10 +31,8 @@ Use one of the following three ways to configure the connection string:
3331
options.ConnectionString = "<Your Connection String>";
3432
});
3533

36-
// Build the ASP.NET Core web application.
3734
var app = builder.Build();
3835

39-
// Start the ASP.NET Core web application.
4036
app.Run();
4137
```
4238

articles/azure-monitor/best-practices-logs.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,11 @@ This article provides architectural best practices for Azure Monitor Logs. The g
1414

1515

1616
## Reliability
17-
[Reliability](/azure/well-architected/resiliency/overview) refers to the ability of a system to recover from failures and continue to function. Instead of trying to prevent failures altogether in the cloud, the goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Log Analytics workspaces and to protect the data they collect.
17+
[Reliability](/azure/well-architected/resiliency/overview) refers to the ability of a system to recover from failures and continue to function. The goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Log Analytics workspaces and to protect the data they collect.
18+
19+
This video provides an overview of reliability and resilience options available for Log Analytics workspaces:
20+
21+
> [!VIDEO https://www.youtube.com/embed/CYspm1Yevx8?cc_load_policy=1&cc_lang_pref=auto]
1822
1923
[!INCLUDE [waf-logs-reliability](includes/waf-logs-reliability.md)]
2024

0 commit comments

Comments
 (0)