You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-document-intelligence-layout.md
+16-10Lines changed: 16 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Document Intelligence Layout skill
2
+
title: Document Layout skill
3
3
titleSuffix: Azure AI Search
4
4
description: Analyze a document to extract regions of interest and their inter-relationships to produce a syntactical representation (markdown format) in an enrichment pipeline in Azure AI Search.
5
5
@@ -10,17 +10,23 @@ ms.service: azure-ai-search
10
10
ms.custom:
11
11
- references_regions
12
12
ms.topic: reference
13
-
ms.date: 10/10/2024
13
+
ms.date: 11/19/2024
14
14
---
15
-
# Document Intelligence Layout skill
16
15
17
-
The **Document Intelligence Layout** skill analyzes a document to extract regions of interest and their inter-relationships to produce a syntactical representation (markdown format). This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview). This article is the reference documentation for the Document Intelligence Layout skill.
16
+
# Document Layout skill
18
17
19
-
+ The **Document Intelligence Layout** skill uses [Document Intelligence Public preview version 2024-07-31-preview](/rest/api/aiservices/operation-groups?view=rest-aiservices-v4.0%20(2024-07-31-preview)&preserve-view=true). It's currently only available in the following Azure regions:
The **Document Layout** skill analyzes a document to extract regions of interest and their inter-relationships to produce a syntactical representation (markdown format). This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview).
21
+
22
+
This article is the reference documentation for the Document Layout skill.
23
+
24
+
The **Document Layout** skill calls the [Document Intelligence Public preview version 2024-07-31-preview](/rest/api/aiservices/operation-groups?view=rest-aiservices-v4.0%20(2024-07-31-preview)&preserve-view=true). It's currently only available in the following Azure regions:
+ For PDF and TIFF, up to 2,000 pages can be processed (with a free tier subscription, only the first two pages are processed).
48
55
+ The file size for analyzing documents is 500 MB for [Azure AI Document Intelligence paid (S0) tier](https://azure.microsoft.com/pricing/details/cognitive-services/) and 4 MB for [Azure AI Document Intelligence free (F0) tier](https://azure.microsoft.com/pricing/details/cognitive-services/).
49
56
+ Image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
50
57
+ If your PDFs are password-locked, you must remove the lock before submission.
In Azure AI Search, indexers for Azure Blob Storage and Azure Files support a `markdown` parsing mode for Markdown files. Markdown files can be indexed in two ways:
22
+
In Azure AI Search, indexers for Azure Blob Storage and Azure Files support a `markdown` parsing mode for Markdown files. Markdown files can be indexed in two ways:
Text data chunking strategies play a key role in optimizing the RAG response and performance. Semantic chunking is to find semantically coherent fragments of a sentence representation. These fragments can then be processed independently and recombined as semantic representations without loss of information, interpretation, or semantic relevance. The inherent meaning of the text is used as a guide for the chunking process. Markdown is a structured and formatted markup language and a popular input for enabling semantic chunking in RAG (Retrieval-Augmented Generation)
16
19
17
-
The Document Intelligence Layout skill offers a comprehensive solution for advanced content extraction and chunk functionality. With the Layout skill, you can easily extract document layout and content as markdown format and utilize markdown parsing mode to produce a set of document chunks
20
+
The Document Layout skill offers a comprehensive solution for advanced content extraction and chunk functionality. With the Layout skill, you can easily extract document layout and content as markdown format and utilize markdown parsing mode to produce a set of document chunks
18
21
19
22
This article shows:
20
-
+ How to use the document intelligence layout skill to extract markdown sections
23
+
+ How to use the Document Layout skill to extract markdown sections
21
24
+ How to apply split skill to constrain chunk size within each markdown section
22
25
+ Generate embeddings for the content within those sections
23
26
+ How to use index projections to compile and write them into a search index.
24
27
25
28
## Prerequisites
29
+
26
30
+ An [indexer-based indexing pipeline](search-indexer-overview.md).
27
31
+ An index that accepts the output of the indexer pipeline.
28
32
+ A [supported data source](search-indexer-overview.md#supported-data-sources) having content that you want to chunk.
29
-
+ A [Document Intelligence Layout skill](cognitive-search-skill-document-intelligence-layout.md) that splits documents based on paragraph boundaries.
33
+
+ A [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) that splits documents based on paragraph boundaries.
30
34
+ An [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) that generates vector embeddings
31
35
+ An [index projection](search-how-to-define-index-projections.md) for one-to-many indexing
32
36
33
37
## Prepare data files
34
38
35
-
The raw inputs must be in a [supported data source](search-indexer-overview.md#supported-data-sources) and the file needs to be a format which [Document Intelligence Layout skill](cognitive-search-skill-document-intelligence-layout.md) supports.
39
+
The raw inputs must be in a [supported data source](search-indexer-overview.md#supported-data-sources) and the file needs to be a format which [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) supports.
@@ -164,7 +168,7 @@ An index must exist on the search service before you create the skill set or run
164
168
165
169
You can use the REST APIs to [create or update a skill set](cognitive-search-defining-skillset.md).
166
170
167
-
Here's an example skill set definition payload to project individual markdown sections chunks and their vector outputs as documents in the search index using the [Document Intelligence Layout skill](cognitive-search-skill-document-intelligence-layout.md) and [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md)
171
+
Here's an example skill set definition payload to project individual markdown sections chunks and their vector outputs as documents in the search index using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) and [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md)
168
172
169
173
```json
170
174
{
@@ -286,7 +290,7 @@ Here's an example skill set definition payload to project individual markdown se
286
290
## Run the indexer
287
291
Once you create a data source, indexes, and skill set, you're ready to [create and run the indexer](search-howto-create-indexers.md#run-the-indexer). This step puts the pipeline into execution.
288
292
289
-
When using the [Document Intelligence Layout skill](cognitive-search-skill-document-intelligence-layout.md), make sure to set the following parameters on the indexer definition:
293
+
When using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md), make sure to set the following parameters on the indexer definition:
290
294
+ The `allowSkillsetToReadFileData` parameter should be set to "true."
291
295
+ the `parsingMode` parameter should be set to "default."
292
296
@@ -330,10 +334,11 @@ POST /indexes/[index name]/docs/search?api-version=[api-version]
330
334
```
331
335
332
336
## See also
337
+
333
338
+[Create a data source](search-howto-indexing-azure-blob-storage.md)
334
339
+[Define an index projection](search-how-to-define-index-projections.md)
335
340
+[How to define a skill set](cognitive-search-defining-skillset.md)
Azure AI Search can index Markdown documents and arrays in Azure Blob Storage using an [indexer](search-indexer-overview.md) that knows how to read Markdown data.
19
21
20
22
This tutorial shows you to index Markdown files indexed using the `oneToMany` Markdown parsing mode. It uses a REST client and the [Search REST APIs](/rest/api/searchservice/) to perform the following tasks:
|[**Add Azure AI Search to a network security perimeter**](search-security-network-security-perimiter.md)| Security | Join a search service to a [network security perimeter](/azure/private-link/network-security-perimeter-concepts) to control network access to your search service. The Azure portal and the Management REST APIs in the [2024-06-01-preview](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2024-06-01-preview&preserve-view=true) can be used to view and reconcile network security perimeter configurations. |
27
27
|[**Query rewrite in the semantic reranker**](semantic-how-to-query-rewrite.md)| Relevance (scoring) | You can set options on a semantic query to rewrite the query input into a revised or expanded query that generates more relevant results from the L2 ranker. Available in the [Search Documents (2024-11-01-preview)](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-11-01-preview&preserve-view=true), the Azure portal, and in the Azure SDK beta packages that provide this feature.|
28
+
|[**New semantic ranker models**](semantic-search-overview.md)| Relevance (scoring) | Semantic ranker runs with improved models in all supported regions. There is no change to APIs or the portal experience. |
28
29
|[**Document Layout skill**](cognitive-search-skill-document-intelligence-layout.md)| Applied AI (skills) | A new skill used to analyze a document for structure and provide [structure-aware chunking](search-how-to-semantic-chunking.md). This skill calls Document Intelligence and uses the Document Intelligence layout model. Available in selected regions through the [Create or Update Skillset (2024-11-01-preview)](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2024-11-01-preview&preserve-view=true), the Azure portal, and in the Azure SDK beta packages that provide this feature.|
29
30
|[**Managed identity for keyless billing to an Azure AI multiservice subdomain**](cognitive-search-attach-cognitive-services.md). | Applied AI (skills) | You can now use a managed identity and roles for a keyless connection to Azure AI services for built-in skills processing. This capability removes restrictions for having both search and AI services in the same region. Available in the [Create or Update Skillset (2024-11-01-preview)](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2024-11-01-preview&preserve-view=true), the Azure portal, and in the Azure SDK beta packages that provide this feature. |
30
31
|[**Markdown parsing mode**](search-how-to-index-markdown-blobs.md)| Indexer data source | With this parsing mode, indexers can generate one-to-one or one-to-many search documents from Markdown files in Azure Storage. Available in the [Create or Update Indexer (2024-11-01-preview)](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2024-11-01-preview&preserve-view=true), the Azure portal, and in the Azure SDK beta packages that provide this feature. |
0 commit comments