Skip to content

Commit 601dd35

Browse files
committed
fixed formatting and link bugs
1 parent c50d2bd commit 601dd35

File tree

1 file changed

+34
-24
lines changed

1 file changed

+34
-24
lines changed

articles/search/search-how-to-index-markdown-blobs.md

Lines changed: 34 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,18 @@ ms.date: 11/19/2024
1717

1818
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1919

20-
**Applies to**: [Blob indexers](search-howto-indexing-azure-blob-storage.md), [OneLake indexers](search-how-to-index-onelake-files.md), [File indexers](search-file-storage-integration.md)
21-
2220
In Azure AI Search, indexers for Azure Blob Storage, Azure Files, and OneLake support a `markdown` parsing mode for Markdown files. Markdown files can be indexed in two ways:
2321

2422
+ One-to-many parsing mode, creating multiple search documents per Markdown file
2523
+ One-to-one parsing mode, creating one search document per Markdown file
2624

2725
## Prerequisites
2826

29-
+ A supported data source. For OneLake, make sure you meet all of the requirements of the [OneLake indexer](search-how-to-index-onelake-files#prerequisites). Azure Storage is a standard performance (general-purpose v2) instance that supports hot, cool, and cold access tiers.
27+
+ A supported data source: Azure Blob storage, Azure File storage, OneLake in Microsoft Fabric.
28+
29+
For OneLake, make sure you meet all of the requirements of the [OneLake indexer](search-how-to-index-onelake-files.md#prerequisites).
30+
31+
Azure Storage for [blob indexers](search-howto-indexing-azure-blob-storage.md#prerequisites) and [file indexers](search-file-storage-integration.md#prerequisites) is a standard performance (general-purpose v2) instance that supports hot, cool, and cold access tiers.
3032

3133
## Markdown parsing mode parameters
3234

@@ -109,7 +111,7 @@ The one-to-many parsing mode parses Markdown files into multiple search document
109111

110112
- `ordinal_position`: An integer value indicating the position of the section within the document hierarchy. This field is used for ordering the sections in their original sequence as they appear in the document, beginning with an ordinal position of 1 and incrementing sequentially for each header.
111113

112-
### Index schema for one-to-many parsed Markdown files
114+
### Index schema for one-to-many parsing
113115

114116
An example index configuration might look something like this:
115117
```http
@@ -145,6 +147,8 @@ An example index configuration might look something like this:
145147
}
146148
```
147149

150+
### Indexer definition for one-to-many parsing
151+
148152
If field names and data types align, the blob indexer can infer the mapping without an explicit field mapping present in the request, so an indexer configuration corresponding to the provided index configuration might look like this:
149153

150154
```http
@@ -165,7 +169,9 @@ api-key: [admin key]
165169
> [!NOTE]
166170
> The `submode` does not need to be set explicitly here because `oneToMany` is the default.
167171
168-
This Markdown file would result in three search documents after indexing, due to the three content sections. The search document resulting from the first content section of the provided Markdown document would contain the following values for `content`, `sections`, `h1`, and `h2`:
172+
### Indexer output for one-to-many parsing
173+
174+
This Markdown file would result in three search documents after indexing, due to the three content sections. The search document resulting from the first content section of the provided Markdown document would contain the following values for `content`, `sections`, `h1`, and `h2`:
169175

170176
```http
171177
{
@@ -270,7 +276,7 @@ Content for subsection 1.1.
270276
Content for section 2.
271277
```
272278

273-
### Index schema for one-to-one parsed Markdown files
279+
### Index schema for one-to-one parsing
274280

275281
If you aren't utilizing field mappings, the shape of the index should reflect the shape of the Markdown content. Given the structure of sample Markdown with its two sections and single subsection, the index should look similar to the following example:
276282
```http
@@ -325,6 +331,28 @@ If you aren't utilizing field mappings, the shape of the index should reflect th
325331
}
326332
```
327333

334+
### Indexer definition for one-to-one parsing
335+
336+
```http
337+
POST https://[service name].search.windows.net/indexers?api-version=2024-11-01-preview
338+
Content-Type: application/json
339+
api-key: [admin key]
340+
341+
{
342+
"name": "my-markdown-indexer",
343+
"dataSourceName": "my-blob-datasource",
344+
"targetIndexName": "my-target-index",
345+
"parameters": {
346+
"configuration": {
347+
"parsingMode": "markdown",
348+
"markdownParsingSubmode": "oneToMany",
349+
}
350+
}
351+
}
352+
```
353+
354+
### Indexer output for one-to-one parsing
355+
328356
Because the Markdown we want to index only goes to a depth of `h2` ("##"), we need `sections` fields nested to a depth of 2 to match that. This configuration would result in the following data in the index:
329357

330358
```http
@@ -357,24 +385,6 @@ As you can see, the ordinal position increments based on the location of the con
357385

358386
It should also be noted that if header levels are skipped in the content, then structure of the resulting document reflects the headers that are present in the Markdown content, not necessarily containing nested sections for `h1` through `h6` consecutively. For example, when the document begins at `h2`, then the first element in the top-level sections array is `h2`.
359387

360-
```http
361-
POST https://[service name].search.windows.net/indexers?api-version=2024-11-01-preview
362-
Content-Type: application/json
363-
api-key: [admin key]
364-
365-
{
366-
"name": "my-markdown-indexer",
367-
"dataSourceName": "my-blob-datasource",
368-
"targetIndexName": "my-target-index",
369-
"parameters": {
370-
"configuration": {
371-
"parsingMode": "markdown",
372-
"markdownParsingSubmode": "oneToMany",
373-
}
374-
}
375-
}
376-
```
377-
378388
## Map one-to-one fields to search fields
379389

380390
If you would like to extract fields with custom names from the document, you can use field mappings to do so. Using the same Markdown sample as before, consider the following index configuration:

0 commit comments

Comments
 (0)