Skip to content

Commit 6cd63d3

Browse files
authored
Merge pull request #202192 from HeidiSteen/heidist-support-case
[azure search] Network security docs (search-indexer-securing-resources.md)
2 parents 30b56cd + a3b10cf commit 6cd63d3

File tree

4 files changed

+93
-65
lines changed

4 files changed

+93
-65
lines changed

articles/search/TOC.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -230,12 +230,12 @@
230230
href: troubleshoot-shared-private-link-resources.md
231231
- name: Outbound connections
232232
items:
233+
- name: Connect as a trusted service
234+
href: search-indexer-howto-access-trusted-service-exception.md
233235
- name: Connect using a managed identity
234236
href: search-howto-managed-identities-data-sources.md
235-
- name: Connect over an IP range
237+
- name: Connect through a firewall
236238
href: search-indexer-howto-access-ip-restricted.md
237-
- name: Connect as a trusted service
238-
href: search-indexer-howto-access-trusted-service-exception.md
239239
- name: Connect through private endpoints
240240
href: search-indexer-howto-access-private.md
241241
- name: Document-level security

articles/search/search-indexer-howto-access-private.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ Private endpoints created through Azure Cognitive Search APIs are referred to as
3333

3434
+ Connections from the search client should be programmatic, either REST APIs or an Azure SDK, rather than through the Azure portal. The device must connect using an authorized IP in the Azure PaaS resource's firewall rules.
3535

36+
+ Indexer execution must use the private execution environment that's specific to your search service. Private endpoint connections aren't supported from the multi-tenant environment.
37+
3638
> [!NOTE]
3739
> When using Private Link for data sources, Azure portal access (from Cognitive Search to your content) - such as through the [Import data](search-import-data-portal.md) wizard - is not supported.
3840

articles/search/search-indexer-overview.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ ms.date: 06/06/2022
1515

1616
An *indexer* in Azure Cognitive Search is a crawler that extracts searchable content from cloud data sources and populates a search index using field-to-field mappings between source data and a search index. This approach is sometimes referred to as a 'pull model' because the search service pulls data in without you having to write any code that adds data to an index. Indexers also drive the [AI enrichment](cognitive-search-concept-intro.md) capabilities of Cognitive Search, integrating external processing of content en route to an index.
1717

18-
Indexers are cloud-only, with individual indexers for [supported data sources](#supported-data-sources). When configuring an indexer, you'll specify a data source (origin) and a search index (destination). Several sources, such as Azure Blob Storage, have additional configuration properties specific to that content type.
18+
Indexers are cloud-only, with individual indexers for [supported data sources](#supported-data-sources). When configuring an indexer, you'll specify a data source (origin) and a search index (destination). Several sources, such as Azure Blob Storage, have more configuration properties specific to that content type.
1919

2020
You can run indexers on demand or on a recurring data refresh schedule that runs as often as every five minutes. More frequent updates require a ['push model'](search-what-is-data-import.md) that simultaneously updates data in both Azure Cognitive Search and your external data source.
2121

2222
## Indexer scenarios and use cases
2323

24-
You can use an indexer as the sole means for data ingestion, or as part of a combination of techniques that load and optionally transform or enrich content along the way. The following table summarizes the main scenarios.
24+
You can use an indexer as the sole means for data ingestion, or in combination with other techniques. The following table summarizes the main scenarios.
2525

2626
| Scenario |Strategy |
2727
|----------|---------|
@@ -55,7 +55,7 @@ Indexers crawl data stores on Azure and outside of Azure.
5555

5656
Indexers accept flattened row sets, such as a table or view, or items in a container or folder. In most cases, it creates one search document per row, record, or item.
5757

58-
Indexer connections to remote data sources can be made using standard Internet connections (public) or encrypted private connections when you use Azure virtual networks for client apps. You can also set up connections to authenticate using a managed identity. For more information about secure connections, see [Granting access via private endpoints](search-indexer-securing-resources.md#granting-access-via-private-endpoints) and [Connect to a data source using a managed identity](search-howto-managed-identities-data-sources.md).
58+
Indexer connections to remote data sources can be made using standard Internet connections (public) or encrypted private connections when you use Azure virtual networks for client apps. You can also set up connections to authenticate using a managed identity. For more information about secure connections, see [Indexer access to content protected by Azure network security features](search-indexer-securing-resources.md) and [Connect to a data source using a managed identity](search-howto-managed-identities-data-sources.md).
5959

6060
## Stages of indexing
6161

@@ -73,7 +73,7 @@ Document cracking is the process of opening files and extracting content. Text-b
7373

7474
Depending on the data source, the indexer will try different operations to extract potentially indexable content:
7575

76-
+ When the document is a file, such as a PDF or other supported file format in [Azure Blob Storage](search-howto-indexing-azure-blob-storage.md#supported-document-formats), the indexer will open the file and extract text, images, and metadata. Indexers can also open files from [SharePoint](search-howto-index-sharepoint-online.md#supported-document-formats) and [Azure Data Lake Storage Gen2](search-howto-index-azure-data-lake-storage.md#supported-document-formats).
76+
+ When the document is a file with embedded images, such as a PDF, the indexer extracts text, images, and metadata. Indexers can open files from [Azure Blob Storage](search-howto-indexing-azure-blob-storage.md#supported-document-formats), [Azure Data Lake Storage Gen2](search-howto-index-azure-data-lake-storage.md#supported-document-formats), and [SharePoint](search-howto-index-sharepoint-online.md#supported-document-formats).
7777

7878
+ When the document is a record in [Azure SQL](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md), the indexer will extract non-binary content from each field in each record.
7979

@@ -89,15 +89,15 @@ Field mapping occurs after document cracking, but before transformations, when t
8989

9090
### Stage 3: Skillset execution
9191

92-
Skillset execution is an optional step that invokes built-in or custom AI processing. You might need it for optical character recognition (OCR) in the form of image analysis if the source data is a binary image, or you might need text translation if content is in different languages.
92+
Skillset execution is an optional step that invokes built-in or custom AI processing. Skillsets can add optical character recognition (OCR) or other forms of image analysis if the content is binary. Skillsets can also add natural language processing. For example, you can add text translation or key phrase extraction.
9393

9494
Whatever the transformation, skillset execution is where enrichment occurs. If an indexer is a pipeline, you can think of a [skillset](cognitive-search-defining-skillset.md) as a "pipeline within the pipeline".
9595

9696
### Stage 4: Output field mappings
9797

98-
If you include a skillset, you will need to [specify output field mappings](cognitive-search-output-field-mapping.md) in the indexer definition. The output of a skillset is manifested internally as a tree structure referred to as an *enriched document*. Output field mappings allow you to select which parts of this tree to map into fields in your index.
98+
If you include a skillset, you'll need to [specify output field mappings](cognitive-search-output-field-mapping.md) in the indexer definition. The output of a skillset is manifested internally as a tree structure referred to as an *enriched document*. Output field mappings allow you to select which parts of this tree to map into fields in your index.
9999

100-
Despite the similarity in names, output field mappings and field mappings build associations from different sources. Field mappings associate the content of source field to a destination field in a search index. Output field mappings associate the content of an internal enriched document (skill outputs) to destination fields in the index. Unlike field mappings, which are considered optional, you will always need to define an output field mapping for any transformed content that needs to reside in an index.
100+
Despite the similarity in names, output field mappings and field mappings build associations from different sources. Field mappings associate the content of source field to a destination field in a search index. Output field mappings associate the content of an internal enriched document (skill outputs) to destination fields in the index. Unlike field mappings, which are considered optional, an output field mapping is required for any transformed content that should be in the index.
101101

102102
The next image shows a sample indexer [debug session](cognitive-search-debug-session.md) representation of the indexer stages: document cracking, field mappings, skillset execution, and output field mappings.
103103

@@ -111,11 +111,11 @@ Indexers can offer features that are unique to the data source. In this respect,
111111

112112
Indexers require a *data source* object that provides a connection string and possibly credentials. Call the [Create Data Source (REST)](/rest/api/searchservice/create-data-source) or [SearchIndexerDataSourceConnection class](/dotnet/api/azure.search.documents.indexes.models.searchindexerdatasourceconnection) to create the resource.
113113

114-
Data sources are configured and managed independently of the indexers that use them, which means a data source can be used by multiple indexers to load more than one index at a time.
114+
Data sources are independent objects. Multiple indexers can use the same data source object to load more than one index at a time.
115115

116116
### Step 2: Create an index
117117

118-
An indexer will automate some tasks related to data ingestion, but creating an index is generally not one of them. As a prerequisite, you must have a predefined index with fields that match those in your external data source. Fields need to match by name and data type. If not, you can [define field mappings](search-indexer-field-mappings.md) to establish the association. For more information about structuring an index, see [Create an Index (REST)](/rest/api/searchservice/Create-Index) or [SearchIndex class](/dotnet/api/azure.search.documents.indexes.models.searchindex).
118+
An indexer will automate some tasks related to data ingestion, but creating an index is generally not one of them. As a prerequisite, you must have a predefined index that contains corresponding target fields for any source fields in your external data source. Fields need to match by name and data type. If not, you can [define field mappings](search-indexer-field-mappings.md) to establish the association. For more information about structuring an index, see [Create an Index (REST)](/rest/api/searchservice/Create-Index) or [SearchIndex class](/dotnet/api/azure.search.documents.indexes.models.searchindex).
119119

120120
> [!Tip]
121121
> Although indexers cannot generate an index for you, the **Import data** wizard in the portal can help. In most cases, the wizard can infer an index schema from existing metadata in the source, presenting a preliminary index schema which you can edit in-line while the wizard is active. Once the index is created on the service, further edits in the portal are mostly limited to adding new fields. Consider the wizard for creating, but not revising, an index. For hands-on learning, step through the [portal walkthrough](search-get-started-portal.md).
@@ -124,9 +124,9 @@ An indexer will automate some tasks related to data ingestion, but creating an i
124124

125125
By default, the first indexer execution occurs when you [create an indexer](/rest/api/searchservice/Create-Indexer) on the search service. You can set the "disabled" property in an indexer to create it without running it.
126126

127-
During indexer execution is when you'll find out if the data source is accessible or the skillset is valid. Until indexer execution starts, dependent objects such as data sources and skillsets are inactive on the search service.
127+
Any errors or warnings about data access or skillset validation will occur during indexer execution. Until indexer execution starts, dependent objects such as data sources and skillsets are passive on the search service.
128128

129-
After the first indexer run, you can re-run it on demand using [Run Indexer](/rest/api/searchservice/run-indexer), or you can [define a recurring schedule](search-howto-schedule-indexers.md).
129+
After the first indexer run, you can rerun it on demand using [Run Indexer](/rest/api/searchservice/run-indexer), or you can [define a recurring schedule](search-howto-schedule-indexers.md).
130130

131131
You can monitor [indexer status in the portal](search-howto-monitor-indexers.md) or through [Get Indexer Status API](/rest/api/searchservice/get-indexer-status). You should also [run queries on the index](search-query-create.md) to verify the result is what you expected.
132132

0 commit comments

Comments
 (0)