You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This will delete all indexes in Azure Cognitive Search along with their supporting infrastructure
58
-
like indexers, data sources and skillset definitions.
59
-
</p>
60
-
<p>
61
-
It will also delete all content in the <code>Chunks</code> blob container in the configured Azure Storage
62
-
account (as the chunks will be recreated from the source data, optionally with the new settings below).
63
-
</p>
64
-
<p>
65
-
However, it will <b><i>not</i></b> delete any data in the <code>Documents</code> container, so all your
66
-
previously uploaded documents will remain available and will be re-indexed after the configuration is reset.
67
-
</p>
68
-
</div>
69
-
<divclass="card mb-3">
70
-
<divclass="card-header">Options</div>
71
-
<divclass="card-body">
72
-
<divclass="mb-2">
73
-
<labelclass="form-label"for="appSettingsOverride-TextEmbedderNumTokens">Number of tokens per chunk</label>
74
-
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The source data will be split up into smaller chunks of approximately the token size you specify here. Embeddings are generated per chunk so the larger the chunks, the more likely you will hit token limits and the more likely the vector representation will be less specific (as it's generated from a larger body of content). Experiment with this value based on the type of content being chunked and the kinds of recall performance required for retrieval scenarios."><iclass="bi bi-info-circle"></i></span>
<labelclass="form-label"for="appSettingsOverride-TextEmbedderTokenOverlap">Token overlap between chunks</label>
79
-
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The number of tokens to overlap between consecutive chunks. This is useful to maintain context continuity between chunks. By including some overlapping tokens, you can ensure that a small portion of context is shared between adjacent chunks, which can help with preserving the meaning and coherence when processing the text with language models."><iclass="bi bi-info-circle"></i></span>
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The minimum number of tokens that a chunk should contain in order to be included in the <code>Chunks</code> index. This helps avoid that small chunks (with only a few words for example) have a disproportional impact on search results."><iclass="bi bi-info-circle"></i></span>
<labelclass="form-label"for="appSettingsOverride-SearchIndexerScheduleMinutes">Search indexer interval in minutes (minimum <code>5</code>)</label>
89
-
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The number of minutes between indexer executions. If you upload new documents, it can take up to this amount of time for the data to be included in the <code>Documents</code> index. It can again take the same amount of time after that for the data to be included in the <code>Chunks</code> index, as the chunks are created while the indexer for the <code>Documents</code> index runs."><iclass="bi bi-info-circle"></i></span>
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The source data will be split up into smaller chunks of approximately the token size you specify here. Embeddings are generated per chunk so the larger the chunks, the more likely you will hit token limits and the more likely the vector representation will be less specific (as it's generated from a larger body of content). Experiment with this value based on the type of content being chunked and the kinds of recall performance required for retrieval scenarios."><iclass="bi bi-info-circle"></i></span>
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The number of tokens to overlap between consecutive chunks. This is useful to maintain context continuity between chunks. By including some overlapping tokens, you can ensure that a small portion of context is shared between adjacent chunks, which can help with preserving the meaning and coherence when processing the text with language models."><iclass="bi bi-info-circle"></i></span>
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The minimum number of tokens that a chunk should contain in order to be included in the <code>Chunks</code> index. This helps avoid that small chunks (with only a few words for example) have a disproportional impact on search results."><iclass="bi bi-info-circle"></i></span>
<spanclass="info-tip"data-bs-toggle="popover"data-bs-content="The number of minutes between indexer executions. If you upload new documents, it can take up to this amount of time for the data to be included in the <code>Documents</code> index. It can again take the same amount of time after that for the data to be included in the <code>Chunks</code> index, as the chunks are created while the indexer for the <code>Documents</code> index runs."><iclass="bi bi-info-circle"></i></span>
0 commit comments