Skip to content

Commit 1b17adf

Browse files
authored
Merge pull request #94264 from Careyjmac/splitSkillUpdate
Add clarification for split skill with lang detect
2 parents 405cf51 + 15ee6dd commit 1b17adf

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/search/cognitive-search-skill-textsplit.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,15 @@ Parameters are case-sensitive.
2929
|--------------------|-------------|
3030
| textSplitMode | Either "pages" or "sentences" |
3131
| maximumPageLength | If textSplitMode is set to "pages", this refers to the maximum page length as measured by `String.Length`. The minimum value is 100. If the textSplitMode is set to "pages", the algorithm will try to split the text into chunks that are at most "maximumPageLength" in size. In this case, the algorithm will do its best to break the sentence on a sentence boundary, so the size of the chunk may be slightly less than "maximumPageLength". |
32-
| defaultLanguageCode | (optional) One of the following language codes: `da, de, en, es, fi, fr, it, ko, pt`. Default is English (en). Few things to consider:<ul><li>If you pass a languagecode-countrycode format, only the languagecode part of the format is used.</li><li>If the language is not in the previous list, the split skill breaks the text at character boundaries.</li><li>Providing a language code is useful to avoid cutting a word in half for non-space languages such as Chinese, Japanese, and Korean.</li></ul> |
32+
| defaultLanguageCode | (optional) One of the following language codes: `da, de, en, es, fi, fr, it, ko, pt`. Default is English (en). Few things to consider:<ul><li>If you pass a languagecode-countrycode format, only the languagecode part of the format is used.</li><li>If the language is not in the previous list, the split skill breaks the text at character boundaries.</li><li>Providing a language code is useful to avoid cutting a word in half for non-whitespace languages such as Chinese, Japanese, and Korean.</li><li>If you do not know the language (i.e. you need to split the text for input into the [LanguageDetectionSkill](cognitive-search-skill-language-detection.md)), the default of English (en) should be sufficient. </li></ul> |
3333

3434

3535
## Skill Inputs
3636

3737
| Parameter name | Description |
3838
|----------------------|------------------|
3939
| text | The text to split into substring. |
40-
| languageCode | (Optional) Language code for the document. |
40+
| languageCode | (Optional) Language code for the document. If you do not know the language (i.e. you need to split the text for input into the [LanguageDetectionSkill](cognitive-search-skill-language-detection.md)), it is safe to remove this input. |
4141

4242
## Skill Outputs
4343

0 commit comments

Comments
 (0)