You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/ops/functions.md
+43-4Lines changed: 43 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ The spec takes the following fields:
31
31
32
32
*`separators_regex` (`list[str]`): A list of regex patterns to split the text.
33
33
Higher-level boundaries should come first, and lower-level should be listed later. e.g. `[r"\n# ", r"\n## ", r"\n\n", r"\. "]`.
34
-
See [regex Syntax](https://docs.rs/regex/latest/regex/#syntax) for supported regular expression syntax.
34
+
See [regex syntax](https://docs.rs/regex/latest/regex/#syntax) for supported regular expression syntax.
35
35
36
36
Input data:
37
37
@@ -57,9 +57,12 @@ Input data:
57
57
58
58
We use the `language` field to determine how to split the input text, following these rules:
59
59
60
-
* We'll match the input `language` field against the `language_name` or `aliases` of each element of `custom_languages`, and use the matched one. If value of `language` is null, it'll be treated as empty string when matching `language_name` or `aliases`.
61
-
* If no match is found, we'll match the `language` field against the builtin language configurations.
62
-
For all supported builtin language names and aliases (extensions), see [the code](https://github.com/search?q=org%3Acocoindex-io+lang%3Arust++%22static+TREE_SITTER_LANGUAGE_BY_LANG%22&type=code).
60
+
* We match the input `language` field against the following registries in the following order:
61
+
* `custom_languages` in the spec, against the `language_name` or `aliases` field of each entry.
62
+
* Builtin languages (see [Supported Languages](#supported-languages) section below), against the language, aliases or file extensions of each entry.
63
+
64
+
All matches are in a case-insensitive manner. If the value of `language` is null, it'll be treated as empty string.
65
+
63
66
* If no match is found, the input will be treated as plain text.
64
67
65
68
:::
@@ -73,6 +76,42 @@ Return: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, w
73
76
*`line` (*Int64*): The line number of the position. Starting from 1.
74
77
*`column` (*Int64*): The column number of the position. Starting from 1.
75
78
79
+
### Supported Languages
80
+
81
+
Currently, `SplitRecursively` supports the following languages:
0 commit comments