Skip to content

Commit d305ebd

Browse files
committed
docs: update docs for SplitRecursively.
1 parent 37484bc commit d305ebd

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

docs/docs/ops/functions.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,17 @@ Input data:
2626

2727
* `text` (type: `str`, required): The text to split.
2828
* `chunk_size` (type: `int`, required): The maximum size of each chunk, in bytes.
29+
* `min_chunk_size` (type: `int`, optional): The minimum size of each chunk, in bytes. If not provided, default to `chunk_size / 2`.
30+
31+
:::note
32+
33+
`SplitRecursively` will do its best to make the output chunks sized between `min_chunk_size` and `chunk_size`.
34+
However, it's possible that some chunks are smaller than `min_chunk_size` or larger than `chunk_size` in rare cases, e.g. too short input text, or non-splittable large text.
35+
36+
Please avoid setting `min_chunk_size` to a value too close to `chunk_size`, to leave more rooms for the function to plan the optimal chunking.
37+
38+
:::
39+
2940
* `chunk_overlap` (type: `int`, optional): The maximum overlap size between adjacent chunks, in bytes.
3041
* `language` (type: `str`, optional): The language of the document.
3142
Can be a langauge name (e.g. `Python`, `Javascript`, `Markdown`) or a file extension (e.g. `.py`, `.js`, `.md`).

0 commit comments

Comments
 (0)