Skip to content

Commit 5b3e29f

Browse files
text splitters: add chunk_size and chunk_overlap validations (#31916)
Thank you for contributing to LangChain! - [x] **PR title**: "package: description" - Where "package" is whichever of langchain, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "core: add foobar LLM" - [x] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
1 parent 0a17a62 commit 5b3e29f

File tree

2 files changed

+11
-0
lines changed

2 files changed

+11
-0
lines changed

libs/text-splitters/langchain_text_splitters/base.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,12 @@ def __init__(
4747
strip_whitespace: If `True`, strips whitespace from the start and end of
4848
every document
4949
"""
50+
if chunk_size <= 0:
51+
msg = f"chunk_size must be > 0, got {chunk_size}"
52+
raise ValueError(msg)
53+
if chunk_overlap < 0:
54+
msg = f"chunk_overlap must be >= 0, got {chunk_overlap}"
55+
raise ValueError(msg)
5056
if chunk_overlap > chunk_size:
5157
msg = (
5258
f"Got a larger chunk overlap ({chunk_overlap}) than chunk size "

libs/text-splitters/tests/unit_tests/test_text_splitters.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,11 @@ def test_character_text_splitting_args() -> None:
212212
"""Test invalid arguments."""
213213
with pytest.raises(ValueError):
214214
CharacterTextSplitter(chunk_size=2, chunk_overlap=4)
215+
for invalid_size in (0, -1):
216+
with pytest.raises(ValueError):
217+
CharacterTextSplitter(chunk_size=invalid_size)
218+
with pytest.raises(ValueError):
219+
CharacterTextSplitter(chunk_size=2, chunk_overlap=-1)
215220

216221

217222
def test_merge_splits() -> None:

0 commit comments

Comments
 (0)