Skip to content

Commit f7b5b7b

Browse files
committed
Removed unnecessary ValueError.
1 parent 25cf665 commit f7b5b7b

File tree

2 files changed

+0
-4
lines changed

2 files changed

+0
-4
lines changed

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ All notable changes to `semchunk` will be documented here. This project adheres
55
### Added
66
- Added an `offsets` argument to `chunk()` and `Chunker.__call__()` that specifies whether to return the start and end offsets of each chunk ([#9](https://github.com/umarbutler/semchunk/issues/9)). The argument defaults to `False`.
77
- Added an `overlap` argument to `chunk()` and `Chunker.__call__()` that specifies the proportion of the chunk size, or, if >=1, the number of tokens, by which chunks should overlap ([#1](https://github.com/umarbutler/semchunk/issues/1)). The argument defaults to `None`, in which case no overlapping occurs.
8-
- Began raising a `ValueError` where the `chunk_size` is smaller than the number of tokens in an empty string (i.e, where the token counter adds special tokens to every input).
98
- Added an undocumented, private `_make_chunk_function()` method to the `Chunker` class that constructs chunking functions with call-level arguments passed.
109
- Added more unit tests for new features as well as for multiple token counters and for ensuring there are no chunks comprised entirely of whitespace characters.
1110

src/semchunk/semchunk.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,6 @@ def chunk(
153153
if token_counter(split) > local_chunk_size:
154154
new_chunks, new_offsets = chunk(text = split, chunk_size = local_chunk_size, token_counter = token_counter, offsets = return_offsets, _recursion_depth = _recursion_depth + 1, _start = split_start)
155155

156-
if not new_chunks:
157-
raise ValueError(f"`semchunk` was given a `chunk_size` smaller than the number of tokens in an empty string ({token_counter(''):,}) (or was given an `overlap` so high that the effective chunk size became smaller than the number of tokens in an empty string). Try increasing the `chunk_size` to >={token_counter('') + 2:,} or decreasing `overlap`.")
158-
159156
chunks.extend(new_chunks)
160157
offsets.extend(new_offsets)
161158

0 commit comments

Comments
 (0)