Removed unnecessary ValueError.

umarbutler · umarbutler · commit f7b5b7b40fe8 · 2024-12-31T15:16:56.000+11:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,6 @@ All notable changes to `semchunk` will be documented here. This project adheres
 ### Added
 - Added an `offsets` argument to `chunk()` and `Chunker.__call__()` that specifies whether to return the start and end offsets of each chunk ([#9](https://github.com/umarbutler/semchunk/issues/9)). The argument defaults to `False`.
 - Added an `overlap` argument to `chunk()` and `Chunker.__call__()` that specifies the proportion of the chunk size, or, if >=1, the number of tokens, by which chunks should overlap ([#1](https://github.com/umarbutler/semchunk/issues/1)). The argument defaults to `None`, in which case no overlapping occurs.
-- Began raising a `ValueError` where the `chunk_size` is smaller than the number of tokens in an empty string (i.e, where the token counter adds special tokens to every input).
 - Added an undocumented, private `_make_chunk_function()` method to the `Chunker` class that constructs chunking functions with call-level arguments passed.
 - Added more unit tests for new features as well as for multiple token counters and for ensuring there are no chunks comprised entirely of whitespace characters.
 
diff --git a/src/semchunk/semchunk.py b/src/semchunk/semchunk.py
@@ -153,9 +153,6 @@ def chunk(
         if token_counter(split) > local_chunk_size:
             new_chunks, new_offsets = chunk(text = split, chunk_size = local_chunk_size, token_counter = token_counter, offsets = return_offsets, _recursion_depth = _recursion_depth + 1, _start = split_start)
             
-            if not new_chunks:
-                raise ValueError(f"`semchunk` was given a `chunk_size` smaller than the number of tokens in an empty string ({token_counter(''):,}) (or was given an `overlap` so high that the effective chunk size became smaller than the number of tokens in an empty string). Try increasing the `chunk_size` to >={token_counter('') + 2:,} or decreasing `overlap`.")
-            
             chunks.extend(new_chunks)
             offsets.extend(new_offsets)