Skip to content

Commit 542e508

Browse files
committed
Fixed bug where attempting to chunk only whitespace characters would raise ValueError: not enough values to unpack (expected 2, got 0) ([ScrapeGraphAI/Scrapegraph-ai#893](ScrapeGraphAI/Scrapegraph-ai#893))
1 parent 964e880 commit 542e508

File tree

4 files changed

+17
-7
lines changed

4 files changed

+17
-7
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [3.0.4] - 2025-02-14
5+
### Fixed
6+
- Fixed bug where attempting to chunk only whitespace characters would raise `ValueError: not enough values to unpack (expected 2, got 0)` ([ScrapeGraphAI/Scrapegraph-ai#893](https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/893)).
7+
48
## [3.0.3] - 2025-02-13
59
### Fixed
610
- Fixed `isaacus/emubert` mistakenly being set to `isaacus-dev/emubert` in the README and tests.
@@ -121,6 +125,7 @@ All notable changes to `semchunk` will be documented here. This project adheres
121125
### Added
122126
- Added the `chunk()` function, which splits text into semantically meaningful chunks of a specified size as determined by a provided token counter.
123127

128+
[3.0.4]: https://github.com/isaacus-dev/semchunk/compare/v3.0.3...v3.0.4
124129
[3.0.3]: https://github.com/isaacus-dev/semchunk/compare/v3.0.2...v3.0.3
125130
[3.0.2]: https://github.com/isaacus-dev/semchunk/compare/v3.0.1...v3.0.2
126131
[3.0.1]: https://github.com/isaacus-dev/semchunk/compare/v3.0.0...v3.0.1

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "semchunk"
7-
version = "3.0.3"
7+
version = "3.0.4"
88
authors = [
99
{name="Isaacus", email="[email protected]"},
1010
{name="Umar Butler", email="[email protected]"},

src/semchunk/semchunk.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,14 @@ def chunk(
249249
# If this is the first call, remove any empty chunks as well as chunks comprised entirely of whitespace and then overlap the chunks if desired and finally return the chunks, optionally with their offsets.
250250
if is_first_call:
251251
# Remove empty chunks.
252-
chunks, offsets = (
253-
zip(*[(chunk, offset) for chunk, offset in zip(chunks, offsets) if chunk and not chunk.isspace()])
254-
if chunks
255-
else ([], [])
256-
) # NOTE `if chunks else ([], [])` ensures that we don't unpack an empty list if there's no chunks (i.e., if the provided text was empty).
257-
chunks, offsets = list(chunks), list(offsets)
252+
chunks_and_offsets = [(chunk, offset) for chunk, offset in zip(chunks, offsets) if chunk and not chunk.isspace()]
253+
254+
if chunks_and_offsets:
255+
chunks, offsets = zip(*chunks_and_offsets)
256+
chunks, offsets = list(chunks), list(offsets)
257+
258+
else:
259+
chunks, offsets = [], []
258260

259261
# Overlap chunks if desired.
260262
if overlap:

tests/test_semchunk.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,9 @@ def test_semchunk() -> None:
188188

189189
# Test chunking nothing to ensure no errors are raised.
190190
semchunk.chunk('', 512, lambda *args: 0)
191+
192+
# Test chunking whitespace to ensure no errors are raised.
193+
semchunk.chunk('\n\n', 512, lambda *args: 0)
191194

192195
if __name__ == '__main__':
193196
test_semchunk()

0 commit comments

Comments
 (0)