Skip to content

Commit 9945642

Browse files
committed
Fixed bug.
1 parent 491c74e commit 9945642

File tree

4 files changed

+10
-2
lines changed

4 files changed

+10
-2
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [3.0.1] - 2024-01-10
5+
### Fixed
6+
- Fixed a bug where attempting to chunk an empty text would raise a `ValueError`.
7+
48
## [3.0.0] - 2024-12-31
59
### Added
610
- Added an `offsets` argument to `chunk()` and `Chunker.__call__()` that specifies whether to return the start and end offsets of each chunk ([#9](https://github.com/umarbutler/semchunk/issues/9)). The argument defaults to `False`.
@@ -105,6 +109,7 @@ All notable changes to `semchunk` will be documented here. This project adheres
105109
### Added
106110
- Added the `chunk()` function, which splits text into semantically meaningful chunks of a specified size as determined by a provided token counter.
107111

112+
[3.0.1]: https://github.com/umarbutler/semchunk/compare/v3.0.0...v3.0.1
108113
[3.0.0]: https://github.com/umarbutler/semchunk/compare/v2.2.2...v3.0.0
109114
[2.2.2]: https://github.com/umarbutler/semchunk/compare/v2.2.1...v2.2.2
110115
[2.2.1]: https://github.com/umarbutler/semchunk/compare/v2.2.0...v2.2.1

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "semchunk"
7-
version = "3.0.0"
7+
version = "3.0.1"
88
authors = [
99
{name="Umar Butler", email="[email protected]"},
1010
]

src/semchunk/semchunk.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ def chunk(
183183
# If this is the first call, remove any empty chunks as well as chunks comprised entirely of whitespace and then overlap the chunks if desired and finally return the chunks, optionally with their offsets.
184184
if is_first_call:
185185
# Remove empty chunks.
186-
chunks, offsets = zip(*[(chunk, offset) for chunk, offset in zip(chunks, offsets) if chunk and not chunk.isspace()])
186+
chunks, offsets = zip(*[(chunk, offset) for chunk, offset in zip(chunks, offsets) if chunk and not chunk.isspace()]) if chunks else ([], []) # NOTE `if chunks else ([], [])` ensures that we don't unpack an empty list if there's no chunks (i.e., if the provided text was empty).
187187
chunks, offsets = list(chunks), list(offsets)
188188

189189
# Overlap chunks if desired.

tests/test_semchunk.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,9 @@ def test_semchunk() -> None:
185185
# Try enabling a progress bar.
186186
chunker([DETERMINISTIC_TEST_INPUT, DETERMINISTIC_TEST_INPUT], progress = True)
187187
chunker([DETERMINISTIC_TEST_INPUT, DETERMINISTIC_TEST_INPUT], offsets = True, progress = True)
188+
189+
# Test chunking nothing to ensure no errors are raised.
190+
semchunk.chunk('', 512, lambda *args: 0)
188191

189192
if __name__ == '__main__':
190193
test_semchunk()

0 commit comments

Comments
 (0)