Skip to content

Commit 3b35b1e

Browse files
committed
Fixed typo in error messages in chunkerify().
1 parent 3d1f7f7 commit 3b35b1e

File tree

3 files changed

+7
-3
lines changed

3 files changed

+7
-3
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [0.3.1] - 2024-05-18
5+
### Fixed
6+
- Fixed typo in error messages in `chunkerify()` where it was referred to as `make_chunker()`.
7+
48
## [0.3.0] - 2024-05-18
59
### Added
610
- Introduced the `chunkerify()` function, which constructs a chunker from a tokenizer or token counter that can be reused and can also chunk multiple texts in a single call. The resulting chunker speeds up chunking by 40.4% thanks, in large part, to a token counter that avoid having to count the number of tokens in a text when the number of characters in the text exceed a certain threshold, courtesy of [@R0bk](https://github.com/R0bk) ([#3](https://github.com/umarbutler/semchunk/pull/3)) ([337a186](https://github.com/umarbutler/semchunk/pull/3/commits/337a18615f991076b076262288b0408cb162b48c)).

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "semchunk"
7-
version = "0.3.0"
7+
version = "0.3.1"
88
authors = [
99
{name="Umar Butler", email="[email protected]"},
1010
]

src/semchunk/semchunk.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ def chunkerify(
178178
tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_or_token_counter)
179179

180180
except Exception:
181-
raise ValueError(f'"{tokenizer_or_token_counter}" was provided to `semchunk.make_chunker` as the name of a tokenizer but neither `tiktoken` nor `transformers` have a tokenizer by that name. Perhaps they are not installed or maybe there is a typo in that name?')
181+
raise ValueError(f'"{tokenizer_or_token_counter}" was provided to `semchunk.chunkerify` as the name of a tokenizer but neither `tiktoken` nor `transformers` have a tokenizer by that name. Perhaps they are not installed or maybe there is a typo in that name?')
182182

183183
tokenizer_or_token_counter = tokenizer
184184

@@ -206,7 +206,7 @@ def chunkerify(
206206
chunk_size -= len(tokenizer_or_token_counter.encode(''))
207207

208208
else:
209-
raise ValueError("Your desired chunk size was not passed to `semchunk.make_chunker` and the provided tokenizer either lacks an attribute named 'model_max_length' or that attribute is not an integer. Either specify a chunk size or provide a tokenizer that has a 'model_max_length' attribute that is an integer.")
209+
raise ValueError("Your desired chunk size was not passed to `semchunk.chunkerify` and the provided tokenizer either lacks an attribute named 'model_max_length' or that attribute is not an integer. Either specify a chunk size or provide a tokenizer that has a 'model_max_length' attribute that is an integer.")
210210

211211
# If we have been given a tokenizer, construct a token counter from it.
212212
if hasattr(tokenizer_or_token_counter, 'encode'):

0 commit comments

Comments
 (0)