Skip to content

Commit 65298f9

Browse files
committed
Better documented the progress argument.
1 parent 525890b commit 65298f9

File tree

3 files changed

+6
-2
lines changed

3 files changed

+6
-2
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [1.0.1] - 2024-06-02
5+
### Fixed
6+
- Documented the `progress` argument in the docstring for `chunkerify()` and in its type hints in the README.
7+
48
## [1.0.0] - 2024-06-02
59
### Added
610
- Added a `progress` argument to the chunker returned by `chunkerify()` that, when set to `True` and multiple texts are passed, displays a progress bar.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def chunkerify(
4646
chunk_size: int = None,
4747
max_token_chars: int = None,
4848
memoize: bool = True,
49-
) -> Callable[[str | Sequence[str]], list[str] | list[list[str]]]:
49+
) -> Callable[[str | Sequence[str], bool], list[str] | list[list[str]]]:
5050
```
5151

5252
`chunkerify()` constructs a chunker that splits one or more texts into semantically meaningful chunks of a specified size as determined by the provided tokenizer or token counter.

src/semchunk/semchunk.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ def chunkerify(
160160
memoize (bool, optional): Whether to memoize the token counter. Defaults to `True`.
161161
162162
Returns:
163-
Callable[[str | Sequence[str]], list[str] | list[list[str]]]: A function that takes either a single text or a sequence of texts and returns, if a single text has been provided, a list of chunks up to `chunk_size`-tokens-long with any whitespace used to split the text removed, or, if multiple texts have been provided, a list of lists of chunks, with each inner list corresponding to the chunks of one of the provided input texts."""
163+
Callable[[str | Sequence[str], bool], list[str] | list[list[str]]]: A function that takes either a single text or a sequence of texts and returns, if a single text has been provided, a list of chunks up to `chunk_size`-tokens-long with any whitespace used to split the text removed, or, if multiple texts have been provided, a list of lists of chunks, with each inner list corresponding to the chunks of one of the provided input texts. The function can also be passed a `progress` argument which if set to `True` and multiple texts are passed, will display a progress bar."""
164164

165165
# If the provided tokenizer is a string, try to load it with either `tiktoken` or `transformers` or raise an error if neither is available.
166166
if isinstance(tokenizer_or_token_counter, str):

0 commit comments

Comments
 (0)