Skip to content

Commit a58a7ff

Browse files
committed
Fixed links in the README.
1 parent 4dfbdd5 commit a58a7ff

File tree

3 files changed

+8
-4
lines changed

3 files changed

+8
-4
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Changelog 🔄
22
All notable changes to `semchunk` will be documented here. This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33

4+
## [0.1.1] - 2023-11-07
5+
### Fixed
6+
- Fixed links in the README.
7+
48
## [0.1.1] - 2023-11-07
59
### Added
610
- Added new test samples.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
`semchunk` is a fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
55

6-
Owing to its complex yet highly efficient chunking algorithm, `semchunk` is both more semantically accurate than [`langchain.text_splitter.RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) (see [How It Works 🔍](#how-it-works-)) and is also over 60% faster than [`semantic-text-splitter`](https://pypi.org/project/semantic-text-splitter/) (see the [Benchmarks 📊](#benchmarks-)).
6+
Owing to its complex yet highly efficient chunking algorithm, `semchunk` is both more semantically accurate than [`langchain.text_splitter.RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) (see [How It Works 🔍](https://github.com/umarbutler/semchunk#how-it-works-)) and is also over 60% faster than [`semantic-text-splitter`](https://pypi.org/project/semantic-text-splitter/) (see the [Benchmarks 📊](https://github.com/umarbutler/semchunk#benchmarks-)).
77

88
## Installation 📦
99
`semchunk` may be installed with `pip`:
@@ -63,7 +63,7 @@ To ensure that chunks are as semantically meaningful as possible, `semchunk` use
6363
## Benchmarks 📊
6464
On a desktop with a Ryzen 3600, 64 GB of RAM, Windows 11 and Python 3.12.0, it takes `semchunk` 35.75 seconds to split every sample in [NLTK's Gutenberg Corpus](https://www.nltk.org/howto/corpus.html#plaintext-corpora) into 512-token-long chunks (for context, the Corpus contains 18 texts and 3,001,260 tokens). By comparison, it takes [`semantic-text-splitter`](https://pypi.org/project/semantic-text-splitter/) 1 minute and 50.5 seconds to chunk the same texts into 512-token-long chunks — a difference of 67.65%.
6565

66-
The code used to benchmark `semchunk` and `semantic-text-splitter` is available [here](tests/bench.py).
66+
The code used to benchmark `semchunk` and `semantic-text-splitter` is available [here](https://github.com/umarbutler/semchunk/blob/main/tests/bench.py).
6767

6868
## Licence 📄
69-
This library is licensed under the [MIT License](https://github.com/umarbutler/semchunk/blob/main/LICENSE).
69+
This library is licensed under the [MIT License](https://github.com/umarbutler/semchunk/blob/main/LICENCE).

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "semchunk"
7-
version = "0.1.1"
7+
version = "0.1.2"
88
authors = [
99
{name="Umar Butler", email="[email protected]"},
1010
]

0 commit comments

Comments
 (0)