Skip to content

Commit d2c733f

Browse files
committed
Updated links in README.
1 parent 10c8dd2 commit d2c733f

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
<div align='center'>
22

33
# semchunk 🧩
4-
<a href="https://pypi.org/project/semchunk/" alt="PyPI Version"><img src="https://img.shields.io/pypi/v/semchunk"></a> <a href="https://github.com/umarbutler/semchunk/actions/workflows/ci.yml" alt="Build Status"><img src="https://img.shields.io/github/actions/workflow/status/umarbutler/semchunk/ci.yml?branch=main"></a> <a href="https://app.codecov.io/gh/umarbutler/semchunk" alt="Code Coverage"><img src="https://img.shields.io/codecov/c/github/umarbutler/semchunk"></a> <a href="https://pypistats.org/packages/semchunk" alt="Downloads"><img src="https://img.shields.io/pypi/dm/semchunk"></a>
4+
<a href="https://pypi.org/project/semchunk/" alt="PyPI Version"><img src="https://img.shields.io/pypi/v/semchunk"></a> <a href="https://github.com/isaacus-dev/semchunk/actions/workflows/ci.yml" alt="Build Status"><img src="https://img.shields.io/github/actions/workflow/status/isaacus-dev/semchunk/ci.yml?branch=main"></a> <a href="https://app.codecov.io/gh/isaacus-dev/semchunk" alt="Code Coverage"><img src="https://img.shields.io/codecov/c/github/isaacus-dev/semchunk"></a> <a href="https://pypistats.org/packages/semchunk" alt="Downloads"><img src="https://img.shields.io/pypi/dm/semchunk"></a>
55

66
</div>
77

88
`semchunk` is a fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
99

1010
It has built-in support for tokenizers from OpenAI's `tiktoken` and Hugging Face's `transformers` and `tokenizers` libraries, in addition to supporting custom tokenizers and token counters. It can also overlap chunks as well as return their offsets.
1111

12-
Powered by an efficient yet highly accurate chunking algorithm ([How It Works 🔍](https://github.com/umarbutler/semchunk#how-it-works-)), `semchunk` produces chunks that are more semantically meaningful than regular token and recursive character chunkers like `langchain`'s `RecursiveCharacterTextSplitter`, while also being 80% faster than its closest alternative, `semantic-text-splitter` ([Benchmarks 📊](https://github.com/umarbutler/semchunk#benchmarks-)).
12+
Powered by an efficient yet highly accurate chunking algorithm ([How It Works 🔍](https://github.com/isaacus-dev/semchunk#how-it-works-)), `semchunk` produces chunks that are more semantically meaningful than regular token and recursive character chunkers like `langchain`'s `RecursiveCharacterTextSplitter`, while also being 80% faster than its closest alternative, `semantic-text-splitter` ([Benchmarks 📊](https://github.com/isaacus-dev/semchunk#benchmarks-)).
1313

1414
## Installation 📦
1515
`semchunk` can be installed with `pip`:
@@ -147,7 +147,7 @@ If overlapping chunks have been requested, `semchunk` also:
147147
## Benchmarks 📊
148148
On a desktop with a Ryzen 9 7900X, 96 GB of DDR5 5600MHz CL40 RAM, Windows 11 and Python 3.12.4, it takes `semchunk` 2.96 seconds to split every sample in [NLTK's Gutenberg Corpus](https://www.nltk.org/howto/corpus.html#plaintext-corpora) into 512-token-long chunks with GPT-4's tokenizer (for context, the Corpus contains 18 texts and 3,001,260 tokens). By comparison, it takes [`semantic-text-splitter`](https://pypi.org/project/semantic-text-splitter/) (with multiprocessing) 23.28 seconds to chunk the same texts into 512-token-long chunks — a difference of 87.28%.
149149

150-
The code used to benchmark `semchunk` and `semantic-text-splitter` is available [here](https://github.com/umarbutler/semchunk/blob/main/tests/bench.py).
150+
The code used to benchmark `semchunk` and `semantic-text-splitter` is available [here](https://github.com/isaacus-dev/semchunk/blob/main/tests/bench.py).
151151

152152
## Licence 📄
153-
This library is licensed under the [MIT License](https://github.com/umarbutler/semchunk/blob/main/LICENCE).
153+
This library is licensed under the [MIT License](https://github.com/isaacus-dev/semchunk/blob/main/LICENCE).

0 commit comments

Comments
 (0)