File tree Expand file tree Collapse file tree 2 files changed +35
-0
lines changed Expand file tree Collapse file tree 2 files changed +35
-0
lines changed Original file line number Diff line number Diff line change @@ -4,6 +4,7 @@ All notable changes to `semchunk` will be documented here. This project adheres
44## [ 0.1.1] - 2023-11-07
55### Added
66- Added new test samples.
7+ - Added benchmarks.
78
89### Changed
910- Improved chunking performance.
Original file line number Diff line number Diff line change 1+ import semchunk
2+ import semantic_text_splitter
3+ import test_semchunk
4+ import time
5+
6+ chunk_size = 512
7+ semantic_text_splitter_chunker = semantic_text_splitter .TiktokenTextSplitter ('gpt-4' )
8+
9+ def bench_semchunk (text : str ) -> None :
10+ semchunk .chunk (text , chunk_size = chunk_size , token_counter = test_semchunk ._token_counter )
11+
12+ def bench_semantic_text_splitter (text : str ) -> None :
13+ semantic_text_splitter_chunker .chunks (text , chunk_size )
14+
15+ libraries = {
16+ 'semchunk' : bench_semchunk ,
17+ #'semantic_text_splitter': bench_semantic_text_splitter,
18+ }
19+
20+ def bench () -> dict [str , float ]:
21+ benchmarks = dict .fromkeys (libraries .keys (), 0 )
22+
23+ for fileid in test_semchunk .gutenberg .fileids ():
24+ sample = test_semchunk .gutenberg .raw (fileid )
25+ for library , function in libraries .items ():
26+ start = time .time ()
27+ function (sample )
28+ benchmarks [library ] += time .time () - start
29+
30+ return benchmarks
31+
32+ if __name__ == '__main__' :
33+ for library , time_taken in bench ().items ():
34+ print (f'{ library } : { time_taken :.2f} s' )
You can’t perform that action at this time.
0 commit comments