Skip to content

Commit 33c429f

Browse files
docs(README): update speed figures to min of v14,12 & 10 tests
1 parent 090dd67 commit 33c429f

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
## Developer friendly NLP ✨
66
[<img align="right" src="https://decisively.github.io/wink-logos/logo-title.png" width="100px" >](https://winkjs.org/)
77

8-
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **600,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
8+
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **525,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
99

1010
## Features
1111
It packs a rich feature set into a small foot print codebase of [under 1500 lines](https://coveralls.io/github/winkjs/wink-nlp?branch=master):
@@ -17,7 +17,7 @@ It packs a rich feature set into a small foot print codebase of [under 1500 line
1717
5. Extensive text pre-processing features
1818
6. Pre-trained models with sizes starting from <3MB onwards
1919
7. Word vector integration
20-
8. Comprehensive NLP pipeline covering tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, named entity extraction, custom entities detection and pattern matching
20+
8. Comprehensive NLP pipeline covering tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity extraction, custom entities detection and pattern matching
2121
9. No external dependencies.
2222

2323

@@ -75,13 +75,16 @@ console.log( doc.tokens().out( its.type, as.freqTable ) );
7575
```
7676

7777
## Speed & Accuracy
78-
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **>600,000 tokens per second** with its default language model — [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The benchmark covered the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This is way ahead of the prevailing speed benchmarks.
78+
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **~525,000 tokens per second** with its default language model — [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
79+
80+
The benchmark was conducted on [Node.js versions 14.8.0, 12.18.3 and 10.22.0](https://nodejs.org/en/about/releases/).
7981

8082
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The current state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
8183

8284
Its general purpose sentiment analysis delivers a [f-score](https://en.wikipedia.org/wiki/F1_score) of **~84.5%**, when validated using Amazon Product Review [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/) at [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). The current benchmark accuracy for **specifically trained** models can range around 95%.
8385

84-
winkNLP delivers this performance with the minimal load on RAM. For example, it processes the entire [History of India Volume I](https://en.wikisource.org/wiki/History_of_India/Volume_1) with a peak memory requirement of under **80MB**. The book has around 350 pages which translates to over 125,000 tokens.
86+
## Memory Requirement
87+
Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire [History of India Volume I](https://en.wikisource.org/wiki/History_of_India/Volume_1) with a total peak memory requirement of under **80MB**. The book has around 350 pages which translates to over 125,000 tokens.
8588

8689
## Documentation
8790
- [Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.

0 commit comments

Comments
 (0)