Skip to content

Commit c560a18

Browse files
docs(README): add section on speed & accuracy
1 parent b09bc6d commit c560a18

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
## Developer friendly NLP ✨
66
[<img align="right" src="https://decisively.github.io/wink-logos/logo-title.png" width="100px" >](https://winkjs.org/)
77

8-
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **500,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
8+
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **600,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
99

1010
## Features
1111
It packs a rich feature set into a small foot print codebase of [under 1500 lines](https://coveralls.io/github/winkjs/wink-nlp?branch=master):
@@ -74,6 +74,14 @@ console.log( doc.tokens().out( its.type, as.freqTable ) );
7474
// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]
7575
```
7676

77+
## Speed & Accuracy
78+
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **>600,000 tokens per second** with its default language model — [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The benchmark covered the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This is way ahead of the prevailing speed benchmarks.
79+
80+
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The current state-of-the-art is at ~97% accuracy but at far lower speeds.
81+
82+
Its general purpose sentiment analysis delivers a [f-score](https://en.wikipedia.org/wiki/F1_score) of **~84.5%**, when validated using Amazon Product Review [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/) at [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). The current accuracy for a **specifically trained** models can range around 95%.
83+
84+
winkNLP delivers this performance with the minimal load on RAM. For example, it processes the entire [History of India Volume I](https://en.wikisource.org/wiki/History_of_India/Volume_1) with a peak memory requirement of under **80MB**. The book has around 350 pages which translates to over 125,000 tokens.
7785

7886
## Documentation
7987
- [Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.

0 commit comments

Comments
 (0)