You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions**easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **525,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
8
+
WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications**easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy.
It is built ground up with a lean code base that has [no external dependency](https://snyk.io/test/github/winkjs/wink-nlp?tab=dependencies). A test coverage of [~100%](https://coveralls.io/github/winkjs/wink-nlp?branch=master) and compliance with the [Open Source Security Foundation best practices](https://bestpractices.coreinfrastructure.org/en/projects/6035) make winkNLP the ideal tool for building production grade systems with confidence.
11
11
12
+
WinkNLP with full [Typescript support](https://github.com/winkjs/wink-nlp/blob/master/types/index.d.ts), runs on Node.js and browsers.
12
13
13
-
## Features
14
-
WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer):
At every stage a range of properties become accessible for tokens, sentences, and entities. Read more about the processing pipeline and how to configure it in the [winkNLP documentation](https://winkjs.org/wink-nlp/processing-pipeline.html).
19
-
20
-
21
-
It packs a rich feature set into a small foot print codebase of [under 1500 lines](https://coveralls.io/github/winkjs/wink-nlp?branch=master):
3. Built-in [API](https://winkjs.org/wink-nlp/visualizing-markup.html) to aid [text visualization](https://observablehq.com/@winkjs/how-to-perform-sentiment-analysis?collection=@winkjs/winknlp-recipes)
19
+
Head to [live examples](https://winkjs.org/examples.html) to explore further.
28
20
29
-
4. Extensive [text processing features](https://winkjs.org/wink-nlp/its-as-helper.html) such as bag-of-words, frequency table, stop word removal, readability statistics computation and many more.
21
+
## Blazing fast
22
+
WinkNLP can easily process large amount of raw text at speeds over <mark>**650,000 tokens/second**</mark> on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
30
23
31
-
5. Pre-trained [language models](https://winkjs.org/wink-nlp/language-models.html) with sizes starting from <3MB onwards
WinkNLP has a [comprehensive natural language processing (NLP) pipeline](https://winkjs.org/wink-nlp/processing-pipeline.html) covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:
40
31
41
-
10.[Runs on web browsers](https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html)
32
+
<table>
33
+
<tr><tdstyle="width:45%">Fast, lossless & multilingual tokenizer ⚡️</td><td>For example, a multilingual text string <b><codestyle="font-size: 0.9em">"¡Hola! नमस्कार! Hi! Bonjour chéri"</code></b> tokenizes as <codestyle="font-size: 0.9em">["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"]</code>. It tokenizes text at <b>4 million</b> tokens/second on a M1 MBP's browser.</td></tr>
34
+
<tr><td>Developer friendly and intuitive <ahref="https://winkjs.org/wink-nlp/getting-started.html">API</a> 💚</td><td>As simple as DOM manipulation; most <ahref="https://observablehq.com/@winkjs/how-to-build-a-naive-wikification-tool?collection=@winkjs/winknlp-recipes">live examples</a> have <b>30-40</b> lines of code.</td></tr>
35
+
<tr><td>Best-in-class <ahref="https://winkjs.org/wink-nlp/visualizing-markup.html">text visualization</a> 🖼</td><td>Programmatically <b><ahref="https://winkjs.org/wink-nlp/markup.html">mark</a></b> tokens, sentences, entities, etc. using HTML mark or any other tag of your choice.</td></tr>
36
+
<tr><td>Extensive text processing features ♻️</td><td>Checkout how a <ahref="https://github.com/winkjs/wink-naive-bayes-text-classifier#readme">Naive Bayes classifier</a> achieves <b>impressive</b> chatbot intent classification accuracy with right kind of preprocessing with winkNLP.</td></tr>
37
+
<tr><td>Pre-trained <ahref="https://winkjs.org/wink-nlp/language-models.html">language models</a> 🔠</td><td>Compact sizes starting from <b><3MB</b>.</td></tr>
38
+
<tr><td>Host of <ahref="https://winkjs.org/wink-nlp/its-as-helper.html">utilities & tools</a> 💼</td><td>BM25 vectorizer; Several similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai; Helpers to get bag of words, frequency table, lemma/stem, stop word removal and many more.</td></tr>
WinkJS also has packages like [Naive Bayes classifier](https://github.com/winkjs/wink-naive-bayes-text-classifier), [multi-class averaged perceptron](https://github.com/winkjs/wink-perceptron) and [popular token and string distance methods](https://github.com/winkjs/wink-distance), which complement winkNLP.
44
42
43
+
## Documentation
44
+
-[Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.
45
+
-[API Reference](https://winkjs.org/wink-nlp/read-doc.html) — explains usage of APIs with examples.
46
+
-[Change log](https://github.com/winkjs/wink-nlp/blob/master/CHANGELOG.md) — version history along with the details of breaking changes, if any.
47
+
-[Examples](https://winkjs.org/examples.html) — live examples with code to give you a head start.
45
48
46
49
## Installation
47
50
@@ -51,22 +54,22 @@ Use [npm](https://www.npmjs.com/package/wink-nlp) install:
51
54
npm install wink-nlp --save
52
55
```
53
56
54
-
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The following table outlines the version specific installation command:
57
+
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:
55
58
56
59
| Node.js Version |Installation |
57
60
| --- | --- |
58
61
| 16 or 18 |`npm install wink-eng-lite-web-model --save`|
59
62
| 14 or 12 |`node -e "require('wink-nlp/models/install')"`|
60
63
61
-
The [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model) is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section.
64
+
The [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model) is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is the **recommended** model.
62
65
63
66
The second command installs the [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), which works with Node.js version 14 or 12.
64
67
65
68
### How to install for Web Browser
66
69
If you’re using winkNLP in the browser use the [wink-eng-lite-web-model](https://www.npmjs.com/package/wink-eng-lite-web-model). Learn about its installation and usage in our [guide to using winkNLP in the browser](https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html). Explore **[winkNLP recipes](https://observablehq.com/collection/@winkjs/winknlp-recipes)** on [Observable](https://observablehq.com/) for live browser based examples.
Experiment with the above code on [RunKit](https://npm.runkit.com/wink-nlp).
103
-
104
-
### Explore Further
105
-
Dive into [winkNLP's concepts](https://winkjs.org/wink-nlp/getting-started.html) or head to **[winkNLP recipes](https://observablehq.com/collection/@winkjs/winknlp-recipes)** for common NLP tasks or just explore live [showcases](https://winkjs.org/showcase.html) to learn:
Links entities such as famous persons, locations or objects to the relevant Wikipedia pages.
115
-
105
+
Experiment with winkNLP on [RunKit](https://npm.runkit.com/wink-nlp).
116
106
117
107
## Speed & Accuracy
118
-
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **~525,000 tokens per second** with its default language model — [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
119
-
120
-
The benchmark was conducted on [Node.js versions 14.8.0, and 12.18.3](https://nodejs.org/en/about/releases/). It delivered similar/better performance on Node.js versions 16/18.
108
+
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **~650,000 tokens per second** with its [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
121
109
122
-
The [winkNLP](https://winkjs.org/wink-nlp/) delivers similar performance on browsers; its performance on a specific machine/browser combination can be measured using the Observable notebook — [How to measure winkNLP's speed on browsers?](https://observablehq.com/@winkjs/how-to-measure-winknlps-speed-on-browsers?collection=@winkjs/winknlp-recipes).
110
+
The benchmark was conducted on [Node.js versions 16, and 18](https://nodejs.org/en/about/releases/).
123
111
124
-
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The current state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
112
+
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
125
113
126
114
Its general purpose sentiment analysis delivers a [f-score](https://en.wikipedia.org/wiki/F1_score) of **~84.5%**, when validated using Amazon Product Review [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/) at [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). The current benchmark accuracy for **specifically trained** models can range around 95%.
127
115
128
116
## Memory Requirement
129
117
Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire [History of India Volume I](https://en.wikisource.org/wiki/History_of_India/Volume_1) with a total peak memory requirement of under **80MB**. The book has around 350 pages which translates to over 125,000 tokens.
130
118
131
-
## Documentation
132
-
-[Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.
133
-
-[API Reference](https://winkjs.org/wink-nlp/read-doc.html) — explains usage of APIs with examples.
134
-
-[Change log](https://github.com/winkjs/wink-nlp/blob/master/CHANGELOG.md) — version history along with the details of breaking changes, if any.
135
-
-[Showcases](https://winkjs.org/showcase.html) — live examples with code to give you a head start.
136
119
137
120
## Need Help?
138
121
@@ -146,11 +129,11 @@ If you spot a bug and the same has not yet been reported, raise a new [issue](ht
146
129
Looking for a new feature, request it via the [new features & ideas](https://github.com/winkjs/wink-nlp/discussions/categories/new-features-ideas) discussion forum or consider becoming a [contributor](https://github.com/winkjs/wink-nlp/blob/master/CONTRIBUTING.md).
147
130
148
131
149
-
## About wink
150
-
[Wink](https://winkjs.org/) is a family of open source packages for **Natural Language Processing**, **Machine Learning**, and **Statistical Analysis** in NodeJS. The code is **thoroughly documented** for easy human comprehension and has a **test coverage of ~100%** for reliability to build production grade solutions.
132
+
## About winkJS
133
+
[WinkJS](https://winkjs.org/) is a family of open source packages for **Natural Language Processing**, **Machine Learning**, and **Statistical Analysis** in NodeJS. The code is **thoroughly documented** for easy human comprehension and has a **test coverage of ~100%** for reliability to build production grade solutions.
151
134
152
135
## Copyright & License
153
136
154
137
**Wink NLP** is copyright 2017-22 [GRAYPE Systems Private Limited](https://graype.in/).
155
138
156
-
It is licensed under the terms of the MIT License.
139
+
It is licensed under the terms of the MIT License.
0 commit comments