Skip to content

Commit 2dc3d15

Browse files
docs(README): improve contents
closes #93 closes #92 references #91
1 parent 82c8206 commit 2dc3d15

File tree

1 file changed

+41
-58
lines changed

1 file changed

+41
-58
lines changed

README.md

Lines changed: 41 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -5,43 +5,46 @@
55
## Developer friendly Natural Language Processing ✨
66
[<img align="right" src="https://decisively.github.io/wink-logos/logo-title.png" width="100px" >](https://winkjs.org/)
77

8-
winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over **525,000 tokens/second**. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.
8+
WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy.
99

10-
[<img src="https://user-images.githubusercontent.com/9491/100614781-ad17bb00-333c-11eb-87ab-2ae41aa21285.png" alt="Wink Wizard Showcase">](https://winkjs.org/showcase-wiz/)
10+
It is built ground up with a lean code base that has [no external dependency](https://snyk.io/test/github/winkjs/wink-nlp?tab=dependencies). A test coverage of [~100%](https://coveralls.io/github/winkjs/wink-nlp?branch=master) and compliance with the [Open Source Security Foundation best practices](https://bestpractices.coreinfrastructure.org/en/projects/6035) make winkNLP the ideal tool for building production grade systems with confidence.
1111

12+
WinkNLP with full [Typescript support](https://github.com/winkjs/wink-nlp/blob/master/types/index.d.ts), runs on Node.js and browsers.
1213

13-
## Features
14-
WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer):
15-
16-
<img src="https://winkjs.org/images/wink-nlp-processing-pipeline.png" alt="Processing pipeline: text, tokenization, SBD, negation, sentiment, NER, POS, CER" title="WinkNLP processing pipeline">
17-
18-
At every stage a range of properties become accessible for tokens, sentences, and entities. Read more about the processing pipeline and how to configure it in the [winkNLP documentation](https://winkjs.org/wink-nlp/processing-pipeline.html).
19-
20-
21-
It packs a rich feature set into a small foot print codebase of [under 1500 lines](https://coveralls.io/github/winkjs/wink-nlp?branch=master):
22-
23-
1. Fast, lossless & multilingual [tokenizer](https://winkjs.org/wink-nlp/processing-pipeline.html)
24-
25-
2. Developer friendly and intuitive [API](https://winkjs.org/wink-nlp/getting-started.html)
14+
## Build amazing apps quickly
15+
| [Wikipedia article timeline](https://winkjs.org/showcase-timeline/) | [Context aware word cloud](https://observablehq.com/@winkjs/how-to-create-a-context-aware-word-cloud) | [Key sentences detection](https://observablehq.com/@winkjs/how-to-visualize-key-sentences-in-a-document) |
16+
| --- | --- | --- |
17+
| [<img src="https://user-images.githubusercontent.com/29990/202497363-19c30578-8146-4f36-9c4b-4de613610837.png">](https://winkjs.org/showcase-timeline/)| [<img src="https://user-images.githubusercontent.com/29990/202506181-1a926ee0-788f-4aa1-aeac-a097f09fe747.png">](https://observablehq.com/@winkjs/how-to-create-a-context-aware-word-cloud)|[<img src="https://user-images.githubusercontent.com/29990/202506490-7f999d12-8319-4969-b92b-0649559ffbe6.png">](https://observablehq.com/@winkjs/how-to-visualize-key-sentences-in-a-document)|
2618

27-
3. Built-in [API](https://winkjs.org/wink-nlp/visualizing-markup.html) to aid [text visualization](https://observablehq.com/@winkjs/how-to-perform-sentiment-analysis?collection=@winkjs/winknlp-recipes)
19+
Head to [live examples](https://winkjs.org/examples.html) to explore further.
2820

29-
4. Extensive [text processing features](https://winkjs.org/wink-nlp/its-as-helper.html) such as bag-of-words, frequency table, stop word removal, readability statistics computation and many more.
21+
## Blazing fast
22+
WinkNLP can easily process large amount of raw text at speeds over <mark>**650,000 tokens/second**</mark> on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
3023

31-
5. Pre-trained [language models](https://winkjs.org/wink-nlp/language-models.html) with sizes starting from <3MB onwards
24+
| Environment | Benchmarking Command |
25+
|--- | --- |
26+
| Node.js | [node benchmark/run](https://github.com/winkjs/wink-nlp/tree/master/benchmark) |
27+
| Browser | [How to measure winkNLP's speed on browsers?](https://observablehq.com/@winkjs/how-to-measure-winknlps-speed-on-browsers) |
3228

33-
6. [BM25-based vectorizer](https://winkjs.org/wink-nlp/bm25-vectorizer.html)
34-
35-
7. Multiple [similarity](https://winkjs.org/wink-nlp/similarity.html) methods
36-
37-
8. Word vector integration
38-
39-
9. No external dependencies
29+
## Features
30+
WinkNLP has a [comprehensive natural language processing (NLP) pipeline](https://winkjs.org/wink-nlp/processing-pipeline.html) covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:
4031

41-
10. [Runs on web browsers](https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html)
32+
<table>
33+
<tr><td style="width:45%">Fast, lossless & multilingual tokenizer ⚡️</td><td>For example, a multilingual text string <b><code style="font-size: 0.9em">"¡Hola! नमस्कार! Hi! Bonjour chéri"</code></b> tokenizes as <code style="font-size: 0.9em">["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"]</code>. It tokenizes text at <b>4 million</b> tokens/second on a M1 MBP's browser.</td></tr>
34+
<tr><td>Developer friendly and intuitive <a href="https://winkjs.org/wink-nlp/getting-started.html">API</a> 💚</td><td>As simple as DOM manipulation; most <a href="https://observablehq.com/@winkjs/how-to-build-a-naive-wikification-tool?collection=@winkjs/winknlp-recipes">live examples</a> have <b>30-40</b> lines of code.</td></tr>
35+
<tr><td>Best-in-class <a href="https://winkjs.org/wink-nlp/visualizing-markup.html">text visualization</a> 🖼</td><td>Programmatically <b><a href="https://winkjs.org/wink-nlp/markup.html">mark</a></b> tokens, sentences, entities, etc. using HTML mark or any other tag of your choice.</td></tr>
36+
<tr><td>Extensive text processing features ♻️</td><td>Checkout how a <a href="https://github.com/winkjs/wink-naive-bayes-text-classifier#readme">Naive Bayes classifier</a> achieves <b>impressive</b> chatbot intent classification accuracy with right kind of preprocessing with winkNLP.</td></tr>
37+
<tr><td>Pre-trained <a href="https://winkjs.org/wink-nlp/language-models.html">language models</a> 🔠</td><td>Compact sizes starting from <b>&lt;3MB</b>.</td></tr>
38+
<tr><td>Host of <a href="https://winkjs.org/wink-nlp/its-as-helper.html">utilities & tools</a> 💼</td><td>BM25 vectorizer; Several similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai; Helpers to get bag of words, frequency table, lemma/stem, stop word removal and many more.</td></tr>
39+
</table>
4240

43-
11. [Typescript support](https://github.com/winkjs/wink-nlp/blob/master/types/index.d.ts).
41+
WinkJS also has packages like [Naive Bayes classifier](https://github.com/winkjs/wink-naive-bayes-text-classifier), [multi-class averaged perceptron](https://github.com/winkjs/wink-perceptron) and [popular token and string distance methods](https://github.com/winkjs/wink-distance), which complement winkNLP.
4442

43+
## Documentation
44+
- [Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.
45+
- [API Reference](https://winkjs.org/wink-nlp/read-doc.html) — explains usage of APIs with examples.
46+
- [Change log](https://github.com/winkjs/wink-nlp/blob/master/CHANGELOG.md) — version history along with the details of breaking changes, if any.
47+
- [Examples](https://winkjs.org/examples.html) — live examples with code to give you a head start.
4548

4649
## Installation
4750

@@ -51,22 +54,22 @@ Use [npm](https://www.npmjs.com/package/wink-nlp) install:
5154
npm install wink-nlp --save
5255
```
5356

54-
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The following table outlines the version specific installation command:
57+
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:
5558

5659
| Node.js Version |Installation |
5760
| --- | --- |
5861
| 16 or 18 | `npm install wink-eng-lite-web-model --save` |
5962
| 14 or 12 | `node -e "require('wink-nlp/models/install')"` |
6063

61-
The [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model) is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section.
64+
The [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model) is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is the **recommended** model.
6265

6366
The second command installs the [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), which works with Node.js version 14 or 12.
6467

6568
### How to install for Web Browser
6669
If you’re using winkNLP in the browser use the [wink-eng-lite-web-model](https://www.npmjs.com/package/wink-eng-lite-web-model). Learn about its installation and usage in our [guide to using winkNLP in the browser](https://winkjs.org/wink-nlp/wink-nlp-in-browsers.html). Explore **[winkNLP recipes](https://observablehq.com/collection/@winkjs/winknlp-recipes)** on [Observable](https://observablehq.com/) for live browser based examples.
6770

68-
## Getting Started
69-
The "Hello World!" in winkNLP is given below:
71+
### Get started
72+
Here is the "Hello World!" of winkNLP:
7073

7174
```javascript
7275
// Load wink-nlp package.
@@ -99,40 +102,20 @@ console.log( doc.tokens().out() );
99102
console.log( doc.tokens().out( its.type, as.freqTable ) );
100103
// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]
101104
```
102-
Experiment with the above code on [RunKit](https://npm.runkit.com/wink-nlp).
103-
104-
### Explore Further
105-
Dive into [winkNLP's concepts](https://winkjs.org/wink-nlp/getting-started.html) or head to **[winkNLP recipes](https://observablehq.com/collection/@winkjs/winknlp-recipes)** for common NLP tasks or just explore live [showcases](https://winkjs.org/showcase.html) to learn:
106-
107-
#### [Wikipedia Timeline](https://winkjs.org/showcase-timeline/)
108-
Reads any wikipedia article and generates a visual timeline of all its events.
109-
110-
#### [NLP Wizard](https://winkjs.org/showcase-wiz/) 🧙
111-
Performs tokenization, sentence boundary detection, pos tagging, named entity detection and sentiment analysis of user input text in real time.
112-
113-
#### [Naive Wikification Tool](https://observablehq.com/@winkjs/how-to-build-a-naive-wikification-tool) 🔗
114-
Links entities such as famous persons, locations or objects to the relevant Wikipedia pages.
115-
105+
Experiment with winkNLP on [RunKit](https://npm.runkit.com/wink-nlp).
116106

117107
## Speed & Accuracy
118-
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **~525,000 tokens per second** with its default language model — [wink-eng-lite-model](https://github.com/winkjs/wink-eng-lite-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
119-
120-
The benchmark was conducted on [Node.js versions 14.8.0, and 12.18.3](https://nodejs.org/en/about/releases/). It delivered similar/better performance on Node.js versions 16/18.
108+
The [winkNLP](https://winkjs.org/wink-nlp/) processes raw text at **~650,000 tokens per second** with its [wink-eng-lite-web-model](https://github.com/winkjs/wink-eng-lite-web-model), when [benchmarked](https://github.com/bestiejs/benchmark.js) using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
121109

122-
The [winkNLP](https://winkjs.org/wink-nlp/) delivers similar performance on browsers; its performance on a specific machine/browser combination can be measured using the Observable notebook — [How to measure winkNLP's speed on browsers?](https://observablehq.com/@winkjs/how-to-measure-winknlps-speed-on-browsers?collection=@winkjs/winknlp-recipes).
110+
The benchmark was conducted on [Node.js versions 16, and 18](https://nodejs.org/en/about/releases/).
123111

124-
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The current state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
112+
It pos tags a subset of WSJ corpus with an accuracy of **~94.7%** — this includes *tokenization of raw text prior to pos tagging*. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
125113

126114
Its general purpose sentiment analysis delivers a [f-score](https://en.wikipedia.org/wiki/F1_score) of **~84.5%**, when validated using Amazon Product Review [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/) at [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). The current benchmark accuracy for **specifically trained** models can range around 95%.
127115

128116
## Memory Requirement
129117
Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire [History of India Volume I](https://en.wikisource.org/wiki/History_of_India/Volume_1) with a total peak memory requirement of under **80MB**. The book has around 350 pages which translates to over 125,000 tokens.
130118

131-
## Documentation
132-
- [Concepts](https://winkjs.org/wink-nlp/getting-started.html) — everything you need to know to get started.
133-
- [API Reference](https://winkjs.org/wink-nlp/read-doc.html) — explains usage of APIs with examples.
134-
- [Change log](https://github.com/winkjs/wink-nlp/blob/master/CHANGELOG.md) — version history along with the details of breaking changes, if any.
135-
- [Showcases](https://winkjs.org/showcase.html) — live examples with code to give you a head start.
136119

137120
## Need Help?
138121

@@ -146,11 +129,11 @@ If you spot a bug and the same has not yet been reported, raise a new [issue](ht
146129
Looking for a new feature, request it via the [new features & ideas](https://github.com/winkjs/wink-nlp/discussions/categories/new-features-ideas) discussion forum or consider becoming a [contributor](https://github.com/winkjs/wink-nlp/blob/master/CONTRIBUTING.md).
147130

148131

149-
## About wink
150-
[Wink](https://winkjs.org/) is a family of open source packages for **Natural Language Processing**, **Machine Learning**, and **Statistical Analysis** in NodeJS. The code is **thoroughly documented** for easy human comprehension and has a **test coverage of ~100%** for reliability to build production grade solutions.
132+
## About winkJS
133+
[WinkJS](https://winkjs.org/) is a family of open source packages for **Natural Language Processing**, **Machine Learning**, and **Statistical Analysis** in NodeJS. The code is **thoroughly documented** for easy human comprehension and has a **test coverage of ~100%** for reliability to build production grade solutions.
151134

152135
## Copyright & License
153136

154137
**Wink NLP** is copyright 2017-22 [GRAYPE Systems Private Limited](https://graype.in/).
155138

156-
It is licensed under the terms of the MIT License.
139+
It is licensed under the terms of the MIT License.

0 commit comments

Comments
 (0)