Skip to content

Commit ad3b655

Browse files
docs(ROADMAP): add short/medium term roadmap
1 parent df86e20 commit ad3b655

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

ROADMAP.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Roadmap 🧭
2+
With winkNLP's production ready release in late 2020, the core is already in place. Apart from sustainment, our goal is to continuously improve it by adding new features and capabilities. We have listed some of the features that should be added to winkNLP:
3+
4+
|S. No.| Feature | Complexity |Status|
5+
|---|---|---|---|
6+
|01.|**Extractive Summarization**:<br/> Add `its.summary` helper to produce extractive summary of text via `doc.out( its.summary )`. While it should be language agnostic, but it should leverage loaded language model's capability to improve summarization.| Simple | WIP |
7+
|02.|**Text Pre-processor**:<br/>Add a text preprocessing utility that provides options to (a) filter specific tokens based on their properties such as `pos`, `isStopWordFlag`, and `type`; (b) map entity type with a definable keyword; (c) add bigrams & trigrams and (d) inject sentiment. The API should follow winkNLP style and standards.|Medium|YTS|
8+
|03.|**Word Vectors Integration**:<br/>Add integration with various word vectors starting with GloVe. This should include compression/decompression for fast loading, helpers for token, sentence and document vector computation. |High|YTS|
9+
|04.|**Sub-word Tokenizer**:<br/>Add sub-word tokenization feature using techniques like Byte Pair Encoding (BPE) and/or WordPiece. The processing pipeline should allow choice of tokenizer.|Very High|YTS|
10+
|05.|**Compose Corpus**:<br/>Add a utility to produce training corpus using patterns and cartesian product.|Simple|YTS|
11+
|06.|**Keywords Extraction**:<br/>Add `its.keywords` helper to extract keywords/keyphrases from the text via `doc.out( its.keywords )`. While it should be language agnostic, but it should leverage loaded language model's capability to improve extraction.| Simple | YTS |
12+
|07.|**BM25 Vectorizer**:<br/>Add a utility to train and also vectorize text based on an already trained BM25 model. It will follow wink-nlp styled API. |Medium|[Completed](https://github.com/winkjs/wink-nlp/discussions/22)|
13+
|08.|**Constituency/Dependency Parser**:<br/>Add a constituency and/or dependency parser — details have to be worked out.|Very High|YTS|
14+
15+
The above is intended to serve as a guideline for users and [contributors](https://github.com/winkjs/wink-nlp/blob/master/CONTRIBUTING.md) for information, feedback and possible [participation & discussion](https://github.com/winkjs/wink-nlp/discussions/categories/new-features-ideas).

0 commit comments

Comments
 (0)