Skip to content

Commit afb25d9

Browse files
committed
Merge remote-tracking branch 'upstream/develop' into oewn22
2 parents 90f1855 + 8588483 commit afb25d9

File tree

5 files changed

+93
-10
lines changed

5 files changed

+93
-10
lines changed

ChangeLog

Lines changed: 58 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,63 @@
1+
2+
Version 3.8 2022-12-12
3+
4+
* Refactor dispersion plot (#3082)
5+
* Provide type hints for LazyCorpusLoader variables (#3081)
6+
* Throw warning when LanguageModel is initialized with incorrect vocabulary (#3080)
7+
* Fix WordNet's all_synsets() function (#3078)
8+
* Resolve TreebankWordDetokenizer inconsistency with end-of-string contractions (#3070)
9+
* Support both iso639-3 codes and BCP-47 language tags (#3060)
10+
* Avoid DeprecationWarning in Regexp tokenizer (#3055)
11+
* Fix many doctests, add doctests to CI (#3054, #3050, #3048)
12+
* Fix bool field not being read in VerbNet (#3044)
13+
* Greatly improve time efficiency of SyllableTokenizer when tokenizing numbers (#3042)
14+
* Fix encodings of Polish udhr corpus reader (#3038)
15+
* Allow TweetTokenizer to tokenize emoji flag sequences (#3034)
16+
* Prevent LazyModule from increasing the size of nltk.__dict__ (#3033)
17+
* Fix CoreNLPServer non-default port issue (#3031)
18+
* Add "acion" suffix to the Spanish SnowballStemmer (#3030)
19+
* Allow loading WordNet without OMW (#3026)
20+
* Use input() in nltk.chat.chatbot() for Jupyter support (#3022)
21+
* Fix edit_distance_align() in distance.py (#3017)
22+
* Tackle performance and accuracy regression of sentence tokenizer since NLTK 3.6.6 (#3014)
23+
* Add the Iota operator to semantic logic (#3010)
24+
* Resolve critical errors in WordNet app (#3008)
25+
* Resolve critical error in CHILDES Corpus (#2998)
26+
* Make WordNet information_content() accept adjective satellites (#2995)
27+
* Add "strict=True" parameter to CoreNLP (#2993, #3043)
28+
* Resolve issue with WordNet's synset_from_sense_key (#2988)
29+
* Handle WordNet synsets that were lost in mapping (#2985)
30+
* Resolve TypeError in Boxer (#2979)
31+
* Add function to retrieve WordNet synonyms (#2978)
32+
* Warn about nonexistent OMW offsets instead of raising an error (#2974)
33+
* Fix missing ic argument in res, jcn and lin similarity functions of WordNet (#2970)
34+
* Add support for the extended OMW (#2946)
35+
* Fix LC cutoff policy of text tiling (#2936)
36+
* Optimize ConditionalFreqDist.__add__ performance (#2939)
37+
* Add Markdown corpus reader (#2902)
38+
39+
Thanks to the following contributors to 3.8:
40+
Alexandre Perez-Lebel, David Lukes, Eric Kafe, Fernando Carranza, Heungson Lee,
41+
Hoyeol Kim, James Huang, Jelle Zijlstra, Louis-Justin Tallot, M.K. Pawelkiewicz,
42+
Jan Lennartz, Malinda Dilhara, Martin Kondratzky, Rob Malouf, Saud Kadiri,
43+
Siddhesh Mhadnak, Stephan Hasler, Steve Smith, Tom Aarsen, Tyler Sheaffer,
44+
Yue Zhao, cestwc, elespike, purificant, richardyy1188
45+
146
Version 3.7 2022-02-09
247

348
* Improve and update the NLTK team page on nltk.org (#2855, #2941)
449
* Drop support for Python 3.6, support Python 3.10 (#2920)
550

51+
Thanks to the following contributors to 3.7:
52+
Tom Aarsen
53+
654
Version 3.6.7 2021-12-28
755

856
* Resolve IndexError in `sent_tokenize` and `word_tokenize` (#2922)
957

58+
Thanks to the following contributors to 3.6.7:
59+
Tom Aarsen
60+
1061
Version 3.6.6 2021-12-21
1162

1263
* Refactor `gensim.doctest` to work for gensim 4.0.0 and up (#2914)
@@ -44,9 +95,9 @@ Version 3.6.6 2021-12-21
4495
* Fix TypeError: _pretty() takes 1 positional argument but 2 were given in sem/drt.py (#2854)
4596
* Replace `http` with `https` in most URLs (#2852)
4697

47-
Thanks to the following contributors to 3.6.6
98+
Thanks to the following contributors to 3.6.6:
4899
Adam Hawley, BatMrE, Danny Sepler, Eric Kafe, Gavish Poddar, Panagiotis Simakis,
49-
RnDevelover, Robby Horvath, Tom Aarsen, Yuta Nakamura, Mohaned Mashaly
100+
RnDevelover, Robby Horvath, Tom Aarsen, Yuta Nakamura, Mohaned Mashaly
50101

51102
Version 3.6.5 2021-10-11
52103

@@ -60,7 +111,7 @@ Version 3.6.5 2021-10-11
60111
* specify minimum regex version that supports regex.Pattern
61112
* avoid re.Pattern and regex.Pattern which fail for Python 3.6, 3.7
62113

63-
Thanks to the following contributors to 3.6.5
114+
Thanks to the following contributors to 3.6.5:
64115
Tom Aarsen, Saibo Geng, Mohaned Mashaly, Dimitri Papadopoulos, Danny Sepler,
65116
Ahmet Yildirim, RnDevelover, yutanakamura
66117

@@ -75,7 +126,7 @@ Version 3.6.4 2021-10-01
75126
* replace travis badge with github actions badge
76127
* add SECURITY.md
77128

78-
Thanks to the following contributors to 3.6.4
129+
Thanks to the following contributors to 3.6.4:
79130
Tom Aarsen, Mohaned Mashaly, Dimitri Papadopoulos Orfanos, purificant, Danny Sepler
80131

81132
Version 3.6.3 2021-09-19
@@ -96,7 +147,7 @@ Version 3.6.3 2021-09-19
96147
* Optional show arg for FreqDist.plot, ConditionalFreqDist.plot
97148
* edit_distance now computes Damerau-Levenshtein edit-distance
98149

99-
Thanks to the following contributors to 3.6.3
150+
Thanks to the following contributors to 3.6.3:
100151
Tom Aarsen, Abhijnan Bajpai, Michael Wayne Goodman, Michał Górny, Maarten ter Huurne,
101152
Manu Joseph, Eric Kafe, Ilia Kurenkov, Daniel Loney, Rob Malouf, Mohaned Mashaly,
102153
purificant, Danny Sepler, Anthony Sottile
@@ -107,7 +158,7 @@ Version 3.6.2 2021-04-20
107158
* fix bug in NgramAssocMeasures (order preserving fix)
108159
* fixes for compatibility with Pypy 7.3.4
109160

110-
Thanks to the following contributors to 3.6.2
161+
Thanks to the following contributors to 3.6.2:
111162
Ruben Cartuyvels, Rob Malouf, Dalton Pearson, Danny Sepler
112163

113164
Version 3.6 2021-04-07
@@ -124,7 +175,7 @@ Version 3.6 2021-04-07
124175

125176
Thanks to the following contributors to 3.6:
126177
Tom Aarsen, K Abainia, Akshita Bhagia, Andrew Bird, Thomas Bird,
127-
Tom Conroy, CubieDev, Christopher Hench, Andrew Jorgensen, Eric Kafe,
178+
Tom Conroy, Christopher Hench, Andrew Jorgensen, Eric Kafe,
128179
Ilia Kurenkov, Yeting Li, Joseph Manu, Marius Mather, Denali Molitor,
129180
Jacob Moorman, Philippe Ombredanne, Vassilis Palassopoulos, Ram Rachum,
130181
Danny Sepler, Or Sharir, Brad Solomon, Hiroki Teranishi, Constantin Weisser,

nltk/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.7.1a
1+
3.8.1a

nltk/sentiment/vader.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,11 @@ def polarity_scores(self, text):
356356
Return a float for sentiment strength based on the input text.
357357
Positive values are positive valence, negative value are negative
358358
valence.
359+
360+
:note: Hashtags are not taken into consideration (e.g. #BAD is neutral). If you
361+
are interested in processing the text in the hashtags too, then we recommend
362+
preprocessing your data to remove the #, after which the hashtag text may be
363+
matched as if it was a normal word in the sentence.
359364
"""
360365
# text, words_and_emoticons, is_cap_diff = self.preprocess(text)
361366
sentitext = SentiText(

web/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ def generate_custom_files():
126126
# built documents.
127127
#
128128
# The short X.Y version.
129-
version = "3.7.1a"
129+
version = "3.8.1a"
130130
# The full version, including alpha/beta/rc tags.
131-
release = "3.7.1a"
131+
release = "3.8.1a"
132132

133133
# The language for content autogenerated by Sphinx. Refer to documentation
134134
# for a list of supported languages.

web/news.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,33 @@ Release Notes
44
2022
55
----
66

7+
NLTK 3.8 release: December 2022:
8+
9+
- Fix WordNet's all_synsets() function
10+
- Greatly improve time efficiency of SyllableTokenizer when tokenizing numbers
11+
- Tackle performance and accuracy regression of sentence tokenizer since NLTK 3.6.6
12+
- Resolve TreebankWordDetokenizer inconsistency with end-of-string contractions
13+
- Optimize ConditionalFreqDist.__add__ performance
14+
- Fix LC cutoff policy of text tiling
15+
- Add Markdown corpus reader
16+
- Add support for the extended OMW
17+
- Support both iso639-3 codes and BCP-47 language tags
18+
- Fix bool field not being read in VerbNet
19+
- Fix encodings of Polish udhr corpus reader
20+
- Allow TweetTokenizer to tokenize emoji flag sequences
21+
- Add "acion" suffix to the Spanish SnowballStemmer
22+
- Allow loading WordNet without OMW
23+
- Fix edit_distance_align() in distance.py
24+
- Add the Iota operator to semantic logic
25+
- Resolve critical error in CHILDES Corpus
26+
- Make WordNet information_content() accept adjective satellites
27+
- Add "strict=True" parameter to CoreNLP
28+
- Resolve issue with WordNet's synset_from_sense_key
29+
- Handle WordNet synsets that were lost in mapping
30+
- Add function to retrieve WordNet synonyms
31+
- Warn about nonexistent OMW offsets instead of raising an error
32+
- Fix missing ic argument in res, jcn and lin similarity functions of WordNet
33+
734
NLTK 3.7 release: February 2022:
835

936
- improve and update the NLTK team page on nltk.org

0 commit comments

Comments
 (0)