Skip to content

Commit 84ddb3e

Browse files
committed
Modernize packaging, resource loading, and test/CI setup
1 parent a8be2a2 commit 84ddb3e

File tree

15 files changed

+479
-221
lines changed

15 files changed

+479
-221
lines changed

.github/workflows/ci.yml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
pull_request:
6+
7+
jobs:
8+
test:
9+
runs-on: ubuntu-latest
10+
strategy:
11+
fail-fast: false
12+
matrix:
13+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
14+
steps:
15+
- uses: actions/checkout@v4
16+
- uses: actions/setup-python@v5
17+
with:
18+
python-version: ${{ matrix.python-version }}
19+
- name: Install package and test dependencies
20+
run: |
21+
python -m pip install --upgrade pip
22+
pip install -e . pytest build twine
23+
- name: Run unit tests
24+
run: pytest -q -m "not integration"
25+
26+
integration:
27+
runs-on: ubuntu-latest
28+
steps:
29+
- uses: actions/checkout@v4
30+
- uses: actions/setup-python@v5
31+
with:
32+
python-version: "3.11"
33+
- name: Install package and dependencies
34+
run: |
35+
python -m pip install --upgrade pip
36+
pip install -e . pytest textblob
37+
- name: Download corpora
38+
run: python -m textblob.download_corpora
39+
- name: Run full tests
40+
run: pytest -q
41+
42+
packaging:
43+
runs-on: ubuntu-latest
44+
steps:
45+
- uses: actions/checkout@v4
46+
- uses: actions/setup-python@v5
47+
with:
48+
python-version: "3.11"
49+
- name: Build and validate distributions
50+
run: |
51+
python -m pip install --upgrade pip
52+
pip install build twine
53+
python -m build
54+
twine check dist/*

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
__pycache__/
2+
*.py[cod]
3+
*.egg-info/
4+
.pytest_cache/
5+
.ruff_cache/
6+
.venv/
7+
.idea/
8+
dist/
9+
build/

COMPATIBILITY.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Compatibility Notes (vs NRCLex 4.0 behavior)
2+
3+
## Comparison method
4+
5+
Because external network access is unavailable in this environment, direct installation of `NRCLex==4.0` from PyPI was not possible. Instead, behavior was compared against the repository's 4.0 implementation logic using the same bundled lexicon data.
6+
7+
## Results
8+
9+
### `load_token_list`
10+
11+
For token list `['happy', 'sad', 'unknown', 'happy']`, the updated implementation produced identical values to the 4.0 logic for:
12+
13+
- `raw_emotion_scores`
14+
- `affect_frequencies`
15+
- `top_emotions`
16+
17+
### Constructor behavior (`NRCLex()`)
18+
19+
- **4.0 behavior:** default `lexicon_file='nrc_en.json'` required current working directory assumptions and could fail after installation.
20+
- **Updated behavior:** `NRCLex()` now reliably loads bundled `nrclex/data/nrc_en.json` via `importlib.resources`.
21+
22+
This constructor change is intentional and improves install-time reliability without changing the public API.
23+
24+
## Metadata corrections (intentional)
25+
26+
- Removed stdlib entries (`collections`, `json`) from dependencies.
27+
- Set `Requires-Python` to `>=3.9` to align with TextBlob support expectations.

README.md

Lines changed: 71 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,71 @@
1-
# NRCLex
2-
3-
(C) 2019 Mark M. Bailey, PhD
4-
5-
## About
6-
NRCLex will measure emotional affect from a body of text. Affect dictionary contains approximately 27,000 words, and is based on the National Research Council Canada (NRC) affect lexicon (see link below) and the NLTK library's WordNet synonym sets.
7-
8-
Lexicon source is (C) 2016 National Research Council Canada (NRC) and this package is **for research purposes only**. Source: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm As per the terms of use of the NRC Emotion Lexicon, if you use the lexicon or any derivative from it, cite this paper: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.
9-
10-
NLTK data is (C) 2019, NLTK Project. Source: [NLTK] (https://www.nltk.org/). Reference: Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.
11-
12-
## Update
13-
* Finally got around to cleaning this up a bit. Updated PyPI package with current version. Thanks to all the contributors for cleaning up my terrible code!
14-
* Expanded NRC lexicon from approximately 10,000 words to 27,000 based on WordNet synonyms.
15-
* Minor bug fixes.
16-
* Contributor updated NTC library.
17-
18-
## Installation
19-
`pip install NRCLex`
20-
21-
## Affects
22-
Emotional affects measured include the following:
23-
24-
* fear
25-
* anger
26-
* anticipation
27-
* trust
28-
* surprise
29-
* positive
30-
* negative
31-
* sadness
32-
* disgust
33-
* joy
34-
35-
## Sample Usage
36-
37-
`from nrclex import NRCLex`<br><br>
38-
39-
40-
*#Instantiate NRCLex object, you can pass your own dictionary filename in json format.*<br>
41-
42-
`text_object = NRCLex(lexicon_file='nrc_en.json')`<br><br>
43-
44-
45-
*#You can pass your raw text to this method(for best results, 'text' should be unicode).*<br>
46-
47-
`text_object.load_raw_text(text: str)`<br><br>
48-
49-
50-
*#You can pass your already tokenized text as a list of tokens, if you want to use an already tokenized input.
51-
This usage assumes that the text is correctly tokenized and does not make use of TextBlob.*<br>
52-
53-
`text_object.load_token_list(list_of_tokens: list)`<br><br>
54-
55-
56-
*#Return words list.*<br>
57-
58-
`text_object.words`<br><br>
59-
60-
61-
*#Return sentences list.*<br>
62-
63-
`text_object.sentences`<br><br>
64-
65-
66-
*#Return affect list.*<br>
67-
68-
`text_object.affect_list`<br><br>
69-
70-
71-
*#Return affect dictionary.*<br>
72-
73-
`text_object.affect_dict`<br><br>
74-
75-
76-
*#Return raw emotional counts.*<br>
77-
78-
`text_object.raw_emotion_scores`<br><br>
79-
80-
81-
*#Return highest emotions.*<br>
82-
83-
`text_object.top_emotions`<br><br>
84-
85-
86-
*#Return affect frequencies.*<br>
87-
88-
`text_object.affect_frequencies`
1+
# NRCLex
2+
3+
(C) 2019 Mark M. Bailey, PhD
4+
5+
## About
6+
NRCLex measures emotional affect from text. Affect dictionary contains approximately 27,000 words and is based on the National Research Council Canada (NRC) affect lexicon and NLTK WordNet synonym sets.
7+
8+
Lexicon source is (C) 2016 National Research Council Canada (NRC) and this package is **for research purposes only**. Source: http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm As per the terms of use of the NRC Emotion Lexicon, if you use the lexicon or any derivative from it, cite this paper: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.
9+
10+
NLTK data is (C) 2019, NLTK Project. Source: [NLTK](https://www.nltk.org/). Reference: Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.
11+
12+
## Installation
13+
`pip install NRCLex`
14+
15+
## Affects
16+
Emotional affects measured include:
17+
18+
* fear
19+
* anger
20+
* anticipation
21+
* trust
22+
* surprise
23+
* positive
24+
* negative
25+
* sadness
26+
* disgust
27+
* joy
28+
29+
## Sample Usage
30+
31+
`from nrclex import NRCLex`
32+
33+
Instantiate NRCLex object. By default this loads the bundled lexicon packaged with the library:
34+
35+
`text_object = NRCLex()`
36+
37+
You can pass your raw text to this method (for best results, text should be unicode):
38+
39+
`text_object.load_raw_text(text: str)`
40+
41+
You can pass already tokenized text as a list of tokens. This usage does not require TextBlob tokenization:
42+
43+
`text_object.load_token_list(list_of_tokens: list)`
44+
45+
Return words list:
46+
47+
`text_object.words`
48+
49+
Return sentences list:
50+
51+
`text_object.sentences`
52+
53+
Return affect list:
54+
55+
`text_object.affect_list`
56+
57+
Return affect dictionary:
58+
59+
`text_object.affect_dict`
60+
61+
Return raw emotional counts:
62+
63+
`text_object.raw_emotion_scores`
64+
65+
Return highest emotions:
66+
67+
`text_object.top_emotions`
68+
69+
Return affect frequencies:
70+
71+
`text_object.affect_frequencies`

RELEASING.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Releasing NRCLex
2+
3+
## 1) Choose version bump
4+
5+
- Patch: bug fixes only, no compatibility changes.
6+
- Minor: backward-compatible features/packaging improvements.
7+
- Major: intentional breaking changes.
8+
9+
For this modernization, use at least a **minor** bump if you changed supported Python versions.
10+
11+
## 2) Update version
12+
13+
Update these files together:
14+
15+
1. `pyproject.toml``[project].version`
16+
2. `nrclex/__init__.py``__version__`
17+
18+
## 3) Validate locally
19+
20+
```bash
21+
python -m pip install --upgrade pip
22+
pip install -e . pytest build twine
23+
pytest -q -m "not integration"
24+
python -m build
25+
twine check dist/*
26+
```
27+
28+
Optional integration tests (requires corpora):
29+
30+
```bash
31+
python -m textblob.download_corpora
32+
pytest -q
33+
```
34+
35+
## 4) Verify artifacts
36+
37+
Ensure **both** files exist:
38+
39+
- `dist/NRCLex-<version>.tar.gz`
40+
- `dist/NRCLex-<version>-py3-none-any.whl`
41+
42+
## 5) Publish
43+
44+
```bash
45+
twine upload dist/*
46+
```
47+
48+
## 6) Don't forget
49+
50+
- Confirm `from nrclex import NRCLex` works from a clean environment.
51+
- Confirm `NRCLex()` loads bundled lexicon without manual file path.
52+
- Confirm dependency metadata does **not** list stdlib modules.
53+
- Confirm `Requires-Python` matches supported versions (>=3.9).
54+
- Tag release in git and add release notes.

__init__.py

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)