Commit 0538ccb
Update Phupha integration to filter in spell checker
- Changed to use full Phupha dataset (62,264 words) in corpus file
- Added filtering logic in pythainlp/spell/pn.py to filter by thai_orst_words
- This allows the full Phupha dataset to be available for other uses
- Updated tests to verify filtering works correctly
- Spell checker now filters 38,160 ORST words from full Phupha dataset
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>1 parent 10b566d commit 0538ccb
File tree
5 files changed
+24151
-13
lines changed- pythainlp
- corpus
- spell
- tests/core
5 files changed
+24151
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | | - | |
9 | 7 | | |
10 | 8 | | |
11 | 9 | | |
| |||
30 | 28 | | |
31 | 29 | | |
32 | 30 | | |
33 | | - | |
| 31 | + | |
34 | 32 | | |
35 | | - | |
36 | | - | |
37 | | - | |
| 33 | + | |
38 | 34 | | |
39 | 35 | | |
40 | 36 | | |
| |||
63 | 59 | | |
64 | 60 | | |
65 | 61 | | |
66 | | - | |
| 62 | + | |
67 | 63 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 64 | + | |
71 | 65 | | |
72 | 66 | | |
73 | 67 | | |
| |||
0 commit comments