Skip to content

Commit b614188

Browse files
committed
Add paragraph about the three problematic versions
1 parent b90ee66 commit b614188

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

nltk/test/wordnet.doctest

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -824,7 +824,15 @@ Normally, only small edits are necessary for NLTK to load any
824824
Wordnet in the original Princeton WordNet wndb format. This could
825825
for ex. be a Princeton WordNet from the 1.x or 2.x series, which
826826
were never included in NLTK, or any Open English Wordnet version.
827-
However, some older versions have problems that require more effort.
827+
This process has been tested and works with all PWN versions since
828+
WN 1.5SC (from 1995), which was the first version to use sense keys.
829+
830+
However, three of these older versions have problems that require
831+
more effort. Two versions (1.5SC and 2.1) miss a copy of the
832+
'lexnames' file, which has been the same for all modern PWN releases,
833+
and needs to be copied manually from any other version.
834+
PWN v. 2.0 is the most difficult to deal with, since some pointer_counts
835+
in the index.POS files are off-by-one.
828836

829837
Let's illustrate the process with Edition 2023 of the Open English
830838
Wordnet, since nltk_data does not include it.

0 commit comments

Comments
 (0)