Skip to content
Discussion options

You must be logged in to vote

Hi @AmitMY, the offsets come from the original WNDB-formatted data and are literally byte offsets into the data files. As such, they are very specific to the original language and version of the data. The NLTK and original OMW data reuse the English synsets for their structure and just add new words, but the WN-LMF XML data from the current OMW that Wn uses provides unique identifiers for each lexicon's elements (although the form of those identifiers may contain the offsets for historical reasons, the identifiers are not meant to be decomposed or interpreted).

If you are looking for a way to refer to a concept regardless of the language, you want the ILI (interlingual index). ILIs were c…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by goodmami
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #272 on July 13, 2025 18:49.