The source map sometimes provides multiple language labels for the same polygon. This is handled in this dataset by assigning the polygon to all languages appearing as label - even though the distribution of labels seems to suggest that only (unspecific) parts of the polygon are concerned.