Skip to content

TypeError in address_masker when processing addresses with lok abbreviation#2

Open
wysockigrzegorz wants to merge 1 commit intoNASK-NLP:masterfrom
wysockigrzegorz:fix/address-masker-lok-crash
Open

TypeError in address_masker when processing addresses with lok abbreviation#2
wysockigrzegorz wants to merge 1 commit intoNASK-NLP:masterfrom
wysockigrzegorz:fix/address-masker-lok-crash

Conversation

@wysockigrzegorz
Copy link
Copy Markdown

Bug description

This PR fixes a crash in the address_masker component of priv_masker, caused by incorrect usage of the Python list .append() method.

Problematic line:

masked_tokens = masked_tokens.append(doc[end])

This causes masked_tokens to become None, since list.append() returns None. As a result, any later use of masked_tokens (like if token in masked_tokens) crashes with:

TypeError: argument of type 'NoneType' is not iterable

Reproduction

Minimal example (adapted from the official README):

ul. Juliusza Słowackiego 13 lok 3

This triggers a crash in search_address_nums_with_key_words().


Fix

The incorrect line was replaced with:

masked_tokens.append(doc[end])

This keeps masked_tokens as a valid list and prevents the crash.


Why this matters

This fix restores correct functionality to address matching involving apartment indicators like lok, lokal, etc., which are common in real-world Polish addresses.


Pozdrowienia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants