Skip to content

v2.2.0

Choose a tag to compare

@vmenger vmenger released this 28 Sep 09:47
· 354 commits to main since this release
3ccd61c

2.2.0 (2023-09-28)

Changed

  • tokenizer logic:
    • a token is now a sequence of alphanumeric characters, a single newline, or a single special character.
    • whitespaces are no longer considered tokens
  • moved token pattern logic to config, using a new TokenPatternAnnotator
  • moved context pattern logic to config, using a new ContextAnnotator
  • many updates to name detection logic
    • lookup list optimizations
    • added, removed and simplified patterns