Fix KeyError in SumBasicSummarizer by bysiber · Pull Request #234 · miso-belica/sumy

bysiber · 2026-02-21T23:35:40Z

Fixes #176.

The SumBasicSummarizer crashes with a KeyError when processing certain texts because the word processing pipeline is inconsistent between two methods.

_get_content_words_in_sentence processes words in this order: normalize → filter stop words → stem

But _get_all_content_words_in_doc (via _get_all_words_in_doc) does it differently: stem → filter stop words → normalize

This means a word can end up in the per-sentence word list but be missing from the document frequency table (or vice versa), which causes the KeyError when looking up the word frequency.

The fix aligns _get_all_content_words_in_doc to follow the same processing order as _get_content_words_in_sentence: normalize → filter stop words → stem. _get_all_words_in_doc now returns raw (unstemmed) words so the caller can apply the pipeline consistently.

All existing sum_basic tests pass.

The _get_all_content_words_in_doc method was processing words in a different order (stem -> filter -> normalize) compared to _get_content_words_in_sentence (normalize -> filter -> stem). This mismatch meant some words would appear in the per-sentence word lists but not in the document frequency table, causing a KeyError during summarization. Aligned both methods to use the same processing order: normalize -> filter stop words -> stem. Also fixed _get_all_words_in_doc to return raw words instead of pre-stemmed words, since stemming is now handled consistently in _get_all_content_words_in_doc. Fixes miso-belica#176

miso-belica

Hello, thank you for the fix. In order to accept it, please add a test reproducing the issue with the original code - red/green TDD approach. You are mentioning that there are 2 cases when this may happen so please add 2 tests - 1 for each.

Also, please add this into the CHANGELOG.md file.

Thank you for your help 🙂

miso-belica · 2026-02-22T11:37:20Z

commit_msg.txt

Please remove this file

miso-belica requested changes Feb 22, 2026

View reviewed changes

commit_msg.txt

Copy link

Owner

miso-belica Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file

miso-belica added the bug label Feb 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix KeyError in SumBasicSummarizer#234

Fix KeyError in SumBasicSummarizer#234
bysiber wants to merge 1 commit intomiso-belica:mainfrom
bysiber:fix-sumbasic-keyerror

bysiber commented Feb 21, 2026

Uh oh!

miso-belica left a comment

Uh oh!

miso-belica Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bysiber commented Feb 21, 2026

Uh oh!

miso-belica left a comment

Choose a reason for hiding this comment

Uh oh!

miso-belica Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants