Skip to content

Commit 0442358

Browse files
authored
Catch LookupError in case of bad encoding string
I've seen cases where bad encoding strings will result in errors, catching LookupError should solve the problem by falling back onto `chardet` or `utf-8` Here's one case: ``` textPayload: "Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 189, in summary self._html(True) File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 132, in _html self.html = self._parse(self.input) File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 141, in _parse doc, self.encoding = build_doc(input) File "/opt/conda/lib/python3.7/site-packages/readability/htmls.py", line 17, in build_doc encoding = get_encoding(page) or 'utf-8' File "/opt/conda/lib/python3.7/site-packages/readability/encoding.py", line 46, in get_encoding page.decode(encoding) LookupError: unknown encoding: utf-8, ie=edge, chrome=1 ```
1 parent de20908 commit 0442358

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

readability/encoding.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def get_encoding(page):
4646
page.decode(encoding)
4747
# It worked!
4848
return encoding
49-
except UnicodeDecodeError:
49+
except (UnicodeDecodeError, LookupError):
5050
pass
5151

5252
# Fallback to chardet if declared encodings fail

0 commit comments

Comments
 (0)