Skip to content

Commit 0eec2fd

Browse files
committed
Add notes about unicode to README
1 parent edf587c commit 0eec2fd

File tree

2 files changed

+48
-0
lines changed

2 files changed

+48
-0
lines changed

README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,30 @@ AdvancedHTMLParser can be installed without dependencies (pass '\-\-no\-deps' to
388388
By default, https://github.com/kata198/QueryableList will be installed, which will enable support for those additional filter methods.
389389

390390

391+
Unicode
392+
-------
393+
394+
AdvancedHTMLParser generally has very good support for unicode, and defaults to "utf\-8" (can be altered by the "encoding" argument to the AdvancedHTMLParser.AdvancedHTMLParser when parsing.)
395+
396+
If you are still getting UnicodeDecodeError or UnicodeEncodeError, there are a few things you can try:
397+
398+
* If the error happens when printing/writing to stdout ( default behaviour for apache / mod\_python is to open stdout with the ANSI/ASCII encoding ), ensure your streams are, in fact, set to utf\-8.
399+
400+
* Set the environment variable PYTHONIOENCODING to "utf\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\-8" to achieve this.
401+
402+
* Ensure that the data you are passing to AdvancedHTMLParser has the correct encoding (matching the "encoding" parameter).
403+
404+
* Switch to python3 if at all possible \-\- python2 does have 'unicode' support and AdvancedHTMLParser uses it to the best of its ability, but python2 does still have some inherit flaws which may come up using standard library / output functions. You should ensure that these are set to use utf\-8 (as described above).
405+
406+
407+
AdvancedHTMLParser is tested against unicode ( even has a unit test ) which works in both python2 and python3 in the general case.
408+
409+
If you are having an issue (even on python2) and you've checked the above "common configuration/usage" errors and think there is still an issue, please open a bug report on https://github.com/kata198/AdvancedHTMLParser with a test case, python version, and traceback.
410+
411+
412+
The library itself is considered unicode-safe, and almost always it's an issue outside of this library, or has a simple workaround.
413+
414+
391415
Example Usage
392416
-------------
393417

README.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,30 @@ AdvancedHTMLParser can be installed without dependencies (pass '\-\-no\-deps' to
406406
By default, https://github.com/kata198/QueryableList will be installed, which will enable support for those additional filter methods.
407407

408408

409+
Unicode
410+
-------
411+
412+
AdvancedHTMLParser generally has very good support for unicode, and defaults to "utf\-8" (can be altered by the "encoding" argument to the AdvancedHTMLParser.AdvancedHTMLParser when parsing.)
413+
414+
If you are still getting UnicodeDecodeError or UnicodeEncodeError, there are a few things you can try:
415+
416+
* If the error happens when printing/writing to stdout ( default behaviour for apache / mod\_python is to open stdout with the ANSI/ASCII encoding ), ensure your streams are, in fact, set to utf\-8.
417+
418+
* Set the environment variable PYTHONIOENCODING to "utf\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\-8" to achieve this.
419+
420+
* Ensure that the data you are passing to AdvancedHTMLParser has the correct encoding (matching the "encoding" parameter).
421+
422+
* Switch to python3 if at all possible \-\- python2 does have 'unicode' support and AdvancedHTMLParser uses it to the best of its ability, but python2 does still have some inherit flaws which may come up using standard library / output functions. You should ensure that these are set to use utf\-8 (as described above).
423+
424+
425+
AdvancedHTMLParser is tested against unicode ( even has a unit test ) which works in both python2 and python3 in the general case.
426+
427+
If you are having an issue (even on python2) and you've checked the above "common configuration/usage" errors and think there is still an issue, please open a bug report on https://github.com/kata198/AdvancedHTMLParser with a test case, python version, and traceback.
428+
429+
430+
The library itself is considered unicode-safe, and almost always it's an issue outside of this library, or has a simple workaround.
431+
432+
409433
Example Usage
410434
-------------
411435

0 commit comments

Comments
 (0)