You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,13 +76,15 @@ The parser then exposes many "standard" functions as you'd find on the web for a
76
76
77
77
getElementsCustomFilter - Provide a function/lambda that takes a tag argument, and returns True to "match" it. Returns all matched objects
78
78
79
-
getHTML - Returns string of HTML representing this DOM
80
-
81
79
getRootNodes - Get a list of nodes at root level (0)
82
80
83
81
getAllNodes - Get all the nodes contained within this document
84
82
85
-
getFormattedHTML - Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to two spaces)
83
+
getHTML - Returns string of HTML representing this DOM
84
+
85
+
getFormattedHTML - Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to four spaces)
86
+
87
+
getMiniHTML - Returns a "mini" HTML representation which disregards all whitespace and indentation beyond the functional single-space
86
88
87
89
88
90
The results of all of these getElement\* functions are TagCollection objects. These objects can be modified, and will be reflected in the parent DOM.
Copy file name to clipboardExpand all lines: README.rst
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,13 +86,15 @@ The parser then exposes many "standard" functions as you'd find on the web for a
86
86
87
87
getElementsCustomFilter \- Provide a function/lambda that takes a tag argument, and returns True to "match" it. Returns all matched objects
88
88
89
-
getHTML \- Returns string of HTML representing this DOM
90
-
91
89
getRootNodes \- Get a list of nodes at root level (0)
92
90
93
91
getAllNodes \- Get all the nodes contained within this document
94
92
95
-
getFormattedHTML \- Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to two spaces)
93
+
getHTML \- Returns string of HTML representing this DOM
94
+
95
+
getFormattedHTML \- Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to four spaces)
96
+
97
+
getMiniHTML \- Returns a "mini" HTML representation which disregards all whitespace and indentation beyond the functional single\-space
96
98
97
99
98
100
The results of all of these getElement\* functions are TagCollection objects. These objects can be modified, and will be reflected in the parent DOM.
@@ -434,7 +436,7 @@ Notes
434
436
435
437
* In general, for tag names and attribute names, you should use lowercase values. During parsing, the parser will lowercase attribute names (like NAME="Abc" becomes name="Abc"). During searching, however, for performance reasons, it is assumed you are passing in already-lowercased strings. If you can't trust the input to be lowercase, then it is your responsibility to call .lower() before calling .getElementsBy\*
436
438
437
-
* If you are using IndexedAdvancedHTMLParser to construct HTML and not search, I recommend either setting the index params to False in the constructor, or calling AdvancedHTMLParser.disableIndexing()
439
+
* If you are using this to construct HTML and not search, I recommend either setting the index params to False in the constructor, or calling AdvancedHTMLParser.disableIndexing()
438
440
439
441
* There are additional functions and usages not documented here, check the file for more information.
440
442
@@ -469,7 +471,7 @@ If you are still getting UnicodeDecodeError or UnicodeEncodeError, there are a f
469
471
470
472
* If the error happens when printing/writing to stdout ( default behaviour for apache / mod\_python is to open stdout with the ANSI/ASCII encoding ), ensure your streams are, in fact, set to utf\-8.
471
473
472
-
* Set the environment variable PYTHONIOENCODING to "utf\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\-8" to your httpd.conf in order to achieve this.
474
+
\* Set the environment variable PYTHONIOENCODING to "utf\\\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\\\-8" to your httpd.conf in order to achieve this.
473
475
474
476
* Ensure that the data you are passing to AdvancedHTMLParser has the correct encoding (matching the "encoding" parameter).
0 commit comments