Skip to content

Commit 75e6938

Browse files
committed
Update READMEs to improve things a little bit and to advertise the new getMiniHTML function
1 parent ce5768a commit 75e6938

File tree

2 files changed

+12
-8
lines changed

2 files changed

+12
-8
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,13 +76,15 @@ The parser then exposes many "standard" functions as you'd find on the web for a
7676

7777
getElementsCustomFilter - Provide a function/lambda that takes a tag argument, and returns True to "match" it. Returns all matched objects
7878

79-
getHTML - Returns string of HTML representing this DOM
80-
8179
getRootNodes - Get a list of nodes at root level (0)
8280

8381
getAllNodes - Get all the nodes contained within this document
8482

85-
getFormattedHTML - Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to two spaces)
83+
getHTML - Returns string of HTML representing this DOM
84+
85+
getFormattedHTML - Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to four spaces)
86+
87+
getMiniHTML - Returns a "mini" HTML representation which disregards all whitespace and indentation beyond the functional single-space
8688

8789

8890
The results of all of these getElement\* functions are TagCollection objects. These objects can be modified, and will be reflected in the parent DOM.

README.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -86,13 +86,15 @@ The parser then exposes many "standard" functions as you'd find on the web for a
8686

8787
getElementsCustomFilter \- Provide a function/lambda that takes a tag argument, and returns True to "match" it. Returns all matched objects
8888

89-
getHTML \- Returns string of HTML representing this DOM
90-
9189
getRootNodes \- Get a list of nodes at root level (0)
9290

9391
getAllNodes \- Get all the nodes contained within this document
9492

95-
getFormattedHTML \- Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to two spaces)
93+
getHTML \- Returns string of HTML representing this DOM
94+
95+
getFormattedHTML \- Returns a formatted string (using AdvancedHTMLFormatter; see below) of the HTML. Takes as argument an indent (defaults to four spaces)
96+
97+
getMiniHTML \- Returns a "mini" HTML representation which disregards all whitespace and indentation beyond the functional single\-space
9698

9799

98100
The results of all of these getElement\* functions are TagCollection objects. These objects can be modified, and will be reflected in the parent DOM.
@@ -434,7 +436,7 @@ Notes
434436

435437
* In general, for tag names and attribute names, you should use lowercase values. During parsing, the parser will lowercase attribute names (like NAME="Abc" becomes name="Abc"). During searching, however, for performance reasons, it is assumed you are passing in already-lowercased strings. If you can't trust the input to be lowercase, then it is your responsibility to call .lower() before calling .getElementsBy\*
436438

437-
* If you are using IndexedAdvancedHTMLParser to construct HTML and not search, I recommend either setting the index params to False in the constructor, or calling AdvancedHTMLParser.disableIndexing()
439+
* If you are using this to construct HTML and not search, I recommend either setting the index params to False in the constructor, or calling AdvancedHTMLParser.disableIndexing()
438440

439441
* There are additional functions and usages not documented here, check the file for more information.
440442

@@ -469,7 +471,7 @@ If you are still getting UnicodeDecodeError or UnicodeEncodeError, there are a f
469471

470472
* If the error happens when printing/writing to stdout ( default behaviour for apache / mod\_python is to open stdout with the ANSI/ASCII encoding ), ensure your streams are, in fact, set to utf\-8.
471473

472-
* Set the environment variable PYTHONIOENCODING to "utf\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\-8" to your httpd.conf in order to achieve this.
474+
\* Set the environment variable PYTHONIOENCODING to "utf\\\-8" before python is launched. In Apache, you can add the line "SetEnv PYTHONIOENCODING utf\\\-8" to your httpd.conf in order to achieve this.
473475

474476
* Ensure that the data you are passing to AdvancedHTMLParser has the correct encoding (matching the "encoding" parameter).
475477

0 commit comments

Comments
 (0)