|
7 | 7 | <tr bgcolor="#7799ee" > |
8 | 8 | <td valign="bottom" > <br /> |
9 | 9 | <font color="#ffffff" face="helvetica, arial" > <br /><big ><big ><strong ><a href="AdvancedHTMLParser.html" ><font color="#ffffff" >AdvancedHTMLParser</font></a>.Parser</strong></big></big></font></td><td align="right" valign="bottom" ><font color="#ffffff" face="helvetica, arial" ><a href="AdvancedHTMLParser.html" >index</a></font></td></tr></table> |
10 | | - <p ><tt ># Copyright (c) 2015, 2016, 2017 Tim Savannah under LGPLv3. See LICENSE (https://gnu.org/licenses/lgpl-3.0.txt) for more information.<br /> |
| 10 | + <p ><tt ># Copyright (c) 2015, 2016, 2017, 2018 Tim Savannah under LGPLv3. See LICENSE (https://gnu.org/licenses/lgpl-3.0.txt) for more information.<br /> |
11 | 11 | #<br /> |
12 | 12 | # Parser implementation</tt></p> |
13 | 13 | <p > |
|
48 | 48 | <font color="#000000" face="helvetica, arial" ><a name="AdvancedHTMLParser" >class <strong >AdvancedHTMLParser</strong></a>(<a href="html.parser.html#HTMLParser" >html.parser.HTMLParser</a>)</font></td></tr> |
49 | 49 |
|
50 | 50 | <tr bgcolor="#ffc8d8" ><td rowspan="2" ><tt > </tt></td> |
51 | | -<td colspan="2" ><tt ><a href="#AdvancedHTMLParser" >AdvancedHTMLParser</a> - This class parses and allows searching of documents<br /> </tt></td></tr> |
| 51 | +<td colspan="2" ><tt ><a href="#AdvancedHTMLParser" >AdvancedHTMLParser</a>(filename=None, encoding='utf-8')<br /> |
| 52 | + <br /> |
| 53 | +<a href="#AdvancedHTMLParser" >AdvancedHTMLParser</a> - This class parses and allows searching of documents<br /> </tt></td></tr> |
52 | 54 | <tr ><td > </td> |
53 | 55 | <td width="100%" ><dl ><dt >Method resolution order:</dt> |
54 | 56 | <dd ><a href="AdvancedHTMLParser.Parser.html#AdvancedHTMLParser" >AdvancedHTMLParser</a></dd> |
|
205 | 207 | <br /> |
206 | 208 | @return - An AdvancedTag of the node that matched, or None if no match.</tt></dd></dl> |
207 | 209 |
|
208 | | -<dl ><dt ><a name="AdvancedHTMLParser-getFormattedHTML" ><strong >getFormattedHTML</strong></a>(self, indent=' ')</dt><dd ><tt >getFormattedHTML - Get formatted and xhtml of this document<br /> |
| 210 | +<dl ><dt ><a name="AdvancedHTMLParser-getFormattedHTML" ><strong >getFormattedHTML</strong></a>(self, indent=' ')</dt><dd ><tt >getFormattedHTML - Get formatted and xhtml of this document, replacing the original whitespace<br /> |
| 211 | + with a pretty-printed version<br /> |
209 | 212 | <br /> |
210 | 213 | @param indent - space/tab/newline of each level of indent, or integer for how many spaces per level<br /> |
211 | 214 | <br /> |
212 | | -@return - Formatted html as string</tt></dd></dl> |
| 215 | +@return - <str> Formatted html<br /> |
| 216 | + <br /> |
| 217 | +@see getHTML - Get HTML with original whitespace<br /> |
| 218 | + <br /> |
| 219 | +@see getMiniHTML - Get HTML with only functional whitespace remaining</tt></dd></dl> |
213 | 220 |
|
214 | | -<dl ><dt ><a name="AdvancedHTMLParser-getHTML" ><strong >getHTML</strong></a>(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree<br /> |
215 | | - @returns - String</tt></dd></dl> |
| 221 | +<dl ><dt ><a name="AdvancedHTMLParser-getHTML" ><strong >getHTML</strong></a>(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree.<br /> |
| 222 | + <br /> |
| 223 | + If parsed from a document, this will contain the original whitespacing.<br /> |
| 224 | + <br /> |
| 225 | + @returns - <str> of html<br /> |
| 226 | + <br /> |
| 227 | + @see getFormattedHTML<br /> |
| 228 | + <br /> |
| 229 | + @see getMiniHTML</tt></dd></dl> |
| 230 | + |
| 231 | +<dl ><dt ><a name="AdvancedHTMLParser-getMiniHTML" ><strong >getMiniHTML</strong></a>(self)</dt><dd ><tt >getMiniHTML - Gets the HTML representation of this document without any pretty formatting<br /> |
| 232 | + and disregarding original whitespace beyond the functional.<br /> |
| 233 | + <br /> |
| 234 | + @return <str> - HTML with only functional whitespace present</tt></dd></dl> |
216 | 235 |
|
217 | 236 | <dl ><dt ><a name="AdvancedHTMLParser-getRoot" ><strong >getRoot</strong></a>(self)</dt><dd ><tt >getRoot - returns the root Tag.<br /> |
218 | 237 | <br /> |
|
396 | 415 | <font color="#000000" face="helvetica, arial" ><a name="IndexedAdvancedHTMLParser" >class <strong >IndexedAdvancedHTMLParser</strong></a>(<a href="AdvancedHTMLParser.Parser.html#AdvancedHTMLParser" >AdvancedHTMLParser</a>)</font></td></tr> |
397 | 416 |
|
398 | 417 | <tr bgcolor="#ffc8d8" ><td rowspan="2" ><tt > </tt></td> |
399 | | -<td colspan="2" ><tt >An <a href="#AdvancedHTMLParser" >AdvancedHTMLParser</a> that indexes for much much faster searching. If you are doing searching/validation, this is your bet.<br /> |
| 418 | +<td colspan="2" ><tt ><a href="#IndexedAdvancedHTMLParser" >IndexedAdvancedHTMLParser</a>(filename=None, encoding='utf-8', indexIDs=True, indexNames=True, indexClassNames=True, indexTagNames=True)<br /> |
| 419 | + <br /> |
| 420 | +An <a href="#AdvancedHTMLParser" >AdvancedHTMLParser</a> that indexes for much much faster searching. If you are doing searching/validation, this is your bet.<br /> |
400 | 421 | If you are writing/modifying, you may use this, but be sure to call <a href="#IndexedAdvancedHTMLParser-reindex" >reindex</a>() after changes.<br /> </tt></td></tr> |
401 | 422 | <tr ><td > </td> |
402 | 423 | <td width="100%" ><dl ><dt >Method resolution order:</dt> |
|
499 | 520 | <br /> |
500 | 521 | @param state <dict> - The state</tt></dd></dl> |
501 | 522 |
|
502 | | -<dl ><dt ><a name="IndexedAdvancedHTMLParser-asHTML" ><strong >asHTML</strong></a> = getHTML(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree<br /> |
503 | | - @returns - String</tt></dd></dl> |
| 523 | +<dl ><dt ><a name="IndexedAdvancedHTMLParser-asHTML" ><strong >asHTML</strong></a> = getHTML(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree.<br /> |
| 524 | + <br /> |
| 525 | + If parsed from a document, this will contain the original whitespacing.<br /> |
| 526 | + <br /> |
| 527 | + @returns - <str> of html<br /> |
| 528 | + <br /> |
| 529 | + @see getFormattedHTML<br /> |
| 530 | + <br /> |
| 531 | + @see getMiniHTML</tt></dd></dl> |
504 | 532 |
|
505 | 533 | <dl ><dt ><a name="IndexedAdvancedHTMLParser-contains" ><strong >contains</strong></a>(self, em)</dt><dd ><tt >Checks if #em is found anywhere within this element tree<br /> |
506 | 534 | <br /> |
|
613 | 641 | <br /> |
614 | 642 | @return - An AdvancedTag of the node that matched, or None if no match.</tt></dd></dl> |
615 | 643 |
|
616 | | -<dl ><dt ><a name="IndexedAdvancedHTMLParser-getFormattedHTML" ><strong >getFormattedHTML</strong></a>(self, indent=' ')</dt><dd ><tt >getFormattedHTML - Get formatted and xhtml of this document<br /> |
| 644 | +<dl ><dt ><a name="IndexedAdvancedHTMLParser-getFormattedHTML" ><strong >getFormattedHTML</strong></a>(self, indent=' ')</dt><dd ><tt >getFormattedHTML - Get formatted and xhtml of this document, replacing the original whitespace<br /> |
| 645 | + with a pretty-printed version<br /> |
617 | 646 | <br /> |
618 | 647 | @param indent - space/tab/newline of each level of indent, or integer for how many spaces per level<br /> |
619 | 648 | <br /> |
620 | | -@return - Formatted html as string</tt></dd></dl> |
| 649 | +@return - <str> Formatted html<br /> |
| 650 | + <br /> |
| 651 | +@see getHTML - Get HTML with original whitespace<br /> |
| 652 | + <br /> |
| 653 | +@see getMiniHTML - Get HTML with only functional whitespace remaining</tt></dd></dl> |
621 | 654 |
|
622 | | -<dl ><dt ><a name="IndexedAdvancedHTMLParser-getHTML" ><strong >getHTML</strong></a>(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree<br /> |
623 | | - @returns - String</tt></dd></dl> |
| 655 | +<dl ><dt ><a name="IndexedAdvancedHTMLParser-getHTML" ><strong >getHTML</strong></a>(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree.<br /> |
| 656 | + <br /> |
| 657 | + If parsed from a document, this will contain the original whitespacing.<br /> |
| 658 | + <br /> |
| 659 | + @returns - <str> of html<br /> |
| 660 | + <br /> |
| 661 | + @see getFormattedHTML<br /> |
| 662 | + <br /> |
| 663 | + @see getMiniHTML</tt></dd></dl> |
| 664 | + |
| 665 | +<dl ><dt ><a name="IndexedAdvancedHTMLParser-getMiniHTML" ><strong >getMiniHTML</strong></a>(self)</dt><dd ><tt >getMiniHTML - Gets the HTML representation of this document without any pretty formatting<br /> |
| 666 | + and disregarding original whitespace beyond the functional.<br /> |
| 667 | + <br /> |
| 668 | + @return <str> - HTML with only functional whitespace present</tt></dd></dl> |
624 | 669 |
|
625 | 670 | <dl ><dt ><a name="IndexedAdvancedHTMLParser-getRoot" ><strong >getRoot</strong></a>(self)</dt><dd ><tt >getRoot - returns the root Tag.<br /> |
626 | 671 | <br /> |
|
660 | 705 | <br /> |
661 | 706 | @param html <str> - valid HTML</tt></dd></dl> |
662 | 707 |
|
663 | | -<dl ><dt ><a name="IndexedAdvancedHTMLParser-toHTML" ><strong >toHTML</strong></a> = getHTML(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree<br /> |
664 | | - @returns - String</tt></dd></dl> |
| 708 | +<dl ><dt ><a name="IndexedAdvancedHTMLParser-toHTML" ><strong >toHTML</strong></a> = getHTML(self)</dt><dd ><tt >getHTML - Get the full HTML as contained within this tree.<br /> |
| 709 | + <br /> |
| 710 | + If parsed from a document, this will contain the original whitespacing.<br /> |
| 711 | + <br /> |
| 712 | + @returns - <str> of html<br /> |
| 713 | + <br /> |
| 714 | + @see getFormattedHTML<br /> |
| 715 | + <br /> |
| 716 | + @see getMiniHTML</tt></dd></dl> |
665 | 717 |
|
666 | 718 | <dl ><dt ><a name="IndexedAdvancedHTMLParser-unknown_decl" ><strong >unknown_decl</strong></a>(self, decl)</dt><dd ><tt >Internal for parsing</tt></dd></dl> |
667 | 719 |
|
|
0 commit comments