Releases · kata198/AdvancedHTMLParser

04 Jun 07:13

kata198

7.2.0

8849721

7.2.0 - Slushy Wushy Wuzza Dwink

7.2.0 Jun 4 2017

Add ".forms" to AdvancedHTMLParser to emulate "document.forms" - returns all
"form" elements in a tag collection
Update removeText to only remove the FIRST occurace of text (inline
with other javascript DOM functions). For old behaviour, see
removeTextAll which will remove all occurances of the text.

^^^ ^^^ ^^^ ^^^^ ^^^^ ^^
MAYBE BACKWARDS INCOMPATIBLE - depending on usage

Add removeTextAll function to remove all occuracnes of text from text
nodes on an element.
Change removeText to return the old block (text in that node prior to
replace). removeTextAll returns a list of all old blocks.
Add addBlock / addBlocks functions which take generic blocks (may be a
str, may be an AdvancedTag ). Returns the added block.
Add removeBlock / removeBlocks functions which take generic blocks
(may be str, may be AdvancedTag). Returns the removed tag or the block
of text prior to remove. None on removeBlock if none found, or None in
the corrosponding element in the list return of removeBlocks
Add removeChildren function as a helper to remove multiple children.
Returns the children removed in a list, with a "None" if that child
was not present.
Add childBlocks method, to return both text nodes and tag nodes. This matches
what childNodes does in JS DOM (childNodes in AdvancedHTMLParser only returns
tags. Probably will be changed in a future version, such that .children
returns a TagCollection and childNodes returns all blocks.)
Add isTextNode and isTagNode functions, to test if a block is a text node
(str) or a tag node (AdvancedTag)
add "getChildBlocks" method which returns child blocks (same as childBlocks
property, but not a property)
Update "insertAfter" and "insertBefore" methods:
1. Now support blocks (text, or node)
2. Now always return the child, not just when insertBefore/insertAfter was
  NULL
3. Remove the actual insertion outside of the try/catch, as that error
  should be raised (and should NEVER happen)
4. Cleanup documentation

Assets 2

21 May 06:57

kata198

7.1.0

de4316a

7.1.0 - Slurpee Coming Soon

7.1.0 May 21 2017

Add createElement function on AdvancedHTMLParser, to work like
document.createElement. Creates an element with the given tag name.
Add createElementFromHTML function to parser which returns an AdvancedTag from given HTML
Add createElementsFromHTML function to parser which supports and returns a list of parsed
AdvancedTags (one or more).
Add createBlocksFromHTML function to parser which parses HTML and returns a
list of blocks (either AdvancedTag objects, or text nodes (str).
Add appendInnerHTML function to AdvancedTag which works like in javascript
tag.innerHTML += 'someOtherHTML'
and will parse and append any tags and/or text nodes
Significant improvements in performance on creating tags ( On average, 125%
reduction in time to create an AdvancedTag. ) Use is also improved.
Add "body" and "head" properties to Parser.AdvancedHTMLParser - to act same
as document.body and document.head
Add method to both Parser and AdvancedTag, "getFirstElementCustomFilter",
which will apply a lambda/function on each tag, starting with first child and
all children, then second child and all children, etc.

This is used for finding things like "body" and "head" without needing to walk
through the whole document. It's also desgined to find them the quickest, as
they are very likely to be early and high-level objects in the tree.

7.0.2 Apr 28 2017

Fix two typos which would result in exceptions
Add "href" as a standard property name for anchors (so em.href = 'abc' sets
the href attribute)

7.0.0 Apr 6 2017

Add "filter"-style functions (think ORM-style filtering, like
name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
- These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
  filter/filterAnd and filterOr) to work on all nodes in the parser
- These have been added to AdvancedTag (as filter/filterAnd and filterOr)
  which work on the tag itself and all children (and their children, and so
  on)
- These have been added to TagCollection (as filter/filterAnd and filterOr)
  that work directly on the elements contained only
- Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
  directly on the containing elements, and all children of those elements (and
  their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)
Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
(previously you had to use setAttribute and getAttribute for everything)
The names used here match what are used in HTML, and include the javascript events
Fix where "className" could get out of sync with "classList"/"classNames"
No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
className maps to class on object-access
Support binary-style attribute set/access, (like for "hidden" property, or "checked")
Support attributes conditional on tag name, like "checked" on an input
Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError
Implement "cloneNode" function from JS DOM
Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
and implemented add and subtract to work on copies
Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag
Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser
Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained
Implements "hasChild" method to see if an element is a direct child of another element
Implement "remove" method on an AdvancedTag, to remove it from the parent.
Some other minor DOM methods, (childElementCount)
Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map
Add ownerDocument to Tags which point to the document (parser), if associated with one
Added some functions for accessing the whole of uids
Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.
Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
uses for "attributes", created before getAttribute/setAttribute were standardized.
This is always available as .attributesDOM , and the dict impl always available as .attributesDict

By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.
Some minor cleanups, doc updates, test updates, etc

6.8.0

Add "getAllChildNodes" to tags, which return all the children (and all their
children, on and so forth) of the given tag
Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
root nodes, all children of them, and all children all the way to the end
Add "getAllNodes" to TagCollection, which returns those nodes contained
within, and all of their children (on and so forth)
Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
either single values or list of values (for ANY aka or), and some specials
( __contains and __icontains suffixes on keys for "value contains" or
"case-insensitive value contains") This method is only available in one place.
7.0.0 will have a full filter implementation on the parser, tags, and tag
collections, but will require QueryableList to be installed. This will be
optional, and this method will remain as an incomplete version.

Assets 2

28 Apr 21:46

kata198

7.0.2

4ddd68d

7.0.2 - Fuzzy Wuzzy Wuzza Weasel

7.0.2 Apr 28 2017

Fix two typos which would result in exceptions
Add "href" as a standard property name for anchors (so em.href = 'abc' sets
the href attribute)

7.0.0 Apr 6 2017

Add "filter"-style functions (think ORM-style filtering, like
name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
- These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
  filter/filterAnd and filterOr) to work on all nodes in the parser
- These have been added to AdvancedTag (as filter/filterAnd and filterOr)
  which work on the tag itself and all children (and their children, and so
  on)
- These have been added to TagCollection (as filter/filterAnd and filterOr)
  that work directly on the elements contained only
- Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
  directly on the containing elements, and all children of those elements (and
  their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)
Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
(previously you had to use setAttribute and getAttribute for everything)
The names used here match what are used in HTML, and include the javascript events
Fix where "className" could get out of sync with "classList"/"classNames"
No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
className maps to class on object-access
Support binary-style attribute set/access, (like for "hidden" property, or "checked")
Support attributes conditional on tag name, like "checked" on an input
Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError
Implement "cloneNode" function from JS DOM
Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
and implemented add and subtract to work on copies
Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag
Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser
Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained
Implements "hasChild" method to see if an element is a direct child of another element
Implement "remove" method on an AdvancedTag, to remove it from the parent.
Some other minor DOM methods, (childElementCount)
Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map
Add ownerDocument to Tags which point to the document (parser), if associated with one
Added some functions for accessing the whole of uids
Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.
Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
uses for "attributes", created before getAttribute/setAttribute were standardized.
This is always available as .attributesDOM , and the dict impl always available as .attributesDict

By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.
Some minor cleanups, doc updates, test updates, etc

6.8.0

Add "getAllChildNodes" to tags, which return all the children (and all their
children, on and so forth) of the given tag
Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
root nodes, all children of them, and all children all the way to the end
Add "getAllNodes" to TagCollection, which returns those nodes contained
within, and all of their children (on and so forth)
Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
either single values or list of values (for ANY aka or), and some specials
( __contains and __icontains suffixes on keys for "value contains" or
"case-insensitive value contains") This method is only available in one place.
7.0.0 will have a full filter implementation on the parser, tags, and tag
collections, but will require QueryableList to be installed. This will be
optional, and this method will remain as an incomplete version.

Assets 2

06 Apr 18:29

kata198

7.0.0

1d6b988

7.0.0 - Fuzzy Weasel

7.0.0 Apr 6 2017

Add "filter"-style functions (think ORM-style filtering, like
name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
- These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
  filter/filterAnd and filterOr) to work on all nodes in the parser
- These have been added to AdvancedTag (as filter/filterAnd and filterOr)
  which work on the tag itself and all children (and their children, and so
  on)
- These have been added to TagCollection (as filter/filterAnd and filterOr)
  that work directly on the elements contained only
- Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
  directly on the containing elements, and all children of those elements (and
  their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)
Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
(previously you had to use setAttribute and getAttribute for everything)
The names used here match what are used in HTML, and include the javascript events
Fix where "className" could get out of sync with "classList"/"classNames"
No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
className maps to class on object-access
Support binary-style attribute set/access, (like for "hidden" property, or "checked")
Support attributes conditional on tag name, like "checked" on an input
Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError
Implement "cloneNode" function from JS DOM
Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
and implemented add and subtract to work on copies
Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag
Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser
Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained
Implements "hasChild" method to see if an element is a direct child of another element
Implement "remove" method on an AdvancedTag, to remove it from the parent.
Some other minor DOM methods, (childElementCount)
Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map
Add ownerDocument to Tags which point to the document (parser), if associated with one
Added some functions for accessing the whole of uids
Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.
Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
uses for "attributes", created before getAttribute/setAttribute were standardized.
This is always available as .attributesDOM , and the dict impl always available as .attributesDict

By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.
Some minor cleanups, doc updates, test updates, etc

6.8.0

Add "getAllChildNodes" to tags, which return all the children (and all their
children, on and so forth) of the given tag
Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
root nodes, all children of them, and all children all the way to the end
Add "getAllNodes" to TagCollection, which returns those nodes contained
within, and all of their children (on and so forth)
Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
either single values or list of values (for ANY aka or), and some specials
( __contains and __icontains suffixes on keys for "value contains" or
"case-insensitive value contains") This method is only available in one place.
7.0.0 will have a full filter implementation on the parser, tags, and tag
collections, but will require QueryableList to be installed. This will be
optional, and this method will remain as an incomplete version.

Assets 2

14 Mar 21:35

kata198

6.7.0

65365bd

6.7.0 - Needs no Cool Name

6.7.0 Mar 14 2017

Fix camel-case vs dash names when using style attributes (like so em.style.paddingTop translates to 'padding-top')
Implement repr on AdvancedTag
Fix repr on StyleAttribute to include the class name
Make style attributes compare (eq and ne) regardless of order
Allow StyleAttribute objects to be created from other StyleAttribute objects
Implement eq and ne on AdvancedTag, these do identity
comparison (same tag equals itself ONLY).
Implement copy and deepcopy methods on StyleAttribute and
AdvancedTag so that tags can be copied.
Add getAttributesList and getAttributesDict on an AdvancedTag to make a copy of a list
of values (like for attrList on AdvancedTag constructor) or a copy of a
dict of values.
Implement an isTagEqual method on AdvancedTag which compares the tag
name and attributes to another tag (for testing between < and > are the
same), non-identity comparison.
Add tests for all changes

Assets 2

27 Oct 20:58

kata198

6.6.4

ed0e98f

6.6.4 - Jumping Tacos

6.6.4 Oct 27 2016
Fix regression where "AdvancedTag.getAttribute" method would not accept a default
(second param).
Fix calling ".value" on an AdvancedTag to get the "value" attribute (was
broken by previous regression)
6.6.3 Oct 03 2016
Fix no-value attributes not appearing in html output (like "checked" on an input). Was in attributes, but not in html output.
6.6.2 Jul 27 2016
Python's HTMLParser on python3 only automatically converts charrefs
(backwards incompatible...) -- so make it stop doing that. This allows things
like and < to not be converted to ' ' and '<' on python3. Added
tests as well.
Cleanup imports and add comments to test cases.
Add fixes made to AdvancedHTMLParser in 6.6.0, relating to text outside root
nodes into AdvancedHTMLFormatter.
6.6.0 Jul 25 2016
In a multiple root node scenario, make sure getHTML returns text that falls
between the root nodes.
Retain text, comments, etc that occur before and after the root node
Update runTests.py to be latest from GoodTests -- allows providing arguments
to GoodTests.py (by passing them to runTests.py) and removes the need for the
symlink in the "tests" directory (which duplicates source in the dist)

Assets 2

03 Oct 22:50

kata198

6.6.3

fd77384

6.6.3

6.6.3 Oct 03 2016
Fix no-value attributes not appearing in html output (like "checked" on an input). Was in attributes, but not in html output.

Assets 2

27 Jul 04:43

kata198

6.6.2

8f98b5b

6.6.2 - Chili Cheese Parsing Pretzel

6.6.2 Jul 27 2016
Python's HTMLParser on python3 only automatically converts charrefs
(backwards incompatible...) -- so make it stop doing that. This allows things
like and < to not be converted to ' ' and '<' on python3. Added
tests as well.
Cleanup imports and add comments to test cases.
Add fixes made to AdvancedHTMLParser in 6.6.0, relating to text outside root
nodes into AdvancedHTMLFormatter.
6.6.0 Jul 25 2016
In a multiple root node scenario, make sure getHTML returns text that falls
between the root nodes.
Retain text, comments, etc that occur before and after the root node
Update runTests.py to be latest from GoodTests -- allows providing arguments
to GoodTests.py (by passing them to runTests.py) and removes the need for the
symlink in the "tests" directory (which duplicates source in the dist)

Assets 2

25 Jul 16:43

kata198

6.6.0

a8b38e2

6.6.0

6.6.0 Jul 25 2016
In a multiple root node scenario, make sure getHTML returns text that falls
between the root nodes.
Retain text, comments, etc that occur before and after the root node
Update runTests.py to be latest from GoodTests -- allows providing arguments
to GoodTests.py (by passing them to runTests.py) and removes the need for the
symlink in the "tests" directory (which duplicates source in the dist)

Assets 2

25 Mar 18:36

kata198

6.5.1

b1cec97

6.5.1 - Mega Salty Pretzel Cannon

6.5.1 Mar 23 2016
Merge in patch by "Tai Kedzierski" which fixes a typo in getElementsByAttr. Thanks!
Fix missing files in MANIFEST.in

Assets 2

Releases: kata198/AdvancedHTMLParser

7.2.0 - Slushy Wushy Wuzza Dwink

Uh oh!

7.1.0 - Slurpee Coming Soon

Uh oh!

7.0.2 - Fuzzy Wuzzy Wuzza Weasel

Uh oh!

7.0.0 - Fuzzy Weasel

Uh oh!

6.7.0 - Needs no Cool Name

Uh oh!

6.6.4 - Jumping Tacos

Uh oh!

6.6.3

Uh oh!

6.6.2 - Chili Cheese Parsing Pretzel

Uh oh!

6.6.0

Uh oh!

6.5.1 - Mega Salty Pretzel Cannon

Uh oh!