Skip to content

Releases: kata198/AdvancedHTMLParser

7.2.0 - Slushy Wushy Wuzza Dwink

04 Jun 07:13

Choose a tag to compare

  • 7.2.0 Jun 4 2017
  • Add ".forms" to AdvancedHTMLParser to emulate "document.forms" - returns all
    "form" elements in a tag collection

  • Update removeText to only remove the FIRST occurace of text (inline
    with other javascript DOM functions). For old behaviour, see
    removeTextAll which will remove all occurances of the text.

^^^ ^^^ ^^^ ^^^^ ^^^^ ^^
MAYBE BACKWARDS INCOMPATIBLE - depending on usage

  • Add removeTextAll function to remove all occuracnes of text from text
    nodes on an element.

  • Change removeText to return the old block (text in that node prior to
    replace). removeTextAll returns a list of all old blocks.

  • Add addBlock / addBlocks functions which take generic blocks (may be a
    str, may be an AdvancedTag ). Returns the added block.

  • Add removeBlock / removeBlocks functions which take generic blocks
    (may be str, may be AdvancedTag). Returns the removed tag or the block
    of text prior to remove. None on removeBlock if none found, or None in
    the corrosponding element in the list return of removeBlocks

  • Add removeChildren function as a helper to remove multiple children.
    Returns the children removed in a list, with a "None" if that child
    was not present.

  • Add childBlocks method, to return both text nodes and tag nodes. This matches
    what childNodes does in JS DOM (childNodes in AdvancedHTMLParser only returns
    tags. Probably will be changed in a future version, such that .children
    returns a TagCollection and childNodes returns all blocks.)

  • Add isTextNode and isTagNode functions, to test if a block is a text node
    (str) or a tag node (AdvancedTag)

  • add "getChildBlocks" method which returns child blocks (same as childBlocks
    property, but not a property)

  • Update "insertAfter" and "insertBefore" methods:

    1. Now support blocks (text, or node)
    2. Now always return the child, not just when insertBefore/insertAfter was
      NULL
    3. Remove the actual insertion outside of the try/catch, as that error
      should be raised (and should NEVER happen)
    4. Cleanup documentation

7.1.0 - Slurpee Coming Soon

21 May 06:57

Choose a tag to compare

  • 7.1.0 May 21 2017
  • Add createElement function on AdvancedHTMLParser, to work like
    document.createElement. Creates an element with the given tag name.

  • Add createElementFromHTML function to parser which returns an AdvancedTag from given HTML

  • Add createElementsFromHTML function to parser which supports and returns a list of parsed
    AdvancedTags (one or more).

  • Add createBlocksFromHTML function to parser which parses HTML and returns a
    list of blocks (either AdvancedTag objects, or text nodes (str).

  • Add appendInnerHTML function to AdvancedTag which works like in javascript
    tag.innerHTML += 'someOtherHTML'
    and will parse and append any tags and/or text nodes

  • Significant improvements in performance on creating tags ( On average, 125%
    reduction in time to create an AdvancedTag. ) Use is also improved.

  • Add "body" and "head" properties to Parser.AdvancedHTMLParser - to act same
    as document.body and document.head

  • Add method to both Parser and AdvancedTag, "getFirstElementCustomFilter",
    which will apply a lambda/function on each tag, starting with first child and
    all children, then second child and all children, etc.

This is used for finding things like "body" and "head" without needing to walk
through the whole document. It's also desgined to find them the quickest, as
they are very likely to be early and high-level objects in the tree.

  • 7.0.2 Apr 28 2017
  • Fix two typos which would result in exceptions
  • Add "href" as a standard property name for anchors (so em.href = 'abc' sets
    the href attribute)
  • 7.0.0 Apr 6 2017
  • Add "filter"-style functions (think ORM-style filtering, like
    name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
    • These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
      filter/filterAnd and filterOr) to work on all nodes in the parser
    • These have been added to AdvancedTag (as filter/filterAnd and filterOr)
      which work on the tag itself and all children (and their children, and so
      on)
    • These have been added to TagCollection (as filter/filterAnd and filterOr)
      that work directly on the elements contained only
    • Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
      directly on the containing elements, and all children of those elements (and
      their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

  • Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)

  • Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
    You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
    (previously you had to use setAttribute and getAttribute for everything)
    The names used here match what are used in HTML, and include the javascript events

  • Fix where "className" could get out of sync with "classList"/"classNames"

  • No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
    className maps to class on object-access

  • Support binary-style attribute set/access, (like for "hidden" property, or "checked")

  • Support attributes conditional on tag name, like "checked" on an input

  • Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError

  • Implement "cloneNode" function from JS DOM

  • Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
    and implemented add and subtract to work on copies

  • Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag

  • Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser

  • Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained

  • Implements "hasChild" method to see if an element is a direct child of another element

  • Implement "remove" method on an AdvancedTag, to remove it from the parent.

  • Some other minor DOM methods, (childElementCount)

  • Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map

  • Add ownerDocument to Tags which point to the document (parser), if associated with one

  • Added some functions for accessing the whole of uids

  • Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.

  • Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
    uses for "attributes", created before getAttribute/setAttribute were standardized.
    This is always available as .attributesDOM , and the dict impl always available as .attributesDict

    By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
    There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.

  • Some minor cleanups, doc updates, test updates, etc

  • 6.8.0
  • Add "getAllChildNodes" to tags, which return all the children (and all their
    children, on and so forth) of the given tag

  • Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
    root nodes, all children of them, and all children all the way to the end

  • Add "getAllNodes" to TagCollection, which returns those nodes contained
    within, and all of their children (on and so forth)

  • Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
    either single values or list of values (for ANY aka or), and some specials
    ( __contains and __icontains suffixes on keys for "value contains" or
    "case-insensitive value contains") This method is only available in one place.
    7.0.0 will have a full filter implementation on the parser, tags, and tag
    collections, but will require QueryableList to be installed. This will be
    optional, and this method will remain as an incomplete version.

7.0.2 - Fuzzy Wuzzy Wuzza Weasel

28 Apr 21:46

Choose a tag to compare

  • 7.0.2 Apr 28 2017
  • Fix two typos which would result in exceptions
  • Add "href" as a standard property name for anchors (so em.href = 'abc' sets
    the href attribute)
  • 7.0.0 Apr 6 2017
  • Add "filter"-style functions (think ORM-style filtering, like
    name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
    • These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
      filter/filterAnd and filterOr) to work on all nodes in the parser
    • These have been added to AdvancedTag (as filter/filterAnd and filterOr)
      which work on the tag itself and all children (and their children, and so
      on)
    • These have been added to TagCollection (as filter/filterAnd and filterOr)
      that work directly on the elements contained only
    • Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
      directly on the containing elements, and all children of those elements (and
      their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

  • Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)

  • Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
    You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
    (previously you had to use setAttribute and getAttribute for everything)
    The names used here match what are used in HTML, and include the javascript events

  • Fix where "className" could get out of sync with "classList"/"classNames"

  • No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
    className maps to class on object-access

  • Support binary-style attribute set/access, (like for "hidden" property, or "checked")

  • Support attributes conditional on tag name, like "checked" on an input

  • Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError

  • Implement "cloneNode" function from JS DOM

  • Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
    and implemented add and subtract to work on copies

  • Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag

  • Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser

  • Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained

  • Implements "hasChild" method to see if an element is a direct child of another element

  • Implement "remove" method on an AdvancedTag, to remove it from the parent.

  • Some other minor DOM methods, (childElementCount)

  • Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map

  • Add ownerDocument to Tags which point to the document (parser), if associated with one

  • Added some functions for accessing the whole of uids

  • Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.

  • Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
    uses for "attributes", created before getAttribute/setAttribute were standardized.
    This is always available as .attributesDOM , and the dict impl always available as .attributesDict

    By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
    There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.

  • Some minor cleanups, doc updates, test updates, etc

  • 6.8.0
  • Add "getAllChildNodes" to tags, which return all the children (and all their
    children, on and so forth) of the given tag

  • Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
    root nodes, all children of them, and all children all the way to the end

  • Add "getAllNodes" to TagCollection, which returns those nodes contained
    within, and all of their children (on and so forth)

  • Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
    either single values or list of values (for ANY aka or), and some specials
    ( __contains and __icontains suffixes on keys for "value contains" or
    "case-insensitive value contains") This method is only available in one place.
    7.0.0 will have a full filter implementation on the parser, tags, and tag
    collections, but will require QueryableList to be installed. This will be
    optional, and this method will remain as an incomplete version.

7.0.0 - Fuzzy Weasel

06 Apr 18:29

Choose a tag to compare

  • 7.0.0 Apr 6 2017
  • Add "filter"-style functions (think ORM-style filtering, like
    name__contains='abc' or name='abc' or name__ne='something'). Supports all filter operations provided by QueryableList
    • These have been added to AdvancedHTMLParser.AdvancedHTMLParser (as
      filter/filterAnd and filterOr) to work on all nodes in the parser
    • These have been added to AdvancedTag (as filter/filterAnd and filterOr)
      which work on the tag itself and all children (and their children, and so
      on)
    • These have been added to TagCollection (as filter/filterAnd and filterOr)
      that work directly on the elements contained only
    • Also, TagCollection has filterAll/filterAllAnd and filterAllOr that work
      directly on the containing elements, and all children of those elements (and
      their children)

This adds QueryableList as a dependency, but setup.py can be passed "--no-deps" to skip that installation (and the filter methods will be unavailable)

  • Add "find" function on AdvancedHTMLParser, which supports a very very small subset of QueryableList (only equals, contains, and icontains) and can be used for "similar" functionality but without the QueryableList dependency, and only usable from the document level (AdvancedHTMLParser.AdvancedHTMLParser)

  • Support javascript-style assignment and access on a handful of tags (The older ones, name, id, align, etc).
    You can now do things like: myTag.name = 'hello' to set the name, and myTag.name to access it
    (previously you had to use setAttribute and getAttribute for everything)
    The names used here match what are used in HTML, and include the javascript events

  • Fix where "className" could get out of sync with "classList"/"classNames"

  • No longer treat "classname" and "class" as the same attribute, they are in fact distinct on the attribute dict, but
    className maps to class on object-access

  • Support binary-style attribute set/access, (like for "hidden" property, or "checked")

  • Support attributes conditional on tag name, like "checked" on an input

  • Change so accessing an attribute on an AdvancedTag which is not set returns None (undefined/null), instead of raising an AttributeError

  • Implement "cloneNode" function from JS DOM

  • Fix TagCollection add and sub were working on the inline element. Moved these to iadd and isub (for += and -=)
    and implemented add and subtract to work on copies

  • Add "isEqualNode" JS DOM method as equivilant to the '==' operator on AdvancedTag

  • Add "contains" JS DOM method to both AdvancedTag, TagCollection, and AdvancedHTMLParser

  • Implemented "in" operator on Parser to check if an element ( or uuid if passed) is contained

  • Implements "hasChild" method to see if an element is a direct child of another element

  • Implement "remove" method on an AdvancedTag, to remove it from the parent.

  • Some other minor DOM methods, (childElementCount)

  • Rename on AdvancedTag "attributes" to "_attributes" in preparation of implementing DOM-style named node map

  • Add ownerDocument to Tags which point to the document (parser), if associated with one

  • Added some functions for accessing the whole of uids

  • Proper quote-escaping within attribute values. " isn't understood across the board, but " is, so switch from former to latter.

  • Add DOM-style "attributes" to every AdvancedTag. This follows the horrible antiquated interface that DOM
    uses for "attributes", created before getAttribute/setAttribute were standardized.
    This is always available as .attributesDOM , and the dict impl always available as .attributesDict

    By default, we will retain the "dict" impl, as the NamedNodeMap impl is deprecated.
    There's a new function, toggleAttributesDOM which will change the global .attributes property to be the DOM (official) or Dict (sane and prior) impl.

  • Some minor cleanups, doc updates, test updates, etc

  • 6.8.0
  • Add "getAllChildNodes" to tags, which return all the children (and all their
    children, on and so forth) of the given tag

  • Add "getAllNodes" to AdvancedHTMLParser.AdvancedHTMLParser - which gets the
    root nodes, all children of them, and all children all the way to the end

  • Add "getAllNodes" to TagCollection, which returns those nodes contained
    within, and all of their children (on and so forth)

  • Add "find" method to AdvancedHTMLParser.AdvancedHTMLParser, which supports filtering by attr=value style, supporting
    either single values or list of values (for ANY aka or), and some specials
    ( __contains and __icontains suffixes on keys for "value contains" or
    "case-insensitive value contains") This method is only available in one place.
    7.0.0 will have a full filter implementation on the parser, tags, and tag
    collections, but will require QueryableList to be installed. This will be
    optional, and this method will remain as an incomplete version.

6.7.0 - Needs no Cool Name

14 Mar 21:35

Choose a tag to compare

  • 6.7.0 Mar 14 2017
  • Fix camel-case vs dash names when using style attributes (like so em.style.paddingTop translates to 'padding-top')

  • Implement repr on AdvancedTag

  • Fix repr on StyleAttribute to include the class name

  • Make style attributes compare (eq and ne) regardless of order

  • Allow StyleAttribute objects to be created from other StyleAttribute objects

  • Implement eq and ne on AdvancedTag, these do identity
    comparison (same tag equals itself ONLY).

  • Implement copy and deepcopy methods on StyleAttribute and
    AdvancedTag so that tags can be copied.

  • Add getAttributesList and getAttributesDict on an AdvancedTag to make a copy of a list
    of values (like for attrList on AdvancedTag constructor) or a copy of a
    dict of values.

  • Implement an isTagEqual method on AdvancedTag which compares the tag
    name and attributes to another tag (for testing between < and > are the
    same), non-identity comparison.

  • Add tests for all changes

6.6.4 - Jumping Tacos

27 Oct 20:58

Choose a tag to compare

  • 6.6.4 Oct 27 2016
  • Fix regression where "AdvancedTag.getAttribute" method would not accept a default
    (second param).
  • Fix calling ".value" on an AdvancedTag to get the "value" attribute (was
    broken by previous regression)
  • 6.6.3 Oct 03 2016
  • Fix no-value attributes not appearing in html output (like "checked" on an input). Was in attributes, but not in html output.
  • 6.6.2 Jul 27 2016
  • Python's HTMLParser on python3 only automatically converts charrefs
    (backwards incompatible...) -- so make it stop doing that. This allows things
    like   and < to not be converted to ' ' and '<' on python3. Added
    tests as well.
  • Cleanup imports and add comments to test cases.
  • Add fixes made to AdvancedHTMLParser in 6.6.0, relating to text outside root
    nodes into AdvancedHTMLFormatter.
  • 6.6.0 Jul 25 2016
  • In a multiple root node scenario, make sure getHTML returns text that falls
    between the root nodes.
  • Retain text, comments, etc that occur before and after the root node
  • Update runTests.py to be latest from GoodTests -- allows providing arguments
    to GoodTests.py (by passing them to runTests.py) and removes the need for the
    symlink in the "tests" directory (which duplicates source in the dist)

6.6.3

03 Oct 22:50

Choose a tag to compare

  • 6.6.3 Oct 03 2016
  • Fix no-value attributes not appearing in html output (like "checked" on an input). Was in attributes, but not in html output.

6.6.2 - Chili Cheese Parsing Pretzel

27 Jul 04:43

Choose a tag to compare

  • 6.6.2 Jul 27 2016
  • Python's HTMLParser on python3 only automatically converts charrefs
    (backwards incompatible...) -- so make it stop doing that. This allows things
    like   and < to not be converted to ' ' and '<' on python3. Added
    tests as well.
  • Cleanup imports and add comments to test cases.
  • Add fixes made to AdvancedHTMLParser in 6.6.0, relating to text outside root
    nodes into AdvancedHTMLFormatter.
  • 6.6.0 Jul 25 2016
  • In a multiple root node scenario, make sure getHTML returns text that falls
    between the root nodes.
  • Retain text, comments, etc that occur before and after the root node
  • Update runTests.py to be latest from GoodTests -- allows providing arguments
    to GoodTests.py (by passing them to runTests.py) and removes the need for the
    symlink in the "tests" directory (which duplicates source in the dist)

6.6.0

25 Jul 16:43

Choose a tag to compare

  • 6.6.0 Jul 25 2016
  • In a multiple root node scenario, make sure getHTML returns text that falls
    between the root nodes.
  • Retain text, comments, etc that occur before and after the root node
  • Update runTests.py to be latest from GoodTests -- allows providing arguments
    to GoodTests.py (by passing them to runTests.py) and removes the need for the
    symlink in the "tests" directory (which duplicates source in the dist)

6.5.1 - Mega Salty Pretzel Cannon

25 Mar 18:36

Choose a tag to compare

  • 6.5.1 Mar 23 2016
  • Merge in patch by "Tai Kedzierski" which fixes a typo in getElementsByAttr. Thanks!
  • Fix missing files in MANIFEST.in