Release 8.0.0 - Fix and Implement All The Things · kata198/AdvancedHTMLParser

8.0.0 - Nov 30 2017
7.4.0 - Nov 30 2017

NOTE: Was originally released as 7.4.0, but since it's such a major update, I updated the major to the "8" series.

Ensure that getAttribute, setAttribute, hasAttribute, 'key' in em.attributes, etc all lowercase the key. This is how the standard operates.
Add "src", "height", and "width" as linked attributes for "img" tags (so imgEm.height = '60px' will set height='60px' on an img.) As a reminder, you can always use element.setAttribute('height', '60px') whether or not a dot-alias is setup.
Add "bgcolor" and "background" as linked attributes for "body" tag ( so bodyEm.bgcolor = 'black' will set bgcolor='black' on a body tag). Again, setAttribute/getAttribute/removeAttribute always work.
Fixed pickle with an AdvancedHTMLParser.AdvancedHTMLParser and AdvancedHTMLParser.AdvancedTag. These now work flawlessly.
Implement firstChild / firstElementChild and lastChild / lastElementChild to get the first/last child block [text or AdvancedTag] (firstChild/lastChild) or first/last child AdvancedTag (firstElementChild/lastElementChild)
Fix name of nextElementSibling and previousElementSibling ( I had named them nextSiblingElement and previousSiblingElement, as those names make more sense, but don't match the official names found in the standard ). For now, we will retain the alternate names as aliases, but they may be marked deprecated in the future
Add AdvancedTag.append (official API name) as an alias for appendBlock. It allows you to pass either a string or an AdvancedTag and adds it as a child block.
Add "innerText" property (as an alias for .text) on AdvancedTag, which will return a string representing all the text nodes which are direct children of this node. This is read-only for now, but you can use .appendText to add a new child block of text
Add "textContent" property on AdvancedTag, which will return a string of all text nodes which appear at or beneath the given node, as they would appear in the document. Basically, this is the innerHTML without any of the markup
Update READMEs to list a few more methods and properties that are available. As always, the full documentation is available as pydoc at: http://htmlpreview.github.io/?https://github.com/kata198/AdvancedHTMLParser/blob/master/doc/AdvancedHTMLParser.html?vers=7.3.3 or the "doc" directory that comes with the distribution. Several usage examples can be found throughout the "tests" directory, available in the source distribution and also online at https://github.com/kata198/AdvancedHTMLParser/tree/master/tests/AdvancedHTMLParserTests
Add a lot more unit tests covering everything in this release, and some other minor tweaks/expansions of others. As of this release, there are 2 lines of unit test code for every 3 lines of library code (including extensive comments in library code)
Add attribute links to body tag for old (pre-HTML5) attributes on body, "link", "vlink", "alink". There is also "text" but that has a name conflict right now; another reason to use get/set Attribute methods instead.
Add support for attributes which have a binary-value through dot-access, but getAttribute and the HTML attribute is a string of "true" or "false" (versus standard boolean which are signified by present-vs-not, such as "checked"). Example is "spellcheck" which supports this.
Add "spellcheck" special attribute, which is a global (Valid for all tags).
Fixup a bug where if you do "myTag.tabIndex = 'blah'" or any non-integer, firefox sets tabIndex to "0", whereas we had an "invalid" value of -1. We now match and use 0.
Fix where "select" and "option" were not inheriting all the attribute links from "input" base
Add "spellcheck" global attribute on all tags
Major - Add all the tag-specific attribute links as defined by w3 for all HTML4 and HTML5 tags and attributes (for example, "noWrap" on td tag, myTdTag.noWrap = True will add "nowrap" to the html representation, myTdTag.colspan = 2 will add "colspan=2" to the html representation)
Support more special conversion of values from dot-access to html attribute syntax
Make non-binary attributes which were being stripped from the html representation when ="" has a different meaning. i.e. autocomplete shoudl show up as ' autocomplete="" ' when given an empty string value.
Implement special conversions for 'crossOrigin' attribute of images and link tags.
Implement special conversions for 'autocomplete' attribute on input and forms
Add "encoding" as an alias for "enctype" on form. This is an extension firefox at least implements, though not in w3 standard.
Implement AdvancedTag.getParentElementCustomFunction which takes a lambda and returns the first parent of given node which matches (returns True)
Implement "form" attribute on several tags (such as 'input' and 'button') which returns the form to which that element belongs (parent form), or None if not within a form tag.
Implement "colSpan" attribute on 'td' to be a clamped value from 1 to 1000 (firefox limits)
Implement "rowSpan" attribute on 'td' to be a clamped value from 0 to 65534 (firefox limits)
Implement clamping on "col" tag attribute "span" (colEm.span) between 1 and 1000 (firefox limits)
Implement AdvancedTag.getPeersCustomFilter, which takes a lambda/function that gets passed an element (each "peer" of this node) and returns True if match. Returns all matching peer nodes
Handle 'cols' and 'rows' special attributes on textarea with their defined behaviour and defaults, whilst retaining the "stringy" implementation on those same attribute names if found on a 'fieldset' tag.
Add a DOMTokenList implementation. This behaves like a list, but can be constructed from a string (by stripping whitespace so that just distinct words remain, and using those as the elements). Also, str() a DOMTokenList ( .toString() in javascript terminology ) will join by " ".
Change AdvancedTag.classList to return a DOMTokenList instead of a regular list. Stringing it will give the className now, same as in javascript.
Add "sandbox" attribute of an iframe, as a DOMTokenList special attribute
Add special conversions for "kind" attribute to a "track"
Lots of additional tests and other improvements

7.3.1 - Nov 21 2017

Update str(AdvancedTag) to give the real HTML representation (start tag,
inner html, and end tag) versus the former implementation which was: joined direct-child text nodes only </ end tag>

The old method is still available as _old__str__, and you can revert to old
behaviour (why would you want to?) by doing AdvancedTag.__str__ =
AdvancedTag._old__str__
Add toHTML/getHTML/asHTML methods to AdvancedTag which also return an HTML
document starting at that node
Minor updates to READMEs
Measurable performance improvements, especially in tags
Improve removeAttribute / del tag.attributes['whatever'] to perform the
special handling / linkage required when dealing with the "style" or "class" attribute
Merge in latest distrib/runTests.py from GoodTests, which among improvements for installation / ensuring GoodTests.py is present, now it is supported to easily test python2 vs python3 just by invoking runTests.py with one or the other (i.e. if you run "python2 ./runTests.py" you'll execute the test suite in python2, vs "python3 ./runTests.py" executes in python3). Previously to do this reliably you'd have to use virtualenvs.
AdvancedTag.classList now returns a COPY, so changes don't flow back to the associated tag (just like in JS api).
Unify tag classes to be stored in a single location ( private _classNames list), and wherever used it is generated from that. Prior it was stored as a string in the attributes dict, and in the _classNames list. This caused strangeness and opened the potential for disconnects. I don't much care for this hacked impl, as 'class' needs to show up or be absent from attributes, as well as work with setAttribute, removeAttribute, it is hacked on via interception. Also, digs a slightly deeper hole for attributes to be standalone (already requires a tag to work properly). Hopefully I can come up with a better impl soon, but this at least meets the goals.
Fix nextSibling and nextSiblingElement which would throw an exception (instead of return None) when called on the last element in a set of peers. Also fix the typo in the tests which would have caught this but was preventing them from running.
Remove stray pdb.set_trace() which was live in the .find method (Alternative for ORM-style filtering without having QueryableList dependency installed)
Fixup addClass / removeClass and anywhere else the class/className attribute can be inserted or modified, and ensure we properly strip the value (all leading and tailing whitespace, and reduce any in-between-words whitespace to a single space)
Allow addClass/removeClass to handle adding and removing multiple, like tag.addClass('classOne classTwo') would add both classOne and classTwo
Ensure that the "style" attribute is always linked between the tag and attributes, and that calling .setAttribute('style', '') for example doesn't leave an empty "style" attribute in html representation
Many additional unit tests

7.3.0 - Nov 19 2017

Lots of fixups to "style" property:
- Properly handle that "style" tag wouldn't show up in HTML if tag
  created without one, and only the form "myTag.style.someProperty =
  'value'" was used.
- Fix where empty style="" would be on HTML if all the values removed
  via dot-access
- Fix where we weren't doing a copy with "tag1.style = tag2.style", and
  thus any changes to either style would affect both tags
- Fix issue where setting style to empty string twice would cause it to
  lose the special StyleAttribute type
- Some performance improvements when dealing with style
- Add "isEmpty" method to check if a style is empty (has no values
  set). For now we do a dict comparison between the two styles (or
  convert a string to a StyleAttribute map and then perform the test)
  whereas Javascript does an identity comparison (so even the same style
  on different tags don't equal eachother, and with tag1.style =
  "font-weight:bold", tag1.style == "font-weight: bold" would be False in
  javascript but we return True. This may change in the future
- Update the "equals" on a style to be able to compare against a string
  (see note above how this differs from the javascript ABI), so
  myTag.style == "display: block" will now work as expected.
- Fix issue where assigning a tag's style to itself would cause a
  disconnect with the HTML attribute value, i.e. myTag.style =
  myTag.style
- Properly handle other misc. situations with style that were being
  handled wrong before
- Ensure we always link the HTML-displayed attribute to the underlying
  style object attribute
- Add several more tests to style, many of which fail on 7.2 but now
  pass with these changes
- When removing a style from a tag ( like myTag.style = somethingElse ), make sure we remove the association
  between the old style and the tag to prevent updates on the old style from affecting the former tag
- Add some comments to the style section
- Add "setProperty" method per JS api, which is a function call to set
  (name, value) for a style property, or provide value='' or value=null to
  remove that property
Addition of a lot of comments throughout code
Change the "style" (StyleAttribute) attribute's backref on a tag to that tag into a weak
reference, which removes a circular reference.
Change the "_attribute" ( SpecialAttributesDict ) attribute's backref on a tag to that tag into a weak
reference, which removes a circular reference.
Additions and modification to pydoc documentation
Minor cleanups / improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.0.0 - Fix and Implement All The Things

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!