- 
                Notifications
    You must be signed in to change notification settings 
- Fork 299
Selected patches from Calibre #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| This seems to have broken 2.6 badly. Huh. | 
| Current coverage is 89.15%@@             master       #245   diff @@
==========================================
  Files            51         50     -1   
  Lines          6817       6726    -91   
  Methods           0          0          
  Messages          0          0          
  Branches       1316       1307     -9   
==========================================
- Hits           6172       5996   -176   
- Misses          485        559    +74   
- Partials        160        171    +11   
 | 
| Oh, right, this is  | 
| How do you suggest I override the application of attributes for html and body tags in my builder? Since without those patches, it would require overriding the entire getPhases() method in html5parser.py Remember that the problem those patches is solving is that there can be multiple  If you dont want to merge gsnedders/html5lib-python@a2d2e05 then how do you suggest I replace the the stream input class? The one in html5lib is too slow. The only alternative I can see is monkey patching -- which is less than optimal for obvious reasons. | 
| @kovidgoyal I'll take a look at dealing with html/body attributes later (I'm literally amount to board a plane). When it comes to the input stream, if it yields good perf increases when given a byte/unicode object we should just specialise them in html5lib. | 
| Sure you are welcome to take the input stream class from calibre for dealing with unicode objects. It is faster because it avoids wrapping the unicode in StringIO. And it actually implements tracking of positions. For my use case, that is important, since I need line and col numbers. | 
82b971c    to
    1f04a3f      
    Compare
  
    76bf242    to
    761f3ab      
    Compare
  
    
See #119. CC @kovidgoyal.
This cherry-picks a few things from https://github.com/gsnedders/html5lib-python/commits/calibre-patches, which was a complete set of Calibre's patches from November 2013. https://github.com/kovidgoyal/calibre/commits/master/src/html5lib has very little changed in it since then, primarily a move to 0.999999-dev and a separate downstream fix for 0c551c9.
So, of those on that branch…
True/Falsecases it's likely slower, therefore failing at its stated goal, as it results in more byte code andPOP_JUMP_IF_FALSEandPOP_JUMP_IF_TRUEspecial-case the condition beingTrueorFalse(oddly, they don't specialiseNone, though it is inPyObject_IsTrue; if that makes any notable performance difference then I'd suggest fixing that in CPython).