File tree Expand file tree Collapse file tree 1 file changed +60
-0
lines changed Expand file tree Collapse file tree 1 file changed +60
-0
lines changed Original file line number Diff line number Diff line change @@ -730,6 +730,66 @@ you can just select by class using CSS and then switch to XPath when needed::
730730This is cleaner than using the verbose XPath trick shown above. Just remember
731731to use the ``. `` in the XPath expressions that will follow.
732732
733+
734+ Beware of how script and style tags differ from other tags
735+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
736+
737+ `Following the standard `__, the contents of ``script `` and ``style `` elements
738+ are parsed as plain text.
739+
740+ __ https://www.w3.org/TR/html401/types.html#type-cdata
741+
742+ This means that XML-like structures found within them, including comments, are
743+ all treated as part of the element text, and not as separate nodes.
744+
745+ For example::
746+
747+ >>> from parsel import Selector
748+ >>> selector = Selector(text="""
749+ .... <script>
750+ .... <!-- comment -->
751+ .... text
752+ .... <br/>
753+ .... </script>
754+ .... <style>
755+ .... <!-- comment -->
756+ .... text
757+ .... <br/>
758+ .... </style>
759+ .... <div>
760+ .... <!-- comment -->
761+ .... text
762+ .... <br/>
763+ .... </div>""")
764+ >>> for tag in selector.xpath('//*[contains(text(), "text")]'):
765+ ... print(tag.xpath('name()').get())
766+ ... print(' Text: ' + (tag.xpath('text()').get() or ''))
767+ ... print(' Comment: ' + (tag.xpath('comment()').get() or ''))
768+ ... print(' Children: ' + ''.join(tag.xpath('*').getall()))
769+ ...
770+ script
771+ Text:
772+ text
773+ <!-- comment -->
774+ <br/>
775+
776+ Comment:
777+ Children:
778+ style
779+ Text:
780+ text
781+ <!-- comment -->
782+ <br/>
783+
784+ Comment:
785+ Children:
786+ div
787+ Text:
788+ text
789+
790+ Comment: <!-- comment -->
791+ Children: <br>
792+
733793.. _old-extraction-api :
734794
735795extract() and extract_first()
You can’t perform that action at this time.
0 commit comments