@@ -824,26 +824,31 @@ namespaces altogether and just work with element names, to write more
824824simple/convenient XPaths. You can use the
825825:meth: `Selector.remove_namespaces ` method for that.
826826
827- Let's show an example that illustrates this with Github blog atom feed.
827+ Let's show an example that illustrates this with the Python Insider blog atom feed.
828828
829829Let's download the atom feed using `requests `_ and create a selector::
830830
831831 >>> import requests
832832 >>> from parsel import Selector
833- >>> text = requests.get('https://github. com/blog.atom ').text
833+ >>> text = requests.get('https://feeds.feedburner. com/PythonInsider ').text
834834 >>> sel = Selector(text=text, type='xml')
835835
836836This is how the file starts::
837837
838838 <?xml version="1.0" encoding="UTF-8"?>
839- <feed xml:lang="en-US"
840- xmlns="http://www.w3.org/2005/Atom"
841- xmlns:media="http://search.yahoo.com/mrss/">
842- <id>tag:github.com,2008:/blog</id>
839+ <?xml-stylesheet ...
840+ <feed xmlns="http://www.w3.org/2005/Atom"
841+ xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
842+ xmlns:blogger="http://schemas.google.com/blogger/2008"
843+ xmlns:georss="http://www.georss.org/georss"
844+ xmlns:gd="http://schemas.google.com/g/2005"
845+ xmlns:thr="http://purl.org/syndication/thread/1.0"
846+ xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
843847 ...
844848
845- You can see two namespace declarations: a default "http://www.w3.org/2005/Atom"
846- and another one using the "media:" prefix for "http://search.yahoo.com/mrss/".
849+ You can see several namespace declarations including a default
850+ "http://www.w3.org/2005/Atom" and another one using the "gd:" prefix for
851+ "http://schemas.google.com/g/2005".
847852
848853We can try selecting all ``<link> `` objects and then see that it doesn't work
849854(because the Atom XML namespace is obfuscating those nodes)::
@@ -856,8 +861,8 @@ nodes can be accessed directly by their names::
856861
857862 >>> sel.remove_namespaces()
858863 >>> sel.xpath("//link")
859- [<Selector xpath='//link' data='<link xmlns="http://www.w3.org/2005/Atom '>,
860- <Selector xpath='//link' data='<link xmlns="http://www.w3.org/2005/Atom '>,
864+ [<Selector xpath='//link' data='<link rel="alternate" type="text/html" h '>,
865+ <Selector xpath='//link' data='<link rel="next" type="application/atom+ '>,
861866 ...
862867
863868If you wonder why the namespace removal procedure isn't called always by default
@@ -883,11 +888,11 @@ Ad-hoc namespaces references
883888references along with the query, through a ``namespaces `` argument,
884889with the prefixes you declare being used in your XPath or CSS query.
885890
886- Let's use the same Atom feed from Github ::
891+ Let's use the same Python Insider Atom feed ::
887892
888893 >>> import requests
889894 >>> from parsel import Selector
890- >>> text = requests.get('https://github. com/blog.atom ').text
895+ >>> text = requests.get('https://feeds.feedburner. com/PythonInsider ').text
891896 >>> sel = Selector(text=text, type='xml')
892897
893898And try to select the links again, now using an "atom:" prefix
@@ -900,11 +905,11 @@ for the "link" node test::
900905
901906You can pass several namespaces (here we're using shorter 1-letter prefixes)::
902907
903- >>> sel.xpath("//a:entry/m:thumbnail/@url ",
904- ... namespaces={"a": "http://www.w3.org/2005/Atom",
905- ... "m ": "http://search.yahoo .com/mrss/ "}).getall()
906- ['https ://avatars1.githubusercontent .com/u/11529908?v=3&s=60 ',
907- 'https://avatars0.githubusercontent .com/u/15114852?v=3&s=60 ',
908+ >>> sel.xpath("//a:entry/a:author/g:image/@src ",
909+ ... namespaces={"a": "http://www.w3.org/2005/Atom",
910+ ... "g ": "http://schemas.google .com/g/2005 "}).getall()
911+ ['http ://photos1.blogger .com/blogger/4554/1119/400/beethoven_10.jpg ',
912+ '//lh3.googleusercontent .com/-7xisiK0EArc/AAAAAAAAAAI/AAAAAAAAAuM/-r6o6A8RKCM/s512-c/photo.jpg ',
908913 ...
909914
910915
0 commit comments