Skip to content

Commit d422fa5

Browse files
authored
Merge pull request #119 from wRAR/master
Replace a removed feed used as an example in the Usage docs.
2 parents b20b537 + 0505c05 commit d422fa5

File tree

1 file changed

+22
-17
lines changed

1 file changed

+22
-17
lines changed

docs/usage.rst

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -824,26 +824,31 @@ namespaces altogether and just work with element names, to write more
824824
simple/convenient XPaths. You can use the
825825
:meth:`Selector.remove_namespaces` method for that.
826826

827-
Let's show an example that illustrates this with Github blog atom feed.
827+
Let's show an example that illustrates this with the Python Insider blog atom feed.
828828

829829
Let's download the atom feed using `requests`_ and create a selector::
830830

831831
>>> import requests
832832
>>> from parsel import Selector
833-
>>> text = requests.get('https://github.com/blog.atom').text
833+
>>> text = requests.get('https://feeds.feedburner.com/PythonInsider').text
834834
>>> sel = Selector(text=text, type='xml')
835835

836836
This is how the file starts::
837837

838838
<?xml version="1.0" encoding="UTF-8"?>
839-
<feed xml:lang="en-US"
840-
xmlns="http://www.w3.org/2005/Atom"
841-
xmlns:media="http://search.yahoo.com/mrss/">
842-
<id>tag:github.com,2008:/blog</id>
839+
<?xml-stylesheet ...
840+
<feed xmlns="http://www.w3.org/2005/Atom"
841+
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
842+
xmlns:blogger="http://schemas.google.com/blogger/2008"
843+
xmlns:georss="http://www.georss.org/georss"
844+
xmlns:gd="http://schemas.google.com/g/2005"
845+
xmlns:thr="http://purl.org/syndication/thread/1.0"
846+
xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
843847
...
844848

845-
You can see two namespace declarations: a default "http://www.w3.org/2005/Atom"
846-
and another one using the "media:" prefix for "http://search.yahoo.com/mrss/".
849+
You can see several namespace declarations including a default
850+
"http://www.w3.org/2005/Atom" and another one using the "gd:" prefix for
851+
"http://schemas.google.com/g/2005".
847852

848853
We can try selecting all ``<link>`` objects and then see that it doesn't work
849854
(because the Atom XML namespace is obfuscating those nodes)::
@@ -856,8 +861,8 @@ nodes can be accessed directly by their names::
856861

857862
>>> sel.remove_namespaces()
858863
>>> sel.xpath("//link")
859-
[<Selector xpath='//link' data='<link xmlns="http://www.w3.org/2005/Atom'>,
860-
<Selector xpath='//link' data='<link xmlns="http://www.w3.org/2005/Atom'>,
864+
[<Selector xpath='//link' data='<link rel="alternate" type="text/html" h'>,
865+
<Selector xpath='//link' data='<link rel="next" type="application/atom+'>,
861866
...
862867

863868
If you wonder why the namespace removal procedure isn't called always by default
@@ -883,11 +888,11 @@ Ad-hoc namespaces references
883888
references along with the query, through a ``namespaces`` argument,
884889
with the prefixes you declare being used in your XPath or CSS query.
885890

886-
Let's use the same Atom feed from Github::
891+
Let's use the same Python Insider Atom feed::
887892

888893
>>> import requests
889894
>>> from parsel import Selector
890-
>>> text = requests.get('https://github.com/blog.atom').text
895+
>>> text = requests.get('https://feeds.feedburner.com/PythonInsider').text
891896
>>> sel = Selector(text=text, type='xml')
892897

893898
And try to select the links again, now using an "atom:" prefix
@@ -900,11 +905,11 @@ for the "link" node test::
900905

901906
You can pass several namespaces (here we're using shorter 1-letter prefixes)::
902907

903-
>>> sel.xpath("//a:entry/m:thumbnail/@url",
904-
... namespaces={"a": "http://www.w3.org/2005/Atom",
905-
... "m": "http://search.yahoo.com/mrss/"}).getall()
906-
['https://avatars1.githubusercontent.com/u/11529908?v=3&s=60',
907-
'https://avatars0.githubusercontent.com/u/15114852?v=3&s=60',
908+
>>> sel.xpath("//a:entry/a:author/g:image/@src",
909+
... namespaces={"a": "http://www.w3.org/2005/Atom",
910+
... "g": "http://schemas.google.com/g/2005"}).getall()
911+
['http://photos1.blogger.com/blogger/4554/1119/400/beethoven_10.jpg',
912+
'//lh3.googleusercontent.com/-7xisiK0EArc/AAAAAAAAAAI/AAAAAAAAAuM/-r6o6A8RKCM/s512-c/photo.jpg',
908913
...
909914

910915

0 commit comments

Comments
 (0)