Skip to content

Commit 1adf454

Browse files
committed
DOC explain new .attrib property (a follow-up to #107).
1 parent 3c94d7b commit 1adf454

File tree

1 file changed

+77
-6
lines changed

1 file changed

+77
-6
lines changed

docs/usage.rst

Lines changed: 77 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,28 @@ selectors. This API can be used for quickly selecting nested data::
101101
'image4_thumb.jpg',
102102
'image5_thumb.jpg']
103103

104+
Instead of using '@src' XPath it is possible to query for attributes using
105+
``.attrib`` property of a :class:`~parsel.selector.Selector`::
106+
107+
>>> [img.attrib['src'] for img in selector.css('img')]
108+
['image1_thumb.jpg',
109+
'image2_thumb.jpg',
110+
'image3_thumb.jpg',
111+
'image4_thumb.jpg',
112+
'image5_thumb.jpg']
113+
114+
As a shortcut, ``.attrib`` is also available on SelectorList directly;
115+
it returns attributes for the first matching element::
116+
117+
>>> selector.css('img').attrib['src']
118+
'image1_thumb.jpg'
119+
120+
This is most useful when only a single result is expected, e.g. when selecting
121+
by id, or selecting unique elements on a web page::
122+
123+
>>> selector.css('base').attrib['href']
124+
'http://example.com/'
125+
104126
To actually extract the textual data, you must call the selector ``.extract()``
105127
method, as follows::
106128

@@ -132,6 +154,9 @@ Now we're going to get the base URL and some image links::
132154
>>> selector.css('base::attr(href)').extract()
133155
['http://example.com/']
134156

157+
>>> selector.css('base').attrib['href']
158+
'http://example.com/'
159+
135160
>>> selector.xpath('//a[contains(@href, "image")]/@href').extract()
136161
['image1.html',
137162
'image2.html',
@@ -215,6 +240,9 @@ Examples:
215240
make much sense: text nodes do not have attributes, and attribute values
216241
are string values already and do not have children nodes.
217242

243+
.. note::
244+
See also: :ref:`selecting-attributes`.
245+
218246

219247
.. _CSS Selectors: https://www.w3.org/TR/css3-selectors/#selectors
220248

@@ -237,13 +265,56 @@ too. Here's an example::
237265

238266
>>> for index, link in enumerate(links):
239267
... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
240-
... print 'Link number %d points to url %s and image %s' % args
268+
... print('Link number %d points to url %s and image %s' % args)
269+
270+
Link number 0 points to url ['image1.html'] and image ['image1_thumb.jpg']
271+
Link number 1 points to url ['image2.html'] and image ['image2_thumb.jpg']
272+
Link number 2 points to url ['image3.html'] and image ['image3_thumb.jpg']
273+
Link number 3 points to url ['image4.html'] and image ['image4_thumb.jpg']
274+
Link number 4 points to url ['image5.html'] and image ['image5_thumb.jpg']
275+
276+
.. _selecting-attributes:
277+
278+
Selecting element attributes
279+
----------------------------
280+
281+
There are several ways to get a value of an attribute. First, one can use
282+
XPath syntax::
283+
284+
>>> selector.xpath("//a/@href").extract()
285+
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
286+
287+
XPath syntax has a few advantages: it is a standard XPath feature, and
288+
``@attributes`` can be used in other parts of an XPath expression - e.g.
289+
it is possible to filter by attribute value.
290+
291+
parsel also provides an extension to CSS selectors (``::attr(...)``)
292+
which allows to get attribute values::
293+
294+
>>> selector.css('a::attr(href)').extract()
295+
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
296+
297+
In addition to that, there is a ``.attrib`` property of Selector.
298+
You can use it if you prefer to lookup attributes in Python
299+
code, without using XPath of CSS extension::
300+
301+
>>> [a.attrib['href'] for a in selector.css('a')]
302+
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
303+
304+
This property is also available on SelectorList; it returns a dictionary
305+
with attributes of a first matching element. It is convenient to use when
306+
a selector is expected to give a single result (e.g. when selecting by element
307+
ID, or when selecting an unique element on a page)::
308+
309+
>>> selector.css('base').attrib
310+
{'href': 'http://example.com/'}
311+
>>> selector.css('base').attrib['href']
312+
'http://example.com/'
313+
314+
``.attrib`` property of an empty SelectorList is empty::
241315

242-
Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
243-
Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
244-
Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg']
245-
Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
246-
Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
316+
>>> selector.css('foo').attrib
317+
{}
247318

248319
Using selectors with regular expressions
249320
----------------------------------------

0 commit comments

Comments
 (0)