@@ -101,6 +101,28 @@ selectors. This API can be used for quickly selecting nested data::
101101 'image4_thumb.jpg',
102102 'image5_thumb.jpg']
103103
104+ Instead of using '@src' XPath it is possible to query for attributes using
105+ ``.attrib `` property of a :class: `~parsel.selector.Selector `::
106+
107+ >>> [img.attrib['src'] for img in selector.css('img')]
108+ ['image1_thumb.jpg',
109+ 'image2_thumb.jpg',
110+ 'image3_thumb.jpg',
111+ 'image4_thumb.jpg',
112+ 'image5_thumb.jpg']
113+
114+ As a shortcut, ``.attrib `` is also available on SelectorList directly;
115+ it returns attributes for the first matching element::
116+
117+ >>> selector.css('img').attrib['src']
118+ 'image1_thumb.jpg'
119+
120+ This is most useful when only a single result is expected, e.g. when selecting
121+ by id, or selecting unique elements on a web page::
122+
123+ >>> selector.css('base').attrib['href']
124+ 'http://example.com/'
125+
104126To actually extract the textual data, you must call the selector ``.extract() ``
105127method, as follows::
106128
@@ -132,6 +154,9 @@ Now we're going to get the base URL and some image links::
132154 >>> selector.css('base::attr(href)').extract()
133155 ['http://example.com/']
134156
157+ >>> selector.css('base').attrib['href']
158+ 'http://example.com/'
159+
135160 >>> selector.xpath('//a[contains(@href, "image")]/@href').extract()
136161 ['image1.html',
137162 'image2.html',
@@ -215,6 +240,9 @@ Examples:
215240 make much sense: text nodes do not have attributes, and attribute values
216241 are string values already and do not have children nodes.
217242
243+ .. note ::
244+ See also: :ref: `selecting-attributes `.
245+
218246
219247.. _CSS Selectors : https://www.w3.org/TR/css3-selectors/#selectors
220248
@@ -237,13 +265,56 @@ too. Here's an example::
237265
238266 >>> for index, link in enumerate(links):
239267 ... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
240- ... print 'Link number %d points to url %s and image %s' % args
268+ ... print('Link number %d points to url %s and image %s' % args)
269+
270+ Link number 0 points to url ['image1.html'] and image ['image1_thumb.jpg']
271+ Link number 1 points to url ['image2.html'] and image ['image2_thumb.jpg']
272+ Link number 2 points to url ['image3.html'] and image ['image3_thumb.jpg']
273+ Link number 3 points to url ['image4.html'] and image ['image4_thumb.jpg']
274+ Link number 4 points to url ['image5.html'] and image ['image5_thumb.jpg']
275+
276+ .. _selecting-attributes :
277+
278+ Selecting element attributes
279+ ----------------------------
280+
281+ There are several ways to get a value of an attribute. First, one can use
282+ XPath syntax::
283+
284+ >>> selector.xpath("//a/@href").extract()
285+ ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
286+
287+ XPath syntax has a few advantages: it is a standard XPath feature, and
288+ ``@attributes `` can be used in other parts of an XPath expression - e.g.
289+ it is possible to filter by attribute value.
290+
291+ parsel also provides an extension to CSS selectors (``::attr(...) ``)
292+ which allows to get attribute values::
293+
294+ >>> selector.css('a::attr(href)').extract()
295+ ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
296+
297+ In addition to that, there is a ``.attrib `` property of Selector.
298+ You can use it if you prefer to lookup attributes in Python
299+ code, without using XPath of CSS extension::
300+
301+ >>> [a.attrib['href'] for a in selector.css('a')]
302+ ['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
303+
304+ This property is also available on SelectorList; it returns a dictionary
305+ with attributes of a first matching element. It is convenient to use when
306+ a selector is expected to give a single result (e.g. when selecting by element
307+ ID, or when selecting an unique element on a page)::
308+
309+ >>> selector.css('base').attrib
310+ {'href': 'http://example.com/'}
311+ >>> selector.css('base').attrib['href']
312+ 'http://example.com/'
313+
314+ ``.attrib `` property of an empty SelectorList is empty::
241315
242- Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
243- Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
244- Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg']
245- Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
246- Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
316+ >>> selector.css('foo').attrib
317+ {}
247318
248319Using selectors with regular expressions
249320----------------------------------------
0 commit comments